您的位置:首页 > 运维架构 > Docker

【docker】CentOS7.4+Python2.7+selenium+Firefox+tesseract的搭建

2020-01-13 09:21 411 查看

当前Docker容器配置:

  • Centos7.4
  • python2.7.5

目标Docker容器配置:

  • Centos7.4
  • python2.7.5
  • selenium 3.141.0
  • geckodriver 0.15
  • firefox 56.0.2
  • Pillow 6.1.0
  • pytesseract 0.2.7

安装依赖环境

yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel libffi-devel gcc make wget git unzip gcc gcc-c++ libjpeg-devel libpng-devel libgif-devel

创建目录存放安装包

mkdir /usr/local/download
cd /usr/local/download

安装pip

wget --no-check-certificate https://bootstrap.pypa.io/get-pip.py
python get-pip.py
mv /usr/bin/pip /usr/bin/pip_bak
ln -s /usr/local/bin/pip /usr/bin/pip

根据需求安装所需包

pip install requests
pip install Pillow
pip install httplib2
pip install excel

安装tesseract

# 安装leptonica
cd /usr/local/download/
wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
tar xvzf leptonica-1.72.tar.gz
cd leptonica-1.72/
./configure
make && make install

# 安装tesseract-3.04
cd /usr/local/download/
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
unzip 3.04.zip && cd tesseract-3.04/
./configure
make && make install
# 手动更新动态链接库
ldconfig
pip install pytesseract
# 安装语言包
在https://github.com/tesseract-ocr/tessdata 下载对应语言的模型文件
由于目前只需要识别手机号码和英文,只下载一个eng.traineddata文件即可,
将模型文件移动到/usr/local/share/tessdata
然后即可进行识别

# 示例
import pytesseract
from PIL import Image

image = Image.open('bb.png')
code = pytesseract.image_to_string(image)
print(code)

安装selenium+Firefox+Xvfb

yum install -y Xvfb gtk3 gtk3-devel libXfont xorg-x11-fonts* libgtk-3.so.0 bzip2
pip install xvfbwrapper selenium pyvirtualdisplay

# 安装浏览器
cd /usr/local/download/
wget https://ftp.mozilla.org/pub/firefox/releases/56.0.2/linux-x86_64/en-US/firefox-56.0.2.tar.bz2
tar xjvf firefox-56.0.2.tar.bz2
rm -f /usr/bin/firefox
ln -s /usr/local/download/firefox/firefox /usr/bin/firefox

# 安装geckodriver
wget https://github.com/mozilla/geckodriver/releases/download/v0.15.0/geckodriver-v0.15.0-linux64.tar.gz
tar xvzf geckodriver-*.tar.gz
rm -f /usr/bin/geckodriver
ln -s /usr/local/download/geckodriver /usr/bin/geckodriver	# 软链接必须用绝对路径

测试用例:

#!/usr/bin/python
# -*- coding:utf-8 -*-
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
display = Display(visible=0, size=(800,600))
display.start()
binary = FirefoxBinary('/usr/bin/firefox')
driver = webdriver.Firefox(firefox_binary=binary)
driver.get('https://www.baidu.com')
print(driver.title.encode('utf8'))
driver.quit()
display.stop()

关注公众号

西加加先生
一起玩转Python

  • 点赞
  • 收藏
  • 分享
  • 文章举报
西加加先生 发布了11 篇原创文章 · 获赞 0 · 访问量 83 私信 关注
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: