centos7 下caffe GPU版的配置和TensorFlow gpu版本的安装
之前写过Ubuntu的安装方法。Centos 和ubuntu 等其他版本不一样,因此再次记录下来:
安装时先安装caffe再试TensorFlow否则会出错。
I. 检查系统环境
在安装之前,需要先检查系统的软硬件环境是否支持CUDA与TF的安装,具体来说,主要有以下几个步骤:
- 检查是否电脑配置有Nvidia显卡
[code]$ /usr/sbin/lspci | grep -i nvidia
- 执行结果如下:
- 检查是否安装了正确的GCC版本
[code]$ gcc --version
- 执行结果如下:
II. 安装 CUDA 与 cuDNN
完成了系统环境的检查,就可以开始安装CUDA与cuDNN了。这一步是安装带有GPU支持的TensorFlow必须完成的,否则将无法使用GPU来完成机器学习任务。安装CUDA也是坑最多的一步,没有之一,基本上在安装上遇到的问题基本上都集中在这里,所以一定要谨慎操作。具体的步骤如下:
1. 确定CUDA与cudnn的版本
请参照文章开头的表格,选择与TensorFlow对应的CUDA与cuDNN的版本,例如,我安装的TensorFlow版本为1.4.1,那么对应的CUDA版本为8.0,cuDNN版本为6.0。下载地址如下:
- CUDA:https://developer.nvidia.com/cuda-downloads
- cuDNN(需要注册Nvidia账号):https://developer.nvidia.com/cudnn
这里切记不要选错了版本,目前(17-12-21)官网的CUDA已经到了9.0,cuDNN已经到了7.0,TensorFlow并不能做到向上兼容,必须选择准确的版本,否则将导致无法正常导入。选择历史版本,CUDA请将网页拉到底部,点击Legacy Releases ;cuDNN请按照网页提示进行操作。
2. 下载CUDA
建议选择RPM安装,选择runfile会遇到一些大坑,需要更复杂的操作,具体原因见下文。
3. 下载cudnn
4. 安装CUDA
下载完成后,进入安装文件所在目录,例如
cd ~/Downloads,执行以下命令,进行CUDA的安装:
[code]$ sudo rpm -i cuda_installer_downloaded_file.rpm $ sudo yum clean all $ sudo yum install cuda
安装过程中,由于国内的网络环境十分不稳定,下载过程可能会出现下载速度过慢、下载中断等网络环境导致的问题,请耐心等待与重试。安装过程中会有一些选项,询问是否安装其他组件,如OpenGL,如果你安装CUDA只是为了安装TensorFlow,请务必只在“toolkit”时选择yes,否则将导致一系列显卡相关的问题,最坏的情况下,需要重新安装Nvidia显卡驱动。
5. 安装cuDNN
cuDNN的安装,只需要将压缩包解压,并把文件覆盖到CUDA对应的目录中去即可:
[code]$ tar -zxvf cudnn-8.0-linux-x64-v6.0.tgz $ cd cuda $ cp include/* /usr/local/cuda-8.0/inlcude $ cp lib64/lib* /usr/local/cuda-8.0/lib64
6. 修改环境变量
执行
sudo vim /etc/profile,在export PATH 那行下面加上两行代码
[code]$ export PATH=/usr/local/cuda-8.0/bin:$PATH $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
之后别忘了执行
source /etc/profile命令,不然还得重启添加的环境变量才能起作用。
三、安装caffe
1. 安装依赖
[code]sudo yum install atlas-devel snappy-devel boost-devel leveldb leveldb-devel hdf5 hdf5-devel glog glog-devel gflags gflags-devel protobuf protobuf-devel opencv opencv-devel lmdb lmdb-devel
2. 下载caffe
[code]git clone https://github.com/BVLC/caffe.git cd caffe cp Makefile.config.example Makefile.config
3.修改文件
Makefile
将 修改线性加速库:
LIBRARIES += cblas atlas改为
LIBRARIES += satlas tatlas
问题往往存在于系统上存在多个protobuf的版本,而系统默认的版本不能满足编译caffe的要求,这个时候我们可以修改makefile文件的这两行,改为自己希望用的版本目录,例如改为使用系统的:
[code]$(Q)protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $< $(Q)protoc --proto_path=$(PROTO_SRC_DIR) --python_out=$(PY_PROTO_BUILD_DIR) $<
改为
[code]$(Q)/usr/bin/protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $< $(Q)/usr/bin/protoc --proto_path=$(PROTO_SRC_DIR) --python_out=$(PY_PROTO_BUILD_DIR) $<
Makefile.config
修改内容为:
去掉BLAS_INCLUDE=/path/to/your/blas和BLAS_lIB=/path/to/your/blas的#,然后添加自己的路径
BLAS_INCLUDE:=/usr/include
BLAS_lIB:=/usr/lib64/atlas
USES_CUDNN:=1注释取消
下面是我的Makefile.config文件
[code]## Refer to http://caffe.berkeleyvision.org/installation.html # Contributions simplifying and improving our build system are welcome! # cuDNN acceleration switch (uncomment to build with cuDNN). USE_CUDNN := 1 # CPU-only switch (uncomment to build without GPU support). # CPU_ONLY := 1 # uncomment to disable IO dependencies and corresponding data layers # USE_OPENCV := 0 # USE_LEVELDB := 0 # USE_LMDB := 0 # uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary) # You should not set this flag if you will be reading LMDBs with any # possibility of simultaneous read and write # ALLOW_LMDB_NOLOCK := 1 # Uncomment if you're using OpenCV 3 # OPENCV_VERSION := 3 # To customize your choice of compiler, uncomment and set the following. # N.B. the default for Linux is g++ and the default for OSX is clang++ # CUSTOM_CXX := g++ # CUDA directory contains bin/ and lib/ directories that we need. CUDA_DIR := /usr/local/cuda # On Ubuntu 14.04, if cuda tools are installed via # "sudo apt-get install nvidia-cuda-toolkit" then use this instead: # CUDA_DIR := /usr # CUDA architecture setting: going with all of them. # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility. # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility. # For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility. CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61 \ -gencode arch=compute_61,code=compute_61 # BLAS choice: # atlas for ATLAS (default) # mkl for MKL # open for OpenBlas BLAS := atlas # Custom (MKL/ATLAS/OpenBLAS) include and lib directories. # Leave commented to accept the defaults for your choice of BLAS # (which should work)! BLAS_INCLUDE := /usr/include BLAS_LIB := /usr/lib64/atlas # Homebrew puts openblas in a directory that is not on the standard search path # BLAS_INCLUDE := $(shell brew --prefix openblas)/include # BLAS_LIB := $(shell brew --prefix openblas)/lib # This is required only if you will compile the matlab interface. # MATLAB directory should contain the mex binary in /bin. # MATLAB_DIR := /usr/local # MATLAB_DIR := /Applications/MATLAB_R2012b.app # NOTE: this is required only if you will compile the python interface. # We need to be able to find Python.h and numpy/arrayobject.h. # PYTHON_INCLUDE := /usr/include/python2.7 \ /usr/lib/python2.7/dist-packages/numpy/core/include # Anaconda Python distribution is quite popular. Include path: # Verify anaconda location, sometimes it's in root. # ANACONDA_HOME := $(HOME)/anaconda # PYTHON_INCLUDE := $(ANACONDA_HOME)/include \ $(ANACONDA_HOME)/include/python3.6m \ $(ANACONDA_HOME)/lib/python3.6/site-packages/numpy/core/include # Uncomment to use Python 3 (default is Python 2) # PYTHON_LIBRARIES := boost_python3 python3.6m # PYTHON_INCLUDE := /usr/include/python3.5m \ # /usr/lib/python3.5/dist-packages/numpy/core/include # We need to be able to find libpythonX.X.so or .dylib. # PYTHON_LIB := /usr/lib # PYTHON_LIB := $(ANACONDA_HOME)/lib # Homebrew installs numpy in a non standard path (keg only) # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include # PYTHON_LIB += $(shell brew --prefix numpy)/lib # Uncomment to support layers written in Python (will link against Python libs) # WITH_PYTHON_LAYER := 1 # Whatever else you find you need goes here. INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies # INCLUDE_DIRS += $(shell brew --prefix)/include # LIBRARY_DIRS += $(shell brew --prefix)/lib # NCCL acceleration switch (uncomment to build with NCCL) # https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0) # USE_NCCL := 1 # Uncomment to use `pkg-config` to specify OpenCV library paths. # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.) # USE_PKG_CONFIG := 1 # N.B. both build and distribute dirs are cleared on `make clean` BUILD_DIR := build DISTRIBUTE_DIR := distribute # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171 # DEBUG := 1 # The ID of the GPU that 'make runtest' will use to run unit tests. TEST_GPUID := 0 # enable pretty build (comment to see full commands) Q ?= @
接下来:
[code]make all make test make runtest
测试
[code]./build/tools/caffe time --model=models/bvlc_alexnet/deploy.prototxt --gpu=0
接下来。在caffe中引入caffe与前面在Ubnutu中是一样的,就不再赘述。
四.TensorFlow安装
[code]conda install tensorflow-gpu
就能完成TensorFlow的安装。这里需要注意的是,必须是tensorflow-gpu,否则安装的tensorflow可能不支持gpu运算。
到此配置就完成。
测试一下是否安装成功
[code]import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() sess.run(hello)
看到了输出还不要高兴的太早,请务必确认正确调用了gpu:
[code]from tensorflow.python.client import device_lib print(device_lib.list_local_devices())
看到输出,大功告成
下面是配置过程中遇到的一些问题:
1.ldconfig(解决*.so不是符号连接)
在sudo ldconfig时遇到
usr/local/cuda-8.0/lib64/libcudnn.so.5 不是符号连接的问题,解决办法也很简单,重新建立链接并删除原链接
首先找到usr/local/cuda-8.0/lib64/目录,搜索 libcudnn 然后发现
两个文件
libcudnn.so.5 和libcudnn.so.5.0.5 理论上只有一个libcudnn.so.5.0.5
终端执行
[code]ln -sf /usr/local/cuda-8.0/lib64/libcudnn.so.5.0.5 /usr/local/cuda-8.0/lib64/libcudnn.so.5
再sudo ldconfig时就可以了,这时候会发现usr/local/cuda-8.0/lib64/目录下只有
libcudnn.so.5.0.5 文件了,libcudnn.so.5消失了。
2. nvcc fatal : Unsupported gpu architecture 'compute_20'
[code]# CUDA architecture setting: going with all of them. # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility. # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility. # For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility. CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_20,code=sm_21 \ -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_52,code=sm_52 \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61 \ -gencode arch=compute_61,code=compute_61
只需要删除下面两行就可以
[code]-gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_20,code=sm_21 \
3../include/caffe/util/cudnn.hpp:5:19: fatal error: cudnn.h: 没有那个文件或目录 #include <cudnn.h>
只需要将相应文件拷入即可
[code]cd cudnn sudo cp lib* /usr/local/cuda/lib64 sudo cp include/cudnn.h /usr/local/cuda/include
4.proto/caffe.pb.h:17:2: error: #error This file was generated by an older version of protoc
见上一篇文章:https://blog.csdn.net/qq_33144323/article/details/81259985
5.make runtest出错:.build_release/tools/caffe:error while loading shared libraries:libboost_system.so.1.67.o
解决办法:export LD_LIBRARY_PATH=/usr/local/lib
阅读更多- Ubuntu16.04安装配置GPU版本Caffe经验总结
- 【转】Ubuntu 16.04安装配置TensorFlow GPU版本
- Ubuntu16.04 安装配置GPU版本Caffe
- caffe 安装配置(CentOS 6.5 + 无GPU)
- ubuntu16.04下安装CUDA cuDNN及tensorflow-gpu版本及caffe-gpu过程
- Ubuntu 16.04安装配置TensorFlow GPU版本
- 在win10下安装配置tensorflow_gpu版本(简单)
- Ubuntu16.04下安装配置了tensorflow GPU版本后导致的常见错误
- ubuntu16.04 安装CUDA 8.0 和 cuDNN 5.1 /cudnn6.0,可适用于gpu版本的(tensorflow,caffe,mxnet)
- 学习笔记(三)ubuntu16.04下Anaconda及tensorflowGPU版本的安装配置
- win10下基于python(anaconda)安装gpu版本的TensorFlow以及kears深度学习框架
- Windows7下安装Caffe(GPU):另一个失败版本
- windows(64位)下用GPU版本的mxnet配置(绑定python)+xgboost快速安装使用
- Windows 7/8.1 下 双版本Python2.7/Python3.5 安装 GPU版的tensorflow
- Tensorflow-GPU版本安装(Ubuntu14.04LTS+Cuda8+Quadro K1200)
- Win10下安装GPU版本的tensorflow
- Ubuntu16.04下使用Anaconda配置GPU版本的Keras及TensorFlow
- win7下配置faster-rcnn的tensorflow版本(gpu版本)
- Windows10下GPU版本TensorFlow安装问题汇总
- ubuntu 16.04 安装Caffe GPU版本