您的位置：首页 > 运维架构 > Linux

centos7 下caffe GPU版的配置和TensorFlow gpu版本的安装

2018-07-28 16:01 393 查看

之前写过Ubuntu的安装方法。Centos 和ubuntu 等其他版本不一样，因此再次记录下来：

安装时先安装caffe再试TensorFlow否则会出错。

I. 检查系统环境

在安装之前，需要先检查系统的软硬件环境是否支持CUDA与TF的安装，具体来说，主要有以下几个步骤：

检查是否电脑配置有Nvidia显卡

[code]$ /usr/sbin/lspci | grep -i nvidia

执行结果如下：

检查是否安装了正确的GCC版本

[code]$ gcc --version

执行结果如下：

II. 安装 CUDA 与 cuDNN

完成了系统环境的检查，就可以开始安装CUDA与cuDNN了。这一步是安装带有GPU支持的TensorFlow必须完成的，否则将无法使用GPU来完成机器学习任务。安装CUDA也是坑最多的一步，没有之一，基本上在安装上遇到的问题基本上都集中在这里，所以一定要谨慎操作。具体的步骤如下：

1. 确定CUDA与cudnn的版本

请参照文章开头的表格，选择与TensorFlow对应的CUDA与cuDNN的版本，例如，我安装的TensorFlow版本为1.4.1，那么对应的CUDA版本为8.0，cuDNN版本为6.0。下载地址如下：

CUDA：https://developer.nvidia.com/cuda-downloads
cuDNN（需要注册Nvidia账号）：https://developer.nvidia.com/cudnn

这里切记不要选错了版本，目前（17-12-21）官网的CUDA已经到了9.0，cuDNN已经到了7.0，TensorFlow并不能做到向上兼容，必须选择准确的版本，否则将导致无法正常导入。选择历史版本，CUDA请将网页拉到底部，点击Legacy Releases ；cuDNN请按照网页提示进行操作。

2. 下载CUDA

建议选择RPM安装，选择runfile会遇到一些大坑，需要更复杂的操作，具体原因见下文。

3. 下载cudnn

4. 安装CUDA

下载完成后，进入安装文件所在目录，例如

cd ~/Downloads

，执行以下命令，进行CUDA的安装：

[code]$ sudo rpm -i cuda_installer_downloaded_file.rpm
$ sudo yum clean all
$ sudo yum install cuda

安装过程中，由于国内的网络环境十分不稳定，下载过程可能会出现下载速度过慢、下载中断等网络环境导致的问题，请耐心等待与重试。安装过程中会有一些选项，询问是否安装其他组件，如OpenGL，如果你安装CUDA只是为了安装TensorFlow，请务必只在“toolkit”时选择yes，否则将导致一系列显卡相关的问题，最坏的情况下，需要重新安装Nvidia显卡驱动。

5. 安装cuDNN

cuDNN的安装，只需要将压缩包解压，并把文件覆盖到CUDA对应的目录中去即可：

[code]$ tar -zxvf cudnn-8.0-linux-x64-v6.0.tgz
$ cd cuda
$ cp include/* /usr/local/cuda-8.0/inlcude
$ cp lib64/lib* /usr/local/cuda-8.0/lib64

6. 修改环境变量

执行

sudo vim /etc/profile

，在export PATH 那行下面加上两行代码

[code]$ export PATH=/usr/local/cuda-8.0/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

之后别忘了执行

source /etc/profile

命令，不然还得重启添加的环境变量才能起作用。

三、安装caffe

1. 安装依赖

[code]sudo yum install atlas-devel snappy-devel boost-devel leveldb leveldb-devel hdf5 hdf5-devel  glog glog-devel gflags gflags-devel protobuf protobuf-devel opencv opencv-devel lmdb lmdb-devel

2. 下载caffe

[code]git clone https://github.com/BVLC/caffe.git
cd caffe
cp Makefile.config.example Makefile.config

3.修改文件

Makefile

将修改线性加速库：

LIBRARIES += cblas atlas

改为

LIBRARIES += satlas tatlas

问题往往存在于系统上存在多个protobuf的版本，而系统默认的版本不能满足编译caffe的要求，这个时候我们可以修改makefile文件的这两行，改为自己希望用的版本目录，例如改为使用系统的：

[code]$(Q)protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $<
$(Q)protoc --proto_path=$(PROTO_SRC_DIR) --python_out=$(PY_PROTO_BUILD_DIR) $<

改为

[code]$(Q)/usr/bin/protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $<
$(Q)/usr/bin/protoc --proto_path=$(PROTO_SRC_DIR) --python_out=$(PY_PROTO_BUILD_DIR) $<

Makefile.config

修改内容为：

去掉BLAS_INCLUDE=/path/to/your/blas和BLAS_lIB=/path/to/your/blas的#，然后添加自己的路径
BLAS_INCLUDE:=/usr/include
BLAS_lIB:=/usr/lib64/atlas

USES_CUDNN:=1

注释取消

下面是我的Makefile.config文件

[code]## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0

# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
#	You should not set this flag if you will be reading LMDBs with any
#	possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1

# Uncomment if you're using OpenCV 3
# OPENCV_VERSION := 3

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
CUDA_ARCH := 	-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-gencode arch=compute_60,code=sm_60 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_61,code=compute_61

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
BLAS_INCLUDE := /usr/include
BLAS_LIB := /usr/lib64/atlas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
# PYTHON_INCLUDE := /usr/include/python2.7 \
/usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
$(ANACONDA_HOME)/include/python3.6m \
$(ANACONDA_HOME)/lib/python3.6/site-packages/numpy/core/include

# Uncomment to use Python 3 (default is Python 2)
# PYTHON_LIBRARIES := boost_python3 python3.6m
# PYTHON_INCLUDE := /usr/include/python3.5m \
#                 /usr/lib/python3.5/dist-packages/numpy/core/include

# We need to be able to find libpythonX.X.so or .dylib.
# PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
# WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# NCCL acceleration switch (uncomment to build with NCCL)
# https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0)
# USE_NCCL := 1

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

接下来：

[code]make all
make test
make runtest

测试

[code]./build/tools/caffe time --model=models/bvlc_alexnet/deploy.prototxt --gpu=0

接下来。在caffe中引入caffe与前面在Ubnutu中是一样的，就不再赘述。

四.TensorFlow安装

[code]conda install tensorflow-gpu

就能完成TensorFlow的安装。这里需要注意的是，必须是tensorflow-gpu，否则安装的tensorflow可能不支持gpu运算。

到此配置就完成。

测试一下是否安装成功

[code]import tensorflow as tf

hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
sess.run(hello)

看到了输出还不要高兴的太早，请务必确认正确调用了gpu：

[code]from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

看到输出，大功告成

下面是配置过程中遇到的一些问题：

1.ldconfig(解决*.so不是符号连接)

在sudo ldconfig时遇到

usr/local/cuda-8.0/lib64/libcudnn.so.5 不是符号连接的问题，解决办法也很简单，重新建立链接并删除原链接

首先找到usr/local/cuda-8.0/lib64/目录，搜索 libcudnn 然后发现

两个文件

libcudnn.so.5 和libcudnn.so.5.0.5 理论上只有一个libcudnn.so.5.0.5

终端执行

[code]ln -sf /usr/local/cuda-8.0/lib64/libcudnn.so.5.0.5 /usr/local/cuda-8.0/lib64/libcudnn.so.5

再sudo ldconfig时就可以了，这时候会发现usr/local/cuda-8.0/lib64/目录下只有

libcudnn.so.5.0.5 文件了，libcudnn.so.5消失了。

2. nvcc fatal : Unsupported gpu architecture 'compute_20'

[code]# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-gencode arch=compute_60,code=sm_60 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_61,code=compute_61

只需要删除下面两行就可以

[code]-gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \

3../include/caffe/util/cudnn.hpp:5:19: fatal error: cudnn.h: 没有那个文件或目录 #include <cudnn.h>

只需要将相应文件拷入即可

[code]cd cudnn
sudo cp lib* /usr/local/cuda/lib64
sudo cp include/cudnn.h /usr/local/cuda/include

4.proto/caffe.pb.h:17:2: error: #error This file was generated by an older version of protoc

见上一篇文章：https://blog.csdn.net/qq_33144323/article/details/81259985

5.make runtest出错：.build_release/tools/caffe:error while loading shared libraries:libboost_system.so.1.67.o

解决办法：export LD_LIBRARY_PATH=/usr/local/lib

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航