您的位置：首页 > 编程语言 > Python开发

numpy

2015-10-27 09:27 537 查看

0. 引言

用python来学习机器学习算法的过程中很大一部分时间是花在数据预处理上，在这个过程中又几乎是在用numpy库来处理数据，因此，掌握好numpy的语法至关重要。

下面介绍在使用过程中常用的一些numpy语法。

1. 生成numpy数组

1.1. 从list生成

首先导入numpy包：

import numpy as np

从list生成向量和从嵌套的list中生成矩阵的代码如下：

# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
# a matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])
print(type(v), type(M))

打印v和M的类型，得到结果：

(<type 'numpy.ndarray'>, <type 'numpy.ndarray'>)

向量v和矩阵M的形状是不同的，打印形状：

print(v.shape, M.shape)

结果为：

((4,), (2, 2))

1.2. 从数组生成函数生成

对于元素数量比较多的数组，从list生成是不可行的，通常使用数组生成函数来得到。

最常用的函数是arange函数：

# create a range
x = np.arange(0, 10, 1) # arguments: start, stop, step

结果的x为：

[0 1 2 3 4 5 6 7 8 9]

1.3. 生成随机数

生成在[0,1]之间均匀分布的数据：

# uniform random numbers in [0,1]
m1 = np.random.rand(3,3)

结果为：

[[ 0.84599869  0.89854308  0.85086676]
[ 0.36870318  0.21339847  0.89252805]
[ 0.45885099  0.76874549  0.24587492]]

生成标准正态分布数据：

# standard normal distributed random numbers
m2 = np.random.randn(3,3)

结果为：

[[-0.71449246  1.04965555 -0.07128566]
[ 0.41860965 -0.6161033  -0.17298205]
[ 1.17018284  1.36366572  0.05347902]]

1.4. 生成0或1数组

全0数组和全1数组：

m3 = np.zeros((3,3))
m4 = np.ones((3,3))

结果为：

(array([[ 0.,  0.,  0.],
[ 0.,  0.,  0.],
[ 0.,  0.,  0.]]),
array([[ 1.,  1.,  1.],
[ 1.,  1.,  1.],
[ 1.,  1.,  1.]]))

2. 文件读取与保存

2.1. CSV

通常使用的数据保存格式是CSV（comma-separated values），CSV文件的读取使用numpy.genfromtxt 函数：

data = np.genfromtxt('stockholm_td_adj.dat')
print(data.shape)

结果为：

(77431, 7)

使用numpy.savetxt函数保存一个CSV文件：

M = np.random.rand(3,3)
np.savetxt("random-matrix.csv", M)

2.2. numpy的默认保存文件

使用numpy.save和numpy.load函数可以保存和读取npy文件：

np.save("random-matrix.npy", M)
np.load("random-matrix.npy")

3. numpy数组操作

3.1. 索引

numpy的索引操作几乎和matlab相同：

M = np.random.rand(3,3)
print(M[1,:])     # row 1
print(M[:,1])     # column 1

3.2. 索引切片

numpy的切片功能和list类似：

A = np.array([1,2,3,4,5])
print A[::2] # step is 2, lower and upper defaults to the beginning and end of the array
print A[:3] # first three elements
print A[3:] # elements from index 3
print A[-3:] # the last three elements

结果为：

[1 3 5]
[1 2 3]
[4 5]
[3 4 5]

多维数组的切片功能和一维数组类似：

A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
print A[1:4, 1:4]

3.3. 花式索引

花式索引指通过指定位置的索引方式：

A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
row_indices = [1, 2, 3]
print A[row_indices]
col_indices = [1, 2, -1] # remember, index -1 means the last element
print A[:,col_indices]

结果为：

[[10 11 12 13 14]
[20 21 22 23 24]
[30 31 32 33 34]]
[[ 1  2  4]
[11 12 14]
[21 22 24]
[31 32 34]
[41 42 44]]

也可以使用由布尔数据组成的索引模板：

B = np.array([0, 1, 2, 3, 4])
row_mask = np.array([True, False, True, False, False])
print B[row_mask]

结果为：

[0 2]

由布尔数据组成的索引模板，通常的实现方式是通过条件在数组中选择数据：

x = np.arange(0, 10, 0.5)
mask = (5 < x) * (x < 7.5)
print mask
print x[mask]

结果为：

[False False False False False False False False False False False  True
True  True  True False False False False False]
[ 5.5  6.   6.5  7. ]

4. 线性代数

实现效率高的代码的途径之一是使用numpy将代码向量化，这样可以通过矩阵和向量的操作来加速。

4.1. 元素级的数组-数组操作

A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
print A * A # element-wise multiplication

结果为：

[[   0    1    4    9   16]
[ 100  121  144  169  196]
[ 400  441  484  529  576]
[ 900  961 1024 1089 1156]
[1600 1681 1764 1849 1936]]

4.2. 矩阵相乘

矩阵或者向量的乘法通过numpy.dot函数实现：

v = np.array([1,2,3,4])
print np.dot(v.T,v)

结果为：

4.3. 数组统计特性计算

numpy提供了许多函数来计算数组的统计特性。

计算平均值：

data = np.genfromtxt('stockholm_td_adj.dat')
print np.mean(data[:,3]) # the temperature data is in column 3

计算标准差和方差：

print (np.std(data[:,3]), np.var(data[:,3]))

计算最小值和最大值：

print data[:,3].min() # lowest daily average temperature

4.4. 高维数组统计特性计算

通过axis=i 来指定对哪维数据求统计特性：

m = np.random.rand(3,3)
print m
print m.max(axis=0) # max in each column
print m.max(axis=1) # max in each row

结果为：

[[ 0.97873141  0.6057547   0.86625548]
[ 0.8024376   0.45005895  0.59669569]
[ 0.36579093  0.40856885  0.55954883]]
[ 0.97873141  0.6057547   0.86625548]
[ 0.97873141  0.8024376   0.55954883]

5. 调整数组的形状和尺寸

5.1. 调整形状

numpy数组形状的变化通过reshape命令来实现：

A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
print A
n, m = A.shape
B = A.reshape((1,n*m))
print B

结果为：

[[ 0  1  2  3  4]
[10 11 12 13 14]
[20 21 22 23 24]
[30 31 32 33 34]
[40 41 42 43 44]]
[[ 0  1  2  3  4 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43
44]]

需要注意的是reshape后的数组不是原数组的复制，reshape前后的数组指向相同的地址，先修改B的元素：

B[0,0:5] = 5 # modify the array
print A
print B

结果A和B都发生了变化：

[[ 5  5  5  5  5]
[10 11 12 13 14]
[20 21 22 23 24]
[30 31 32 33 34]
[40 41 42 43 44]]
[[ 5  5  5  5  5 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43
44]]

也可以用flatten函数将高维数组转化为向量，和reshape不同的是，flatten函数会生成原始数组的复制：

B = A.flatten()
B[0:5] = 10
print A
print B

可以看出，A和B中的元算不再相同:

[[ 5  5  5  5  5]
[10 11 12 13 14]
[20 21 22 23 24]
[30 31 32 33 34]
[40 41 42 43 44]]
[[ 10  10  10  10  10 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43
44]]

5.2. 添加新的维度

通过newaxis函数，可以将向量转化为行或者列矩阵：

v = np.array([1,2,3])
print v
print v[:, np.newaxis]  # make a column matrix of the vector v
print v.shape, v[:, np.newaxis].shape

结果为：

[1 2 3]
[[1]
[2]
[3]]
(3,) (3, 1)

5.3. 数组的堆叠和合并

数组的堆叠可以通过np.concatenate来完成：

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
c = np.concatenate((a, b), axis=0)
d = np.concatenate((a, b.T), axis=1)
print c
print d

输出结果为：

[[1 2]
[3 4]
[5 6]]

[[1 2 5]
[3 4 6]]

或者使用np.hstack和np.vstack来完成：

c = np.vstack((a,b))
d = np.hstack((a,b.T))
print c
print d

输出结果和np.concatenate一样，不同的是函数参数减少了：

[[1 2]
[3 4]
[5 6]]

[[1 2 5]
[3 4 6]]

6. 向量函数

前面提过，可以通过将对数组元素的循环改为向量化的算法来对计算过程加速，而加速的前提是需要函数可以接收向量输入。

看下面这个函数：

def Theta(x):
"""
Scalar implemenation of the Heaviside step function.
"""
if x >= 0:
return 1
else:
return 0

只能接收标量输入，当输入为向量时会报错，对其向量化可以通过numpy.vectorize来实现：

Theta_vec = np.vectorize(Theta)
print Theta_vec(np.array([-3,-2,-1,0,1,2,3]))

输出结果为：

[0 0 0 1 1 1 1]

当然，也可以设计可以接收向量输入的函数，比如下面这个：

def Theta(x):
"""
Vector-aware implemenation of the Heaviside step function.
"""
return 1 * (x >= 0)

对其输入向量：

print Theta(np.array([-3,-2,-1,0,1,2,3]))

输出结果为：

[0 0 0 1 1 1 1]

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航