[Python for Data Anlysis]CH04 Numpy Basics -- Arrays and Vectorized Computation
2016-02-18 14:57
211 查看
NumPy Basics: Arrays and Vectorized Computation
NumPy, short for Numerical Python, is the fundamental package required for highperformance scientific computing and data analysis.
ndarray
mathematical functions for fast operations on entire arrays of data without having to write loop
Tools for reading data form disk
Linear Algebra, random number generation, Fourier transformation
Tools for interrating code wiritten in C, C++, Fortran
基本设置
%matplotlib inline from __future__ import division from numpy.random import randn import numpy as np np.set_printoptions(precision=4, suppress=True)
NumPy ndarray: A Multidimensional Array Object
基本使用data = randn(2, 3) data *10 data + data data.shape data.dtype
Creating ndarray
Array它能接受任何序列, 然后创建一个NumPy array,包含输入的序列
zeros and ones
zeros 和 ones创建对应shape的array, 而且分别全为0,1.
empty
empty creats an array without initializing its values to any particular value
arange
arange 将range变为对应的array
#array data1= [6,7.5,8,0,1] arr1 = np.array(data1) #二维序列 nested sequences data2 = [[1,2,3,4],[5,6,7,8]] arr2 = np.array(data2) #zeros, ones a1 = np.zeros(10) a2 = np.ones((2,3)) #empty np.empty(10) #arange np.arange(15)
Function | Description |
---|---|
array | Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default. |
asarray | Convert input to ndarray, but do not copy if the input is already an ndarray |
arange | Like the built-in range but returns an ndarray instead of a list. |
ones, ones_like | Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype. |
zeros, zeros_like | Like ones and ones_like but producing arrays of 0’s instead |
empty, empty_like | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros |
eye, identity | Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere) |
Data Types for ndarrays
主要时用于计算memory大小的,后面数字表示bit位数, double(float)8字节,所以要64bitsarr1 = np.array([1,2,3],dtype = np.float64) arr2 = np.array([1,2,3],dtype = np.int32) arr1.dtype arr2.dtype
casting dtypes between different arrays
类型给定方法:1. 初始化时默认给定
2. 初始化时给定
3. arr.astype(给定dtype,或这另一个arr2.dtype)
astype always creates a new array,不论类型有没有被改变
#1. 初始化默认给定 arr = np.arange(1,6) #2. 初始化是给定 numeric_strings = np.array(['1.25','-9.6','42'],dtype = np.string_) #3. 改变数据类型 float_arr = arr.astype(np.float64) #cast int64 to float64 numeric_strings.astype(float) #if cast fail for some reason, a TypeError will be raised, # Numpy is smart enough to alias Python types to equivalent dtypes # arr2.dtype arr1 = np.arange(10) arr2 = randn(2,3) arr1.astype(arr2.dtype),arr1.dtype
Operations between Arrays and Scalars
和R, Matlab一致,所有的*, + ,-,/是对应元素间的操作
arr = np.array([[1., 2., 3.], [4., 5., 6.]]) arr #二元运算符 arr + arr arr - arr arr * arr arr / arr
#一元运算符 1 / arr arr ** 0.5
Bacis Indexing and Sclicing
One dimension
Array slices are views on the original array,and any modifications to the view will be reflected in the source array.
arr = np.arange(10) arr arr[5] arr[5:8] arr[5:8] = 12 arr
arr_slice = arr[5:8] arr_slice[1] = 12345 arr arr_slice[:] = 64 arr
copy of the slice of the array
arr[5:8].copy() arr_slice_copy = arr[5:8].copy() arr_slice_copy[1] = 1 arr_slice_copy arr
Higher Dimension
The elements at each index are no longer scalars but rather corresponding arraysarr2d = np.array([[1,2,3],[4,5,6],[7,8,9]]) arr2d[2] arr2d[0][2],arr2d[0,2] arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]) arr3d arr3d.shape arr3d[0] arr3d[0] = 42 arr3d[1, 0]
Indexing with slices
view of original arrayarr[1:6] arr2d # 仅有一个表示行 arr2d[:2] # 两个则分别表示行和列 arr2d[:2, 1:] arr2d[1, :2]
Boolean Indexing
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe']) data = randn(7, 4) names data
names == 'Bob' data[names == 'Bob'] data[names == 'Bob', 2:] data[names == 'Bob', 3] mask = (names == 'Bob') | (names == 'Will') #do not support keywords and, or mask data[mask] data[data<0] = 0 data data[names!='Joe'] = 7 data
Fancy Indexing
Indexing using integer arraysarr = np.empty((8, 4)) for i in range(8): arr[i] = i arr
arr[[4, 3, 0, 6]] arr[[-3,-5,-7]]
arr = np.arange(32).reshape((8, 4)) arr arr[[1, 5, 7, 2], [0, 3, 1, 2]] arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]] arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]
Transposing arrays and swapping axes
arr = np.arange(15).reshape((3, 5)) arr arr.T
arr = np.random.randn(6, 3) np.dot(arr.T, arr)
transpose(), swapaxes()暂时用不到
Universal Functions: Element-wise Array Functions
一些快速的函数,element-wise的函数arr = np.arange(10) np.sqrt(arr) np.exp(arr)
参数为多个array
x = randn(8) y = randn(8) x y np.maximum(x, y) # element-wise maximum
返回多个值
arr = randn(7) * 5 np.modf(arr)
Uinary functions
Function | Description |
---|---|
abs, fabs | Compute the absolute value element-wise for integer, floating point, or complex values. Use fabs as a faster alternative for non-complex-valued data |
sqrt | Compute the square root of each element. Equivalent to arr ** 0.5 |
square | Compute the square of each element. Equivalent to arr ** 2 |
exp | Compute the exponent e x of each element |
log, log10, log2, log1p | Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively |
sign | Compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative) |
ceil | Compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element |
floor | Compute the floor of each element, i.e. the largest integer less than or equal to each element |
rint | Round elements to the nearest integer, preserving the dtype |
modf | Return fractional and integral parts of array as separate array |
isnan | Return boolean array indicating whether each value is NaN (Not a Number) |
isfinite, isinf | Return boolean array indicating whether each element is finite (non- inf , non- NaN ) or infinite, respectively |
cos, cosh, sin, sinh, tan, tanh | Regular and hyperbolic trigonometric functions |
arccos, arccosh, arcsin, arcsinh, arctan, arctanh | Inverse trigonometric functions |
logical_not | Compute truth value of not x element-wise. Equivalent to -arr . |
Binary functions
Function | Description |
---|---|
add | Add corresponding elements in arrays |
subtract | Subtract elements in second array from first array |
multiply | Multiply array elements |
divide, floor_divide | Divide or floor divide (truncating the remainder) |
power | Raise elements in first array to powers indicated in second array |
maximum, fmax | Element-wise maximum. fmax ignores NaN |
minimum, fmin | Element-wise minimum. fmin ignores NaN |
mod | Element-wise modulus (remainder of division) |
copysign | Copy sign of values in second argument to values in first argument |
Data processing using arrays
vectorization把loop转换成array expression: fasterExpressing conditional logic as array operations
pure pythonresult = [x if c else y for x,y,c in zip(x,y,c)
numpy
result = np.where(c,x,y) arr = randn(4, 4) arr np.where(arr > 0, 2, -2) np.where(arr > 0, 2, arr) # set only positive values to 2
Mathematical and statistical methods
meanarr = np.random.randn(5, 4) # normally-distributed data arr.mean() np.mean(arr) arr.sum()
按行列,0为列,1 为行
arr.mean(axis=1) arr.sum(0)
cumsum, cumprod
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) arr.cumsum(0) arr.cumprod(1)
Method | Description |
---|---|
sum | Sum of all the elements in the array or along an axis. Zero-length arrays have sum 0. |
mean | Arithmetic mean. Zero-length arrays have NaN mean. |
std, var Standard deviation and variance, respectively, with optional degrees of freedom adjust-ment (default denominator n ). | |
min, max | Minimum and maximum. |
argmin, argmax | Indices of minimum and maximum elements, respectively. |
cumsum | Cumulative sum of elements starting from 0 |
cumprod | Cumulative product of elements starting from 1 |
Methods for boolean arrays
统计正数arr = randn(100) (arr > 0).sum() # Number of positive values
是否存在any,是否都all
bools = np.array([False, False, True, False]) bools.any() bools.all()
Sorting
arr.sort()arr = randn(8) arr arr.sort() arr
arr.sort(1)
arr.sort(1)
np.sort()
np.sort(arr)
Unique and other set logic
np.unique(arr)names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe']) np.unique(names) ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4]) np.unique(ints)
np.in1d(arr1,arr2)
values = np.array([6, 0, 0, 3, 2, 5, 6]) np.in1d(values, [2, 3, 6])
Method | Description |
---|---|
unique(x) | Compute the sorted, unique elements in x |
intersect1d(x, y) | Compute the sorted, common elements in x and y |
union1d(x, y) | Compute the sorted union of elements |
in1d(x, y) | Compute a boolean array indicating whether each element of x is contained in y |
setdiff1d(x, y) | Set difference, elements in x that are not in y |
setxor1d(x, y) | Set symmetric differences; elements that are in either of the arrays, but not both |
File input and output with arrays
Storing arrays on disk in binary format
arr = np.arange(10) np.save('some_array', arr) np.load('some_array.npy')
np.savez('array_archive.npz', a=arr, b=arr) arch = np.load('array_archive.npz') arch['b'] #dict-like
Saving and loading text files
pandas里面的read_csv和read_table 较为常用arr = np.loadtxt('array_ex.txt', delimiter=',') arr
Linear algebra
from numpy.linalg import inv, qr1. A %*% B
“`python
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y
x.dot(y) # equivalently np.dot(x, y)
```
2. QR分解
“`
from numpy.linalg import inv, qr
X = randn(5, 5)
mat = X.T.dot(X)
inv(mat)
mat.dot(inv(mat))
q, r = qr(mat)
r
Function | Description |
---|---|
diag | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or |
dot | Matrix multiplication |
trace | Compute the sum of the diagonal elements |
det | Compute the matrix determinant |
eig | Compute the eigenvalues and eigenvectors of a square matrix |
inv | Compute the inverse of a square matrix |
pinv | Compute the Moore-Penrose pseudo-inverse inverse of a square matrix |
qr | Compute the QR decomposition |
svd | Compute the singular value decomposition (SVD) |
solve | Solve the linear system Ax = b for x, where A is a square matrix |
lstsq | Compute the least-squares solution to y = Xb |
Random number generation
samples = np.random.normal(size=(4, 4)) samples
from random import normalvariate N = 1000000 %timeit samples = [normalvariate(0, 1) for _ in xrange(N)] %timeit np.random.normal(size=N)
Function | Description |
---|---|
seed | Seed the random number generator |
permutation | Return a random permutation of a sequence, or return a permuted range |
shuffle | Randomly permute a sequence in place |
rand | Draw samples from a uniform distribution |
randint | Draw random integers from a given low-to-high range |
randn | Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface) |
binomial | Draw samples a binomial distribution |
normal | Draw samples from a normal (Gaussian) distribution |
beta | Draw samples from a beta distribution |
chisquare | Draw samples from a chi-square distribution |
gamma | Draw samples from a gamma distribution |
uniform | Draw samples from a uniform [0, 1) distribution |
Example: Random Walks
pure pythonimport random position = 0 walk = [position] steps = 1000 for i in xrange(steps): step = 1 if random.randint(0, 1) else -1 position += step walk.append(position)
numpy
np.random.seed(12345) nsteps = 1000 draws = np.random.randint(0, 2, size=nsteps) steps = np.where(draws > 0, 1, -1) walk = steps.cumsum()
初探random walk
walk.min()
walk.max()
找出初次到达10或-10的时刻
(np.abs(walk)>=10).argmax()
Simulating many random walks at once
nwalks = 5000 nsteps = 1000 draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1 steps = np.where(draws > 0, 1, -1) walks = steps.cumsum(1) #对行求和 walks
初探random walk
walks.max() walks.min() hits30 = (np.abs(walks) >= 30).any(1) hits30 hits30.sum() # Number that hit 30 or -30 crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1) crossing_times.mean()
正态分布 random walk
steps = np.random.normal(loc=0, scale=0.25, size=(nwalks, nsteps))
相关文章推荐
- 《笨办法学Python》 第39课手记
- 使用Python的PIL模块来进行图片对比
- 【python】编程语言入门经典100例--5
- Python 18.3 async/await
- python过滤文件
- 【python】编程语言入门经典100例--4
- 【python】编程语言入门经典100例--3
- python2.7爬虫实战(房地产信息抓取)
- 【python】编程语言入门经典100例--2
- Python学习之路-Day4
- 新手常见Python运行时错误
- Python:数据库操作模块SQLAlchemy
- 语句乎?表达式乎?(Python/C)
- Python模块 - os
- Python-面向对象 (二 继承)
- Python 18.2 asyncio
- Python正则表达式
- Python版本需要是2.7
- python程序打包成exe执行文件
- Python 18.1 协程