您的位置:首页 > 编程语言 > Python开发

[Python for Data Anlysis]CH04 Numpy Basics -- Arrays and Vectorized Computation

2016-02-18 14:57 211 查看

NumPy Basics: Arrays and Vectorized Computation

NumPy, short for Numerical Python, is the fundamental package required for high

performance scientific computing and data analysis.

ndarray

mathematical functions for fast operations on entire arrays of data without having to write loop

Tools for reading data form disk

Linear Algebra, random number generation, Fourier transformation

Tools for interrating code wiritten in C, C++, Fortran

基本设置

%matplotlib inline
from __future__ import division
from numpy.random import randn
import numpy as np
np.set_printoptions(precision=4, suppress=True)


NumPy ndarray: A Multidimensional Array Object

基本使用

data = randn(2, 3)
data *10
data + data
data.shape
data.dtype


Creating ndarray

Array

它能接受任何序列, 然后创建一个NumPy array,包含输入的序列

zeros and ones

zeros 和 ones创建对应shape的array, 而且分别全为0,1.

empty

empty creats an array without initializing its values to any particular value

arange

arange 将range变为对应的array

#array
data1= [6,7.5,8,0,1]
arr1 = np.array(data1)
#二维序列 nested sequences
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)

#zeros, ones
a1 = np.zeros(10)
a2 = np.ones((2,3))

#empty
np.empty(10)

#arange
np.arange(15)


FunctionDescription
arrayConvert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default.
asarrayConvert input to ndarray, but do not copy if the input is already an ndarray
arangeLike the built-in range but returns an ndarray instead of a list.
ones, ones_likeProduce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype.
zeros, zeros_likeLike ones and ones_like but producing arrays of 0’s instead
empty, empty_likeCreate new arrays by allocating new memory, but do not populate with any values like ones and zeros
eye, identityCreate a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere)

Data Types for ndarrays

主要时用于计算memory大小的,后面数字表示bit位数, double(float)8字节,所以要64bits

arr1 = np.array([1,2,3],dtype = np.float64)
arr2 = np.array([1,2,3],dtype = np.int32)
arr1.dtype
arr2.dtype


casting dtypes between different arrays

类型给定方法:

1. 初始化时默认给定

2. 初始化时给定

3. arr.astype(给定dtype,或这另一个arr2.dtype)

astype always creates a new array,不论类型有没有被改变

#1. 初始化默认给定
arr = np.arange(1,6)
#2. 初始化是给定
numeric_strings = np.array(['1.25','-9.6','42'],dtype = np.string_)
#3. 改变数据类型
float_arr = arr.astype(np.float64) #cast int64 to float64
numeric_strings.astype(float)
#if cast fail for some reason, a TypeError will be raised,
# Numpy is smart enough to alias Python types to equivalent dtypes

# arr2.dtype
arr1 = np.arange(10)
arr2 = randn(2,3)
arr1.astype(arr2.dtype),arr1.dtype


Operations between Arrays and Scalars

和R, Matlab一致,

所有的*, + ,-,/是对应元素间的操作

arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr
#二元运算符
arr + arr
arr - arr
arr * arr
arr / arr


#一元运算符
1 / arr
arr ** 0.5


Bacis Indexing and Sclicing

One dimension

Array slices are views on the original array,

and any modifications to the view will be reflected in the source array.

arr = np.arange(10)
arr
arr[5]
arr[5:8]
arr[5:8] = 12
arr


arr_slice = arr[5:8]
arr_slice[1] = 12345
arr
arr_slice[:] = 64
arr


copy of the slice of the array

arr[5:8].copy()
arr_slice_copy = arr[5:8].copy()
arr_slice_copy[1] = 1
arr_slice_copy
arr


Higher Dimension

The elements at each index are no longer scalars but rather corresponding arrays

arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[2]
arr2d[0][2],arr2d[0,2]

arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
arr3d
arr3d.shape
arr3d[0]
arr3d[0] = 42
arr3d[1, 0]


Indexing with slices

view of original array

arr[1:6]
arr2d
# 仅有一个表示行
arr2d[:2]
# 两个则分别表示行和列
arr2d[:2, 1:]
arr2d[1, :2]


Boolean Indexing

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = randn(7, 4)
names
data


names == 'Bob'
data[names == 'Bob']
data[names == 'Bob', 2:]
data[names == 'Bob', 3]

mask = (names == 'Bob') | (names == 'Will')
#do not support keywords and, or
mask
data[mask]

data[data<0] = 0
data
data[names!='Joe'] = 7
data


Fancy Indexing

Indexing using integer arrays

arr = np.empty((8, 4))
for i in range(8):
arr[i] = i
arr


arr[[4, 3, 0, 6]]
arr[[-3,-5,-7]]


arr = np.arange(32).reshape((8, 4))
arr
arr[[1, 5, 7, 2], [0, 3, 1, 2]]
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]


Transposing arrays and swapping axes

arr = np.arange(15).reshape((3, 5))
arr
arr.T


arr = np.random.randn(6, 3)
np.dot(arr.T, arr)


transpose(), swapaxes()暂时用不到

Universal Functions: Element-wise Array Functions

一些快速的函数,element-wise的函数

arr = np.arange(10)
np.sqrt(arr)
np.exp(arr)


参数为多个array

x = randn(8)
y = randn(8)
x
y
np.maximum(x, y) # element-wise maximum


返回多个值

arr = randn(7) * 5
np.modf(arr)


Uinary functions

FunctionDescription
abs, fabsCompute the absolute value element-wise for integer, floating point, or complex values. Use fabs as a faster alternative for non-complex-valued data
sqrtCompute the square root of each element. Equivalent to arr ** 0.5
squareCompute the square of each element. Equivalent to arr ** 2
expCompute the exponent e x of each element
log, log10, log2, log1pNatural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively
signCompute the sign of each element: 1 (positive), 0 (zero), or -1 (negative)
ceilCompute the ceiling of each element, i.e. the smallest integer greater than or equal to each element
floorCompute the floor of each element, i.e. the largest integer less than or equal to each element
rintRound elements to the nearest integer, preserving the dtype
modfReturn fractional and integral parts of array as separate array
isnanReturn boolean array indicating whether each value is NaN (Not a Number)
isfinite, isinfReturn boolean array indicating whether each element is finite (non- inf , non- NaN ) or infinite, respectively
cos, cosh, sin, sinh, tan, tanhRegular and hyperbolic trigonometric functions
arccos, arccosh, arcsin, arcsinh, arctan, arctanhInverse trigonometric functions
logical_notCompute truth value of not x element-wise. Equivalent to -arr .

Binary functions

FunctionDescription
addAdd corresponding elements in arrays
subtractSubtract elements in second array from first array
multiplyMultiply array elements
divide, floor_divideDivide or floor divide (truncating the remainder)
powerRaise elements in first array to powers indicated in second array
maximum, fmaxElement-wise maximum. fmax ignores NaN
minimum, fminElement-wise minimum. fmin ignores NaN
modElement-wise modulus (remainder of division)
copysignCopy sign of values in second argument to values in first argument

Data processing using arrays

vectorization把loop转换成array expression: faster

Expressing conditional logic as array operations

pure python

result = [x if c else y for x,y,c in zip(x,y,c)


numpy

result = np.where(c,x,y)
arr = randn(4, 4)
arr
np.where(arr > 0, 2, -2)
np.where(arr > 0, 2, arr) # set only positive values to 2


Mathematical and statistical methods

mean

arr = np.random.randn(5, 4) # normally-distributed data
arr.mean()
np.mean(arr)
arr.sum()


按行列,0为列,1 为行

arr.mean(axis=1)
arr.sum(0)


cumsum, cumprod

arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr.cumsum(0)
arr.cumprod(1)


MethodDescription
sumSum of all the elements in the array or along an axis. Zero-length arrays have sum 0.
meanArithmetic mean. Zero-length arrays have NaN mean.
std, var Standard deviation and variance, respectively, with optional degrees of freedom adjust-ment (default denominator n ).
min, maxMinimum and maximum.
argmin, argmaxIndices of minimum and maximum elements, respectively.
cumsumCumulative sum of elements starting from 0
cumprodCumulative product of elements starting from 1

Methods for boolean arrays

统计正数

arr = randn(100)
(arr > 0).sum() # Number of positive values


是否存在any,是否都all

bools = np.array([False, False, True, False])

bools.any()

bools.all()


Sorting

arr.sort()

arr = randn(8)
arr
arr.sort()
arr


arr.sort(1)
arr.sort(1)


np.sort()

np.sort(arr)


Unique and other set logic

np.unique(arr)

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe',       'Joe'])
np.unique(names)
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)


np.in1d(arr1,arr2)

values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])


MethodDescription
unique(x)Compute the sorted, unique elements in x
intersect1d(x, y)Compute the sorted, common elements in x and y
union1d(x, y)Compute the sorted union of elements
in1d(x, y)Compute a boolean array indicating whether each element of x is contained in y
setdiff1d(x, y)Set difference, elements in x that are not in y
setxor1d(x, y)Set symmetric differences; elements that are in either of the arrays, but not both

File input and output with arrays

Storing arrays on disk in binary format

arr = np.arange(10)
np.save('some_array', arr)
np.load('some_array.npy')


np.savez('array_archive.npz', a=arr, b=arr)
arch = np.load('array_archive.npz')
arch['b'] #dict-like


Saving and loading text files

pandas里面的read_csv和read_table 较为常用

arr = np.loadtxt('array_ex.txt', delimiter=',')
arr


Linear algebra

from numpy.linalg import inv, qr

1. A %*% B

“`python

x = np.array([[1., 2., 3.], [4., 5., 6.]])

y = np.array([[6., 23.], [-1, 7], [8, 9]])

x

y

x.dot(y) # equivalently np.dot(x, y)

```


2. QR分解

“`

from numpy.linalg import inv, qr

X = randn(5, 5)

mat = X.T.dot(X)

inv(mat)

mat.dot(inv(mat))

q, r = qr(mat)

r

FunctionDescription
diagReturn the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or
dotMatrix multiplication
traceCompute the sum of the diagonal elements
detCompute the matrix determinant
eigCompute the eigenvalues and eigenvectors of a square matrix
invCompute the inverse of a square matrix
pinvCompute the Moore-Penrose pseudo-inverse inverse of a square matrix
qrCompute the QR decomposition
svdCompute the singular value decomposition (SVD)
solveSolve the linear system Ax = b for x, where A is a square matrix
lstsqCompute the least-squares solution to y = Xb

Random number generation

samples = np.random.normal(size=(4, 4))
samples


from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
%timeit np.random.normal(size=N)


FunctionDescription
seedSeed the random number generator
permutationReturn a random permutation of a sequence, or return a permuted range
shuffleRandomly permute a sequence in place
randDraw samples from a uniform distribution
randintDraw random integers from a given low-to-high range
randnDraw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
binomialDraw samples a binomial distribution
normalDraw samples from a normal (Gaussian) distribution
betaDraw samples from a beta distribution
chisquareDraw samples from a chi-square distribution
gammaDraw samples from a gamma distribution
uniformDraw samples from a uniform [0, 1) distribution

Example: Random Walks

pure python

import random
position = 0
walk = [position]
steps = 1000
for i in xrange(steps):
step = 1 if random.randint(0, 1) else -1
position += step
walk.append(position)


numpy

np.random.seed(12345)
nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()


初探random walk

walk.min()

walk.max()

找出初次到达10或-10的时刻

(np.abs(walk)>=10).argmax()


Simulating many random walks at once

nwalks = 5000
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1) #对行求和
walks


初探random walk

walks.max()
walks.min()

hits30 = (np.abs(walks) >= 30).any(1)
hits30
hits30.sum() # Number that hit 30 or -30

crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)
crossing_times.mean()


正态分布 random walk

steps = np.random.normal(loc=0, scale=0.25,
size=(nwalks, nsteps))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: