您的位置:首页 > 编程语言 > Python开发

数据分析与展示——NumPy数据存取与函数

2017-11-02 11:40 561 查看

NumPy库入门

NumPy数据存取和函数

数据的CSV文件存取

CSV文件

CSV(Comma-Separated Value,逗号分隔值)是一种常见的文件格式,用来存储批量数据。

np.savetxt(frame,array,fmt='%.18e',delimiter=None)


frame:文件、字符串或产生器,可以是.gz或.bz2的压缩文件。

array:存入文件的数组。

fmt:写入文件的格式,例如:%d %.2f %.18e。

delimiter:分割字符串,默认是任何空格。

范例:savetxt()保存文件

In [1]: import numpy as np

In [2]: a = np.arange(100).reshape(5,20)

In [3]: np.savetxt('a.csv', a, fmt='%d', delimiter=',')


"a.csv"文件信息如下:

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79
80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99


In [4]: np.savetxt('a1.csv', a, fmt='%.1f', delimiter=',')


"a1.csv"文件信息如下:

0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0
20.0,21.0,22.0,23.0,24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,33.0,34.0,35.0,36.0,37.0,38.0,39.0
40.0,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,59.0
60.0,61.0,62.0,63.0,64.0,65.0,66.0,67.0,68.0,69.0,70.0,71.0,72.0,73.0,74.0,75.0,76.0,77.0,78.0,79.0
80.0,81.0,82.0,83.0,84.0,85.0,86.0,87.0,88.0,89.0,90.0,91.0,92.0,93.0,94.0,95.0,96.0,97.0,98.0,99.0


np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)


frame:文件、字符串或产生器,可以是.gz或.bz2的压缩文件。

dtype:数据类型,可选。

delimiter:分割字符串,默认是任何空格。

unpack:如果True,读入属性将分别写入不同变量。

范例:loadtxt()读取文件

In [5]: b = np.loadtxt('a1.csv', delimiter=',')

In [6]: b
Out[6]:
array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
[ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,
31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
[ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.,  50.,
51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
[ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,  70.,
71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
[ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,  90.,
91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]])

In [7]: b = np.loadtxt('a1.csv', dtype=np.int, delimiter=',')

In [8]: b
Out[8]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99]])


CSV文件的局限性

CSV只能有效存储一维和二维数组。np.savetxt()、np.loadtxt()只能有效存取一维和二维数组。

多维数据的存取

a.tofile(frame, sep='', format='%s')


frame:文件、字符串。

sep:数据分割字符串,如果是空串,写入文件为二进制。

format:写入数据的格式。

范例:tofile()存储多维数据

In [9]: a = np.arange(100).reshape(5,10,2)

In [10]: a.tofile('b.dat', sep=',', format='%d')


"b.dat"文件信息如下:

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99


In [11]: a.tofile('b1.dat', format='%d')


"b1.dat"文件信息(二进制文件)如下:

array([[[ 0,  1],
[ 2,  3],
[ 4,  5],
[ 6,  7],
[ 8,  9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]],

[[20, 21],
[22, 23],
[24, 25],
[26, 27],
[28, 29],
[30, 31],
[32, 33],
[34, 35],
[36, 37],
[38, 39]],

[[40, 41],
[42, 43],
[44, 45],
[46, 47],
[48, 49],
[50, 51],
[52, 53],
[54, 55],
[56, 57],
[58, 59]],

[[60, 61],
[62, 63],
[64, 65],
[66, 67],
[68, 69],
[70, 71],
[72, 73],
[74, 75],
[76, 77],
[78, 79]],

[[80, 81],
[82, 83],
[84, 85],
[86, 87],
[88, 89],
[90, 91],
[92, 93],
[94, 95],
[96, 97],
[98, 99]]])


Out[17]:

Numpy的随机数函数

Numpy的random子库

基本格式:np.random.*

np.random.rand()、np.random.randn()、np.random.randint()

np.random的随机数函数

函数说明
rand(d0,d1, ... ,dn)根据d0 - dn 创建随机数组,浮点数,[0,1),均匀分布
randn(d0,d1, ... ,dn)根据d0 - dn创建随机数组,标准正态分布
randint(low,[,high,shape])根据shape创建随机整数或整数数组,范围是[low,high]
seed(s)随机数种子,s是给定的种子值
范例:函数测试

In [18]: a = np.random.rand(3,4,5)

In [19]: a
Out[19]:
array([[[ 0.97845512,  0.90466706,  0.92576248,  0.77775142,  0.84334893],
[ 0.39599821,  0.31917683,  0.7961439 ,  0.01324569,  0.97660396],
[ 0.5049603 ,  0.80952265,  0.67359257,  0.89334316,  0.94496225],
[ 0.04840473,  0.04665257,  0.20956817,  0.62255095,  0.36600489]],

[[ 0.58059326,  0.28464266,  0.23596248,  0.16677631,  0.86467069],
[ 0.14691968,  0.60863245,  0.71725038,  0.69206766,  0.18301705],
[ 0.73197901,  0.99051723,  0.10489076,  0.33979432,  0.0354286 ],
[ 0.73696453,  0.48268632,  0.99294233,  0.06285961,  0.93090147]],

[[ 0.07853777,  0.827061  ,  0.66325364,  0.52289669,  0.96894828],
[ 0.41912388,  0.01883408,  0.80978245,  0.93082898,  0.98095581],
[ 0.58614214,  0.55996867,  0.37734444,  0.79280598,  0.03626233],
[ 0.233132  ,  0.22514788,  0.32245147,  0.13739658,  0.18866422]]])

In [20]: sn = np.random.randn(3,4,5)

In [21]: sn
Out[21]:
array([[[-0.54821321,  0.35733947,  0.74102173, -1.26679716, -0.75072289],
[ 0.13182283,  2.32578442, -0.52208189,  2.5041796 , -0.96995644],
[ 1.00171095,  0.97037733,  1.55386206, -0.94515087,  0.75707273],
[-1.2481768 ,  0.53095038,  0.92527818, -0.17261088, -0.13667463]],

[[ 2.18760173, -0.93813162,  0.19032109, -1.59605908, -0.96802666],
[ 0.30649913,  1.32375007,  0.72547761, -1.59253182, -0.72385311],
[-2.22923637, -1.05462649,  1.82672301,  0.47343961, -0.9786459 ],
[-0.36857965,  0.59003624,  1.80140997,  1.00965744,  1.9037593 ]],

[[ 0.36273071, -0.0447364 ,  1.27120325,  0.21076423, -0.40820945],
[-1.22315321, -1.94670543,  0.17959233, -1.1020581 ,  0.17423733],
[-1.16368644,  0.00589158,  1.19701291, -0.4255035 , -0.7508364 ],
[-1.61788168,  0.50386607,  0.15993032,  0.36881486, -0.41457221]]])

In [22]: b = np.random.randint(100,200,(3,4))

In [23]: b
Out[23]:
array([[163, 171, 163, 168],
[166, 127, 160, 109],
[135, 111, 196, 190]])

In [24]: np.random.seed(10)

In [25]: np.random.randint(100,200,(3,4))
Out[25]:
array([[109, 115, 164, 128],
[189, 193, 129, 108],
[173, 100, 140, 136]])

In [26]: np.random.seed(10)

In [27]: np.random.randint(100,200,(3,4))
Out[27]:
array([[109, 115, 164, 128],
[189, 193, 129, 108],
[173, 100, 140, 136]])


np.random的随机数函数

函数说明
shuffle(a)根据数组a的第1轴进行随机排列,改变数组x
permutation(a)根据数组a的第1轴产生一个新的乱序数组,不改变数组x
choice(a,[,size,replace,p])从一维数组a中以概率p抽取元素,形成size形状新数组
replace表示是否可以重用元素,默认为False
范例:函数测试

In [28]: a = np.random.randint(100,200,(3,4))

In [29]: a
Out[29]:
array([[116, 111, 154, 188],
[162, 133, 172, 178],
[149, 151, 154, 177]])

In [30]: np.random.shuffle(a)

In [31]: a
Out[31]:
array([[116, 111, 154, 188],
[149, 151, 154, 177],
[162, 133, 172, 178]])

In [32]: np.random.shuffle(a)

In [33]: a
Out[33]:
array([[162, 133, 172, 178],
[116, 111, 154, 188],
[149, 151, 154, 177]])

In [34]: a = np.random.randint(100,200,(3,4))

In [35]: a
Out[35]:
array([[113, 192, 186, 130],
[130, 189, 112, 165],
[131, 157, 136, 127]])

In [36]: np.random.permutation(a)
Out[36]:
array([[113, 192, 186, 130],
[130, 189, 112, 165],
[131, 157, 136, 127]])

In [37]: a
Out[37]:
array([[113, 192, 186, 130],
[130, 189, 112, 165],
[131, 157, 136, 127]])

In [38]: b = np.random.randint(100,200,(8,))

In [39]: b
Out[39]: array([177, 122, 123, 194, 111, 128, 174, 188])

In [40]: np.random.choice(b,(3,2))
Out[40]:
array([[122, 188],
[123, 177],
[174, 188]])

In [41]: np.random.choice(b,(3,2),replace=False)
Out[41]:
array([[123, 111],
[128, 188],
[174, 122]])

In [42]: np.random.choice(b,(3,2),p= b/np.sum(b))
Out[42]:
array([[174, 122],
[188, 194],
[174, 123]])


函数说明
uniform(low,high,size)产生具有均匀分布的数组,low起始值,high结束值,size形状
normal(loc,scale,size)产生具有正态分布的数组,loc均值,scale标准差,size形状
poisson(lam,size)产生具有泊松分布的数组,lam随机事件发生率,size形状
In [43]: u = np.random.uniform(0,10,(3,4))

In [44]: u
Out[44]:
array([[ 8.8393648 ,  3.25511638,  1.65015898,  3.92529244],
[ 0.93460375,  8.21105658,  1.5115202 ,  3.84114449],
[ 9.44260712,  9.87625475,  4.56304547,  8.26122844]])

In [45]: n = np.random.normal(10,5,(3,4))

In [46]: n
Out[46]:
array([[ 12.8882903 ,   2.6251256 ,  10.39394227,  14.59206826],
[  7.5365132 ,  10.48231186,   6.73620032,   8.89118781],
[  4.65856717,   3.86153973,   1.00713488,   6.5739633 ]])


NumPy的统计函数

Numpy直接提供的统计类函数

基本格式:np.*

np.std()、np.var()、np.average()

np.random的统计函数

函数说明
sum(a,axis=None)根据给定轴axis计算数组a相关元素之和,axis整数或元组
mean(a,axis=None)根据给定轴axis计算数组a相关元素的期望,axis整数或元组
average(a,axis=None,weights=None)根据给定轴axis计算数组a相关元素的加权平均值
std(a,axis=None)根据给定轴axis计算数组a相关元素的标准差
var(a,axis=None)根据给定轴axis计算数组a相关元素的方差
axis=None是统计函数的标配参数,表示对每个元素进行计算。

In [47]: a = np.arange(15).reshape(3,5)

In [48]: a
Out[48]:
array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

In [49]: np.sum(a)
Out[49]: 105

In [50]: np.mean(a,axis=1)      # 2. = (0+5+10)/3
Out[50]: array([  2.,   7.,  12.])

In [51]: np.mean(a,axis=0)
Out[51]: array([ 5.,  6.,  7.,  8.,  9.]) # 7. = (2+7+12)/3

In [52]: np.average(a, axis=0, weights=[10,5,1]) #  加权平均: 4.1875 = (2*10+7*5+1*12)/(10+5+1)
Out[52]: array([ 2.1875,  3.1875,  4.1875,  5.1875,  6.1875])

In [53]: np.std(a)
Out[53]: 4.3204937989385739

In [54]: np.var(a)
Out[54]: 18.666666666666668


函数说明
min(a) max(a)计算数组a中元素的最小值、最大值
argmin(a) argmax(a)计算数组a中元素最小值、最大值的降一维后下标
unravel_index(index,shape)根据shape将一维下标index转换成多维下标
ptp(a)计算数组a中元素最大值与最小值的差
median(a)计算数组a中元素的中位数(中值)
In [55]: b = np.arange(15,0,-1).reshape(3,5)

In [56]: b
Out[56]:
array([[15, 14, 13, 12, 11],
[10,  9,  8,  7,  6],
[ 5,  4,  3,  2,  1]])

In [57]: np.max(b)
Out[57]: 15

In [58]: np.argmax(b)   # 扁平化后的下标
Out[58]: 0

In [59]: np.unravel_index(np.argmax(b), b.shape)    # 重塑成多维下标
Out[59]: (0, 0)

In [60]: np.ptp(b)
Out[60]: 14

In [61]: np.median(b)
Out[61]: 8.0


Numpy的梯度函数

np.random的梯度函数

函数说明
np.gradient计算数组f中元素的梯度,当f为多维时,返回每个维度梯度
梯度:连续值之间的变化率,即斜率。 XY坐标轴连续X坐标对应的Y轴值:a,b,c,其中b的梯度是:(c-a)/2

In [62]: a = np.random.randint(0,20,(5))

In [63]: a
Out[63]: array([14, 16, 10, 17,  0])

In [64]: np.gradient(a)     # 存在两侧值:-2. = (10-14)/2
Out[64]: array([  2. ,  -2. ,   0.5,  -5. , -17. ])

In [65]: b = np.random.randint(0,20,(5))

In [66]: b
Out[66]: array([17,  9, 16,  9, 12])

In [67]: np.gradient(b)     # 只有一侧值:-8. = (9-17)/1
Out[67]: array([-8. , -0.5,  0. , -2. ,  3. ])

In [68]: c = np.random.randint(0, 50, (3,5))

In [69]: c
Out[69]:
array([[30, 17, 17, 16,  0],
[31, 37,  9,  0, 38],
[22, 32,  2,  3, 31]])

In [70]: np.gradient(c)
Out[70]:
[array([[  1. ,  20. ,  -8. , -16. ,  38. ],
[ -4. ,   7.5,  -7.5,  -6.5,  15.5],
[ -9. ,  -5. ,  -7. ,   3. ,  -7. ]]),
array([[-13. ,  -6.5,  -0.5,  -8.5, -16. ],
[  6. , -11. , -18.5,  14.5,  38. ],
[ 10. , -10. , -14.5,  14.5,  28. ]])]
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: