数据分析与展示——NumPy数据存取与函数
2017-11-02 11:40
561 查看
NumPy库入门
NumPy数据存取和函数
数据的CSV文件存取
CSV文件
CSV(Comma-Separated Value,逗号分隔值)是一种常见的文件格式,用来存储批量数据。np.savetxt(frame,array,fmt='%.18e',delimiter=None)
frame:文件、字符串或产生器,可以是.gz或.bz2的压缩文件。
array:存入文件的数组。
fmt:写入文件的格式,例如:%d %.2f %.18e。
delimiter:分割字符串,默认是任何空格。
范例:savetxt()保存文件
In [1]: import numpy as np In [2]: a = np.arange(100).reshape(5,20) In [3]: np.savetxt('a.csv', a, fmt='%d', delimiter=',')
"a.csv"文件信息如下:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59 60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [4]: np.savetxt('a1.csv', a, fmt='%.1f', delimiter=',')
"a1.csv"文件信息如下:
0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0 20.0,21.0,22.0,23.0,24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,33.0,34.0,35.0,36.0,37.0,38.0,39.0 40.0,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,59.0 60.0,61.0,62.0,63.0,64.0,65.0,66.0,67.0,68.0,69.0,70.0,71.0,72.0,73.0,74.0,75.0,76.0,77.0,78.0,79.0 80.0,81.0,82.0,83.0,84.0,85.0,86.0,87.0,88.0,89.0,90.0,91.0,92.0,93.0,94.0,95.0,96.0,97.0,98.0,99.0
np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)
frame:文件、字符串或产生器,可以是.gz或.bz2的压缩文件。
dtype:数据类型,可选。
delimiter:分割字符串,默认是任何空格。
unpack:如果True,读入属性将分别写入不同变量。
范例:loadtxt()读取文件
In [5]: b = np.loadtxt('a1.csv', delimiter=',') In [6]: b Out[6]: array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.], [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.], [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.], [ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]]) In [7]: b = np.loadtxt('a1.csv', dtype=np.int, delimiter=',') In [8]: b Out[8]: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
CSV文件的局限性
CSV只能有效存储一维和二维数组。np.savetxt()、np.loadtxt()只能有效存取一维和二维数组。多维数据的存取
a.tofile(frame, sep='', format='%s')
frame:文件、字符串。
sep:数据分割字符串,如果是空串,写入文件为二进制。
format:写入数据的格式。
范例:tofile()存储多维数据
In [9]: a = np.arange(100).reshape(5,10,2) In [10]: a.tofile('b.dat', sep=',', format='%d')
"b.dat"文件信息如下:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [11]: a.tofile('b1.dat', format='%d')
"b1.dat"文件信息(二进制文件)如下:
array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])
Out[17]:
Numpy的随机数函数
Numpy的random子库
基本格式:np.random.*np.random.rand()、np.random.randn()、np.random.randint()
np.random的随机数函数
函数 | 说明 |
---|---|
rand(d0,d1, ... ,dn) | 根据d0 - dn 创建随机数组,浮点数,[0,1),均匀分布 |
randn(d0,d1, ... ,dn) | 根据d0 - dn创建随机数组,标准正态分布 |
randint(low,[,high,shape]) | 根据shape创建随机整数或整数数组,范围是[low,high] |
seed(s) | 随机数种子,s是给定的种子值 |
In [18]: a = np.random.rand(3,4,5) In [19]: a Out[19]: array([[[ 0.97845512, 0.90466706, 0.92576248, 0.77775142, 0.84334893], [ 0.39599821, 0.31917683, 0.7961439 , 0.01324569, 0.97660396], [ 0.5049603 , 0.80952265, 0.67359257, 0.89334316, 0.94496225], [ 0.04840473, 0.04665257, 0.20956817, 0.62255095, 0.36600489]], [[ 0.58059326, 0.28464266, 0.23596248, 0.16677631, 0.86467069], [ 0.14691968, 0.60863245, 0.71725038, 0.69206766, 0.18301705], [ 0.73197901, 0.99051723, 0.10489076, 0.33979432, 0.0354286 ], [ 0.73696453, 0.48268632, 0.99294233, 0.06285961, 0.93090147]], [[ 0.07853777, 0.827061 , 0.66325364, 0.52289669, 0.96894828], [ 0.41912388, 0.01883408, 0.80978245, 0.93082898, 0.98095581], [ 0.58614214, 0.55996867, 0.37734444, 0.79280598, 0.03626233], [ 0.233132 , 0.22514788, 0.32245147, 0.13739658, 0.18866422]]]) In [20]: sn = np.random.randn(3,4,5) In [21]: sn Out[21]: array([[[-0.54821321, 0.35733947, 0.74102173, -1.26679716, -0.75072289], [ 0.13182283, 2.32578442, -0.52208189, 2.5041796 , -0.96995644], [ 1.00171095, 0.97037733, 1.55386206, -0.94515087, 0.75707273], [-1.2481768 , 0.53095038, 0.92527818, -0.17261088, -0.13667463]], [[ 2.18760173, -0.93813162, 0.19032109, -1.59605908, -0.96802666], [ 0.30649913, 1.32375007, 0.72547761, -1.59253182, -0.72385311], [-2.22923637, -1.05462649, 1.82672301, 0.47343961, -0.9786459 ], [-0.36857965, 0.59003624, 1.80140997, 1.00965744, 1.9037593 ]], [[ 0.36273071, -0.0447364 , 1.27120325, 0.21076423, -0.40820945], [-1.22315321, -1.94670543, 0.17959233, -1.1020581 , 0.17423733], [-1.16368644, 0.00589158, 1.19701291, -0.4255035 , -0.7508364 ], [-1.61788168, 0.50386607, 0.15993032, 0.36881486, -0.41457221]]]) In [22]: b = np.random.randint(100,200,(3,4)) In [23]: b Out[23]: array([[163, 171, 163, 168], [166, 127, 160, 109], [135, 111, 196, 190]]) In [24]: np.random.seed(10) In [25]: np.random.randint(100,200,(3,4)) Out[25]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]]) In [26]: np.random.seed(10) In [27]: np.random.randint(100,200,(3,4)) Out[27]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]])
np.random的随机数函数
函数 | 说明 |
---|---|
shuffle(a) | 根据数组a的第1轴进行随机排列,改变数组x |
permutation(a) | 根据数组a的第1轴产生一个新的乱序数组,不改变数组x |
choice(a,[,size,replace,p]) | 从一维数组a中以概率p抽取元素,形成size形状新数组 replace表示是否可以重用元素,默认为False |
In [28]: a = np.random.randint(100,200,(3,4)) In [29]: a Out[29]: array([[116, 111, 154, 188], [162, 133, 172, 178], [149, 151, 154, 177]]) In [30]: np.random.shuffle(a) In [31]: a Out[31]: array([[116, 111, 154, 188], [149, 151, 154, 177], [162, 133, 172, 178]]) In [32]: np.random.shuffle(a) In [33]: a Out[33]: array([[162, 133, 172, 178], [116, 111, 154, 188], [149, 151, 154, 177]]) In [34]: a = np.random.randint(100,200,(3,4)) In [35]: a Out[35]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [36]: np.random.permutation(a) Out[36]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [37]: a Out[37]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [38]: b = np.random.randint(100,200,(8,)) In [39]: b Out[39]: array([177, 122, 123, 194, 111, 128, 174, 188]) In [40]: np.random.choice(b,(3,2)) Out[40]: array([[122, 188], [123, 177], [174, 188]]) In [41]: np.random.choice(b,(3,2),replace=False) Out[41]: array([[123, 111], [128, 188], [174, 122]]) In [42]: np.random.choice(b,(3,2),p= b/np.sum(b)) Out[42]: array([[174, 122], [188, 194], [174, 123]])
函数 | 说明 |
---|---|
uniform(low,high,size) | 产生具有均匀分布的数组,low起始值,high结束值,size形状 |
normal(loc,scale,size) | 产生具有正态分布的数组,loc均值,scale标准差,size形状 |
poisson(lam,size) | 产生具有泊松分布的数组,lam随机事件发生率,size形状 |
In [43]: u = np.random.uniform(0,10,(3,4)) In [44]: u Out[44]: array([[ 8.8393648 , 3.25511638, 1.65015898, 3.92529244], [ 0.93460375, 8.21105658, 1.5115202 , 3.84114449], [ 9.44260712, 9.87625475, 4.56304547, 8.26122844]]) In [45]: n = np.random.normal(10,5,(3,4)) In [46]: n Out[46]: array([[ 12.8882903 , 2.6251256 , 10.39394227, 14.59206826], [ 7.5365132 , 10.48231186, 6.73620032, 8.89118781], [ 4.65856717, 3.86153973, 1.00713488, 6.5739633 ]])
NumPy的统计函数
Numpy直接提供的统计类函数
基本格式:np.*np.std()、np.var()、np.average()
np.random的统计函数
函数 | 说明 |
---|---|
sum(a,axis=None) | 根据给定轴axis计算数组a相关元素之和,axis整数或元组 |
mean(a,axis=None) | 根据给定轴axis计算数组a相关元素的期望,axis整数或元组 |
average(a,axis=None,weights=None) | 根据给定轴axis计算数组a相关元素的加权平均值 |
std(a,axis=None) | 根据给定轴axis计算数组a相关元素的标准差 |
var(a,axis=None) | 根据给定轴axis计算数组a相关元素的方差 |
In [47]: a = np.arange(15).reshape(3,5) In [48]: a Out[48]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [49]: np.sum(a) Out[49]: 105 In [50]: np.mean(a,axis=1) # 2. = (0+5+10)/3 Out[50]: array([ 2., 7., 12.]) In [51]: np.mean(a,axis=0) Out[51]: array([ 5., 6., 7., 8., 9.]) # 7. = (2+7+12)/3 In [52]: np.average(a, axis=0, weights=[10,5,1]) # 加权平均: 4.1875 = (2*10+7*5+1*12)/(10+5+1) Out[52]: array([ 2.1875, 3.1875, 4.1875, 5.1875, 6.1875]) In [53]: np.std(a) Out[53]: 4.3204937989385739 In [54]: np.var(a) Out[54]: 18.666666666666668
函数 | 说明 |
---|---|
min(a) max(a) | 计算数组a中元素的最小值、最大值 |
argmin(a) argmax(a) | 计算数组a中元素最小值、最大值的降一维后下标 |
unravel_index(index,shape) | 根据shape将一维下标index转换成多维下标 |
ptp(a) | 计算数组a中元素最大值与最小值的差 |
median(a) | 计算数组a中元素的中位数(中值) |
In [55]: b = np.arange(15,0,-1).reshape(3,5) In [56]: b Out[56]: array([[15, 14, 13, 12, 11], [10, 9, 8, 7, 6], [ 5, 4, 3, 2, 1]]) In [57]: np.max(b) Out[57]: 15 In [58]: np.argmax(b) # 扁平化后的下标 Out[58]: 0 In [59]: np.unravel_index(np.argmax(b), b.shape) # 重塑成多维下标 Out[59]: (0, 0) In [60]: np.ptp(b) Out[60]: 14 In [61]: np.median(b) Out[61]: 8.0
Numpy的梯度函数
np.random的梯度函数
函数 | 说明 |
np.gradient | 计算数组f中元素的梯度,当f为多维时,返回每个维度梯度 |
In [62]: a = np.random.randint(0,20,(5)) In [63]: a Out[63]: array([14, 16, 10, 17, 0]) In [64]: np.gradient(a) # 存在两侧值:-2. = (10-14)/2 Out[64]: array([ 2. , -2. , 0.5, -5. , -17. ]) In [65]: b = np.random.randint(0,20,(5)) In [66]: b Out[66]: array([17, 9, 16, 9, 12]) In [67]: np.gradient(b) # 只有一侧值:-8. = (9-17)/1 Out[67]: array([-8. , -0.5, 0. , -2. , 3. ]) In [68]: c = np.random.randint(0, 50, (3,5)) In [69]: c Out[69]: array([[30, 17, 17, 16, 0], [31, 37, 9, 0, 38], [22, 32, 2, 3, 31]]) In [70]: np.gradient(c) Out[70]: [array([[ 1. , 20. , -8. , -16. , 38. ], [ -4. , 7.5, -7.5, -6.5, 15.5], [ -9. , -5. , -7. , 3. , -7. ]]), array([[-13. , -6.5, -0.5, -8.5, -16. ], [ 6. , -11. , -18.5, 14.5, 38. ], [ 10. , -10. , -14.5, 14.5, 28. ]])]
相关文章推荐
- Python数据分析与展示(2)——Numpy数据存取与函数
- 【Python数据分析与展示】(二)NUMPY数据存取与函数
- python/pandas/numpy数据分析(十)-函数, rank,重复索引
- Numpy数据存取与函数
- 学习Python数据分析随手笔记【一】numpy数组的函数简单应用
- 数据分析与展示——Matplotlib基础绘图函数示例
- python numpy库中的数据存取和函数
- 数据分析—numpy(常用的函数)
- Python——NumPy数据存取与函数
- 数据库提取数据函数分析
- 利用Python数据分析:Numpy基础(六)
- 利用Python进行数据分析之Numpy
- 数据分析基础教程Numpy指南笔记
- 程序运行 栈帧分析 以及 修改栈帧中数据以及函数地址
- 数据分析之Pandas(二):索引、过滤 、算术运算、 函数应用和映射
- python3.6中安装numpy,pandas,scipy,scikit_learn,matplotlib等数据分析工具
- 生成特定分布随机数的方法:Python seed() 函数&numpy &scikit-learn随机数据生成
- python数据分析库,numpy与pandas
- pandas常用的数据分析函数(一)
- 数据分析与展示(3)-- Matplotlib库基础使用