您的位置:首页 > 其它

数据分析之Pandas(三):汇总、统计、相关系数和协方差

2017-08-09 10:33 627 查看
pandas对象中拥有一组常用的数学和统计方法,跟NumPy数组相比,它们是基于没有缺失数据的加上构建的。

In [71]: df = DataFrame([[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]],index=['a','b','c','d'],columns=['one','two'])

In [72]: df
Out[72]:
one  two
a  1.40  NaN
b  7.10 -4.5
c   NaN  NaN
d  0.75 -1.3

In [73]: df.sum()
Out[73]:
one    9.25
two   -5.80
dtype: float64


NA值会自动排除,当然我们也可以通过skipna参数禁用该功能

有些方法可以累计统计数据

df.cumsum()
Out[74]:
one  two
a  1.40  NaN
b  8.50 -4.5
c   NaN  NaN
d  9.25 -5.8


idmin和idmax返回间接统计,即返回达到最大致或最小值的索引

In [77]: df.idxmax()
Out[77]:
one    b
two    d
dtype: object

In [78]: df.idxmin()
Out[78]:
one    d
two    b
dtype: object


常用的描述和统计函数,详情见

http://blog.csdn.net/u011707148/article/details/76822877

相关系数和协方差

有些统计时通过参数计算得到。看个例子,它的数据来自Yahoo!Finace的股票价格和成交量

pip install pandas_datareader

In [81]: from pandas_datareader import data as web

In [82]: all_data={}

In [83]: for ticker in ['AAPL','IBM','GOOG']:
...:     all_data[ticker] = web.get_data_yahoo(ticker,'1/1/2000','1/1/2010')
...:

In [84]: price = DataFrame({tic:data['Adj Close'] for tic,data in all_data.iteritems()})

In [85]: volume = DataFrame({tic:data['Volume'] for tic,data in all_data.iteritems()})

In [86]: returns = price.pct_change() #计算百分比

In [87]: returns.tail()
Out[87]:
AAPL      GOOG       IBM
Date
2009-12-24  0.449644 -0.585705  0.033312
2009-12-28  0.286843  0.977260  0.359968
2009-12-29 -0.309294 -0.160838 -0.278636
2009-12-30 -0.074395  0.028634 -0.075809
2009-12-31 -0.144809 -0.167703  0.092164


有关的pandas_datareader用法,详情见

https://pandas-datareader.readthedocs.io/en/latest/

获取的数据集格式为:

all_data
{'AAPL':                  Open       High        Low      Close  Adj Close     Volume
Date
2000-01-03   3.745536   4.017857   3.631696   3.997768   3.610740  133949200
2000-01-04   3.866071   3.950893   3.613839   3.660714   3.306317  128094400
2000-01-05   3.705357   3.948661   3.678571   3.714286   3.354702  194580400
2000-01-06   3.790179   3.821429   3.392857   3.392857   3.064391  191993200
2000-01-07   3.446429   3.607143   3.410714   3.553571   3.209547  115183600
2000-01-10   3.642857   3.651786   3.383929   3.491071   3.153097  126266000
2000-01-11   3.426339   3.549107   3.232143   3.312500   2.991814  110387200
2000-01-12   3.392857   3.410714   3.089286   3.113839   2.812385  244017200
2000-01-13   3.374439   3.526786   3.303571   3.455357   3.120841  258171200
2000-01-14   3.571429   3.651786   3.549107   3.587054   3.239787   97594000
2000-01-18   3.607143   3.785714   3.587054   3.712054   3.352686  114794400
2000-01-19   3.772321   3.883929   3.691964   3.805804   3.437360  149410800
2000-01-20   4.125000   4.339286   4.053571   4.053571   3.661140  457783200
2000-01-21   4.080357   4.080357   3.935268   3.975446   3.590579  123981200
2000-01-24   3.872768   4.026786   3.754464   3.794643   3.427280  110219200
2000-01-25   3.750000   4.040179   3.656250   4.008929   3.620820  124286400
2000-01-26   3.928571   4.078125   3.919643   3.935268   3.554291   91789600
2000-01-27   3.886161   4.035714   3.821429   3.928571   3.548242   85036000
2000-01-28   3.863839   3.959821   3.593750   3.629464   3.278092  105837200
2000-01-31   3.607143   3.709821   3.375000   3.705357   3.346637  175420000
2000-02-01   3.714286   3.750000   3.571429   3.580357   3.233739   79508800
2000-02-02   3.598214   3.647321   3.464286   3.529018   3.187370  116048800
2000-02-03   3.582589   3.723214   3.580357   3.689732   3.332525  118798400
2000-02-04   3.712054   3.928571   3.700893   3.857143   3.483729  106330000
2000-02-07   3.857143   4.080357   3.783482   4.073661   3.679286  110266800
2000-02-08   4.071429   4.147321   3.973214   4.102679   3.705494  102160800
2000-02-09   4.075893   4.183036   4.015625   4.022321   3.632916   74841200
2000-02-10   4.031250   4.066964   3.928571   4.053571   3.661140   75745600
2000-02-11   4.058036   4.075893   3.866071   3.883929   3.507921   53062800
2000-02-14   3.904018   4.138393   3.879464   4.136161   3.735734   91884800
...        ...        ...        ...        ...        ...
2009-11-18  29.505714  29.571428  29.142857  29.422857  26.574398   93580200
2009-11-19  29.230000  29.230000  28.542856  28.644285  25.871201  135581600
2009-11-20  28.450001  28.627142  28.251429  28.559999  25.795074  101666600
2009-11-23  29.000000  29.428572  28.992857  29.411428  26.564075  118724200
2009-11-24  29.332857  29.411428  28.985714  29.205715  26.378279   79609600
2009-11-25  29.342857  29.378571  29.108572  29.170000  26.346025   71613500
2009-11-27  28.459999  28.994286  28.338572  28.655714  25.881527   73814300
2009-11-30  28.730000  28.811428  28.395714  28.558571  25.793781  106214500
2009-12-01  28.891428  28.967142  28.118572  28.138571  25.414444  116440800
2009-12-02  28.422857  28.774286  27.964285  28.032858  25.318968  178815000
2009-12-03  28.202858  28.425714  28.038572  28.068571  25.351227  112179900
2009-12-04  28.528572  28.554285  27.182858  27.617144  24.943501  206721200
2009-12-07  27.617144  27.681429  26.954287  26.992857  24.379650  178689700
2009-12-08  27.051428  27.478571  26.957144  27.124287  24.498354  172599700
2009-12-09  27.325714  28.308571  27.187143  28.257143  25.521540  171195500
2009-12-10  28.500000  28.528572  28.017143  28.061428  25.344772  122417400
2009-12-11  28.254286  28.285715  27.632856  27.809999  25.117678  107443700
2009-12-14  27.910000  28.204287  27.508572  28.139999  25.415737  123947600
2009-12-15  27.975714  28.215714  27.610001  27.738571  25.053171  104864900
2009-12-16  27.871429  28.071428  27.792856  27.861429  25.164133   88246200
2009-12-17  27.751429  27.857143  27.285715  27.408571  24.755119   97209700
2009-12-18  27.595715  27.928572  27.514286  27.918571  25.215746  152192600
2009-12-21  28.007143  28.535715  27.952858  28.318571  25.577024  152976600
2009-12-22  28.491428  28.692858  28.379999  28.622858  25.851851   87378900
2009-12-23  28.742857  28.911428  28.687143  28.871429  26.076355   86381400
2009-12-24  29.078571  29.907143  29.049999  29.862858  26.971806  125222300
2009-12-28  30.245714  30.564285  29.944286  30.230000  27.303402  161141400
2009-12-29  30.375713  30.388571  29.818571  29.871429  26.979546  111301400
2009-12-30  29.832857  30.285715  29.758572  30.234285  27.307270  103021100
2009-12-31  30.447144  30.478571  30.080000  30.104286  27.189861   88102700

[2515 rows x 6 columns],
'GOOG':                   Open        High         Low       Close   Adj Close  \
Date
2004-08-19   49.813286   51.835709   47.800831   49.982655   49.982655
2004-08-20   50.316402   54.336334   50.062355   53.952770   53.952770
2004-08-23   55.168217   56.528118   54.321388   54.495735   54.495735
2004-08-24   55.412300   55.591629   51.591621   52.239193   52.239193
2004-08-25   52.284027   53.798351   51.746044   52.802086   52.802086
2004-08-26   52.279045   53.773445   52.134586   53.753517   53.753517
2004-08-27   53.848164   54.107193   52.647663   52.876804   52.876804
2004-08-30   52.443428   52.548038   50.814533   50.814533   50.814533
2004-08-31   50.958992   51.661362   50.889256   50.993862   50.993862
2004-09-01   51.158245   51.292744   49.648903   49.937820   49.937820
2004-09-02   49.409801   50.993862   49.285267   50.565468   50.565468
2004-09-03   50.286514   50.680038   49.474556   49.818268   49.818268
2004-09-07   50.316402   50.809555   49.619015   50.600338   50.600338
2004-09-08   50.181908   51.322632   50.062355   50.958992   50.958992
2004-09-09   51.073563   51.163227   50.311420   50.963974   50.963974
2004-09-10   50.610302   53.081039   50.460861   52.468334   52.468334
2004-09-13   53.115910   54.002586   53.031227   53.549286   53.549286
2004-09-14   53.524376   55.790882   53.195610   55.536835   55.536835
2004-09-15   55.073570   56.901718   54.894241   55.790882   55.790882
2004-09-16   55.960247   57.683788   55.616535   56.772205   56.772205
2004-09-17   56.996365   58.525631   56.562988   58.525631   58.525631
2004-09-20   58.256641   60.572956   58.166977   59.457142   59.457142
2004-09-21   59.681301   59.985161   58.535595   58.699978   58.699978
2004-09-22   58.480801   59.611561   58.186901   58.968971   58.968971
2004-09-23   59.198112   61.086033   58.291508   60.184414   60.184414
2004-09-24   60.244190   61.818291   59.656395   59.691261   59.691261
2004-09-27   59.556767   60.214302   58.680054   58.909195   58.909195
2004-09-28   60.423519   63.462128   59.880554   63.193138   63.193138
2004-09-29   63.113434   67.257904   62.879314   65.295258   65.295258
2004-09-30   64.707458   65.902977   64.259140   64.558022   64.558022
...         ...         ...         ...         ...
2009-11-18  287.248322  288.309357  284.966888  287.248322  287.248322
2009-11-19  285.813690  285.928284  283.935730  285.425171  285.425171
2009-11-20  283.686676  284.732758  283.636871  283.915802  283.915802
2009-11-23  287.168610  292.204742  286.854797  290.087677  290.087677
2009-11-24  290.172363  291.054047  287.193542  290.456299  290.456299
2009-11-25  292.110107  292.433899  290.257050  291.776337  291.776337
2009-11-27  284.932007  290.142487  284.418915  288.797516  288.797516
2009-11-30  289.230896  290.745209  287.477478  290.411469  290.411469
2009-12-01  292.966888  294.506104  290.411469  293.833649  293.833649
2009-12-02  294.396515  295.397766  292.015442  292.658051  292.658051
2009-12-03  293.420197  294.620697  291.407745  291.776337  291.776337
2009-12-04  295.402771  296.304382  288.508606  291.412720  291.412720
2009-12-07  291.014221  293.245850  289.415192  292.030396  292.030396
2009-12-08  290.660522  294.227173  289.913330  292.428894  292.428894
2009-12-09  292.653076  293.564636  290.700378  293.410217  293.410217
2009-12-10  294.117584  296.244598  294.102631  294.645599  294.645599
2009-12-11  296.229645  296.264526  292.767639  294.152435  294.152435
2009-12-14  296.563416  297.539734  295.198517  296.752686  296.752686
2009-12-15  295.542236  297.076477  294.391541  295.462524  295.462524
2009-12-16  298.182343  299.064026  297.205994  297.763916  297.763916
2009-12-17  297.106384  297.704132  295.771362  295.861053  295.861053
2009-12-18  296.902130  298.346710  296.389069  297.096405  297.096405
2009-12-21  297.689178  298.800018  296.722809  298.222198  298.222198
2009-12-22  299.547211  299.626923  298.306885  299.437622  299.437622
2009-12-23  300.623199  305.290710  300.299408  304.697906  304.697906
2009-12-24  305.320587  308.603271  304.991821  308.085236  308.085236
2009-12-28  309.669281  311.826202  308.085236  310.272034  310.272034
2009-12-29  311.203522  311.253357  307.990570  308.543518  308.543518
2009-12-30  308.095184  310.202271  307.851105  310.202271  310.202271
2009-12-31  311.208527  311.532288  308.832428  308.832428  308.832428

Volume
Date
2004-08-19  44871300
2004-08-20  22942800
2004-08-23  18342800
2004-08-24  15319700
2004-08-25   9232100
2004-08-26   7128600
2004-08-27   6241200
2004-08-30   5221400
2004-08-31   4941200
2004-09-01   9181600
2004-09-02  15190400
2004-09-03   5176800
2004-09-07   5875200
2004-09-08   5009200
2004-09-09   4080900
2004-09-10   8740200
2004-09-13   7881300
2004-09-14  10880300
2004-09-15  10763900
2004-09-16   9310200
2004-09-17   9517400
2004-09-20  10679200
2004-09-21   7263000
2004-09-22   7617100
2004-09-23   8576100
2004-09-24   9166700
2004-09-27   7099600
2004-09-28  17009400
2004-09-29  30661400
2004-09-30  13823300
...
2009-11-18   3110700
2009-11-19   4352200
2009-11-20   4027400
2009-11-23   5114000
2009-11-24   3222400
2009-11-25   2933300
2009-11-27   2779500
2009-11-30   3463000
2009-12-01   4657900
2009-12-02   3338800
2009-12-03   2868000
2009-12-04   5046000
2009-12-07   3284500
2009-12-08   3059300
2009-12-09   3575200
2009-12-10   3349000
2009-12-11   3452800
2009-12-14   3841100
2009-12-15   4577800
2009-12-16   5639800
2009-12-17   5297300
2009-12-18   7089400
2009-12-21   5161600
2009-12-22   3775600
2009-12-23   4160800
2009-12-24   1723800
2009-12-28   3408400
2009-12-29   2860200
2009-12-30   2942100
2009-12-31   2448700

[1353 rows x 6 columns],
'IBM':                   Open        High         Low       Close   Adj Close  \
Date
2000-01-03  112.437500  116.000000  111.875000  116.000000   86.912140
2000-01-04  114.000000  114.500000  110.875000  112.062500   83.961983
2000-01-05  112.937500  119.750000  112.125000  116.000000   86.912140
2000-01-06  118.000000  118.937500  113.500000  114.000000   85.413643
2000-01-07  117.250000  117.937500  110.625000  113.500000   85.038986
2000-01-10  117.250000  119.375000  115.375000  118.000000   88.410614
2000-01-11  117.875000  121.125000  116.625000  119.000000   89.159859
2000-01-12  119.625000  122.000000  118.250000  119.500000   89.534462
2000-01-13  119.937500  121.000000  115.750000  118.250000   88.597923
2000-01-14  120.937500  123.312500  117.500000  119.625000   89.628128
2000-01-18  119.687500  119.750000  115.000000  115.750000   86.724800
2000-01-19  115.562500  122.000000  112.687500  119.500000   89.534462
2000-01-20  123.000000  124.750000  119.000000  119.000000   89.159859
2000-01-21  121.875000  123.000000  119.937500  121.500000   91.032967
2000-01-24  121.875000  122.875000  116.562500  121.500000   91.032967
2000-01-25  116.750000  119.500000  116.000000  119.125000   89.253494
2000-01-26  119.062500  119.937500  116.000000  116.750000   87.474060
2000-01-27  118.000000  118.437500  111.625000  113.500000   85.038986
2000-01-28  112.750000  114.187500  110.062500  111.562500   83.587364
2000-01-31  111.375000  112.812500  109.625000  112.250000   84.102455
2000-02-01  112.375000  113.437500  109.125000  110.000000   82.416672
2000-02-02  112.000000  115.812500  110.375000  113.500000   85.038986
2000-02-03  115.000000  117.500000  114.062500  117.125000   87.754974
2000-02-04  118.187500  118.437500  114.812500  115.625000   86.631165
2000-02-07  116.000000  116.875000  113.125000  114.125000   85.507286
2000-02-08  116.937500  119.000000  116.187500  118.812500   89.113068
2000-02-09  119.000000  119.750000  117.000000  117.375000   88.034897
2000-02-10  117.687500  119.125000  116.875000  119.125000   89.347443
2000-02-11  119.062500  119.437500  114.562500  115.375000   86.534813
2000-02-14  116.000000  116.375000  114.500000  116.062500   87.050453
...         ...         ...         ...         ...
2009-11-18  128.050003  128.350006  127.550003  128.149994  106.745453
2009-11-19  127.279999  127.839996  126.510002  127.540001  106.237350
2009-11-20  127.169998  127.690002  126.459999  126.959999  105.754219
2009-11-23  127.699997  128.940002  127.650002  128.199997  106.787109
2009-11-24  127.900002  128.320007  127.139999  127.930000  106.562210
2009-11-25  127.879997  128.220001  127.080002  127.279999  106.020775
2009-11-27  124.589996  126.620003  124.260002  125.699997  104.704666
2009-11-30  125.120003  126.599998  124.919998  126.349998  105.246086
2009-12-01  127.290001  128.389999  126.849998  127.940002  106.570534
2009-12-02  127.320000  128.389999  127.160004  127.209999  105.962456
2009-12-03  127.599998  128.470001  127.250000  127.550003  106.245659
2009-12-04  128.399994  128.899994  126.000000  127.250000  105.995773
2009-12-07  126.879997  127.529999  126.589996  127.040001  105.820869
2009-12-08  126.970001  127.349998  126.160004  126.800003  105.620956
2009-12-09  126.699997  128.389999  126.110001  128.389999  106.945366
2009-12-10  128.130005  129.470001  128.089996  129.339996  107.736694
2009-12-11  129.009995  129.770004  128.710007  129.679993  108.019913
2009-12-14  129.649994  129.979996  129.600006  129.929993  108.228142
2009-12-15  129.460007  129.860001  127.940002  128.490005  107.028687
2009-12-16  128.740005  129.600006  128.350006  128.710007  107.211914
2009-12-17  128.000000  128.559998  127.120003  127.400002  106.120712
2009-12-18  127.970001  128.389999  127.000000  127.910004  106.545555
2009-12-21  127.800003  128.940002  127.680000  128.649994  107.161919
2009-12-22  129.300003  129.979996  129.190002  129.929993  108.228142
2009-12-23  129.699997  130.000000  129.300003  130.000000  108.286469
2009-12-24  129.889999  130.570007  129.479996  130.570007  108.761246
2009-12-28  130.990005  132.309998  130.720001  132.309998  110.210617
2009-12-29  132.279999  132.369995  131.800003  131.850006  109.827469
2009-12-30  131.229996  132.679993  130.679993  132.570007  110.427193
2009-12-31  132.410004  132.850006  130.750000  130.899994  109.036133

Volume
Date
2000-01-03  10347700
2000-01-04   8227800
2000-01-05  12733200
2000-01-06   7971900
2000-01-07  11856700
2000-01-10   8540500
2000-01-11   7873300
2000-01-12   6803800
2000-01-13   8489700
2000-01-14  10956600
2000-01-18   7643900
2000-01-19   8634500
2000-01-20  17783400
2000-01-21   7868700
2000-01-24   6499200
2000-01-25   6936900
2000-01-26   4895100
2000-01-27   8324600
2000-01-28   6669400
2000-01-31   6202700
2000-02-01   7304700
2000-02-02   7707700
2000-02-03   6213800
2000-02-04   5405300
2000-02-07   5779500
2000-02-08   5464300
2000-02-09   5374700
2000-02-10   4446600
2000-02-11   5749400
2000-02-14   4544000
...
2009-11-18   3970100
2009-11-19   5308400
2009-11-20   5354200
2009-11-23   7003500
2009-11-24   4816500
2009-11-25   3967700
2009-11-27   3319200
2009-11-30   6302700
2009-12-01   6578600
2009-12-02   4597900
2009-12-03   5760000
2009-12-04   7068500
2009-12-07   4144400
2009-12-08   5351400
2009-12-09   6071900
2009-12-10   7077800
2009-12-11   6597200
2009-12-14   5201300
2009-12-15   7862600
2009-12-16   6372500
2009-12-17   5909500
2009-12-18   9106600
2009-12-21   4772500
2009-12-22   5535500
2009-12-23   4127600
2009-12-24   4265100
2009-12-28   5800400
2009-12-29   4184200
2009-12-30   3867000
2009-12-31   4223400

[2515 rows x 6 columns]}


Series的corr方法用于计算两个Series中重叠的、非NA的、按索引对齐的值的相关系数,cov用于计算协方差:下面我们来计算AAPL和IBM的相关系数和协方差。如果直接调用两个方法,则直接返回完整的相关系数和协方差。

In [94]: returns.AAPL.corr(returns.IBM)
Out[94]: 0.27155239676044468

In [95]: returns.AAPL.cov(returns.IBM)
Out[95]: 0.061849759999671697

In [96]: returns.corr()
Out[96]:
AAPL      GOOG       IBM
AAPL  1.000000  0.265386  0.271552
GOOG  0.265386  1.000000  0.278625
IBM   0.271552  0.278625  1.000000

In [97]: returns.cov()
Out[97]:
AAPL      GOOG       IBM
AAPL  0.329952  0.048933  0.061850
GOOG  0.048933  0.174015  0.043413
IBM   0.061850  0.043413  0.157224


利用corrwith方法,可以计算其列或行跟另一个Series或DataFrame之间的相关系数。传入一个series将会返回一个相关系数值series(针对各列进行计算)

In [98]: returns.corrwith(returns.IBM)
Out[98]:
AAPL    0.271552
GOOG    0.278625
IBM     1.000000
dtype: float64

In [100]: volume = DataFrame({tic:data['Volume'] for tic,data in all_data.iteritems()})

In [101]: returns.corrwith(volume)
Out[101]:
AAPL    0.317785
GOOG    0.385612
IBM     0.440509
dtype: float64


传入axis=1,则按行计算。在计算相关系数之前,所有的数据项都会被标签对齐。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
相关文章推荐