您的位置:首页 > 其它

pandas 常用函数

2017-05-17 22:54 417 查看
1.主要讲的是当index存在重复值的时候, 可以用 obj.index.is_unique 判断,获取重复index的值的时候obj['a'],返回的所有重复的index的值。
2.dataframe 常用的算术统计函数,https://chrisalbon.com/python/pandas_dataframe_descriptive_stats.html
函数list 参见, python 数据分析, P139 ,table 5-10
3.import pandas_datareader as web 可以采集股票数据作为统计样本,支持的web及使用方式,见下表。

https://pandas-datareader.readthedocs.io/en/latest/

(1)series 和 series

returns.MSFT.corr(returns.IBM) 相关系数

returns.MSFT.cov(returns.IBM) 协方差

(2)frame 自相关

returns.corr()

returns.cov()
(3)frame 和 series 相关

returns.corrwith(returns.IBM)
(4)frame 和 frame 相关

returns.corrwith(volumn)

import numpy as np
from pandas import DataFrame , Series
print ("Axis indexes with duplicate values")
obj=Series(range(5),index =['a','a','b','b','c'])
print("obj is \n", obj)
print("obj.index.is_unique is ",obj.index.is_unique)
print("obj['a'] is \n", obj['a'])
print("obj['b'] is \n",obj['b'])

df=DataFrame(np.random.randn(4,3),index=['a','a','b','b'])
print("df is \n",df)
print("df.ix['b'] is \n ",df.ix['b'])

df = DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]],index=['a', 'b', 'c', 'd'],columns=['one','two'])
print("df is \n",df)
print("Calling dafaframe's sum method returns a Series containing column sums")
print("df.sum() is \n",df.sum())
print("passing axis=1 sums over the rows instead")
print("df.sum(axis=1) \n", df.sum(axis=1))
print("NA values are excluded unless the entire slice is NA.this can be disabled using the skipna option")
print("df.mean(axis=1,skipna=False \n ",df.mean(axis=1,skipna=False))

print("df.idxmax() return indirect statistics like the index value where the maximum values are attained \n",df.idxmax())
print("df.cumsum() return cumulative sum of values \n",df.cumsum())
print("df.describe() return multiple summary statistics in one shot \n",df.describe())
obj=Series(['a','a','b','c']*4)
print("obj is \n",obj)
print("obj.describe() return alternate summary statistics \n",obj.describe())

import pandas_datareader as web
 https://pandas-datareader.readthedocs.io/en/latest/ 
all_data={}
for ticker in ['AAPL','IBM', 'MSFT', 'GOOG']:
all_data[ticker] = web.get_data_google(ticker,'1/1/2016','1/1/2017')
print("all data is \n ", all_data)

price = DataFrame({tic: data['Close']
for tic, data in all_data.items()})
volume = DataFrame({tic: data['Volume']
for tic, data in all_data.items()})

returns = price.pct_change()
print("returns.tail()\n",returns.tail())

print("returns.MSFT.corr(returns.IBM) \n",returns.MSFT.corr(returns.IBM))
print("returns.MSFT.cov(returns.IBM) \n", returns.MSFT.cov(returns.IBM))

print("returns.corr() \n", returns.corr())
print("returns.cov() \n", returns.cov())

print("returns.corrwith(returns.IBM) \n",returns.corrwith(returns.IBM))

print("volumn is \n",volume)
print("returns.corrwith(volumn) \n",returns.corrwith(volume))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: