pandas文档学习之一:数据结构介绍
2018-01-02 22:58
495 查看
pandas官方文档传送门 http://pandas.pydata.org/pandas-docs/stable/
data可以是 一个python字典、一个ndarray、一个标量值(比如5)
Series 具有类似ndarray的属性、字典的属性及矢量运算
Seires具有名称属性
- 一维数组、列表、字典或序列的字典
- 二维numpy数组
- 结构化的ndarray
- A Series
- 其他的DataFrame
Intro to Data Structures-数据结构介绍
两个重要的数据结构—Series & DataframeSeries
是一维数组对象,包含一维数组和索引#创建一个Series >>> import pandas as pd >>> s = pd.Series(data, index=index)
data可以是 一个python字典、一个ndarray、一个标量值(比如5)
# 从数组 创建一个Series , 可以指定索引值,若不指定 则为默认值 In [3]: obj = pd.Series([4,7,-5,3]) In [4]: obj Out[4]: 0 4 1 7 2 -5 3 3 dtype: int64
# 从字典创建一个Series , 字典的键将传递给索引 In [17]: sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000} In [19]: obj3 = pd.Series(sdata) In [20]: obj3 Out[20]: Ohio 35000 Oregon 16000 Texas 71000 Utah 5000 dtype: int64 In [21]: states = ['California', 'Ohio', 'Oregon', 'Texas'] In [22]: obj4 = pd.Series(sdata, index=states) In [23]: obj4 Out[23]: California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64
# 从标量值创建Series,必须给出索引,将会根据索引数目创建相应的值 In [24]: pd.Series(5., index=['a', 'b', 'c', 'd', 'e']) Out[24]: a 5.0 b 5.0 c 5.0 d 5.0 e 5.0 dtype: float64
Series 具有类似ndarray的属性、字典的属性及矢量运算
In [5]: obj.values Out[5]: array([ 4, 7, -5, 3]) In [6]: obj.index Out[6]: RangeIndex(start=0, stop=4, step=1) # 可指定索引名称 In [8]: obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c']) In [9]: obj2 Out[9]: d 4 b 7 a -5 c 3 dtype: int64 # 根据索引查询和修改值 In [10]: obj2['a'] Out[10]: -5 In [11]: obj2['c']=9 In [12]: obj2 Out[12]: d 4 b 7 a -5 c 9 dtype: int64 In [14]: 5 in obj Out[14]: False # 条件过滤 In [15]: obj2[obj2>5] Out[15]: b 7 c 9 dtype: int64 # 四则运算 In [16]: obj2 * 2 Out[16]: d 8 b 14 a -10 c 18 dtype: int64 In [17]: obj[1:]+obj[:-1] Out[17]: 0 NaN 1 14.0 2 -10.0 3 NaN dtype: float64 # 值不存在显示NaN
Seires具有名称属性
In [27]: s = pd.Series(np.random.randn(5), name='something') In [28]: s Out[28]: 0 -0.4949 1 1.0718 2 0.7216 3 -0.7068 4 -1.0396 Name: something, dtype: float64 In [29]: s.name Out[29]: 'something' # 通过pandas.Series.rename()方法修改名称 In [30]: s2 = s.rename("different") In [31]: s2.name Out[31]: 'different'
DataFrame
是具有不同类型列的二维标签结构,类似SQL表格创建一个dataframe对象
数据可以来自:- 一维数组、列表、字典或序列的字典
- 二维numpy数组
- 结构化的ndarray
- A Series
- 其他的DataFrame
# 从Series 或字典 的字典 In [32]: d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), ....: 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} ....: In [33]: df = pd.DataFrame(d) In [34]: df Out[34]: one two a 1.0 1.0 b 2.0 2.0 c 3.0 3.0 d NaN 4.0 In [35]: pd.DataFrame(d, index=['d', 'b', 'a']) Out[35]: one two d NaN 4.0 b 2.0 2.0 a 1.0 1.0 In [36]: pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three']) Out[36]: two three d 4.0 NaN b 2.0 NaN a 1.0 NaN # 注意: key是列名,一个Series为一列 # 从数组的字典 In [39]: d = {'one' : [1., 2., 3., 4.], ....: 'two' : [4., 3., 2., 1.]} ....: In [40]: pd.DataFrame(d) Out[40]: one two 0 1.0 4.0 1 2.0 3.0 2 3.0 2.0 3 4.0 1.0 In [41]: pd.DataFrame(d, index=['a', 'b', 'c', 'd']) Out[41]: one two a 1.0 4.0 b 2.0 3.0 c 3.0 2.0 d 4.0 1.0 # 从结构化数组 In [42]: data = np.zeros((2,), dtype=[('A', 'i4'),('B', 'f4'),('C', 'a10')]) In [43]: data[:] = [(1,2.,'Hello'), (2,3.,"World")] In [44]: pd.DataFrame(data) Out[44]: A B C 0 1 2.0 b'Hello' 1 2 3.0 b'World' In [45]: pd.DataFrame(data, index=['first', 'second']) Out[45]: A B C first 1 2.0 b'Hello' second 2 3.0 b'World' In [46]: pd.DataFrame(data, columns=['C', 'A', 'B']) Out[46]: C A B 0 b'Hello' 1 2.0 1 b'World' 2 3.0 # 从字典的列表 In [47]: data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}] In [48]: pd.DataFrame(data2) Out[48]: a b c 0 1 2 NaN 1 5 10 20.0 In [49]: pd.DataFrame(data2, index=['first', 'second']) Out[49]: a b c first 1 2 NaN second 5 10 20.0 In [50]: pd.DataFrame(data2, columns=['a', 'b']) Out[50]: a b 0 1 2 1 5 10 #从元组的字典 In [51]: pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2}, ....: ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4}, ....: ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6}, ....: ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8}, ....: ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}}) ....: Out[51]: a b a b c a b A B 4.0 1.0 5.0 8.0 10.0 C 3.0 2.0 6.0 7.0 NaN D NaN NaN NaN NaN 9.0 # from records 、from items 等 In [52]: data Out[52]: array([(1, 2., b'Hello'), (2, 3., b'World')], dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')]) In [53]: pd.DataFrame.from_records(data, index='C') Out[53]: A B C b'Hello' 1 2.0 b'World' 2 3.0 In [54]: pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])]) Out[54]: A B 0 1 4 1 2 5 2 3 6 In [55]: pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])], ....: orient='index', columns=['one', 'two', 'three']) ....: Out[55]: one two three A 1 2 3 B 4 5 6
列的增加、删除、选择、插入
In [56]: df['one'] Out[56]: a 1.0 b 2.0 c 3.0 d NaN Name: one, dtype: float64 In [57]: df['three'] = df['one'] * df['two'] In [58]: df['flag'] = df['one'] > 2 In [59]: df Out[59]: one two three flag a 1.0 1.0 1.0 False b 2.0 2.0 4.0 False c 3.0 3.0 9.0 True d NaN 4.0 NaN False # 删除 In [60]: del df['two'] In [61]: three = df.pop('three') In [62]: df Out[62]: one flag a 1.0 False b 2.0 False c 3.0 True d NaN False In [63]: df['foo'] = 'bar' In [64]: df Out[64]: one flag foo a 1.0 False bar b 2.0 False bar c 3.0 True bar d NaN False bar In [65]: df['one_trunc'] = df['one'][:2] In [66]: df Out[66]: one flag foo one_trunc a 1.0 False bar 1.0 b 2.0 False bar 2.0 c 3.0 True bar NaN d NaN False bar NaN # 在指定位置插入列 insert(位置,列名,数据) In [67]: df.insert(1, 'bar', df['one']) In [68]: df Out[68]: one bar flag foo one_trunc a 1.0 1.0 False bar 1.0 b 2.0 2.0 False bar 2.0 c 3.0 3.0 True bar NaN d NaN NaN False bar NaN
DataFrame按索引选取
操作 | 语法 | 结果 |
---|---|---|
选择列 | df[col] | Series |
按标签选择行 | df.loc[label] | Series |
按位置选择行 | df.iloc[loc] | Series |
按行切片 | df[5:10] | DataFrame |
按布尔向量选择行 | df[bool_vec] | DataFrame |
In [75]: df.loc['b'] Out[75]: one 2 bar 2 flag False foo bar one_trunc 2 Name: b, dtype: object In [76]: df.iloc[2] Out[76]: one 3 bar 3 flag True foo bar one_trunc NaN Name: c, dtype: object
DataFrame计算
# pandas生成时间索引 In [81]: index = pd.date_range('1/1/2000', periods=8) In [82]: df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=list('ABC')) In [83]: df Out[83]: A B C 2000-01-01 -1.2268 0.7698 -1.2812 2000-01-02 -0.7277 -0.1213 -0.0979 2000-01-03 0.6958 0.3417 0.9597 2000-01-04 -1.1103 -0.6200 0.1497 2000-01-05 -0.7323 0.6877 0.1764 2000-01-06 0.4033 -0.1550 0.3016 2000-01-07 -2.1799 -1.3698 -0.9542 2000-01-08 1.4627 -1.7432 -0.8266 In [86]: df * 5 + 2 Out[86]: A B C 2000-01-01 -4.1341 5.8490 -4.4062 2000-01-02 -1.6385 1.3935 1.5106 2000-01-03 5.4789 3.7087 6.7986 2000-01-04 -3.5517 -1.0999 2.7487 2000-01-05 -1.6617 5.4387 2.8822 2000-01-06 4.0165 1.2252 3.5081 2000-01-07 -8.8993 -4.8492 -2.7710 2000-01-08 9.3135 -6.7158 -2.1330 In [87]: 1 / df Out[87]: A B C 2000-01-01 -0.8151 1.2990 -0.7805 2000-01-02 -1.3742 -8.2436 -10.2163 2000-01-03 1.4372 2.9262 1.0420 2000-01-04 -0.9006 -1.6130 6.6779 2000-01-05 -1.3655 1.4540 5.6675 2000-01-06 2.4795 -6.4537 3.3154 2000-01-07 -0.4587 -0.7300 -1.0480 2000-01-08 0.6837 -0.5737 -1.2098 In [88]: df ** 4 Out[88]: A B C 2000-01-01 2.2653 0.3512 2.6948e+00 2000-01-02 0.2804 0.0002 9.1796e-05 2000-01-03 0.2344 0.0136 8.4838e-01 2000-01-04 1.5199 0.1477 5.0286e-04 2000-01-05 0.2876 0.2237 9.6924e-04 2000-01-06 0.0265 0.0006 8.2769e-03 2000-01-07 22.5795 3.5212 8.2903e-01 2000-01-08 4.5774 9.2332 4.6683e-01 # DataFrame的转置 # only show the first 5 rows In [95]: df[:5].T Out[95]: 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 A -1.2268 -0.7277 0.6958 -1.1103 -0.7323 B 0.7698 -0.1213 0.3417 -0.6200 0.6877 C -1.2812 -0.0979 0.9597 0.1497 0.1764
相关文章推荐
- Python pandas 0.19.1 Intro to Data Structures 数据结构介绍 文档翻译
- 韩顺平_php从入门到精通_视频教程_第1讲_html介绍_html运行原理①_学习笔记_源代码图解_PPT文档整理
- 深度学习中几个常用包的参考文档(NumPy,SciPy,Matplotlib,pandas等)
- Pandas : 基本数据结构介绍
- pandas官方文档API参考的读写各类文件的API介绍 (1)(API reference IN&OUTPUT)
- 算法设计和数据结构学习_5(BST&AVL&红黑树简单介绍)
- 学习数据结构 -> 算法的介绍
- Android开发的测试功能的开发文档学习--介绍
- 【IBM Tivoli Identity Manager 学习文档】11 TIM设计思路介绍
- Python Scapy(2.3.1)文档学习(一):介绍
- vue.js 2.0 官方文档学习笔记 —— 01. vue 介绍
- 《利用Python进行数据分析》第五章-pandas的数据结构介绍
- 韩顺平_轻松搞定网页设计(html+css+javascript)_ 第18讲_js课程介绍_js基本介绍_学习笔记_源代码图解_PPT文档整理
- 一位朋友介绍的学习数据结构的方法
- 学习数据结构 -> 算法的介绍 .
- swift学习一:介绍,开发文档下载
- pandas---数据结构介绍
- Retrofit学习一:文档介绍
- TIM学习文档17 TIM部署思路介绍
- 十分钟搞定pandas(官方学习文档的译文)