您的位置：首页 > 编程语言 > Python开发

pandas 时间序列基础

2018-04-01 11:11 337 查看

pandas 中最常用的时间序列类型就是以时间戳为索引的Series :from datetime import datetime

dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7),
datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = Series(np.random.randn(6), index=dates)

ts
2011-01-02 -1.739738
2011-01-05 -0.813930
2011-01-07 -0.083642
2011-01-08 0.418713
2011-01-10 0.116473
2011-01-12 -1.048764
dtype: float64

#这时候变量ts就自动变成一个TimeSeries了：
type(ts)
pandas.core.series.Series

ts.index # 索引也被放在一个DatetimeIndex 中
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
'2011-01-10', '2011-01-12'],
dtype='datetime64[ns]', freq=None)

# 时间序列之间的算术运算会自动对齐

ts + ts[::2]

2011-01-02 -3.479476
2011-01-05 NaN
2011-01-07 -0.167284
2011-01-08 NaN
2011-01-10 0.232945
2011-01-12 NaN
dtype: float64

由于TimeSeries 是 Series 的一个子类，因此索引选取的行为大多是一样的，这里还有一
种更为方便的用法：传入一日期字符串也是可以的ts['1/10/2011']
0.11647269713229616

ts['20110110']
0.11647269713229616
当传入比较长的时间序列时候，传入‘年’，或者‘年月’ 可以选取对应的年份或者月份
传入一个时间范围可以返回在这个范围的日期，注意这里切片产生的原时间的视图。
此外，还有一个等价的实例方法可以截取两个日期之间的TimeSeries:

ts.truncate(after='1/9/2011')

2011-01-02 -1.739738
2011-01-05 -0.813930
2011-01-07 -0.083642
2011-01-08 0.418713
dtype: float64当有重复时间点时候可以使用 is_unique 属性判断是不是唯一的

dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000',
'1/3/2000'])
dup_ts = Series(np.arange(5), index=dates)
dup_ts

2000-01-01 0
2000-01-02 1
2000-01-02 2
2000-01-02 3
2000-01-03 4
dtype: int32

dup_ts.index.is_unique
False

#此时对重复日期索引会返回所有的值

dup_ts['1/2/2000']
2000-01-02 1
2000-01-02 2
2000-01-02 3
dtype: int32

#想对非唯一的数据进行聚合，使用groupby 传入 level = 0
grouped = dup_ts.groupby(level=0)
grouped.mean()

2000-01-01 0
2000-01-02 2
2000-01-03 4
dtype: int32

grouped.count()

2000-01-01 1
2000-01-02 3
2000-01-03 1
dtype: int64

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python

相关文章推荐

新的分享

章节导航