您的位置:首页 > 编程语言 > Python开发

《利用python做数据分析》第十章:时间序列分析

2016-02-01 16:33 736 查看
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


//anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn(‘Matplotlib is building the font cache using fc-list. This may take a moment.’)

from pandas import Series,DataFrame


#### Time Seiries Analysis
****
> build-in package
time datetime calendar

from datetime import datetime


now = datetime.now()


now


datetime.datetime(2016, 2, 1, 11, 11, 8, 934671)

> ** display time right now **

now.year,now.month,now.day


(2016, 2, 1)

datetime以毫秒形势存储��和⌚️,**datetime.datedelta**表示两个datetime对象之间的时间差

delta = datetime(2011,1,7) - datetime(2008,6,24,8,15)


显示的前一个是天数,后一个是秒钟
—-
delta.days
delta.seconds

delta


datetime.timedelta(926, 56700)

### 可以给datetime对象加上或者减去一个或者多个timedelta,会产生一个新对象

from datetime import timedelta


start = datetime(2011, 1, 7)


start + timedelta(12)


datetime.datetime(2011, 1, 19, 0, 0)

start - timedelta(12) * 4


datetime.datetime(2010, 11, 20, 0, 0)

> 可见timedelta是以天为单位

#### datetime模块中的数据类型
—–
- date | 以公历形式存储日历日期(年、月、日)
- time | 将时间存储为时、分、秒、毫秒
- datetime | 存储时间和日期
- timedelta| 比阿诗两个datetime值之间的差(日, 秒, 毫秒)

## str transformed to datetime
use ** str ** or ** strftime(invoke a formed str) ** ,datetime object and pandas.Timestamp can be formulated to string

stamp = datetime(2011, 1, 3)


str(stamp)


‘2011-01-03 00:00:00’

stamp.strftime('%Y-%m-%d')


‘2011-01-03’

stamp.strftime('%Y-%m')


‘2011-01’

value = '2011-01-03'


datetime.strptime(value, '%Y-%m-%d')


datetime.datetime(2011, 1, 3, 0, 0)

datestrs = ['7/6/2011','8/6/2011']


[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]


[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

datetime.striptime 是通过已知格式进行日期解析的最佳方式,但每次都要编写格式定义
- 使用dateutil中的parser.parse来实现

from dateutil.parser import parse


parse('2011-01-03')


datetime.datetime(2011, 1, 3, 0, 0)

parse的解析能力很强,几乎可以解析一切格式

parse('Jan 31,1997 10:45 PM')


datetime.datetime(1997, 1, 31, 22, 45)

parse('6/30/2011', dayfirst=True)


datetime.datetime(2011, 6, 30, 0, 0)

datestrs


[‘7/6/2011’, ‘8/6/2011’]

# pd.to_datetime()

pd.to_datetime(datestrs)


DatetimeIndex([‘2011-07-06’, ‘2011-08-06’], dtype=’datetime64[ns]’, freq=None)

dates = [datetime(2011, 1, 2),datetime(2011,1,5),datetime(2011,1,7),
datetime(2011,1,8),datetime(2011,1,10),datetime(2011,1,12)]


ts = Series(np.random.randn(6), index=dates)


ts


2011-01-02 0.573974
2011-01-05 -0.337112
2011-01-07 -1.650845
2011-01-08 0.450012
2011-01-10 -1.253801
2011-01-12 -0.402997
dtype: float64

type(ts)


pandas.core.series.Series

ts.index


DatetimeIndex([‘2011-01-02’, ‘2011-01-05’, ‘2011-01-07’, ‘2011-01-08’,
‘2011-01-10’, ‘2011-01-12’],
dtype=’datetime64[ns]’, freq=None)

ts + ts[::2]


2011-01-02 1.147949
2011-01-05 NaN
2011-01-07 -3.301690
2011-01-08 NaN
2011-01-10 -2.507602
2011-01-12 NaN
dtype: float64

ts[::2]


2011-01-02 0.573974
2011-01-07 -1.650845
2011-01-10 -1.253801
dtype: float64

## 索引、选取、子集构造

ts['1/10/2011']


-1.2538008746706757

传入可以解释为日期的字符,就可以代替索引

ts['20110110']


-1.2538008746706757

longer_ts=Series(np.random.randn(1000),index=pd.date_range('20000101',periods=1000))


longer_ts


2000-01-01 -1.025498
2000-01-02 -0.913267
2000-01-03 0.240895
2000-01-04 -1.475368
2000-01-05 -1.675558
2000-01-06 1.020005
2000-01-07 0.638097
2000-01-08 0.503482
2000-01-09 -0.541771
2000-01-10 -1.107036
2000-01-11 0.797612
2000-01-12 1.691745
2000-01-13 1.889323
2000-01-14 -0.852126
2000-01-15 -0.987578
2000-01-16 0.558084
2000-01-17 -0.842907
2000-01-18 1.932399
2000-01-19 -1.126650
2000-01-20 -0.529707
2000-01-21 0.116756
2000-01-22 -0.012790
2000-01-23 0.501330
2000-01-24 0.346976
2000-01-25 -0.880443
2000-01-26 -0.229017
2000-01-27 0.926648
2000-01-28 0.894491
2000-01-29 -0.573260
2000-01-30 -1.712945

2002-08-28 -0.751376
2002-08-29 -1.731035
2002-08-30 -0.150107
2002-08-31 -0.621332
2002-09-01 0.449311

14596
2002-09-02 0.873422
2002-09-03 1.496143
2002-09-04 -0.581023
2002-09-05 2.882920
2002-09-06 -0.347482
2002-09-07 0.165490
2002-09-08 -0.475642
2002-09-09 0.191958
2002-09-10 0.801963
2002-09-11 -1.603021
2002-09-12 1.114401
2002-09-13 0.994800
2002-09-14 -0.974208
2002-09-15 2.096747
2002-09-16 -0.252620
2002-09-17 -0.279536
2002-09-18 -0.059076
2002-09-19 -0.497615
2002-09-20 -0.009895
2002-09-21 1.813504
2002-09-22 0.863885
2002-09-23 1.330777
2002-09-24 -0.394473
2002-09-25 -1.163973
2002-09-26 -0.986664
Freq: D, dtype: float64

longer_ts['2002']


2002-01-01 -1.249172
2002-01-02 -1.368829
2002-01-03 0.097135
2002-01-04 -0.972259
2002-01-05 -0.640629
2002-01-06 0.619072
2002-01-07 1.625769
2002-01-08 -0.893140
2002-01-09 0.113725
2002-01-10 0.446898
2002-01-11 -0.382041
2002-01-12 -1.667311
2002-01-13 -0.307464
2002-01-14 0.623383
2002-01-15 -0.211188
2002-01-16 -1.166355
2002-01-17 0.399710
2002-01-18 -0.171451
2002-01-19 -1.591578
2002-01-20 -0.367654
2002-01-21 0.985778
2002-01-22 0.125848
2002-01-23 1.366708
2002-01-24 0.449383
2002-01-25 0.211848
2002-01-26 -1.033201
2002-01-27 0.668416
2002-01-28 0.402693
2002-01-29 -0.730690
2002-01-30 1.666659

2002-08-28 -0.751376
2002-08-29 -1.731035
2002-08-30 -0.150107
2002-08-31 -0.621332
2002-09-01 0.449311
2002-09-02 0.873422
2002-09-03 1.496143
2002-09-04 -0.581023
2002-09-05 2.882920
2002-09-06 -0.347482
2002-09-07 0.165490
2002-09-08 -0.475642
2002-09-09 0.191958
2002-09-10 0.801963
2002-09-11 -1.603021
2002-09-12 1.114401
2002-09-13 0.994800
2002-09-14 -0.974208
2002-09-15 2.096747
2002-09-16 -0.252620
2002-09-17 -0.279536
2002-09-18 -0.059076
2002-09-19 -0.497615
2002-09-20 -0.009895
2002-09-21 1.813504
2002-09-22 0.863885
2002-09-23 1.330777
2002-09-24 -0.394473
2002-09-25 -1.163973
2002-09-26 -0.986664
Freq: D, dtype: float64

longer_ts['2001/03']


2001-03-01 -0.130463
2001-03-02 -1.245341
2001-03-03 1.035173
2001-03-04 1.115275
2001-03-05 0.013602
2001-03-06 0.828075
2001-03-07 -0.802564
2001-03-08 2.067711
2001-03-09 2.158392
2001-03-10 1.348256
2001-03-11 1.282607
2001-03-12 -1.088485
2001-03-13 -0.882978
2001-03-14 -0.030872
2001-03-15 0.840561
2001-03-16 -0.061428
2001-03-17 0.170721
2001-03-18 0.895892
2001-03-19 -0.050714
2001-03-20 0.608656
2001-03-21 1.222177
2001-03-22 0.889833
2001-03-23 -0.932351
2001-03-24 0.163275
2001-03-25 0.001171
2001-03-26 0.969950
2001-03-27 -0.118747
2001-03-28 -0.840478
2001-03-29 -2.654215
2001-03-30 -0.351836
2001-03-31 -0.365322
Freq: D, dtype: float64

ts['20110101':'20110201']


2011-01-02 0.573974
2011-01-05 -0.337112
2011-01-07 -1.650845
2011-01-08 0.450012
2011-01-10 -1.253801
2011-01-12 -0.402997
dtype: float64

ts.truncate(after='20110109')


2011-01-02 0.573974
2011-01-05 -0.337112
2011-01-07 -1.650845
2011-01-08 0.450012
dtype: float64

dates = pd.date_range('20000101', periods=100, freq='W-WED')


dates


DatetimeIndex([‘2000-01-05’, ‘2000-01-12’, ‘2000-01-19’, ‘2000-01-26’,
‘2000-02-02’, ‘2000-02-09’, ‘2000-02-16’, ‘2000-02-23’,
‘2000-03-01’, ‘2000-03-08’, ‘2000-03-15’, ‘2000-03-22’,
‘2000-03-29’, ‘2000-04-05’, ‘2000-04-12’, ‘2000-04-19’,
‘2000-04-26’, ‘2000-05-03’, ‘2000-05-10’, ‘2000-05-17’,
‘2000-05-24’, ‘2000-05-31’, ‘2000-06-07’, ‘2000-06-14’,
‘2000-06-21’, ‘2000-06-28’, ‘2000-07-05’, ‘2000-07-12’,
‘2000-07-19’, ‘2000-07-26’, ‘2000-08-02’, ‘2000-08-09’,
‘2000-08-16’, ‘2000-08-23’, ‘2000-08-30’, ‘2000-09-06’,
‘2000-09-13’, ‘2000-09-20’, ‘2000-09-27’, ‘2000-10-04’,
‘2000-10-11’, ‘2000-10-18’, ‘2000-10-25’, ‘2000-11-01’,
‘2000-11-08’, ‘2000-11-15’, ‘2000-11-22’, ‘2000-11-29’,
‘2000-12-06’, ‘2000-12-13’, ‘2000-12-20’, ‘2000-12-27’,
‘2001-01-03’, ‘2001-01-10’, ‘2001-01-17’, ‘2001-01-24’,
‘2001-01-31’, ‘2001-02-07’, ‘2001-02-14’, ‘2001-02-21’,
‘2001-02-28’, ‘2001-03-07’, ‘2001-03-14’, ‘2001-03-21’,
‘2001-03-28’, ‘2001-04-04’, ‘2001-04-11’, ‘2001-04-18’,
‘2001-04-25’, ‘2001-05-02’, ‘2001-05-09’, ‘2001-05-16’,
‘2001-05-23’, ‘2001-05-30’, ‘2001-06-06’, ‘2001-06-13’,
‘2001-06-20’, ‘2001-06-27’, ‘2001-07-04’, ‘2001-07-11’,
‘2001-07-18’, ‘2001-07-25’, ‘2001-08-01’, ‘2001-08-08’,
‘2001-08-15’, ‘2001-08-22’, ‘2001-08-29’, ‘2001-09-05’,
‘2001-09-12’, ‘2001-09-19’, ‘2001-09-26’, ‘2001-10-03’,
‘2001-10-10’, ‘2001-10-17’, ‘2001-10-24’, ‘2001-10-31’,
‘2001-11-07’, ‘2001-11-14’, ‘2001-11-21’, ‘2001-11-28’],
dtype=’datetime64[ns]’, freq=’W-WED’)

long_df = DataFrame(np.random.randn(100,4),index=dates,columns=['Colorado','Texas','New York','Ohio'])


long_df.ix['5-2001']


ColoradoTexasNew YorkOhio
2001-05-021.7830701.090816-1.035363-0.089864
2001-05-09-1.2907001.311863-0.5960370.819694
2001-05-160.688693-0.249644-0.8592120.879270
2001-05-23-1.6026601.211236-1.0283362.022514
2001-05-30-0.705427-0.189235-0.710712-2.397815
dates = pd.DatetimeIndex(['1/1/2000','1/2/2000',
'1/2/2000','1/2/2000',
'1/3/2000'])


dup_ts = Series(np.arange(5), index=dates)


dup_ts


2000-01-01 0
2000-01-02 1
2000-01-02 2
2000-01-02 3
2000-01-03 4
dtype: int64

通过检查索引的** is_unique ** 属性,判断是不是唯一

dup_ts.index.is_unique


False

对这个时间序列进行索引,要么产生标量值,要么产生切片,具体要看所选的
> **时间点是否重复**

none repeat(2000-1-3)

dup_ts['1/3/2000']


4

repeat (2000-1-2)

dup_ts['1/2/2000']


2000-01-02 1
2000-01-02 2
2000-01-02 3
dtype: int64

define whether it is reaptable or not

dup_ts.index.is_unique


False

# 对具有非唯一时间戳的数据聚合 #

> groupby(level=0)

level=0意味着索引唯一一层!!!

—-

grouped = dup_ts.groupby(level=0)


grouped.mean(),grouped.count()


(2000-01-01 0
2000-01-02 2
2000-01-03 4
dtype: int64, 2000-01-01 1
2000-01-02 3
2000-01-03 1
dtype: int64)

> 将时间序列转换成 **具有固定频率(每日)的时间序列**
- resample

ts.resample('D')


2011-01-02 0.573974
2011-01-03 NaN
2011-01-04 NaN
2011-01-05 -0.337112
2011-01-06 NaN
2011-01-07 -1.650845
2011-01-08 0.450012
2011-01-09 NaN
2011-01-10 -1.253801
2011-01-11 NaN
2011-01-12 -0.402997
Freq: D, dtype: float64

生成日期范围
- pandas.date_range
- 类型:DatetimeIndex

index = pd.date_range('4/1/2012','6/1/2012')


## base frequency
- 基础频率通常以一个字符串表示,M每月,H每小时
- 对于每个基础频率都有一个偏移量与之对应
- date offset

from pandas.tseries.offsets import Hour, Minute


hour = Hour()


hour


> 传入一个整数即可定义偏移量的倍数:

four_hours = Hour(4)


four_hours


pd.date_range('1/1/2000','1/3/2000 23:59',freq='4h')


DatetimeIndex([‘2000-01-01 00:00:00’, ‘2000-01-01 04:00:00’,
‘2000-01-01 08:00:00’, ‘2000-01-01 12:00:00’,
‘2000-01-01 16:00:00’, ‘2000-01-01 20:00:00’,
‘2000-01-02 00:00:00’, ‘2000-01-02 04:00:00’,
‘2000-01-02 08:00:00’, ‘2000-01-02 12:00:00’,
‘2000-01-02 16:00:00’, ‘2000-01-02 20:00:00’,
‘2000-01-03 00:00:00’, ‘2000-01-03 04:00:00’,
‘2000-01-03 08:00:00’, ‘2000-01-03 12:00:00’,
‘2000-01-03 16:00:00’, ‘2000-01-03 20:00:00’],
dtype=’datetime64[ns]’, freq=’4H’)

偏移量可以通过加法链接

Hour(2) + Minute(30)


pd.date_range('1/1/2000', periods=10, freq='1h30min')


DatetimeIndex([‘2000-01-01 00:00:00’, ‘2000-01-01 01:30:00’,
‘2000-01-01 03:00:00’, ‘2000-01-01 04:30:00’,
‘2000-01-01 06:00:00’, ‘2000-01-01 07:30:00’,
‘2000-01-01 09:00:00’, ‘2000-01-01 10:30:00’,
‘2000-01-01 12:00:00’, ‘2000-01-01 13:30:00’],
dtype=’datetime64[ns]’, freq=’90T’)

### WOM(week of month)

rng = pd.date_range('1/1/2012','9/1/2012',freq='WOM-3FRI')


pd.date_range('1/1/2012','9/1/2012',freq='W-FRI')


DatetimeIndex([‘2012-01-06’, ‘2012-01-13’, ‘2012-01-20’, ‘2012-01-27’,
‘2012-02-03’, ‘2012-02-10’, ‘2012-02-17’, ‘2012-02-24’,
‘2012-03-02’, ‘2012-03-09’, ‘2012-03-16’, ‘2012-03-23’,
‘2012-03-30’, ‘2012-04-06’, ‘2012-04-13’, ‘2012-04-20’,
‘2012-04-27’, ‘2012-05-04’, ‘2012-05-11’, ‘2012-05-18’,
‘2012-05-25’, ‘2012-06-01’, ‘2012-06-08’, ‘2012-06-15’,
‘2012-06-22’, ‘2012-06-29’, ‘2012-07-06’, ‘2012-07-13’,
‘2012-07-20’, ‘2012-07-27’, ‘2012-08-03’, ‘2012-08-10’,
‘2012-08-17’, ‘2012-08-24’, ‘2012-08-31’],
dtype=’datetime64[ns]’, freq=’W-FRI’)

> 时间表别名10-4 P314

### 移动(超前和滞后)数据
- 移动(shifting)指的是沿着时间轴将数据迁移或者后移
- Series & Dataframe都有一个shift方法单纯执行前移后移
- 保持索引不变

ts = Series(np.random.randn(4),index=pd.date_range('1/1/2000',periods=4,freq='M'))


ts


2000-01-31 -0.550830
2000-02-29 -1.297499
2000-03-31 1.178102
2000-04-30 1.359573
Freq: M, dtype: float64

ts.shift(-2)


2000-01-31 1.178102
2000-02-29 1.359573
2000-03-31 NaN
2000-04-30 NaN
Freq: M, dtype: float64

shift ususally used to calculate the pct change of a series

ts / ts.shift(1) - 1


2000-01-31 NaN
2000-02-29 1.355534
2000-03-31 -1.907979
2000-04-30 0.154037
Freq: M, dtype: float64

ts.pct_change()


2000-01-31 NaN
2000-02-29 1.355534
2000-03-31 -1.907979
2000-04-30 0.154037
Freq: M, dtype: float64

ts.shift(2, freq='M')


2000-03-31 -0.550830
2000-04-30 -1.297499
2000-05-31 1.178102
2000-06-30 1.359573
Freq: M, dtype: float64

ts.shift(3, freq='D')


2000-02-03 -0.550830
2000-03-03 -1.297499
2000-04-03 1.178102
2000-05-03 1.359573
dtype: float64

type(ts)


pandas.core.series.Series

ts.shift()


2000-01-31 NaN
2000-02-29 -0.550830
2000-03-31 -1.297499
2000-04-30 1.178102
Freq: M, dtype: float64

ts.shift(3)


2000-01-31 NaN
2000-02-29 NaN
2000-03-31 NaN
2000-04-30 -0.55083
Freq: M, dtype: float64

ts.shift(freq='D')


2000-02-01 -0.550830
2000-03-01 -1.297499
2000-04-01 1.178102
2000-05-01 1.359573
Freq: MS, dtype: float64

ts.shift(periods=2)


2000-01-31 NaN
2000-02-29 NaN
2000-03-31 -0.550830
2000-04-30 -1.297499
Freq: M, dtype: float64

freq means move the index by the frequence

from pandas.tseries.offsets import Day, MonthEnd


如果增加的是⚓️点偏移量(比如MonthEnd),第一次增量会讲原来的日期向前滚动到适合规则的下一个日期
- 今天11月17号,MonthEnd就是这个月末11.31

now = datetime(2011, 11, 17)


now + 3*Day()


Timestamp(‘2011-11-20 00:00:00’)

now + MonthEnd()


Timestamp(‘2011-11-30 00:00:00’)

now + MonthEnd(2)


Timestamp(‘2011-12-31 00:00:00’)

offset = MonthEnd()


offset.rollforward(now)


Timestamp(‘2011-11-30 00:00:00’)

offset.rollback(now)


Timestamp(‘2011-10-31 00:00:00’)

巧妙的使用**groupby**和**⚓️点偏移量**

ts = Series(np.random.randn(20), index=pd.date_range('1/15/2000',periods=20,freq='4d'))


ts.groupby(offset.rollforward).mean()


2000-01-31 -0.223943
2000-02-29 -0.241283
2000-03-31 -0.080391
dtype: float64

更方便快捷的方法应该是用
> resample

ts.resample('M', how='mean')


2000-01-31 -0.223943
2000-02-29 -0.241283
2000-03-31 -0.080391
Freq: M, dtype: float64

# import pytz
—-
pytz是一个世界时区的库,时区名

import pytz


pytz.common_timezones[-5:]


[‘US/Eastern’, ‘US/Hawaii’, ‘US/Mountain’, ‘US/Pacific’, ‘UTC’]

tz = pytz.timezone('US/Eastern')


tz


### 本地化和转换

rng = pd.date_range('3/9/2012 9:30',periods=6, freq='D')
ts = Series(np.random.randn(len(rng)),index=rng)


del index


ts.index.tz


add a time zone set of the ts
- make it print

pd.date_range('3/9/2000 9:30',periods=10, freq='D',tz='UTC')


DatetimeIndex([‘2000-03-09 09:30:00+00:00’, ‘2000-03-10 09:30:00+00:00’,
‘2000-03-11 09:30:00+00:00’, ‘2000-03-12 09:30:00+00:00’,
‘2000-03-13 09:30:00+00:00’, ‘2000-03-14 09:30:00+00:00’,
‘2000-03-15 09:30:00+00:00’, ‘2000-03-16 09:30:00+00:00’,
‘2000-03-17 09:30:00+00:00’, ‘2000-03-18 09:30:00+00:00’],
dtype=’datetime64[ns, UTC]’, freq=’D’)

> The +00:00 means
- time zone

use *tz_localize* to localize the time zone

ts_utc = ts.tz_localize('UTC')

ts_utc


2012-03-09 09:30:00+00:00 -0.258702
2012-03-10 09:30:00+00:00 -1.019056
2012-03-11 09:30:00+00:00 1.044139
2012-03-12 09:30:00+00:00 0.826684
2012-03-13 09:30:00+00:00 0.998759
2012-03-14 09:30:00+00:00 -0.839695
Freq: D, dtype: float64

just have a try of crtl+v

ts_utc.index


DatetimeIndex([‘2012-03-09 09:30:00+00:00’, ‘2012-03-10 09:30:00+00:00’,
‘2012-03-11 09:30:00+00:00’, ‘2012-03-12 09:30:00+00:00’,
‘2012-03-13 09:30:00+00:00’, ‘2012-03-14 09:30:00+00:00’],
dtype=’datetime64[ns, UTC]’, freq=’D’)

convert localized time zone to another one use:
> *tz_convert*

ts_utc.tz_convert('US/Eastern')


2012-03-09 04:30:00-05:00 -0.258702
2012-03-10 04:30:00-05:00 -1.019056
2012-03-11 05:30:00-04:00 1.044139
2012-03-12 05:30:00-04:00 0.826684
2012-03-13 05:30:00-04:00 0.998759
2012-03-14 05:30:00-04:00 -0.839695
Freq: D, dtype: float64

*tz_localize* & *tz_convert* are also instance methods on *DatetimeIndex*

ts.index.tz_localize('Asia/Shanghai')


DatetimeIndex([‘2012-03-09 09:30:00+08:00’, ‘2012-03-10 09:30:00+08:00’,
‘2012-03-11 09:30:00+08:00’, ‘2012-03-12 09:30:00+08:00’,
‘2012-03-13 09:30:00+08:00’, ‘2012-03-14 09:30:00+08:00’],
dtype=’datetime64[ns, Asia/Shanghai]’, freq=’D’)

# operations with Time Zone
- awrae Timestamp Objects

Localized from naive to time zone-aware and converted from one time zone to another

stamp = pd.Timestamp('2011-03-12 4:00')
stamp_utc = stamp.tz_localize('utc')


stamp_utc.tz_convert('US/Eastern')


Timestamp(‘2011-03-11 23:00:00-0500’, tz=’US/Eastern’)

>Time zone-aware Timestamp objects internally store a UTC timestamp calue as nano-seconed since thr UNIX epoch(January 1,1970)
- this UTC value is invariant between time zone conversions

stamp_utc.value


1299902400000000000

stamp = pd.Timestamp('2012-03-12 01:30', tz='US/Eastern')


stamp


Timestamp(‘2012-03-12 01:30:00-0400’, tz=’US/Eastern’)

stamp + Hour()


Timestamp(‘2012-03-12 02:30:00-0400’, tz=’US/Eastern’)

# operations between different time zones

rng = pd.date_range('3/7/2012 9:30',periods=10, freq='B')


ts = Series(np.random.randn(len(rng)), index=rng)


ts


2012-03-07 09:30:00 0.315600
2012-03-08 09:30:00 0.616440
2012-03-09 09:30:00 -1.633940
2012-03-12 09:30:00 0.260501
2012-03-13 09:30:00 -0.394620
2012-03-14 09:30:00 -0.554103
2012-03-15 09:30:00 2.441851
2012-03-16 09:30:00 -3.473308
2012-03-19 09:30:00 -0.339365
2012-03-20 09:30:00 0.335510
Freq: B, dtype: float64

ts1 = ts[:7].tz_localize('Europe/London')


ts2 = ts1[2:].tz_convert('Europe/Moscow')


result = ts1 + ts2


>different time zone can be added up together freely

result.index


DatetimeIndex([‘2012-03-07 09:30:00+00:00’, ‘2012-03-08 09:30:00+00:00’,
‘2012-03-09 09:30:00+00:00’, ‘2012-03-12 09:30:00+00:00’,
‘2012-03-13 09:30:00+00:00’, ‘2012-03-14 09:30:00+00:00’,
‘2012-03-15 09:30:00+00:00’],
dtype=’datetime64[ns, UTC]’, freq=’B’)

## Periods and Periods Arithmetic

> Periods
- time spans
- days, months,quarters,years

p = pd.Period(2007, freq='A-DEC')


p


Period(‘2007’, ‘A-DEC’)

## Time Series Plotting

close_px_call = pd.read_csv('/Users/Houbowei/Desktop/SRP/books/pydata-book-master/pydata-book-master/ch09/stock_px.csv', parse_dates=True,index_col=0)


close_px = close_px_call[['AAPL','MSFT','XOM']]


close_px = close_px.resample('B',fill_method='ffill')


close_px


AAPLMSFTXOM
2003-01-027.4021.1129.22
2003-01-037.4521.1429.24
2003-01-067.4521.5229.96
2003-01-077.4321.9328.95
2003-01-087.2821.3128.83
2003-01-097.3421.9329.44
2003-01-107.3621.9729.03
2003-01-137.3222.1628.91
2003-01-147.3022.3929.17
2003-01-157.2222.1128.77
2003-01-167.3121.7528.90
2003-01-177.0520.2228.60
2003-01-207.0520.2228.60
2003-01-217.0120.1727.94
2003-01-226.9420.0427.58
2003-01-237.0920.5427.52
2003-01-246.9019.5926.93
2003-01-277.0719.3226.21
2003-01-287.2919.1826.90
2003-01-297.4719.6127.88
2003-01-307.1618.9527.37
2003-01-317.1818.6528.13
2003-02-037.3319.0828.52
2003-02-047.3018.5928.52
2003-02-057.2218.4528.11
2003-02-067.2218.6327.87
2003-02-077.0718.3027.66
2003-02-107.1818.6227.87
2003-02-117.1818.2527.67
2003-02-127.2018.2527.12
2011-09-05374.0525.8072.14
2011-09-06379.7425.5171.15
2011-09-07383.9326.0073.65
2011-09-08384.1426.2272.82
2011-09-09377.4825.7471.01
2011-09-12379.9425.8971.84
2011-09-13384.6226.0471.65
2011-09-14389.3026.5072.64
2011-09-15392.9626.9974.01
2011-09-16400.5027.1274.55
2011-09-19411.6327.2173.70
2011-09-20413.4526.9874.01
2011-09-21412.1425.9971.97
2011-09-22401.8225.0669.24
2011-09-23404.3025.0669.31
2011-09-26403.1725.4471.72
2011-09-27399.2625.6772.91
2011-09-28397.0125.5872.07
2011-09-29390.5725.4573.88
2011-09-30381.3224.8972.63
2011-10-03374.6024.5371.15
2011-10-04372.5025.3472.83
2011-10-05378.2525.8973.95
2011-10-06377.3726.3473.89
2011-10-07369.8026.2573.56
2011-10-10388.8126.9476.28
2011-10-11400.2927.0076.27
2011-10-12402.1926.9677.16
2011-10-13408.4327.1876.37
2011-10-14422.0027.2778.11
2292 rows × 3 columns

close_px.resample?


close_px['AAPL'].plot()


close_px.ix['2009'].plot()


close_px['AAPL'].ix['01-2011':'03-2011'].plot()


apple_q = close_px['AAPL'].resample('Q-DEC', fill_method='ffill')


apple_q.ix['2009':].plot()


close_px.AAPL.plot()


close_px.plot()


apple_std250 = pd.rolling_std(close_px.AAPL, 250)


apple_std250.describe()


count 2043.000000
mean 20.604571
std 12.606813
min 1.335707
25% 9.121461
50% 22.231490
75% 32.411445
max 39.327273
Name: AAPL, dtype: float64

apple_std250.plot()


close_px.describe()


AAPLMSFTXOM
count2292.0000002292.0000002292.000000
mean125.33989523.95301059.568473
std107.2185533.26732216.731836
min6.56000014.33000026.210000
25%37.12250021.69000049.517500
50%91.36500024.00000062.980000
75%185.53500026.28000072.540000
max422.00000034.07000087.480000
close_px_call.describe()


AAPLMSFTXOMSPX
count2214.0000002214.0000002214.0000002214.000000
mean125.51614723.94545259.5587441183.773311
std107.3946933.25519816.725025180.983466
min6.56000014.33000026.210000676.530000
25%37.13500021.70000049.4925001077.060000
50%91.45500024.00000062.9700001189.260000
75%185.60500026.28000072.5100001306.057500
max422.00000034.07000087.4800001565.150000
spx = close_px_call.SPX.pct_change()


spx


2003-01-02         NaN
2003-01-03   -0.000484
2003-01-06    0.022474
2003-01-07   -0.006545
2003-01-08   -0.014086
2003-01-09    0.019386
2003-01-10    0.000000
2003-01-13   -0.001412
2003-01-14    0.005830
2003-01-15   -0.014426
2003-01-16   -0.003942
2003-01-17   -0.014017
2003-01-21   -0.015702
2003-01-22   -0.010432
2003-01-23    0.010224
2003-01-24   -0.029233
2003-01-27   -0.016160
2003-01-28    0.013050
2003-01-29    0.006779
2003-01-30   -0.022849
2003-01-31    0.013130
2003-02-03    0.005399
2003-02-04   -0.014088
2003-02-05   -0.005435
2003-02-06   -0.006449
2003-02-07   -0.010094
2003-02-10    0.007569
2003-02-11   -0.008098
2003-02-12   -0.012687
2003-02-13   -0.001600
...
2011-09-02   -0.025282
2011-09-06   -0.007436
2011-09-07    0.028646
2011-09-08   -0.010612
2011-09-09   -0.026705
2011-09-12    0.006966
2011-09-13    0.009120
2011-09-14    0.013480
2011-09-15    0.017187
2011-09-16    0.005707
2011-09-19   -0.009803
2011-09-20   -0.001661
2011-09-21   -0.029390
2011-09-22   -0.031883
2011-09-23    0.006082
2011-09-26    0.023336
2011-09-27    0.010688
2011-09-28   -0.020691
2011-09-29    0.008114
2011-09-30   -0.024974
2011-10-03   -0.028451
2011-10-04    0.022488
2011-10-05    0.017866
2011-10-06    0.018304
2011-10-07   -0.008163
2011-10-10    0.034125
2011-10-11    0.000544
2011-10-12    0.009795
2011-10-13   -0.002974
2011-10-14    0.017380
Name: SPX, dtype: float64


returns = close_px.pct_change()


corr = pd.rolling_corr(returns.AAPL, spx, 125 , min_periods=100)


corr.plot()


<matplotlib.axes._subplots.AxesSubplot at 0x10bf49450>




corr = pd.rolling_corr(returns, spx, 125, min_periods=100).plot()







                                            
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: