您的位置：首页 > 编程语言 > Python开发

愉快的学习就从翻译开始吧_4-Time Series Forecasting with the Long Short-Term Memory Network in Python

2018-06-11 18:59 417 查看

Transform Time Series to Stationary/转换时间序列为固定（搞不懂什么意思）

The Shampoo Sales dataset is not stationary.洗发水数据集不是固定的This means that there is a structure in the data that is dependent on the time. Specifically, there is an increasing trend in the data.这意味着数据中有一个依赖于时间的结构。具体来说，数据有增加的趋势。
Stationary data is easier to model and will very likely result in more skillful forecasts.固定数据更容易建模，并且很可能会导致更熟练的预测（可以理解为更快，比如你熟练某件事，就做的快，但我并不认可作者的说法，逻辑上根本 4000 就说不通）
The trend can be removed from the observations, then added back to forecasts later to return the prediction to the original scale and calculate a comparable error score.趋势可以从观测值中删除，然后再加回到预测中，以便将预测返回到原始比例并计算可比较的误差分数
A standard way to remove a trend is by differencing the data. That is the observation from the previous time step (t-1) is subtracted from the current observation (t). This removes the trend and we are left with a difference series, or the changes to the observations from one time step to the next.消除趋势的标准方法是对数据进行差异化。这是从前一时间步（t-1）的观察结果减去当前观察值（t）。这消除了趋势，我们留下了一个差异序列，或对观测值的变化从一步变为下一步（什么鬼东西，受不了了要来一把王者荣耀）
We can achieve this automatically using the diff() function in pandas. Alternatively, we can get finer grained control and write our own function to do this, which is preferred for its flexibility in this case.我们可以使用pandas中的diff（）函数自动实现这一点，或者，我们可以得到更好的精细控制，并编写我们自己的函数来执行此操作，这在此情况下更适合其灵活性
Below is a function called difference() that calculates a differenced series. Note that the first observation in the series is skipped as there is no prior observation with which to calculate a differenced value.下面是一个称为difference（）的函数，用于计算差异序列。请注意，系列中的第一个观察值会被忽略，因为它前面没有用于计算差值的观察值

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)

We also need to invert this process in order to take forecasts made on the differenced series back into their original scale.我们还需要反转这一过程，以便将对差异系列的预测恢复到原始数据。（应该是original data，因为计算过程中根本就没有缩放，这作者明显是把下面将要进行的缩放操作给混到这里了）
The function below, called inverse_difference(), inverts this operation.

下面的函数叫做inverse_difference（），它反转了这个操作。

# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

We can test out these functions by differencing the whole series, then returning it to the original scale, as follows:

我们可以用差异化的整个序列来测试这些函数，然后返回原始数据，如下所示：（又是scale！！！）

from pandas import read_csv
from pandas import datetime
from pandas import Series

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

# load dataset
def parser(x):
return datetime.strptime('190'+x, '%Y-%m')
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
print(series.head())
# transform to be stationary
differenced = difference(series, 1)
print(differenced.head())
# invert transform
inverted = list()
for i in range(len(differenced)):
value = inverse_difference(series, differenced[i], len(series)-i)
inverted.append(value)
inverted = Series(inverted)
print(inverted.head())

Running the example prints the first 5 rows of the loaded data, then the first 5 rows of the differenced series, then finally the first 5 rows with the difference operation inverted.运行示例将打印加载数据的前5行，然后打印差异序列的前5行，最后打印翻转差异操作的前5行。
Note that the first observation in the original dataset was removed from the inverted difference data. Besides that, the last set of data matches the first as expected.请注意，原始数据集中的第一个观察值已从反转的差异数据中删除。除此之外，最后一组数据与预期的一样匹配。（你所的很对，那这样折腾有何意义呢，分明有更简单的代码可以实现这个功能，就为了把人绕晕吗？还有就是反转函数为什么要倒序取值，就为了折腾人嘛？也许我吐槽错了，欢迎批评指正，哈哈）

Month
1901-01-01    266.0
1901-02-01    145.9
1901-03-01    183.1
1901-04-01    119.3
1901-05-01    180.3

Name: Sales, dtype: float64
0   -120.1
1     37.2
2    -63.8
3     61.0
4    -11.8
dtype: float64

0    145.9
1    183.1
2    119.3
3    180.3
4    168.5
dtype: float64

For more information on making the time series stationary and differencing, see the posts:如果有人觉得本节还有价值，想了解更多就去看看如下的文章吧！

本节所用到的函数

pandas.Series

class pandas. Series (data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)[source]

One-dimensional ndarray with axis labels (including time series).Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).Operations between Series (+, -, /, , *) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.

Parameters:	data : array-like, dict, or scalar value Contains data stored in Series Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later. index : array-like or Index (1d) Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict. dtype : numpy.dtype or None If None, dtype will be inferred copy : boolean, default False Copy input data

Parameters:

data : array-like, dict, or scalar value

Contains data stored in Series
Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.

index : array-like or Index (1d)

Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.

dtype : numpy.dtype or None

If None, dtype will be inferred

copy : boolean, default False

Copy input data

Attributes

T	return the transpose, which is by definition self
asobject	Return object Series which contains boxed values.
at	Access a single value for a row/column label pair.
axes	Return a list of the row axis labels
base	return the base object if the memory of the underlying data is shared
blocks	(DEPRECATED) Internal property, property synonym for as_blocks()
data	return the data pointer of the underlying data
dtype	return the dtype object of the underlying data
dtypes	return the dtype object of the underlying data
flags
ftype	return if the data is sparse\|dense
ftypes	return if the data is sparse\|dense
hasnans	return if I have any nans; enables various perf speedups
iat	Access a single value for a row/column pair by integer position.
iloc	Purely integer-location based indexing for selection by position.
index	The index (axis labels) of the Series.
is_monotonic	Return boolean if values in the object are monotonic_increasing
is_monotonic_decreasing	Return boolean if values in the object are monotonic_decreasing
is_monotonic_increasing	Return boolean if values in the object are monotonic_increasing
is_unique	Return boolean if values in the object are unique
itemsize	return the size of the dtype of the item of the underlying data
ix	A primarily label-location based indexer, with integer position fallback.
loc	Access a group of rows and columns by label(s) or a boolean array.
nbytes	return the number of bytes in the underlying data
ndim	return the number of dimensions of the underlying data, by definition 1
shape	return a tuple of the shape of the underlying data
size	return the number of elements in the underlying data
strides	return the strides of the underlying data
values	Return Series as ndarray or ndarray-like depending on the dtype

empty
imag
is_copy
name
real

Methods

abs ()	Return a Series/DataFrame with absolute numeric value of each element.
add (other[, level, fill_value, axis])	Addition of series and other, element-wise (binary operator add).
add_prefix (prefix)	Prefix labels with string prefix.
add_suffix (suffix)	Suffix labels with string suffix.
agg (func[, axis])	Aggregate using one or more operations over the specified axis.
aggregate (func[, axis])	Aggregate using one or more operations over the specified axis.
align (other[, join, axis, level, copy, …])	Align two objects on their axes with the specified join method for each axis Index
all ([axis, bool_only, skipna, level])	Return whether all elements are True over series or dataframe axis.
any ([axis, bool_only, skipna, level])	Return whether any element is True over requested axis.
append (to_append[, ignore_index, …])	Concatenate two or more Series.
apply (func[, convert_dtype, args])	Invoke function on values of Series.
argmax ([axis, skipna])	(DEPRECATED) ..
argmin ([axis, skipna])	(DEPRECATED) ..
argsort ([axis, kind, order])	Overrides ndarray.argsort.
as_blocks ([copy])	(DEPRECATED) Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
as_matrix ([columns])	(DEPRECATED) Convert the frame to its Numpy-array representation.
asfreq (freq[, method, how, normalize, …])	Convert TimeSeries to specified frequency.
asof (where[, subset])	The last row without any NaN is taken (or the last row without NaN considering only the subset of columns in the case of a DataFrame)
astype (dtype[, copy, errors])	Cast a pandas object to a specified dtype dtype .
at_time (time[, asof])	Select values at particular time of day (e.g.
autocorr ([lag])	Lag-N autocorrelation
between (left, right[, inclusive])	Return boolean Series equivalent to left <= series <= right.
between_time (start_time, end_time[, …])	Select values between particular times of the day (e.g., 9:00-9:30 AM).
bfill ([axis, inplace, limit, downcast])	Synonym for DataFrame.fillna(method='bfill')
bool ()	Return the bool of a single element PandasObject.
cat	alias of pandas.core.arrays.categorical.CategoricalAccessor
clip ([lower, upper, axis, inplace])	Trim values at input threshold(s).
clip_lower (threshold[, axis, inplace])	Return copy of the input with values below a threshold truncated.
clip_upper (threshold[, axis, inplace])	Return copy of input with values above given value(s) truncated.
combine (other, func[, fill_value])	Perform elementwise binary operation on two Series using given function with optional fill value when an index is missing from one Series or the other
combine_first (other)	Combine Series values, choosing the calling Series’s values first.
compound ([axis, skipna, level])	Return the compound percentage of the values for the requested axis
compress (condition, args, *kwargs)	Return selected slices of an array along given axis as a Series
consolidate ([inplace])	(DEPRECATED) Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray).
convert_objects ([convert_dates, …])	(DEPRECATED) Attempt to infer better dtype for object columns.
copy ([deep])	Make a copy of this object’s indices and data.
corr (other[, method, min_periods])	Compute correlation with other Series, excluding missing values
count ([level])	Return number of non-NA/null observations in the Series
cov (other[, min_periods])	Compute covariance with Series, excluding missing values
cummax ([axis, skipna])	Return cumulative maximum over a DataFrame or Seriesaxis.
cummin ([axis, skipna])	Return cumulative minimum over a DataFrame or Seriesaxis.
cumprod ([axis, skipna])	Return cumulative product over a DataFrame or Seriesaxis.
cumsum ([axis, skipna])	Return cumulative sum over a DataFrame or Series axis.
describe ([percentiles, include, exclude])	Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
diff ([periods])	First discrete difference of element.
div (other[, level, fill_value, axis])	Floating division of series and other, element-wise (binary operator truediv).
divide (other[, level, fill_value, axis])	Floating division of series and other, element-wise (binary operator truediv).
divmod (other[, level, fill_value, axis])	Integer division and modulo of series and other, element-wise (binary operator divmod).
dot (other)	Matrix multiplication with DataFrame or inner-product with Series objects.
drop ([labels, axis, index, columns, level, …])	Return Series with specified index labels removed.
drop_duplicates ([keep, inplace])	Return Series with duplicate values removed.
dropna ([axis, inplace])	Return a new Series with missing values removed.
dt	alias of pandas.core.indexes.accessors.CombinedDatetimelikeProperties
duplicated ([keep])	Indicate duplicate Series values.
eq (other[, level, fill_value, axis])	Equal to of series and other, element-wise (binary operator eq).
equals (other)	Determines if two NDFrame objects contain the same elements.
ewm ([com, span, halflife, alpha, …])	Provides exponential weighted functions
expanding ([min_periods, center, axis])	Provides expanding transformations.
factorize ([sort, na_sentinel])	Encode the object as an enumerated type or categorical variable.
ffill ([axis, inplace, limit, downcast])	Synonym for DataFrame.fillna(method='ffill')
fillna ([value, method, axis, inplace, …])	Fill NA/NaN values using the specified method
filter ([items, like, regex, axis])	Subset rows or columns of dataframe according to labels in the specified index.
first (offset)	Convenience method for subsetting initial periods of time series data based on a date offset.
first_valid_index ()	Return index for first non-NA/null value.
floordiv (other[, level, fill_value, axis])	Integer division of series and other, element-wise (binary operator floordiv).
from_array (arr[, index, name, dtype, copy, …])	Construct Series from array.
from_csv (path[, sep, parse_dates, header, …])	(DEPRECATED) Read CSV file.
ge (other[, level, fill_value, axis])	Greater than or equal to of series and other, element-wise (binary operator ge).
get (key[, default])	Get item from object for given key (DataFrame column, Panel slice, etc.).
get_dtype_counts ()	Return counts of unique dtypes in this object.
get_ftype_counts ()	(DEPRECATED) Return counts of unique ftypes in this object.
get_value (label[, takeable])	(DEPRECATED) Quickly retrieve single value at passed index label
get_values ()	same as values (but handles sparseness conversions); is a view
groupby ([by, axis, level, as_index, sort, …])	Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns.
gt (other[, level, fill_value, axis])	Greater than of series and other, element-wise (binary operator gt).
head ( )	Return the first n rows.
hist ([by, ax, grid, xlabelsize, xrot, …])	Draw histogram of the input series using matplotlib
idxmax ([axis, skipna])	Return the row label of the maximum value.
idxmin ([axis, skipna])	Return the row label of the minimum value.
infer_objects ()	Attempt to infer better dtypes for object columns.
interpolate ([method, axis, limit, inplace, …])	Interpolate values according to different methods.
isin (values)	Check whether values are contained in Series.
isna ()	Detect missing values.
isnull ()	Detect missing values.
item ()	return the first element of the underlying data as a python scalar
items ()	Lazily iterate over (index, value) tuples
iteritems ()	Lazily iterate over (index, value) tuples
keys ()	Alias for index
kurt ([axis, skipna, level, numeric_only])	Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
kurtosis ([axis, skipna, level, numeric_only])	Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
last (offset)	Convenience method for subsetting final periods of time series data based on a date offset.
last_valid_index ()	Return index for last non-NA/null value.
le (other[, level, fill_value, axis])	Less than or equal to of series and other, element-wise (binary operator le).
lt (other[, level, fill_value, axis])	Less than of series and other, element-wise (binary operator lt).
mad ([axis, skipna, level])	Return the mean absolute deviation of the values for the requested axis
map (arg[, na_action])	Map values of Series using input correspondence (a dict, Series, or function).
mask (cond[, other, inplace, axis, level, …])	Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other.
max ([axis, skipna, level, numeric_only])	This method returns the maximum of the values in the object.
mean ([axis, skipna, level, numeric_only])	Return the mean of the values for the requested axis
median ([axis, skipna, level, numeric_only])	Return the median of the values for the requested axis
memory_usage ([index, deep])	Return the memory usage of the Series.
min ([axis, skipna, level, numeric_only])	This method returns the minimum of the values in the object.
mod (other[, level, fill_value, axis])	Modulo of series and other, element-wise (binary operator mod).
mode ()	Return the mode(s) of the dataset.
mul (other[, level, fill_value, axis])	Multiplication of series and other, element-wise (binary operator mul).
multiply (other[, level, fill_value, axis])	Multiplication of series and other, element-wise (binary operator mul).
ne (other[, level, fill_value, axis])	Not equal to of series and other, element-wise (binary operator ne).
nlargest ([n, keep])	Return the largest n elements.
nonzero ()	Return the integer indices of the elements that are non-zero
notna ()	Detect existing (non-missing) values.
notnull ()	Detect existing (non-missing) values.
nsmallest ([n, keep])	Return the smallest n elements.
nunique ([dropna])	Return number of unique elements in the object.
pct_change ([periods, fill_method, limit, freq])	Percentage change between the current and a prior element.
pipe (func, args, *kwargs)	Apply func(self, args, *kwargs)
plot	alias of pandas.plotting._core.SeriesPlotMethods
pop (item)	Return item and drop from frame.
pow (other[, level, fill_value, axis])	Exponential power of series and other, element-wise (binary operator pow).
prod ([axis, skipna, level, numeric_only, …])	Return the product of the values for the requested axis
product ([axis, skipna, level, numeric_only, …])	Return the product of the values for the requested axis
ptp ([axis, skipna, level, numeric_only])	Returns the difference between the maximum value and the minimum value in the object.
put (args, *kwargs)	Applies the put method to its values attribute if it has one.
quantile ([q, interpolation])	Return value at the given quantile, a la numpy.percentile.
radd (other[, level, fill_value, axis])	Addition of series and other, element-wise (binary operator radd).
rank ([axis, method, numeric_only, …])	Compute numerical data ranks (1 through n) along axis.
ravel ([order])	Return the flattened underlying data as an ndarray
rdiv (other[, level, fill_value, axis])	Floating division of series and other, element-wise (binary operator rtruediv).
reindex ([index])	Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.
reindex_axis (labels[, axis])	(DEPRECATED) Conform Series to new index with optional filling logic.
reindex_like (other[, method, copy, limit, …])	Return an object with matching indices to myself.
rename ([index])	Alter Series index labels or name
rename_axis (mapper[, axis, copy, inplace])	Alter the name of the index or columns.
reorder_levels (order)	Rearrange index levels using input order.
repeat (repeats, args, *kwargs)	Repeat elements of an Series.
replace ([to_replace, value, inplace, limit, …])	Replace values given in to_replace with value.
resample (rule[, how, axis, fill_method, …])	Convenience method for frequency conversion and resampling of time series.
reset_index ([level, drop, name, inplace])	Generate a new DataFrame or Series with the index reset.
rfloordiv (other[, level, fill_value, axis])	Integer division of series and other, element-wise (binary operator rfloordiv).
rmod (other[, level, fill_value, axis])	Modulo of series and other, element-wise (binary operator rmod).
rmul (other[, level, fill_value, axis])	Multiplication of series and other, element-wise (binary operator rmul).
rolling (window[, min_periods, center, …])	Provides rolling window calculations.
round ([decimals])	Round each value in a Series to the given number of decimals.
rpow (other[, level, fill_value, axis])	Exponential power of series and other, element-wise (binary operator rpow).
rsub (other[, level, fill_value, axis])	Subtraction of series and other, element-wise (binary operator rsub).
rtruediv (other[, level, fill_value, axis])	Floating division of series and other, element-wise (binary operator rtruediv).
sample ([n, frac, replace, weights, …])	Return a random sample of items from an axis of object.
searchsorted (value[, side, sorter])	Find indices where elements should be inserted to maintain order.
select (crit[, axis])	(DEPRECATED) Return data corresponding to axis labels matching criteria
sem ([axis, skipna, level, ddof, numeric_only])	Return unbiased standard error of the mean over requested axis.
set_axis (labels[, axis, inplace])	Assign desired index to given axis.
set_value (label, value[, takeable])	(DEPRECATED) Quickly set single value at passed label.
shift ([periods, freq, axis])	Shift index by desired number of periods with an optional time freq
skew ([axis, skipna, level, numeric_only])	Return unbiased skew over requested axis Normalized by N-1
slice_shift ([periods, axis])	Equivalent to shift without copying data.
sort_index ([axis, level, ascending, …])	Sort Series by index labels.
sort_values ([axis, ascending, inplace, …])	Sort by the values.
sortlevel ([level, ascending, sort_remaining])	(DEPRECATED) Sort Series with MultiIndex by chosen level.
squeeze ([axis])	Squeeze length 1 dimensions.
std ([axis, skipna, level, ddof, numeric_only])	Return sample standard deviation over requested axis.
str	alias of pandas.core.strings.StringMethods
sub (other[, level, fill_value, axis])	Subtraction of series and other, element-wise (binary operator sub).
subtract (other[, level, fill_value, axis])	Subtraction of series and other, element-wise (binary operator sub).
sum ([axis, skipna, level, numeric_only, …])	Return the sum of the values for the requested axis
swapaxes (axis1, axis2[, copy])	Interchange axes and swap values axes appropriately
swaplevel ([i, j, copy])	Swap levels i and j in a MultiIndex
tail ( )	Return the last n rows.
take (indices[, axis, convert, is_copy])	Return the elements in the given positional indices along an axis.
to_clipboard ([excel, sep])	Copy object to the system clipboard.
to_csv ([path, index, sep, na_rep, …])	Write Series to a comma-separated values (csv) file
to_dense ()	Return dense representation of NDFrame (as opposed to sparse)
to_dict ([into])	Convert Series to {label -> value} dict or dict-like object.
to_excel (excel_writer[, sheet_name, na_rep, …])	Write Series to an excel sheet
to_frame ([name])	Convert Series to DataFrame
to_hdf (path_or_buf, key, **kwargs)	Write the contained data to an HDF5 file using HDFStore.
to_json ([path_or_buf, orient, date_format, …])	Convert the object to a JSON string.
to_latex ([buf, columns, col_space, header, …])	Render an object to a tabular environment table.
to_msgpack ([path_or_buf, encoding])	msgpack (serialize) object to input file path
to_period ([freq, copy])	Convert Series from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)
to_pickle (path[, compression, protocol])	Pickle (serialize) object to file.
to_sparse ([kind, fill_value])	Convert Series to SparseSeries
to_sql (name, con[, schema, if_exists, …])	Write records stored in a DataFrame to a SQL database.
to_string ([buf, na_rep, float_format, …])	Render a string representation of the Series
to_timestamp ([freq, how, copy])	Cast to datetimeindex of timestamps, at beginning of period
to_xarray ()	Return an xarray object from the pandas object.
tolist ()	Return a list of the values.
transform (func, args, *kwargs)	Call function producing a like-indexed NDFrame and return a NDFrame with the transformed values
transpose (args, *kwargs)	return the transpose, which is by definition self
truediv (other[, level, fill_value, axis])	Floating division of series and other, element-wise (binary operator truediv).
truncate ([before, after, axis, copy])	Truncate a Series or DataFrame before and after some index value.
tshift ([periods, freq, axis])	Shift the time index, using the index’s frequency if available.
tz_convert (tz[, axis, level, copy])	Convert tz-aware axis to target time zone.
tz_localize (tz[, axis, level, copy, ambiguous])	Localize tz-naive TimeSeries to target time zone.
unique ()	Return unique values of Series object.
unstack ([level, fill_value])	Unstack, a.k.a.
update (other)	Modify Series in place using non-NA values from passed Series.
valid ([inplace])	(DEPRECATED) Return Series without null values.
value_counts ([normalize, sort, ascending, …])	Returns object containing counts of unique values.
var ([axis, skipna, level, ddof, numeric_only])	Return unbiased variance over requested axis.
view ([dtype])	Create a new view of the Series.
where (cond[, other, inplace, axis, level, …])	Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
xs (key[, axis, level, drop_level])	Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航