您的位置:首页 > 编程语言 > Python开发

Python 学习中遇到的各种问题

2015-11-13 21:49 645 查看

O’Reilly出版的Wes McKenny编的《Python for Data Analysis》, 采用Anaconda3集成环境

1.1 Movielens数据的处理例子,输出前五个用户信息。代码如下:

import pandas as pd
unames = ['user_id', 'gender', 'age', 'occupationb', 'zip']
users = pd.read_table('ch02/movielens/users.dat', sep = "::", header = None, names = uname)


报错信息:

D:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py:648: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators;you can avoid this warning by specifying engine='python'.  ParserWarning)


修改方法:在users语句末尾加上 engine=’python’

users = pd.read_table('ch02/movielens/users.dat', sep = "::", header = None, names = uname,engine='python')


1.2 在MovieLens 1M数据集例子,使用pivot_table()按性别计算每部电影的平均得分

mean_ratings = data.pivot_table('rating', rows = 'title', cols = 'gender', aggfunc = 'mean')


报错信息:

Traceback (most recent call last):

File "<ipython-input-28-669a36c33797>", line 1, in <module>
mean_ratings = data.pivot_table('rating', rows = 'title', cols = 'gender', aggfunc = 'mean')

TypeError: pivot_table() got an unexpected keyword argument 'rows'


修改方法:用 index 替换 rows,用 columns 替换 cols

mean_ratings = data.pivot_table('rating', index= 'title', columns= 'gender', aggfunc = 'mean')


1.3 Numpy.cumsum 计算累积和

# 创建 5*4 的数组
In[1]: arr = np.arange(1, 21).reshape(5, 4)
Out[1]:
([[ 1,  2,  3,  4],
[ 5,  6,  7,  8],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20]])

# numpy.cumsum() 求取累计和
In [2]: arr.cumsum()
Out[2]:
array([  1,   3,   6,  10,  15,  21,  28,  36,  45,  55,  66,  78,  91, 105, 120, 136, 153, 171, 190, 210])

# numpy.cumsum(0) 当参数为0时, 按行累计和,在每一列中累加
In [3]: arr.cumsum(0)
Out[3]:
array([[ 1,  2,  3,  4],
[ 6,  8, 10, 12],
[15, 18, 21, 24],
[28, 32, 36, 40],
[45, 50, 55, 60]])
# numpy.cumsum(1) 当参数为1时, 按列累计和,在每一行中累加
In [4]: arr.cumsum(1)
Out[4]:
array([[ 1,  3,  6, 10],
[ 5, 11, 18, 26],
[ 9, 19, 30, 42],
[13, 27, 42, 58],
[17, 35, 54, 74]])
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: