您的位置：首页 > 编程语言 > Python开发

python的pandas处理数据第一次

2016-07-02 10:36 429 查看

一、这是kaggle上面的泰坦尼克号题，看帖子上有数据探索过程，照着做了一下，感觉跟R差不多，应该是我还没有深入学吧。

二、matplotlib的作图子包pyplot先学一下，plt.figure()是定义一个图像，再用figure.add_subplot()方法增加子图，设置图的排列顺序。

import pandas as pd

import matplotlib.pyplot as plt

dt=pd.read_csv('C:\\Users\\Administrator\\Desktop\\tatannic\\train.csv')

#第二个图，年纪和人数的关系

age=dt.Age

mean=age.mean()

age=age.fillna(mean)

fig=plt.figure()

ax=fig.add_subplot(1,2,1)

ax.hist(age,bins=10)

plt.xlabel('Age')

plt.ylabel('Count of people')

#第二个图，船票和人数的关系

fare=dt.Fare

ax=fig.add_subplot(1,2,2)

ax.hist(fare,bins=10)

plt.xlabel('fare')

plt.show()

#画船票和人数的箱子图

fig2=plt.figure()

ax=fig2.add_subplot(1,1,1)

ax.boxplot(fare)

plt.xlabel('fare')

plt.ylabel('Count of people')

三、数据的groupby、

一直按照结合sql语句、R语言，python学习数据处理，基本上思想都是处理一个表。

pandas中的groupby是一个方法，参数是某一列。

>dt.groupby(['Pclass','Survived']).Pclass.count()

Out[52]:

Pclass Survived

1 0 80

1 136

2 0 97

1 87

3 0 372

1 119

Name: Pclass, dtype: int64

会按照Pclass和Survived聚合，而且会分别做统计。

统计每个Pclass每一类字段对应的Survived属性的总量。

>dt.groupby('Pclass').Survived.count()

Out[66]:

Pclass

1 216

2 184

3 491

统计每个Pclass每一类字段对应的Survived属性为1的总量

>dt.groupby(['Pclass']).Survived.sum()

Out[71]:

Pclass

1 136

2 87

3 119

Name: Survived, dtype: int64

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航