您的位置:首页 > 其它

pandas入门——数据合并merge函数

2017-08-15 19:56 671 查看

数据合并merge函数

创建数据集

# 导入pandas和numpy包
import pandas as pd
import numpy as np

# 创建两个数据框
df_left = pd.DataFrame(data=np.ones((5,6)),columns=["a","b","c","d","e","f"],index=["k1","k2","k3","k4","k5"])
df_right = pd.DataFrame(data=np.ones((5,6))*2,columns=["e","f","g","h","j","k"],index=["k3","k4","k5","k6","k7"])

df_left["key1"] = ["k1","k0","k0","k1","k1"]
df_left["key2"] = ["k0","k0","k1","k1","k0"]

df_right["key1"] = ["k1","k0","k0","k0","k1"]
df_right["key2"] = ["k0","k1","k1","k1","k0"]

print(df_r
4000
ight)
print(df_left)


e   f   g   h   j   k   key1    key2
k3  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0
k4  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k5  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k6  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k7  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0

a   b   c   d   e   f   key1    key2
k1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0
k2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0


merge默认的合并方式是inner

print(pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="inner"))


a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k
0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
1   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
2   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
3   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
4   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
5   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
6   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0


merge的合并方式是outer 并显示出merge的方式

pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="outer",indicator=True)


a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _merge
0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
1   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
2   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
3   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
4   1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_only
5   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
6   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
7   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
8   1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  NaN NaN NaN NaN NaN NaN left_only


使用left的方式进行合并 并指定索引位进行合并

pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="left",left_index=True,right_index=True,indicator=True)


a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _merge
k1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  NaN NaN NaN NaN NaN NaN left_only
k2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_only
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both


使用right的方式进行合并 并指定索引位进行合并 且对数据追加后缀

pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="right",left_index=True,right_index=True,indicator=True,suffixes=("_left","_right"))


a   b   c   d   e_left  f_left  key1    key2    e_right f_right g   h   j   k   _merge
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
k6  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only
k7  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  pandas