您的位置:首页 > 其它

pandas versus excel 学习总结1—excel相关操作

2018-11-28 14:11 337 查看

最近在网易云听了听pandas versus excel这个课,感觉Timothy老师讲的蛮好的,对所讲解的知识点进行一下总结,已备之后翻阅。

1.创建一个excel文件

  • 主要使用to_excel(‘filename’)
import os
import pandas as pd
DIR = os.path.dirname(__file__)
filename = os.path.join(DIR,'output.xlsx')
df = pd.DataFrame({'id':[1,2,3],'name':['liu','jiang','kobe']})
df = df.set_index('id')
df.to_excel(filename)

2.读取一个excel

  • 使用read_excel(‘filename’,header=),None时是没有头,1是以第二行为头,以此类推
  • shape:返回行列数目
  • columns:访问和设置列名
  • head(),tail():前后几行
  • set_index(‘ID’, inplace=True) 第一个是以某列为索引,设置inplace为True避免使用people = people.set_index()
  • List item
import pandas as pd
people = pd.read_excel('People.xlsx')#header=None, 1
print(people.shape)
people.columns = ['ID', 'Type', 'Title', 'FirstName', 'MiddleName', 'LastName']
people.set_index('ID',inplace=True)
people.to_excel('People.xlsx')
print(people.columns)
print(people.head())
print('-----------'*10)
print(people.tail())

3.行、列、单元格

3.1.序列Series:与字典d = {‘x’:100,‘y’:200,‘z’:300}很像,两种方式创建

import pandas as pd
d = {'x':100,'y':200,'z':300}
s1 = pd.Series(d)

l1 = [100,200,300]
l2 = ['x','y','z']
s2 = pd.Series(l1,index=l2)

3.2.Series与DataFrame创建excel

s1 = pd.Series([1,2,3],index=[1,2,3],name='A')
s2 = pd.Series([10,20,30],index=[1,2,3],name='B')
s3 = pd.Series([100,200,300],index=[2,3,4],name='C')#索引并集,填充nan
df = pd.DataFrame({s1.name:s1,s2.name:s2,s3.name:s3})
# df = pd.DataFrame([s1,s2,s3])#反转了
df.to_excel('filename')

3.3.数据填充

  • skiprows:跳过几行
  • usecols:使用哪几列
  • index_col:设置索引
  • dtype:转换数据类型,当空值时默认是float64型,要转换下才可操作
  • .at[] :定位到索引值所在行
import pandas as pd
from  datetime import date,timedelta
def add_month(d,md):
yd = md//12
m = d.month + md%12
if m != 12:
yd += m//12
m = m % 12
return date(d.year+yd,m,d.day)
books = pd.read_excel('Books.xlsx',skiprows=3,usecols='C:F',index_col=None,dtype={'ID':str,'InStore':str,'Date':str})
start = date(2018,1,1)
for i in books.index:
# books.at[i,'ID'] = i + 1 	#根据DataFrame直接修改
books['ID'].at[i]= i +1   	#变成Series修改
books['InStore'].at[i]= 'Yes' if i%2==0 else 'No'
# books['Date'].at[i] = start + timedelta(days=i)
books['Date'].at[i] = add_month(start,i)
print(books)

3.4.计算填充

  • 将列进行运算填充到另一列
import pandas as pd
books = pd.read_excel('books.xlsx',index_col="ID")
books['Price'] = books['ListPrice']*books['Discount']
# books['Price'] = books['Price'].apply(lambda x:x+2) #列自运算
books.to_excel('booksop.xlsx')
print(books)

3.5数据排序

  • sort_valuse:默认ascendin=False是从大到小
import pandas as pd
s = pd.read_excel('List.xlsx',index_col='ID')
s.sort_values(by=['Worthy','Price'],inplace=True,ascending=[True,False])
print(s)

3.6数据筛选

def age_range(a):
return 18<=a<30
def level_a(g):
return 85<=g<+100
import pandas as pd
s = pd.read_excel('Students.xlsx',index_col='ID')
s = s.loc[s['Age'].apply(age_range)].loc[s['Score'].apply(level_a)]
# s = s.loc[s['Age'].apply(lambda a:18<=a<30)]\
#     .loc[s['Score'].apply(lambda g:85<=g<+100)]
print(s)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: