您的位置：首页 > 编程语言 > Python开发

python csv

2016-06-18 20:45 369 查看

参考：

csv - CSV File Reading and Writing：https://docs.python.org/2.7/library/csv.html?highlight=csv#module-csv

#################################################################

CSV（Comma Separated Values，逗号分隔符）格式是电子表格和数据库中使用最频繁的。CSV格式默认使用逗号进行分隔，但是并没有标准格式，所以使用其他符号（比如，句号，下划线等）进行分隔都是允许的，具体分隔符由实际开发者决定。

也正是因为没有所谓的标准格式，所以在处理不同的CSV文件时可能会比较复杂。CSV模块的出现能够让开发者忽略读写数据的详细细节，从而更方便的进行开发使用。同时，它也支持从不同格式的CSV文件中进行读写操作（需要自己设定）。

导入CSV模块：

import csv

测试数据－test.csv文件：

123,456,789
asd,fgh,jkl

##################################

首先介绍csv模块的reader函数：

csv.reader(csvfile, dialect='excel', **fmtparams)

参数：

csvfile：必选参数，支持迭代器协议（iterator protocol），可以是文件对象（file object）或者列表对象（list object）。如果是文件对象，那么打开此文件时必须加上'b'符号（以二进制形式打开，If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.）

dialect / fmtparams：可选参数，下文再讲

函数功能：返回一个迭代器（iterator），每次输出csvfile的一行，并不自动执行数据类型转换

测试１：读一个csv文件

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open('test.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print row

结果：每次输出csv文件的一行，转换为一组字符串

测试２：读一个列表

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

csvfile = ["123, 456, 789", "asd, fgh, jkl"]

reader = csv.reader(csvfile)
for row in reader:
print row

结果：csv模块根据默认分隔符逗号进行解析，并不会取消字符串之间的空格

但有时某些文件使用其他分隔符进行分隔，那么我们需要设定分隔符，使用delimiter参数：

测试数据使用句号进行分隔：

123.456.789
asd.fgh.jkl

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open('test.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter='.')
for row in reader:
print row

测试数据使用空格进行分隔：

123 456 789
asd fgh jkl

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open('test.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
print row

#############################################3

接下来介绍的就是csv模块的writer函数：

csv.writer(csvfile, dialect='excel', **fmtparams)

参数：

csvfile：必选参数，csvfile可以是任何含有write()方法的对象。如果csvfile是文件对象，那么必须在打开文件时使用'b'参数（If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference. ）

dialect / fmtparams：可选参数，下文再讲

函数功能：返回一个写入器（writer object）。在使用写入器的过程中，会将所有非字符串数据转换为字符串数据。

有两个写入数据的函数：

１．writerow - 每次写入一行数据

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

se = [['asd','fgh','jkl'], [123,456,789]]

with open("test2.csv", 'wb') as csvfile:
writer = csv.writer(csvfile)
for row in se:
writer.writerow(row)

结果：

asd,fgh,jkl
123,456,789

２．writerows－一次性将所有数据写入

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

se = [['asd','fgh','jkl'], [123,456,789]]

with open("test2.csv", 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(se)

结果：

asd,fgh,jkl
123,456,789

如果想要在文件中加入数据，而不是清空已存在内容在加入，可以修改如下：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("test2.csv", 'ab') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
writer.writerow([2,3,5])

test2.csv原有内容：

asd fgh jkl
123 456 789

结果：

asd fgh jkl
123 456 789
2 3 5

如果想要重复输入某些数据，可以直接在输入函数中进行操作：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

se = [['asd','fgh','jkl'], [123,456,789]]

with open("test2.csv", 'wb') as csvfile:
writer = csv.writer(csvfile)
for row in se:
writer.writerow(row*2)

将每行数据重复一次

结果：

asd,fgh,jkl,asd,fgh,jkl
123,456,789,123,456,789

仅仅想要重复其中一些数据：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

se = [['asd','fgh','jkl'], [123,456,789]]

with open("test2.csv", 'wb') as csvfile:
writer = csv.writer(csvfile)
for row in se:
writer.writerow(row+["adf"]*2)

结果：

asd,fgh,jkl,adf,adf
123,456,789,adf,adf

如果我们想要改变分隔符，如果使用空格，可以使用delimiter参数进行操作：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

se = [['asd','fgh','jkl'], [123,456,789]]

with open("test2.csv", 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
for row in se:
writer.writerow(row)

结果：

asd fgh jkl
123 456 789

reader和writer函数仅对序列（sequence，比如列表list，元祖tuple等）数据进行处理

#########################################################3

通过上面对于reader和writer函数的学习，相信已经能够基本覆盖你对csv的需求。但如果想要一些更加高级的用法，还需要继续下面的内容

########################################################3333333

首先介绍DictReader函数：

class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)

参数：

csvfile：必选参数，要求与reader函数类似，但是不需要在csvfile是文件对象时使用'b'参数打开文件

fieldnames：可选参数，是一个序列，每个值都是一个键，按序配对每一行数据的值，组成一个键－值对。如果没有指定，那么第一行数据为fieldnames。默认为None，也就是默认第一行字符串列表为fieldnames。

restkey：可选参数，是一个hashable type数据（比如，字符串，数字等）。当fieldnames中的键的个数小于某一行字符串的个数时，先按序进行配对，剩余的值均与restkey进行配对。默认为None，也就是None为键。

restval：可选参数，可以是一个序列，也可以是单个字符串，数字等。当fieldnames中的键的个数大于某一行字符串的个数时，先按序进行配对，剩余的键均与restval进行配对。默认为None，也就是None为值。

dialect='excel', *args, **kwds：暂不讨论

函数功能：从csvfile对象中读取数据，以字典形式返回

测试数据：

111,222
one,two
three,four
five,six

１．最简单的DictReader函数使用：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("dict.csv") as csvfile:
dictReader = csv.DictReader(csvfile)
for row in dictReader:
print row

２．指定fieldnames：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

names = ['first', 'second']

with open("dict.csv") as csvfile:
dictReader = csv.DictReader(csvfile, fieldnames=names)
for row in dictReader:
print row

３．键数小于值数

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

names = ['first']

with open("dict.csv") as csvfile:
dictReader = csv.DictReader(csvfile, fieldnames=names)
for row in dictReader:
print row

４．键数大于值数

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

names = ['first', 'second', 'third']

with open("dict.csv") as csvfile:
dictReader = csv.DictReader(csvfile, fieldnames=names)
for row in dictReader:
print row

５．指定restkey

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

names = ['first']
restk = 'restkey'

with open("dict.csv") as csvfile:
dictReader = csv.DictReader(csvfile, fieldnames=names, restkey=restk)
for row in dictReader:
print row

６．指定restval

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

names = ['first', 'second', 'third']
restv = 'restval'

with open("dict.csv") as csvfile:
dictReader = csv.DictReader(csvfile, fieldnames=names, restval=restv)
for row in dictReader:
print row

７．同时，可以指定分隔符

测试数据：

111 222
one two
three four
five six

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("dict.csv") as csvfile:
dictReader = csv.DictReader(csvfile, delimiter=' ')
for row in dictReader:
print row['111'],row['222']

##################################################3

介绍完读函数后，就是写函数DictWriter：

class csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)

参数：

csvfile：必选参数，类型同writer函数一致，但是不需要在csvfile是文件对象时使用'b'参数打开文件。

fieldnames：必选参数，指定输入csvfile对象的键及其顺序，是一个序列。

restval：可选参数，当fieldnames的键数大于输入字典的值的个数时，此时缺失键对应的值由restval代替。restval默认为空字符。

extrasaction：可选参数，当输入的键在fieldnames不存在时，激活这个参数。如果参数值为'raise'，那么将抛出一个ValueError；如果参数值为'ignore'，则忽略缺失的键及其对应的值，并不抛出错误。

dialect='excel', *args, **kwds：暂不讨论

函数功能：将字典的值写入csvfile对象

１．最简单的DictWriter函数使用：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

dic = {11:"first", 22:'second', 'third':33, "four":44}

with open("dict2.csv", 'w') as csvfile:
ke = dic.keys()
va = dic.values()
print "keys:",ke
print "values:",va
dictWriter = csv.DictWriter(csvfile, fieldnames=ke)
dictWriter.writerow(dic)

dict2.csv文件：

44,first,33,second

同样的，DictWriter函数也有writerow和writerows这两个函数，并且，也可以使用delimiter参数设定分隔符。

２．抛出ValueError

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

dic = {11:"first", 22:'second', 'third':33, "four":44}

with open("dict2.csv", 'w') as csvfile:
ke = dic.keys()
va = dic.values()
print "keys:",ke
print "values:",va
ke.pop(0)
dictWriter = csv.DictWriter(csvfile, fieldnames=ke)
dictWriter.writerow(dic)

３．使用'ignore'

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

dic = {11:"first", 22:'second', 'third':33, "four":44}

with open("dict2.csv", 'w') as csvfile:
ke = dic.keys()
ke.pop(0)
va = dic.values()
print "keys:",ke
print "values:",va
dictWriter = csv.DictWriter(csvfile, fieldnames=ke, extrasaction='ignore')
dictWriter.writerow(dic)

dict2.csv文件：

first,33,second

4．使用restval

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

dic = {11:"first", 22:'second', 'third':33, "four":44}

with open("dict2.csv", 'w') as csvfile:
ke = dic.keys()
ke.append('adf')
va = dic.values()
print "keys:",ke
print "values:",va
dictWriter = csv.DictWriter(csvfile, fieldnames=ke, restval="hello")
dictWriter.writerow(dic)

dict2.csv文件：

44,first,33,second,hello

######################################3

下面介绍csv模块中的几个重要的类：

类csv.Dialect：

是一个容器类，里面包含了很多的属性，用来定义特定reader或者writer实例的参数。通常使用它的子类进行操作：

类csv.excel：

Dialect的子类，定义了表格模式CSV文件（Excel-generated CSV file）的常用特性。它已经进行了注册，且注册名为"excel"。所以，在read，writer，DictReader，DictWriter函数中均默认使用dialect='excel'。

#########################################

下面解释Dialect中常用的属性：

Dialect.delimiter：

单个字符的字符串，用于分隔字段。默认为','（逗号）。

Dialect.quotechar：

单个字符的字符串。如果字段中包含了特殊字符（比如分隔符，或者引用字符等（ or which contain new-line characters）），则使用quotechar将该字段包括在内。默认使用'"'（双引号）。

Dialect.skipinitalspace：

若为True，则忽略跟在分隔符后面的空格。默认为False

Dialect.quoting：

指定特殊字段的范围，用于quotechar的使用。其值为csv模块设定好的一些常数（如下所示），默认常数为QUOTE_MINIMAL。

Dialect.doublequote：

默认为True，表示当字段中存在quotechar时，重复两次该字段中的quotechar；若为False，则使用escapechar字符作为该字段中的quotechar中的前缀。

如果在输出时，doublequote设置为False，并且没有设置escapechar，那么当在字段中找到quotechar时，将抛出一个Error

Dialect.escapechar：

单个字符的字符串，如果quoting字符设置为QUOTE_NONE时，那么当字段中出现分隔符时，仅在该字段的分隔符前面加上escapechar前缀，或者当字段中出现quotechar时，在该quotechar前面加上escapechar前缀即可。如果quoting为其他值而doublequote为False时，那么仅对字段中的引用符（quotechar）有用。默认为None，表示没有设置escapechar字符。

###############################################3

csv模块的常数：

csv.QUOTE_ALL:

控制writer对象引用所有字段

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("test.csv", 'wb') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
writer.writerow(['adsf', 'dd'])
writer.writerow([23, 33])

test.csv文件：

"adsf","dd"
"23","33"

csv.QUOTE_MINIMAL：

仅引用含有特殊字符的字段，比如，分隔符，引用符以及终止符中的字符（delimiter, quotechar or any of the characters in lineterminator）。

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("test.csv", 'wb') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL)
writer.writerow(['adsf', 'dd,asdf'])
writer.writerow([23, 33])

test.csv文件：

adsf,"dd,asdf"
23,33

csv.QUOTE_NONNUMERIC

控制writer对象去引用所有非数字字段

控制reader对象转换所有非引用字段为float类型数据

写文件

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("test.csv", 'wb') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(['adsf', 'dd asdf'])
writer.writerow([23, 33])

test.csv文件：

"adsf","dd asdf"
23,33

读文件

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("test.csv", 'rb') as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)
for row in reader:
print row

csv.QUOTE_NONE：

通知writer对象不对任何字段进行引用。当分隔符或者引用符出现在字段中时，使用escapechar字符作为前缀。如果escapchar没有设置，那么writer抛出Error。

通知reader对象不对文件中的任何引用字段进行处理。

写文件：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("test.csv", 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', escapechar='_', quoting=csv.QUOTE_NONE)
writer.writerow(['adf"ad"da sd', 'dd'])
writer.writerow([23, 32])

test.csv文件：

adf_"ad_"da_ sd dd
23 32

读文件：

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import csv

with open("test.csv", 'rb') as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONE)
for row in reader:
print row

以上这些csv模块的常数均有确定数字与其对应：

#####################################################

可以自己注册一个dialect子类，然后进行处理

csv.register_dialect(name, [dialect, ]**fmtparams)

直接使用键－值方式在参数中进行处理

csv.register_dialect("hello", delimiter=':', quoting=csv.QUOTE_NONE)

同样有注销函数：

csv.unregister_dialect(name)

如果想要查看已注册哪些函数，可以使用函数:

csv.list_dialects()

说明csv模块默认注册了两个Dialect子类

################################################

csv模块还有一个用处，解析字符串中的数字：

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航