Python文件操作常用的API(open函数使用)
2019-03-13 21:24
1021 查看
打开文件,获取文件对象
fp = open(file, mode, encoding) # file:要操作的文件路径, 使用的时候注意目录的拼接 # mode: 打开方式 # encoding: 编码方式
- 关于第一个参数file需要注意的是:
文件的路径,使用的时候要注意文件目录的拼接,在项目里时常会有人忘记拼接路径而出错。
路径问题查阅: os.path
-
关于第二个参数mode:
-
关于第三个参数encoding需要注意的是:
这个指定用来指定编码方式,通常不指定,系统会自动识别。举例下面这个我曾经遇到的情况来说明问题:
不同操作系统上,默认的编码是不一样的,这个需要注意,比如在windows操作系统上,默认新建的文件的编码格式是gbk的,linux上则是默认的unicode。
写入文件的数据如果需要指定编码格式,比如我使用requests库爬取网页,将response的内容保存到文件中,写入文件时候会出错,而且response是一个HTML文件,保存之后用浏览器打开,也有可能出现了页面乱码的问题,那么这个时候就需要指定该参数。
import requests response = requests.get('https://www.baidu.com') with open('baidu.html', 'w+') as f: f.write(response.text)
运行文件会报错如下:
UnicodeEncodeError: 'gbk' codec can't encode character '\xe7' in position 318: illegal multibyte sequence
这就是写入的数据的编码格式与系统默认文件格式不同导致的,第一次进行代码修改,结果如下
import requests response = requests.get('https://www.baidu.com') with open('baidu.html', 'w+', encoding='utf-8') as f: f.write(response.text)
修改之后没有出现了错误,但是用浏览器打开保存的HTML文件,会发现页面乱码的问题,当然,这个乱码的根本原因就是字符集的问题(浏览器也有默认打开文件的编码方式,如果有乱码的问题,考虑以下这个问题,还有response返回的HTML文件的charset我们使用的百度的charset是UTF-8的,当然这个编码格式问题不是我们这篇文章主要研讨的),我们以open函数的使用来寻求以下两种解决方式:
import requests response = requests.get('https://www.baidu.com') with open('baidu.html', 'w+', encoding='utf-8') as f: # response.content是bytes类型的内容,对bytes数据进行编码 f.write(response.content.decode('utf-8')) # 将二进制的内容写入文件 with open('baiduBytes.html', 'wb+') as f: f.write(response.content)
文件的打开关闭通常使用上下文管理:with语句
如果有兴趣,可以看看我在本站写的关于with的一篇博客。
- 关于函数返回值:文件对象
open函数返回值:文件对象,它是一个迭代器,使用for in语句可以对内容进行遍历,也可以使用next()函数进行操作。
准备一个文件,内容如下:
with open('flydb.json', 'r+') as f: for line in f: print(repr(line)) 输出结果: '{"name":"marsen", "age":18},\n' '{"name":"marsen", "age":17},\n' '{"name":"marsen", "age":16},\n' # 每一个line(str类型)内容都带有换行符\n,直接print(line)的话,行之间会出现换行。 with open('flydb.json', 'r+') as f: lines = f.readlines() print(lines) 输出结果: ['{"name":"marsen", "age":18},\n', '{"name":"marsen", "age":17},\n', '{"name":"marsen", "age":16},\n']
大多数情况下,我们直接使用for进行循环处理。但是,在开发过程中有时也需要对迭代做更加精确的控制,这时候就需要了解一下更细致的迭代机制了。 下面以文件对象来拓展探讨一下迭代的基本细节:
# 使用next()函数进行遍历 with open('flydb.json', 'r+') as f: print(next(f)) print(next(f)) print(next(f)) 输出结果: {"name":"marsen", "age":18}, {"name":"marsen", "age":17}, {"name":"marsen", "age":16}, 注意因为f是迭代器,在next到f的最后一行的时候,next会出现`StopIteration`的错误;, StopIteration 用来指示迭代的结尾。可以参考以下两种方式修改代码: with open('flydb.json', 'r+') as f: try: while True: line = next(f) print(line, end='') except StopIteration: pass 输出结果: {"name":"marsen", "age":18}, {"name":"marsen", "age":17}, {"name":"marsen", "age":16}, 使用next()函数的第二个参数,当next到最后的时候,函数不会报错而是返回这个指定的默认值, 利用这个我们可以通过返回一个我们自己的指定值来标记结尾; with open('flydb.json', 'r+') as f: while True: line = next(f, None) if line is None: break print(line, end='') 输出结果: {"name":"marsen", "age":18}, {"name":"marsen", "age":17}, {"name":"marsen", "age":16},
附上open函数的详细API文档,
def open(file, mode='r', buffering=None, encoding=None, errors=None, newline=None, closefd=True): # known special case of open """ Open file and return a stream. Raise IOError upon failure. file is either a text or byte string giving the name (and the path if the file isn't in the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.) mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. Other common values are 'w' for writing (truncating the file if it already exists), 'x' for creating and writing to a new file, and 'a' for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.) The available modes are: ========= =============================================================== Character Meaning --------- --------------------------------------------------------------- 'r' open for reading (default) 'w' open for writing, truncating the file first 'x' create a new file and open it for writing 'a' open for writing, appending to the end of the file if it exists 'b' binary mode 't' text mode (default) '+' open a disk file for updating (reading and writing) 'U' universal newline mode (deprecated) ========= =============================================================== The default mode is 'rt' (open for reading text). For binary random access, the mode 'w+b' opens and truncates the file to 0 bytes, while 'r+b' opens the file without truncation. The 'x' mode implies 'w' and raises an `FileExistsError` if the file already exists. Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn't. Files opened in binary mode (appending 'b' to the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is appended to the mode argument), the contents of the file are returned as strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given. 'U' mode is deprecated and will raise an exception in future versions of Python. It has no effect in Python 3. Use newline to control universal newlines mode. buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows: * Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device's "block size" and falling back on `io.DEFAULT_BUFFER_SIZE`. On many systems, the buffer will typically be 4096 or 8192 bytes long. * "Interactive" text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files. encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent, but any encoding supported by Python can be passed. See the codecs module for the list of supported encodings. errors is an optional string that specifies how encoding errors are to be handled---this argument should not be used in binary mode. Pass 'strict' to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass 'ignore' to ignore errors. (Note that ignoring encoding errors can lead to data loss.) See the documentation for codecs.register or run 'help(codecs.Codec)' for a list of the permitted encoding error strings. newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows: * On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated. * On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string. If closefd is False, the underlying file descriptor will be kept open when the file is closed. This does not work when a file name is given and must be True in that case. A custom opener can be used by passing a callable as *opener*. The underlying file descriptor for the file object is then obtained by calling *opener* with (*file*, *flags*). *opener* must return an open file descriptor (passing os.open as *opener* results in functionality similar to passing None). open() returns a file object whose type depends on the mode, and through which the standard file operations such as reading and writing are performed. When open() is used to open a file in a text mode ('w', 'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open a file in a binary mode, the returned class varies: in read binary mode, it returns a BufferedReader; in write binary and append binary modes, it returns a BufferedWriter, and in read/write mode, it returns a BufferedRandom. It is also possible to use a string or bytearray as a file for both reading and writing. For strings StringIO can be used like a file opened in a text mode, and for bytes a BytesIO can be used like a file opened in a binary mode. """ pass
相关文章推荐
- python中对文件、文件夹(文件操作函数)的操作 整理API
- Python OS 文件操作模块常用函数
- Python中shutil模块的常用文件操作函数用法示例
- python中操作文件函数open的简单操作实例
- 【python】open函数文件操作读、写和转义符‘\n’
- Python open()函数文件打开、读、写操作详解
- Python之文件操作及常用函数
- C#的File类中常用的文件操作函数(方法)及其使用
- python常用的文件目录操作函数
- Python文件或目录操作的常用函数
- PYTHON文件操作常用函数
- Python文件或目录操作的常用函数
- python基础:os模块中关于文件/目录常用的函数使用方法
- Python open()函数文件打开、读、写操作详解
- 第31课 Python列表的基本操作使用与常用函数
- os、os.path 模块中关于文件、目录常用的函数使用方法(python)
- python文件操作常用api
- Python文件或目录操作的常用函数
- Python open()函数文件打开、读、写基础操作
- Python OS 文件操作模块常用函数