您的位置:首页 > 编程语言 > Python开发

Python文件操作常用的API(open函数使用)

2019-03-13 21:24 1021 查看

打开文件,获取文件对象

fp = open(file, mode, encoding)
# file:要操作的文件路径, 使用的时候注意目录的拼接
# mode: 打开方式
# encoding: 编码方式
  • 关于第一个参数file需要注意的是:
    文件的路径,使用的时候要注意文件目录的拼接,在项目里时常会有人忘记拼接路径而出错。

路径问题查阅: os.path

  • 关于第二个参数mode:

  • 关于第三个参数encoding需要注意的是:
    这个指定用来指定编码方式,通常不指定,系统会自动识别。

    举例下面这个我曾经遇到的情况来说明问题:
    不同操作系统上,默认的编码是不一样的,这个需要注意,比如在windows操作系统上,默认新建的文件的编码格式是gbk的,linux上则是默认的unicode。
    写入文件的数据如果需要指定编码格式,比如我使用requests库爬取网页,将response的内容保存到文件中,写入文件时候会出错,而且response是一个HTML文件,保存之后用浏览器打开,也有可能出现了页面乱码的问题,那么这个时候就需要指定该参数。

import requests

response = requests.get('https://www.baidu.com')

with open('baidu.html', 'w+') as f:
f.write(response.text)

运行文件会报错如下:

UnicodeEncodeError: 'gbk' codec can't encode character '\xe7' in position 318: illegal multibyte sequence

这就是写入的数据的编码格式与系统默认文件格式不同导致的,第一次进行代码修改,结果如下

import requests

response = requests.get('https://www.baidu.com')

with open('baidu.html', 'w+', encoding='utf-8') as f:
f.write(response.text)

修改之后没有出现了错误,但是用浏览器打开保存的HTML文件,会发现页面乱码的问题,当然,这个乱码的根本原因就是字符集的问题(浏览器也有默认打开文件的编码方式,如果有乱码的问题,考虑以下这个问题,还有response返回的HTML文件的charset我们使用的百度的charset是UTF-8的,当然这个编码格式问题不是我们这篇文章主要研讨的),我们以open函数的使用来寻求以下两种解决方式:

import requests

response = requests.get('https://www.baidu.com')

with open('baidu.html', 'w+', encoding='utf-8') as f:
# response.content是bytes类型的内容,对bytes数据进行编码
f.write(response.content.decode('utf-8'))

# 将二进制的内容写入文件
with open('baiduBytes.html', 'wb+') as f:
f.write(response.content)

文件的打开关闭通常使用上下文管理:with语句
如果有兴趣,可以看看我在本站写的关于with的一篇博客。

  • 关于函数返回值:文件对象
    open函数返回值:文件对象,它是一个迭代器,使用for in语句可以对内容进行遍历,也可以使用next()函数进行操作。
    准备一个文件,内容如下:
with open('flydb.json', 'r+') as f:
for line in f:
print(repr(line))

输出结果:
'{"name":"marsen", "age":18},\n'
'{"name":"marsen", "age":17},\n'
'{"name":"marsen", "age":16},\n'

# 每一个line(str类型)内容都带有换行符\n,直接print(line)的话,行之间会出现换行。
with open('flydb.json', 'r+') as f:
lines = f.readlines()
print(lines)
输出结果:
['{"name":"marsen", "age":18},\n', '{"name":"marsen", "age":17},\n', '{"name":"marsen", "age":16},\n']

大多数情况下,我们直接使用for进行循环处理。但是,在开发过程中有时也需要对迭代做更加精确的控制,这时候就需要了解一下更细致的迭代机制了。 下面以文件对象来拓展探讨一下迭代的基本细节:

# 使用next()函数进行遍历
with open('flydb.json', 'r+') as f:
print(next(f))
print(next(f))
print(next(f))
输出结果:
{"name":"marsen", "age":18},

{"name":"marsen", "age":17},

{"name":"marsen", "age":16},

注意因为f是迭代器,在next到f的最后一行的时候,next会出现`StopIteration`的错误;,
StopIteration 用来指示迭代的结尾。可以参考以下两种方式修改代码:

with open('flydb.json', 'r+') as f:
try:
while True:
line = next(f)
print(line, end='')
except StopIteration:
pass
输出结果:
{"name":"marsen", "age":18},
{"name":"marsen", "age":17},
{"name":"marsen", "age":16},

使用next()函数的第二个参数,当next到最后的时候,函数不会报错而是返回这个指定的默认值,
利用这个我们可以通过返回一个我们自己的指定值来标记结尾;
with open('flydb.json', 'r+') as f:
while True:
line = next(f, None)
if line is None:
break
print(line, end='')
输出结果:
{"name":"marsen", "age":18},
{"name":"marsen", "age":17},
{"name":"marsen", "age":16},

附上open函数的详细API文档,

def open(file, mode='r', buffering=None, encoding=None, errors=None, newline=None, closefd=True): # known special case of open
"""
Open file and return a stream.  Raise IOError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is closed, unless closefd is set to False.)

mode is an optional string that specifies the mode in which the file
is opened. It defaults to 'r' which means open for reading in text
mode.  Other common values are 'w' for writing (truncating the file if
it already exists), 'x' for creating and writing to a new file, and
'a' for appending (which on some Unix systems, means that all writes
append to the end of the file regardless of the current seek position).
In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getpreferredencoding(False) is called to get the
current locale encoding. (For reading and writing raw bytes use binary
mode and leave encoding unspecified.) The available modes are:

========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r'       open for reading (default)
'w'       open for writing, truncating the file first
'x'       create a new file and open it for writing
'a'       open for writing, appending to the end of the file if it exists
'b'       binary mode
't'       text mode (default)
'+'       open a disk file for updating (reading and writing)
'U'       universal newline mode (deprecated)
========= ===============================================================

The default mode is 'rt' (open for reading text). For binary random
access, the mode 'w+b' opens and truncates the file to 0 bytes, while
'r+b' opens the file without truncation. The 'x' mode implies 'w' and
raises an `FileExistsError` if the file already exists.

Python distinguishes between files opened in binary and text modes,
even when the underlying operating system doesn't. Files opened in
binary mode (appending 'b' to the mode argument) return contents as
bytes objects without any decoding. In text mode (the default, or when
't' is appended to the mode argument), the contents of the file are
returned as strings, the bytes having been first decoded using a
platform-dependent encoding or using the specified encoding if given.

'U' mode is deprecated and will raise an exception in future versions
of Python.  It has no effect in Python 3.  Use newline to control
universal newlines mode.

buffering is an optional integer used to set the buffering policy.
Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
line buffering (only usable in text mode), and an integer > 1 to indicate
the size of a fixed-size chunk buffer.  When no buffering argument is
given, the default buffering policy works as follows:

* Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying device's
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
On many systems, the buffer will typically be 4096 or 8192 bytes long.

* "Interactive" text files (files for which isatty() returns True)
use line buffering.  Other text files use the policy described above
for binary files.

encoding is the name of the encoding used to decode or encode the
file. This should only be used in text mode. The default encoding is
platform dependent, but any encoding supported by Python can be
passed.  See the codecs module for the list of supported encodings.

errors is an optional string that specifies how encoding errors are to
be handled---this argument should not be used in binary mode. Pass
'strict' to raise a ValueError exception if there is an encoding error
(the default of None has the same effect), or pass 'ignore' to ignore
errors. (Note that ignoring encoding errors can lead to data loss.)
See the documentation for codecs.register or run 'help(codecs.Codec)'
for a list of the permitted encoding error strings.

newline controls how universal newlines works (it only applies to text
mode). It can be None, '', '\n', '\r', and '\r\n'.  It works as
follows:

* On input, if newline is None, universal newlines mode is
enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
these are translated into '\n' before being returned to the
caller. If it is '', universal newline mode is enabled, but line
endings are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by the given
string, and the line ending is returned to the caller untranslated.

* On output, if newline is None, any '\n' characters written are
translated to the system default line separator, os.linesep. If
newline is '' or '\n', no translation takes place. If newline is any
of the other legal values, any '\n' characters written are translated
to the given string.

If closefd is False, the underlying file descriptor will be kept open
when the file is closed. This does not work when a file name is given
and must be True in that case.

A custom opener can be used by passing a callable as *opener*. The
underlying file descriptor for the file object is then obtained by
calling *opener* with (*file*, *flags*). *opener* must return an open
file descriptor (passing os.open as *opener* results in functionality
similar to passing None).

open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.

It is also possible to use a string or bytearray as a file for both
reading and writing. For strings StringIO can be used like a file
opened in a text mode, and for bytes a BytesIO can be used like a file
opened in a binary mode.
"""
pass
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: