Python中使用md5sum检查目录中相同文件代码分享
2015-02-02 00:00
1076 查看
"""This module contains code from Think Python by Allen B. Downey http://thinkpython.com Copyright 2012 Allen B. Downey License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html """ import os def walk(dirname): """Finds the names of all files in dirname and its subdirectories. dirname: string name of directory """ names = [] for name in os.listdir(dirname): path = os.path.join(dirname, name) if os.path.isfile(path): names.append(path) else: names.extend(walk(path)) return names def compute_checksum(filename): """Computes the MD5 checksum of the contents of a file. filename: string """ cmd = 'md5sum ' + filename return pipe(cmd) def check_diff(name1, name2): """Computes the difference between the contents of two files. name1, name2: string filenames """ cmd = 'diff %s %s' % (name1, name2) return pipe(cmd) def pipe(cmd): """Runs a command in a subprocess. cmd: string Unix command Returns (res, stat), the output of the subprocess and the exit status. """ fp = os.popen(cmd) res = fp.read() stat = fp.close() assert stat is None return res, stat def compute_checksums(dirname, suffix): """Computes checksums for all files with the given suffix. dirname: string name of directory to search suffix: string suffix to match Returns: map from checksum to list of files with that checksum """ names = walk(dirname) d = {} for name in names: if name.endswith(suffix): res, stat = compute_checksum(name) checksum, _ = res.split() if checksum in d: d[checksum].append(name) else: d[checksum] = [name] return d def check_pairs(names): """Checks whether any in a list of files differs from the others. names: list of string filenames """ for name1 in names: for name2 in names: if name1 < name2: res, stat = check_diff(name1, name2) if res: return False return True def print_duplicates(d): """Checks for duplicate files. Reports any files with the same checksum and checks whether they are, in fact, identical. d: map from checksum to list of files with that checksum """ for key, names in d.iteritems(): if len(names) > 1: print 'The following files have the same checksum:' for name in names: print name if check_pairs(names): print 'And they are identical.' if __name__ == '__main__': d = compute_checksums(dirname='.', suffix='.py') print_duplicates(d)
相关文章推荐
- Python中使用md5sum检查目录中相同文件代码分享
- Python和perl实现批量对目录下电子书文件重命名的代码分享
- 删除目录下相同文件的python代码(逐级优化)
- 删除目录下相同文件的python代码(逐级优化)
- Python和perl实现批量对目录下电子书文件重命名的代码分享
- Python使用百度API上传文件到百度网盘代码分享
- 删除目录下相同文件的python代码(逐级优化)
- 使用Python计算指定目录md5,根据md5找查到相同的文件并打印
- Python使用百度API上传文件到百度网盘代码分享
- Python中的文件和目录操作实现代码
- 如何使用python递归查找并删除某个目录下的文件
- php中检查文件或目录是否存在的代码小结
- 市面上所有号称"虚拟机","防火墙"的实时监控杀毒软件无一不是使用的IFSHOOK技术.但是同时也有一些朋友不断写MAIL给我打听如何实现读写的监控.下面给出用VTOOLSD写的代码.也就是所有实时杀毒软件的奥秘.同时,很多拦截文件操作的软件,例如对目录加
- 遍历文件目录的python 代码
- python使用PyV8执行javascript代码示例分享
- python使用PyV8执行javascript代码示例分享
- Python中的文件和目录操作实现代码
- 分享非常有用的Java程序 (关键代码) (二)---列出文件和目录
- CMD下一个命令遍历目录并删除相同文件的代码
- python 生成目录树及显示文件大小的代码