Python文件去重工具
2016-01-15 17:46
561 查看
只需要稍微查看一下文件遍历的方法os.walk()和文件md5的方法
之后,再修改一下
之后,再修改一下
[code]#!/usr/bin/env python # -*- coding: utf-8 -*- import argparse import os import tempfile import hashlib def print_dedu(): for file_md5, filelist in dedu_dict.items(): if len(filelist) == 1: continue print "md5:{}".format(file_md5) for filename in filelist: print "{}".format(filename) print "" def exec_dedu(): for file_md5, filelist in dedu_dict.items(): if len(filelist) == 1: continue print "md5:{}".format(file_md5) filelist.pop() for filename in filelist: print "rm {}".format(filename) os.remove(filename) print "" parser = argparse.ArgumentParser(description="This is a de-duplicate tool") parser.add_argument("dir",default=".", help="target directory") parser.add_argument("-s","--safe","--scan", action="store_true",dest="not_delete", help="scan directory only ,don't delete file") parser.add_argument("-o","--output",type=argparse.FileType("w"), default=None, help="output of scan result") args = parser.parse_args() print "[INFO]dir:{}".format(args.dir) if args.not_delete: print "[INFO]we are in safe mode." not_delete = args.not_delete output_fifo="" output_filename="" if args.output: output_fifo=args.output output_filename=args.output.name else: fd, output_filename = tempfile.mkstemp(prefix="dedu-",suffix=".log") output_fifo=os.fdopen(fd,"w") print "[INFO]output filename:{}".format(output_filename) output_fifo.write("Hello World\n") # check paramter if not os.path.isdir(args.dir): print("dir{} is not exists!".format(args.dir)) sys.exit(-1) target_dir = args.dir # let's traverse_directory def md5(fname): """ from http://stackoverflow.com/quest- ions/3431825/generating-a-md5-checksum-of-a-file """ hash = hashlib.md5() with open(fname, "rb") as f: for chunk in iter(lambda: f.read(4096), b""): hash.update(chunk) return hash.hexdigest() dedu_dict={} for dirpath, subdirList, subfileList in os.walk(target_dir): for filename in subfileList: full_filename = os.path.join(dirpath, filename) file_md5 = md5(full_filename) if file_md5 not in dedu_dict : dedu_dict[file_md5] = [] dedu_dict[file_md5].append(full_filename) #dedu_dict[file_md5].append(full_filename) print_dedu() output_fifo.close() if not not_delete: exec_dedu()
相关文章推荐
- 【Python】如何安装easy_install?
- HTMLTestRunner修改Python3的版本
- python webdriver测试报告
- Python 3.2安装第三方模块
- Python入门教程
- Mongo性能测试-python脚本
- Mongo性能测试-python脚本
- Sublime搭建Python开发环境
- python中读写文件及中文编码处理方法
- Python在cmd中用easy_install django导入时出现错误,待解决
- python——SQL基本使用
- ubuntu 14.04 python mysql 安装
- python 堡垒机
- Python字符串的encode与decode研究心得乱码问题解决方法(很多的编码问题都可以从此得出答案)
- python 编码问题
- Python: scikit-image canny 边缘检测
- 【Python模块】命令行解析--argparse
- 自学Python五 爬虫基础练习之SmartQQ协议
- python 常用模块
- python __call__ 函数