python脚本编程:批量对比文本文件,根据具体字段比较差异
2017-04-28 15:39
429 查看
有时候又这样的需求,有两个文件(里面是表形式的数据,字段有重合也有不一样的),需要对比两个文件之间的差异数据记录并摘出来
B文件表格式
其中A文件有若干条记录,B文件也有若干条记录,B文件中有些记录对应的索引号在A文件中没有,现在需要找出这些记录,比如:0124174510这个字段对应在A中9222100502392220106000000020000029000170124500019054字段的后12位,根据字符串分割去批量匹配出这样的缺失数据
![](https://oscdn.geek-share.com/Uploads/Images/Content/201704/1c1d58669fb90fb800d248d87e05a31c)
根据文件夹里文件的日期去批量拼文件名,结果置于另一文件夹,python处理速度还是不错的
文件示例
A文件表每条记录的格式:03090000 00049993 9222100502392220106000000020000029000170124500019054 20170124 12:30:01622908347435512917 00049996
B文件表格式
01006530 00096900 000480 0124174505 6228480478369552177 000000004066 000000000000 00000000000 0200 000000 5411 00000021 100504754110404 003081009289 00 000000 01030000 000000 00 071 000000000005 000000000000 D00000000001 1 000 6 0 0124174510 01030000 0 03 00000000000 00010111001
其中A文件有若干条记录,B文件也有若干条记录,B文件中有些记录对应的索引号在A文件中没有,现在需要找出这些记录,比如:0124174510这个字段对应在A中9222100502392220106000000020000029000170124500019054字段的后12位,根据字符串分割去批量匹配出这样的缺失数据
代码
# dates to be compared dateArr = ["170124", "170125", "170130", "170206", "170211", "170228", "170304", "170309", "170314", "170321", "170325"] # local path that contains data src_dir = "./src_data" res_dir = "./res_data" # the exact merchant ID to be concerned gMchtId = "100502392220106" # read files and compare, then write as records print "start to compare file..." for dateStr in dateArr: print "comparing " + dateStr + " files" mic_file_name = "M_IC" + dateStr + "OTRAD100502392220106" acom_file_name = "no_chongzhengIND" + dateStr + "01ACOM" # define mic set at this date micIndexSet = set() # read mic file and create index keys print "reading " + dateStr + " mic file" with open(src_dir + '/' + mic_file_name, 'r') as micFileStream: # process file line by line for micLineStr in micFileStream: # pass the empty line if len(micLineStr) == 0: print "empty mic line" break # slice strings micLineDataArray = micLineStr.split() combinedInfo = micLineDataArray[2] micMchtId = combinedInfo[4:19] # pass wrong merchant ids if micMchtId != gMchtId: continue # get query index micIndex = combinedInfo[-12:] # add to mic index set micIndexSet.add(micIndex) # define linestr array to save the result lines resultLineStr = list() # read acom file and compare index keys print "reading " + dateStr + " acom file" with open(src_dir + '/' + acom_file_name, 'r') as acomFileStream: # process file line by line for acomLineStr in acomFileStream: if len(acomLineStr) == 0: print "empty acom line" break acomLineDataArray = acomLineStr.split() acomMchtId = acomLineDataArray[12] if acomMchtId != gMchtId: continue acomIndex = acomLineDataArray[13] # save the diffed lines if acomIndex not in micIndexSet: resultLineStr.append(acomLineStr) # write the result lines to file print "write " + dateStr + " result file" with open(res_dir + '/' + dateStr + "_result", 'w') as resultFileStream: res_str = "" for line in resultLineStr: res_str += line + '\n' resultFileStream.write(res_str) print "compare over"
截图
根据文件夹里文件的日期去批量拼文件名,结果置于另一文件夹,python处理速度还是不错的
相关文章推荐
- 比较牛B的sql语句,可以嵌套查询,还可以根据临时字段名排序
- 根据某一字段跨服务器比较两个表数据
- python比较两个文件的差异
- python脚本编程:批量下载指定页面图片
- python比较两个excel表格的差异
- 根据字段间比较获得指定单元的颜色的类(vb.net)
- Python 脚本学习(三),日志分析脚本,文件差异对比,HTTP状态检测
- Beyond Compare脚本:命令行批量比较文件并生成html格式的差异报告
- 对比两个同类型的泛型集合并返回差异泛型集合 ——两个List<类名>的比较
- 通过实例浅析Python对比C语言的编程思想差异
- 【Oracle批量更新】根据一个大表批量更新另一大表的方法比较
- 两数据库字段差异比较
- 两个数据库比较 对比视图存储过程及表结构差异
- PHP 二维数组根据某个字段排序的具体实现
- 比较2个文件中每行字段的差异
- 比较两个数据库中表和字段的差异
- 根据两个list的相同字段比较,合并成一个list
- 根据一个表中的字段属性名称作为查询条件来批量修改该表中数据
- CHECKSUM比较两表字段值差异
- 通过实例浅析Python对比C语言的编程思想差异