您的位置:首页 > 编程语言 > Python开发

python脚本编程:批量对比文本文件,根据具体字段比较差异

2017-04-28 15:39 429 查看
有时候又这样的需求,有两个文件(里面是表形式的数据,字段有重合也有不一样的),需要对比两个文件之间的差异数据记录并摘出来

文件示例

A文件表每条记录的格式:

03090000   00049993   9222100502392220106000000020000029000170124500019054                 20170124 12:30:01622908347435512917       00049996


B文件表格式

01006530    00096900    000480 0124174505 6228480478369552177 000000004066 000000000000  00000000000 0200 000000 5411 00000021 100504754110404 003081009289 00 000000 01030000    000000 00 071 000000000005 000000000000 D00000000001 1 000 6 0 0124174510 01030000    0 03     00000000000  00010111001


其中A文件有若干条记录,B文件也有若干条记录,B文件中有些记录对应的索引号在A文件中没有,现在需要找出这些记录,比如:0124174510这个字段对应在A中9222100502392220106000000020000029000170124500019054字段的后12位,根据字符串分割去批量匹配出这样的缺失数据

代码

# dates to be compared
dateArr = ["170124",
"170125",
"170130",
"170206",
"170211",
"170228",
"170304",
"170309",
"170314",
"170321",
"170325"]
# local path that contains data
src_dir = "./src_data"
res_dir = "./res_data"

# the exact merchant ID to be concerned
gMchtId = "100502392220106"

# read files and compare, then write as records
print "start to compare file..."

for dateStr in dateArr:
print "comparing " + dateStr + " files"
mic_file_name = "M_IC" + dateStr + "OTRAD100502392220106"
acom_file_name = "no_chongzhengIND" + dateStr + "01ACOM"

# define mic set at this date
micIndexSet = set()
# read mic file and create index keys
print "reading " + dateStr + " mic file"
with open(src_dir + '/' + mic_file_name, 'r') as micFileStream:
# process file line by line
for micLineStr in micFileStream:
# pass the empty line
if len(micLineStr) == 0:
print "empty mic line"
break
# slice strings
micLineDataArray = micLineStr.split()
combinedInfo = micLineDataArray[2]
micMchtId = combinedInfo[4:19]
# pass wrong merchant ids
if micMchtId != gMchtId:
continue
# get query index
micIndex = combinedInfo[-12:]
# add to mic index set
micIndexSet.add(micIndex)

# define linestr array to save the result lines
resultLineStr = list()
# read acom file and compare index keys
print "reading " + dateStr + " acom file"
with open(src_dir + '/' + acom_file_name, 'r') as acomFileStream:
# process file line by line
for acomLineStr in acomFileStream:
if len(acomLineStr) == 0:
print "empty acom line"
break
acomLineDataArray = acomLineStr.split()
acomMchtId = acomLineDataArray[12]
if acomMchtId != gMchtId:
continue
acomIndex = acomLineDataArray[13]
# save the diffed lines
if acomIndex not in micIndexSet:
resultLineStr.append(acomLineStr)

# write the result lines to file
print "write " + dateStr + " result file"
with open(res_dir + '/' + dateStr + "_result", 'w') as resultFileStream:
res_str = ""
for line in resultLineStr:
res_str += line + '\n'
resultFileStream.write(res_str)

print "compare over"


截图



根据文件夹里文件的日期去批量拼文件名,结果置于另一文件夹,python处理速度还是不错的
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: