您的位置:首页 > 数据库 > MySQL

scrapy_pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

2019-01-12 15:28 4547 查看

 问题描述:

python:3.6

ubantu:5.4.0-6ubuntu1~16.04.4

 在使用scrapy为框架,将采集到的数据使用pymysql保存到虚拟机中的时候,数据采集没有问题,但是在插入的时候出现了问题,报错如下:

[code]Traceback (most recent call last):
File "e:\anaconda3\lib\site-packages\twisted\internet\defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "E:\Scrapy\Jianshu\Jianshu\pipelines.py", line 36, in process_item
self.update(item)
File "E:\Scrapy\Jianshu\Jianshu\pipelines.py", line 31, in update
self.cursor.execute(update_time)
File "e:\anaconda3\lib\site-packages\pymysql\cursors.py", line 170, in execute
result = self._query(query)
File "e:\anaconda3\lib\site-packages\pymysql\cursors.py", line 328, in _query
conn.query(q)
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 516, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 727, in _read_query_result
result.read()
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 1066, in read
first_packet = self.connection._read_packet()
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 683, in _read_packet
packet.check_error()
File "e:\anaconda3\lib\site-packages\pymysql\protocol.py", line 220, in check_error
err.raise_mysql_exception(self._data)
File "e:\anaconda3\lib\site-packages\pymysql\err.py", line 109, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

表定义:

代码:

[code]# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import pymysql

class XXXXPipeline(object):
def __init__(self):
self.conn = pymysql.connect("192.168.0.124", "root", "123456", "a")
self.cursor = self.conn.cursor()
self._sql = None
self.num = 1

@property
def get_sql(self):
if not self._sql:
self._sql = sql = """insert into XXXX_save values({},'{}','{}','{}','{}','{}','{}')"""
return self._sql

def update(self, item):
print("item['content']:")
print(repr(item['content']))
update_time = self.get_sql.format(self.num, item['title'],
item['author'], item['author_img'],
item['artical_id'],
item['pub_time'], item['content'].replace("'",""))
print("update_time:", update_time)
self.cursor.execute(update_time)
self.conn.commit()
self.num += 1

def process_item(self, item, spider):
self.update(item)
return item

原因分析:

原因在于当你执行爬虫文件的时候,向虚拟机中的mysql插入了数据,当你停掉了程序后在去执行,mysql中还保存上次采集的数据,定义的ID为主键,定义的时候并没有进行设置自增,而是通过spider文件进行传值,每次从1开始(这是个瑕疵,随后进行改正),也就是说每次在执行spider文件的时候,需要先清空数据库中的表。

在去执行spider文件。

运行成功,数据库中已经插入了数据

 

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐