您的位置：首页 > 数据库 > MySQL

scrapy_pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

2019-01-12 15:28 4547 查看

问题描述：

python：3.6

ubantu：5.4.0-6ubuntu1~16.04.4

在使用scrapy为框架，将采集到的数据使用pymysql保存到虚拟机中的时候，数据采集没有问题，但是在插入的时候出现了问题，报错如下：

[code]Traceback (most recent call last):
File "e:\anaconda3\lib\site-packages\twisted\internet\defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "E:\Scrapy\Jianshu\Jianshu\pipelines.py", line 36, in process_item
self.update(item)
File "E:\Scrapy\Jianshu\Jianshu\pipelines.py", line 31, in update
self.cursor.execute(update_time)
File "e:\anaconda3\lib\site-packages\pymysql\cursors.py", line 170, in execute
result = self._query(query)
File "e:\anaconda3\lib\site-packages\pymysql\cursors.py", line 328, in _query
conn.query(q)
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 516, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 727, in _read_query_result
result.read()
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 1066, in read
first_packet = self.connection._read_packet()
File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 683, in _read_packet
packet.check_error()
File "e:\anaconda3\lib\site-packages\pymysql\protocol.py", line 220, in check_error
err.raise_mysql_exception(self._data)
File "e:\anaconda3\lib\site-packages\pymysql\err.py", line 109, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

表定义:

代码：

[code]# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import pymysql

class XXXXPipeline(object):
def __init__(self):
self.conn = pymysql.connect("192.168.0.124", "root", "123456", "a")
self.cursor = self.conn.cursor()
self._sql = None
self.num = 1

@property
def get_sql(self):
if not self._sql:
self._sql = sql = """insert into XXXX_save values({},'{}','{}','{}','{}','{}','{}')"""
return self._sql

def update(self, item):
print("item['content']:")
print(repr(item['content']))
update_time = self.get_sql.format(self.num, item['title'],
item['author'], item['author_img'],
item['artical_id'],
item['pub_time'], item['content'].replace("'",""))
print("update_time:", update_time)
self.cursor.execute(update_time)
self.conn.commit()
self.num += 1

def process_item(self, item, spider):
self.update(item)
return item

原因分析：

原因在于当你执行爬虫文件的时候，向虚拟机中的mysql插入了数据，当你停掉了程序后在去执行，mysql中还保存上次采集的数据，定义的ID为主键，定义的时候并没有进行设置自增，而是通过spider文件进行传值,每次从1开始(这是个瑕疵，随后进行改正),也就是说每次在执行spider文件的时候，需要先清空数据库中的表。

在去执行spider文件。

运行成功，数据库中已经插入了数据

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航