新浪微博爬虫遇到的cookie rejected 问题解决办法
2014-06-05 10:31
495 查看
最近做了个新浪微博爬虫,用到了httpclient-4.3.3,程序运行的很好,就是一直会出现 cookie rejected警告,日志如下:
今天实在看不下去了,在网上找一大片资料,大部分是过期的或者版本跟不上,各种尝试整理之后,找到了解决办法,其实是cookie策略的问题,重写默认的策略验证就OK了。
2014-06-05 10:27:17.417 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ef542aa2.538fd58b.ec8a8e2c", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:23 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com" 2014-06-05 10:27:17.422 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ef632aa2.538fd58b.c6dd669e", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com" 登录成功,昵称:佩佩菜_52350 2014-06-05 10:27:20.019 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.75d37a79.538fd58d.077976a4", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:25 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com" 2014-06-05 10:27:20.019 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.75e37a79.538fd58d.575a338c", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com" 登录成功,昵称:通吃一条街呵呵 2014-06-05 10:27:29.119 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.9fcc12df.538fd597.fcf0e3af", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:35 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com" 2014-06-05 10:27:29.120 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.9fd812df.538fd597.e804e263", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com" 登录成功,昵称:dxedf log4j:WARN No appenders could be found for logger (com.mchange.v2.log.MLog). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - 读取系统配置:D:\Workspaces\eurlanda\DAP_EurlandaSpider\WebRoot\WEB-INF\classes\config.properties 2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.weibo.dely=12 2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.task.saveDely=1 2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.task.dely=168 2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.retryCount=3 2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.work_thread_num=10 2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.readTimeout=5 2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.serverPort=7077 2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.connectTimeout=5 2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.work.schedule=* * 18-9 ? * 1-5|* * * ? * 1,7|* * * * * ? 2014-06-05 10:27:30.254 [Thread-0] INFO c.e.s.c.sina_weibo.SinaWeiBoCrawler - ----------- 抓取日期2010-02-23 00:00:00的数据----------- 2014-06-05 10:27:30.869 [18721437752] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ae2f61ad.538fd599.2711e9ab", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com" 2014-06-05 10:27:30.870 [18721437752] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ae3b61ad.538fd599.cec3bfae", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com" 2014-06-05 10:27:30.881 [zjweii@qq.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.18d93dd.538fd599.add86b40", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com" 2014-06-05 10:27:30.882 [zjweii@qq.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.18ee3dd.538fd599.d7522db2", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com" 2014-06-05 10:27:31.089 [18721437752] INFO c.e.s.c.sina_weibo.SinaWeiBoClient - 搜索无结果。 2014-06-05 10:27:31.280 [pbz201402@126.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.39486d50.538fd599.66e98262", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com" 2014-06-05 10:27:31.280 [pbz201402@126.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.395a6d50.538fd599.84218ee8", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
今天实在看不下去了,在网上找一大片资料,大部分是过期的或者版本跟不上,各种尝试整理之后,找到了解决办法,其实是cookie策略的问题,重写默认的策略验证就OK了。
CookieSpecProvider easySpecProvider = new CookieSpecProvider() { public CookieSpec create(HttpContext context) { return new BrowserCompatSpec() { @Override public void validate(Cookie cookie, CookieOrigin origin) throws MalformedCookieException { // Oh, I am easy } }; } }; Registry<CookieSpecProvider> reg = RegistryBuilder.<CookieSpecProvider>create() .register(CookieSpecs.BEST_MATCH, new BestMatchSpecFactory()) .register(CookieSpecs.BROWSER_COMPATIBILITY, new BrowserCompatSpecFactory()) .register("mySpec", easySpecProvider) .build(); RequestConfig requestConfig = RequestConfig.custom() .setCookieSpec("mySpec") .build(); CloseableHttpClient httpclient = HttpClients.custom() .setDefaultCookieSpecRegistry(reg) .setDefaultRequestConfig(requestConfig) .build();
相关文章推荐
- Python学习笔记:学习爬虫时遇到的问题TypeError: cannot use a string pattern on a bytes-like object 与解决办法
- Python学习笔记:学习爬虫时遇到的问题TypeError: cannot use a string pattern on a bytes-like object 与解决办法
- Android项目实战_新浪微博客户端开发过程中遇到的问题及解决办法01
- cegui遇到的问题及其通用解决办法
- updater application block v2.0的使用过程中遇到的问题及解决办法
- 工作遇到的问题和解决办法4
- mysql5中遇到的字符集问题以及解决办法
- 工作中遇到的问题和解决办法7
- 工作中遇到的问题和解决办法9
- 抛开cookie使用session-PHP中SESSION不能跨页传递问题的解决办法
- 用python编写ASP脚本时遇到的问题,初步的解决方法,目前正在寻找更好的解决办法。
- 工作遇到的问题和解决办法5
- 工作中遇到的问题和解决办法10
- 安装Rational Enterprise Suite(Robot...)时遇到的问题及解决办法!
- 工作遇到的问题和解决办法6
- SQL Server 2000遇到的两个问题及其解决办法
- 使用最新的cvs及cvsweb,遇到的问题解决办法。
- RHEL4 安装mysql5 遇到的问题 以及菜鸟级解决办法
- 工作中遇到的问题和解决办法8(could not initialize a collection的问题)
- QQ空间常遇到的三个问题及解决办法