Jsoup抓取网页内容超时问题的处理
2018-01-26 10:56
246 查看
在利用Jsoup抓取网页的时候,遇到超时问题:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:655)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:628)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:260)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:249)
at com.accord.jsoup.JsoupUtils.getHtmlByUrl2(JsoupUtils.java:70)
at com.accord.jsoup.JsoupMain.getListDatas(JsoupMain.java:63)
at com.accord.jsoup.JsoupMain.main(JsoupMain.java:32)
Exception in thread "main" java.lang.NullPointerException
at com.accord.jsoup.JsoupMain.parserHtml(JsoupMain.java:169)
at com.accord.jsoup.JsoupMain.getListDatas(JsoupMain.java:64)
at com.accord.jsoup.JsoupMain.main(JsoupMain.java:32)
我一开始代码里面是:
Jsoup.connect(url).cookie("JSESSIONID", sessionId).get();
方法获取网页内容,这种方式使用的是默认超时时间3秒。
由于请求数据慢,就会造成超时的,修改成如下就可以了:
Jsoup.connect(url).cookie("JSESSIONID", sessionId).timeout(50000).get(); // 50S
这个时间,自己看着设置!
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:655)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:628)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:260)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:249)
at com.accord.jsoup.JsoupUtils.getHtmlByUrl2(JsoupUtils.java:70)
at com.accord.jsoup.JsoupMain.getListDatas(JsoupMain.java:63)
at com.accord.jsoup.JsoupMain.main(JsoupMain.java:32)
Exception in thread "main" java.lang.NullPointerException
at com.accord.jsoup.JsoupMain.parserHtml(JsoupMain.java:169)
at com.accord.jsoup.JsoupMain.getListDatas(JsoupMain.java:64)
at com.accord.jsoup.JsoupMain.main(JsoupMain.java:32)
我一开始代码里面是:
Jsoup.connect(url).cookie("JSESSIONID", sessionId).get();
方法获取网页内容,这种方式使用的是默认超时时间3秒。
由于请求数据慢,就会造成超时的,修改成如下就可以了:
Jsoup.connect(url).cookie("JSESSIONID", sessionId).timeout(50000).get(); // 50S
这个时间,自己看着设置!
相关文章推荐
- 抓取网页中的内容、如何解决乱码问题、如何解决登录问题以及对所采集的数据进行处理显示的过程
- c# 采集 获取网页数据内容 一会超时的问题
- JSOUP 抓取HTTPS/HTTP网页,校验问题
- Java网页抓取网页内容时,乱码问题的解决
- HttpClient+jsoup实现网页数据抓取和处理
- python抓取网页时字符集转换问题处理方案分享
- 使用jsoup进行网页内容抓取
- Http,Jsoup抓取网页内容-by TomHawk
- 安卓新闻客户端(二) JSOUP解析HTML 抓取网页内容
- python 处理抓取网页乱码问题
- 使用Jsoup函数包抓取网页内容
- 使用Jsoup获取网页内容超时设置
- node js 抓取指定网页内容gb2312乱码问题解决
- Jsoup网页内容抓取分析
- java 抓取网页乱码问题处理
- Jsoup网页内容抓取分析
- Jsoup 处理 html 片段<script> 内容转义,js转换成一行问题
- 【python】网页内容抓取遭遇乱码问题
- python 处理抓取网页乱码问题一招鲜
- 利用jsoup抓取指定网页的照片【也可以修改一下抓取其他内容】