网络爬虫
2016-04-22 21:37
471 查看
第一次动手写爬虫,遇到了很多问题,但编程本身就不是那么容易的,希望自己能一步步的解决掉这些问题吧,下边是目前遇到的问题
程序运行后出现异常
package zhuawang; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import org.apache.commons.logging.LogFactory; import org.apache.commons.httpclient.HttpClient; import org.apache.commons.httpclient.HttpException; import org.apache.commons.httpclient.HttpStatus; import org.apache.commons.httpclient.NameValuePair; import org.apache.commons.httpclient.methods.*; import org.apache.commons.codec.*; public class zhuawang { private static HttpClient httpClient=new HttpClient(); //设置代理服务器 static{ //设置代理服务器的IP地址和端口 httpClient.getHostConfiguration().setProxy("172.27.35.1", 8080); } public static boolean downloadPage(String path) throws HttpException,IOException{ InputStream input=null; OutputStream output=null; //得到POST方法 PostMethod postMethod=new PostMethod(path); //测试post方法的参数 NameValuePair[] postData = new NameValuePair[2]; postData[0] = new NameValuePair("name","baidu"); postData[1] = new NameValuePair("pasword","123456"); postMethod.addParameters(postData); //执行,返回状态码 int statusCode = httpClient.executeMethod(postMethod); //针对状态码进行处理(简单起见,只处理返回值为200的状态码) if(statusCode == HttpStatus.SC_OK) { input = postMethod.getResponseBodyAsStream(); //得到文件名 String filename = path.substring(path.lastIndexOf('/')+1); //获得文件输出流 output = new FileOutputStream(filename); //输出到文件 int tempByte = -1; while((tempByte=input.read())>0) { output.write(tempByte); } //关闭输出流 if(input!=null) input.close(); if(output!=null) output.close(); return true; } return false; } //测试代码 public static void main(String[] args) { //抓取百度首页 try{ zhuawang.downloadPage("http://localhost:8080/firstTest.htm?method=test"); }catch (HttpException e){ e.printStackTrace(); //System.out.println("程序异常"); }catch (IOException e){ e.printStackTrace(); } } }
程序运行后出现异常
四月 22, 2016 9:38:30 下午 org.apache.commons.httpclient.HttpMethodDirector executeWithRetry 信息: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect 四月 22, 2016 9:38:30 下午 org.apache.commons.httpclient.HttpMethodDirector executeWithRetry 信息: Retrying request .四月 22, 2016 9:38:51 下午 org.apache.commons.httpclient.HttpMethodDirector executeWithRetry 信息: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect 四月 22, 2016 9:38:51 下午 org.apache.commons.httpclient.HttpMethodDirector executeWithRetry 信息: Retrying requestrequest
相关文章推荐
- TCP/IP、Http、Socket的区别
- 网络请求
- 神经网络编程
- HNOI2016 网络
- 实现基于TCP/IP协议的简单Client/Server程序
- HTTP协议--cookie、session、缓存与代理
- Xcode7.2 使用NSURL发送http请求报错
- 网络流GAP dinic模板
- 网络 HTTP协议
- 同或门(XNOR)电路的网络学习
- iOS网络编程4--使用SwiftyJSON解析JSON数据
- 24篇HTTP博客
- 常见HTTP状态码(200、301、302、500等)解说
- System.ServiceModel.CommunicationException: 接收HTTP 响应时错误发生
- HTTP协议 (1)初识HTTP
- HttpClient和DefaultHttpClient
- HttpClient 教程 (一)
- 网络开发中socket简介
- get value from agent failed: ZBX_TCP_READ() failed;[104] connection reset by peer
- Android_开源框架_AndroidUniversalImageLoader网络图片加载