crawler_java应用集锦9:httpclient4.2.2的几个常用方法,登录之后访问页面问题,下载文件_设置代理
2015-03-24 21:35
831 查看
在工作中要用到android,然后进行网络请求的时候,打算使用httpClient。
总结一下httpClient的一些基本使用。
版本是4.2.2。
使用这个版本的过程中,百度很多,结果都是出现的org.apache.commons.httpclient.这个包名,而不是我这里的org.apache.http.client.HttpClient----------前者版本是 Commons HttpClient 3.x ,不是最新的版本HttpClient 4.×。
官网上面:
Commons HttpClient 3.x codeline is at the end of life. All users of Commons HttpClient 3.x are strongly encouraged to upgrade to HttpClient 4.1.
1.基本的get
Java代码
public void getUrl(String url, String encoding)
throws ClientProtocolException, IOException {
// 默认的client类。
HttpClient client = new DefaultHttpClient();
// 设置为get取连接的方式.
HttpGet get = new HttpGet(url);
// 得到返回的response.
HttpResponse response = client.execute(get);
// 得到返回的client里面的实体对象信息.
HttpEntity entity = response.getEntity();
if (entity != null) {
System.out.println("内容编码是:" + entity.getContentEncoding());
System.out.println("内容类型是:" + entity.getContentType());
// 得到返回的主体内容.
InputStream instream = entity.getContent();
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream, encoding));
System.out.println(reader.readLine());
} catch (Exception e) {
e.printStackTrace();
} finally {
instream.close();
}
}
// 关闭连接.
client.getConnectionManager().shutdown();
}
2.基本的Post
下面的params参数,是在表单里面提交的参数。
Java代码
public void postUrlWithParams(String url, Map params, String encoding)
throws Exception {
DefaultHttpClient httpclient = new DefaultHttpClient();
try {
HttpPost httpost = new HttpPost(url);
// 添加参数
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
if (params != null && params.keySet().size() > 0) {
Iterator iterator = params.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry entry = (Entry) iterator.next();
nvps.add(new BasicNameValuePair((String) entry.getKey(),
(String) entry.getValue()));
}
}
httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
HttpResponse response = httpclient.execute(httpost);
HttpEntity entity = response.getEntity();
System.out.println("Login form get: " + response.getStatusLine()
+ entity.getContent());
dump(entity, encoding);
System.out.println("Post logon cookies:");
List<Cookie> cookies = httpclient.getCookieStore().getCookies();
if (cookies.isEmpty()) {
System.out.println("None");
} else {
for (int i = 0; i < cookies.size(); i++) {
System.out.println("- " + cookies.get(i).toString());
}
}
} finally {
// 关闭请求
httpclient.getConnectionManager().shutdown();
}
}
3。打印页面输出的小代码片段
Java代码
private static void dump(HttpEntity entity, String encoding)
throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(
entity.getContent(), encoding));
System.out.println(br.readLine());
}
4.常见的登录session问题,需求:使用账户,密码登录系统之后,然后再访问页面不出错。
特别注意,下面的httpclient对象要使用一个,而不要在第二次访问的时候,重新new一个。至于如何保存这个第一步经过了验证的httpclient,有很多种方法实现。单例,系统全局变量(android 下面的Application),ThreadLocal变量等等。
以及下面创建的httpClient要使用ThreadSafeClientConnManager对象!
public String getSessionId(String url, Map params, String encoding,
Java代码
String url2) throws Exception {
DefaultHttpClient httpclient = new DefaultHttpClient(
new ThreadSafeClientConnManager());
try {
HttpPost httpost = new HttpPost(url);
// 添加参数
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
if (params != null && params.keySet().size() > 0) {
Iterator iterator = params.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry entry = (Entry) iterator.next();
nvps.add(new BasicNameValuePair((String) entry.getKey(),
(String) entry.getValue()));
}
}
// 设置请求的编码格式
httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
// 登录一遍
httpclient.execute(httpost);
// 然后再第二次请求普通的url即可。
httpost = new HttpPost(url2);
BasicResponseHandler responseHandler = new BasicResponseHandler();
System.out.println(httpclient.execute(httpost, responseHandler));
} finally {
// 关闭请求
httpclient.getConnectionManager().shutdown();
}
return "";
}
5.下载文件,例如mp3等等。
Java代码
//第一个参数,网络连接;第二个参数,保存到本地文件的地址
public void getFile(String url, String fileName) {
HttpClient httpClient = new DefaultHttpClient();
HttpGet get = new HttpGet(url);
try {
ResponseHandler<byte[]> handler = new ResponseHandler<byte[]>() {
public byte[] handleResponse(HttpResponse response)
throws ClientProtocolException, IOException {
HttpEntity entity = response.getEntity();
if (entity != null) {
return EntityUtils.toByteArray(entity);
} else {
return null;
}
}
};
byte[] charts = httpClient.execute(get, handler);
FileOutputStream out = new FileOutputStream(fileName);
out.write(charts);
out.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
httpClient.getConnectionManager().shutdown();
}
}
6.创建一个多线程环境下面可用的httpClient
(原文:http://blog.csdn.net/jiaoshi0531/article/details/6459468)
Java代码
HttpParams params = new BasicHttpParams();
//设置允许链接的做多链接数目
ConnManagerParams.setMaxTotalConnections(params, 200);
//设置超时时间.
ConnManagerParams.setTimeout(params, 10000);
//设置每个路由的最多链接数量是20
ConnPerRouteBean connPerRoute = new ConnPerRouteBean(20);
//设置到指定主机的路由的最多数量是50
HttpHost localhost = new HttpHost("127.0.0.1",80);
connPerRoute.setMaxForRoute(new HttpRoute(localhost), 50);
ConnManagerParams.setMaxConnectionsPerRoute(params, connPerRoute);
//设置链接使用的版本
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
//设置链接使用的内容的编码
HttpProtocolParams.setContentCharset(params,
HTTP.DEFAULT_CONTENT_CHARSET);
//是否希望可以继续使用.
HttpProtocolParams.setUseExpectContinue(params, true);
SchemeRegistry schemeRegistry = new SchemeRegistry();
schemeRegistry.register(new Scheme("http",PlainSocketFactory.getSocketFactory(),80));
schemeRegistry.register(new Scheme("https",SSLSocketFactory.getSocketFactory(),443));
ClientConnectionManager cm = new ThreadSafeClientConnManager(params,schemeRegistry);
httpClient = new DefaultHttpClient(cm, params);
7.实用的一个对象,http上下文,可以从这个对象里面取到一次请求相关的信息,例如request,response,代理主机等。
Java代码
public static void getUrl(String url, String encoding)
throws ClientProtocolException, IOException {
// 设置为get取连接的方式.
HttpGet get = new HttpGet(url);
HttpContext localContext = new BasicHttpContext();
// 得到返回的response.第二个参数,是上下文,很好的一个参数!
httpclient.execute(get, localContext);
// 从上下文中得到HttpConnection对象
HttpConnection con = (HttpConnection) localContext
.getAttribute(ExecutionContext.HTTP_CONNECTION);
System.out.println("socket超时时间:" + con.getSocketTimeout());
// 从上下文中得到HttpHost对象
HttpHost target = (HttpHost) localContext
.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
System.out.println("最终请求的目标:" + target.getHostName() + ":"
+ target.getPort());
// 从上下文中得到代理相关信息.
HttpHost proxy = (HttpHost) localContext
.getAttribute(ExecutionContext.HTTP_PROXY_HOST);
if (proxy != null)
System.out.println("代理主机的目标:" + proxy.getHostName() + ":"
+ proxy.getPort());
System.out.println("是否发送完毕:"
+ localContext.getAttribute(ExecutionContext.HTTP_REQ_SENT));
// 从上下文中得到HttpRequest对象
HttpRequest request = (HttpRequest) localContext
.getAttribute(ExecutionContext.HTTP_REQUEST);
System.out.println("请求的版本:" + request.getProtocolVersion());
Header[] headers = request.getAllHeaders();
System.out.println("请求的头信息: ");
for (Header h : headers) {
System.out.println(h.getName() + "--" + h.getValue());
}
System.out.println("请求的链接:" + request.getRequestLine().getUri());
// 从上下文中得到HttpResponse对象
HttpResponse response = (HttpResponse) localContext
.getAttribute(ExecutionContext.HTTP_RESPONSE);
HttpEntity entity = response.getEntity();
if (entity != null) {
System.out.println("返回结果内容编码是:" + entity.getContentEncoding());
System.out.println("返回结果内容类型是:" + entity.getContentType());
dump(entity, encoding);
}
}
输出结果大致如下:
Txt代码
socket超时时间:0
最终请求的目标:money.finance.sina.com.cn:-1
是否发送完毕:true
请求的版本:HTTP/1.1
请求的头信息:
Host--money.finance.sina.com.cn
Connection--Keep-Alive
User-Agent--Apache-HttpClient/4.2.2 (java 1.5)
请求的链接:/corp/go.php/vFD_BalanceSheet/stockid/600031/ctrl/part/displaytype/4.phtml
返回结果内容编码是:null
返回结果内容类型是:Content-Type: text/html
8.设置代理
Java代码
//String hostIp代理主机ip,int port 代理端口
htpHost proxy = new HttpHost(hostIp, port);
// 设置代理主机.
htpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
proxy);
9.设置保持链接时间
Java代码
//在服务端设置一个保持持久连接的特性.
//HTTP服务器配置了会取消在一定时间内没有活动的链接,以节省系统的持久性链接资源.
httpClient.setKeepAliveStrategy(new ConnectionKeepAliveStrategy() {
public long getKeepAliveDuration(HttpResponse response,
HttpContext context) {
HeaderElementIterator it = new BasicHeaderElementIterator(
response.headerIterator(HTTP.CONN_KEEP_ALIVE));
while (it.hasNext()) {
HeaderElement he = it.nextElement();
String param = he.getName();
String value = he.getValue();
if (value != null && param.equalsIgnoreCase("timeout")) {
try {
return Long.parseLong(value) * 1000;
} catch (Exception e) {
}
}
}
HttpHost target = (HttpHost)context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
if("www.baidu.com".equalsIgnoreCase(target.getHostName())){
return 5*1000;
}
else
return 30*1000;
}
});
总结一下httpClient的一些基本使用。
版本是4.2.2。
使用这个版本的过程中,百度很多,结果都是出现的org.apache.commons.httpclient.这个包名,而不是我这里的org.apache.http.client.HttpClient----------前者版本是 Commons HttpClient 3.x ,不是最新的版本HttpClient 4.×。
官网上面:
Commons HttpClient 3.x codeline is at the end of life. All users of Commons HttpClient 3.x are strongly encouraged to upgrade to HttpClient 4.1.
1.基本的get
Java代码
public void getUrl(String url, String encoding)
throws ClientProtocolException, IOException {
// 默认的client类。
HttpClient client = new DefaultHttpClient();
// 设置为get取连接的方式.
HttpGet get = new HttpGet(url);
// 得到返回的response.
HttpResponse response = client.execute(get);
// 得到返回的client里面的实体对象信息.
HttpEntity entity = response.getEntity();
if (entity != null) {
System.out.println("内容编码是:" + entity.getContentEncoding());
System.out.println("内容类型是:" + entity.getContentType());
// 得到返回的主体内容.
InputStream instream = entity.getContent();
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream, encoding));
System.out.println(reader.readLine());
} catch (Exception e) {
e.printStackTrace();
} finally {
instream.close();
}
}
// 关闭连接.
client.getConnectionManager().shutdown();
}
2.基本的Post
下面的params参数,是在表单里面提交的参数。
Java代码
public void postUrlWithParams(String url, Map params, String encoding)
throws Exception {
DefaultHttpClient httpclient = new DefaultHttpClient();
try {
HttpPost httpost = new HttpPost(url);
// 添加参数
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
if (params != null && params.keySet().size() > 0) {
Iterator iterator = params.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry entry = (Entry) iterator.next();
nvps.add(new BasicNameValuePair((String) entry.getKey(),
(String) entry.getValue()));
}
}
httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
HttpResponse response = httpclient.execute(httpost);
HttpEntity entity = response.getEntity();
System.out.println("Login form get: " + response.getStatusLine()
+ entity.getContent());
dump(entity, encoding);
System.out.println("Post logon cookies:");
List<Cookie> cookies = httpclient.getCookieStore().getCookies();
if (cookies.isEmpty()) {
System.out.println("None");
} else {
for (int i = 0; i < cookies.size(); i++) {
System.out.println("- " + cookies.get(i).toString());
}
}
} finally {
// 关闭请求
httpclient.getConnectionManager().shutdown();
}
}
3。打印页面输出的小代码片段
Java代码
private static void dump(HttpEntity entity, String encoding)
throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(
entity.getContent(), encoding));
System.out.println(br.readLine());
}
4.常见的登录session问题,需求:使用账户,密码登录系统之后,然后再访问页面不出错。
特别注意,下面的httpclient对象要使用一个,而不要在第二次访问的时候,重新new一个。至于如何保存这个第一步经过了验证的httpclient,有很多种方法实现。单例,系统全局变量(android 下面的Application),ThreadLocal变量等等。
以及下面创建的httpClient要使用ThreadSafeClientConnManager对象!
public String getSessionId(String url, Map params, String encoding,
Java代码
String url2) throws Exception {
DefaultHttpClient httpclient = new DefaultHttpClient(
new ThreadSafeClientConnManager());
try {
HttpPost httpost = new HttpPost(url);
// 添加参数
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
if (params != null && params.keySet().size() > 0) {
Iterator iterator = params.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry entry = (Entry) iterator.next();
nvps.add(new BasicNameValuePair((String) entry.getKey(),
(String) entry.getValue()));
}
}
// 设置请求的编码格式
httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
// 登录一遍
httpclient.execute(httpost);
// 然后再第二次请求普通的url即可。
httpost = new HttpPost(url2);
BasicResponseHandler responseHandler = new BasicResponseHandler();
System.out.println(httpclient.execute(httpost, responseHandler));
} finally {
// 关闭请求
httpclient.getConnectionManager().shutdown();
}
return "";
}
5.下载文件,例如mp3等等。
Java代码
//第一个参数,网络连接;第二个参数,保存到本地文件的地址
public void getFile(String url, String fileName) {
HttpClient httpClient = new DefaultHttpClient();
HttpGet get = new HttpGet(url);
try {
ResponseHandler<byte[]> handler = new ResponseHandler<byte[]>() {
public byte[] handleResponse(HttpResponse response)
throws ClientProtocolException, IOException {
HttpEntity entity = response.getEntity();
if (entity != null) {
return EntityUtils.toByteArray(entity);
} else {
return null;
}
}
};
byte[] charts = httpClient.execute(get, handler);
FileOutputStream out = new FileOutputStream(fileName);
out.write(charts);
out.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
httpClient.getConnectionManager().shutdown();
}
}
6.创建一个多线程环境下面可用的httpClient
(原文:http://blog.csdn.net/jiaoshi0531/article/details/6459468)
Java代码
HttpParams params = new BasicHttpParams();
//设置允许链接的做多链接数目
ConnManagerParams.setMaxTotalConnections(params, 200);
//设置超时时间.
ConnManagerParams.setTimeout(params, 10000);
//设置每个路由的最多链接数量是20
ConnPerRouteBean connPerRoute = new ConnPerRouteBean(20);
//设置到指定主机的路由的最多数量是50
HttpHost localhost = new HttpHost("127.0.0.1",80);
connPerRoute.setMaxForRoute(new HttpRoute(localhost), 50);
ConnManagerParams.setMaxConnectionsPerRoute(params, connPerRoute);
//设置链接使用的版本
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
//设置链接使用的内容的编码
HttpProtocolParams.setContentCharset(params,
HTTP.DEFAULT_CONTENT_CHARSET);
//是否希望可以继续使用.
HttpProtocolParams.setUseExpectContinue(params, true);
SchemeRegistry schemeRegistry = new SchemeRegistry();
schemeRegistry.register(new Scheme("http",PlainSocketFactory.getSocketFactory(),80));
schemeRegistry.register(new Scheme("https",SSLSocketFactory.getSocketFactory(),443));
ClientConnectionManager cm = new ThreadSafeClientConnManager(params,schemeRegistry);
httpClient = new DefaultHttpClient(cm, params);
7.实用的一个对象,http上下文,可以从这个对象里面取到一次请求相关的信息,例如request,response,代理主机等。
Java代码
public static void getUrl(String url, String encoding)
throws ClientProtocolException, IOException {
// 设置为get取连接的方式.
HttpGet get = new HttpGet(url);
HttpContext localContext = new BasicHttpContext();
// 得到返回的response.第二个参数,是上下文,很好的一个参数!
httpclient.execute(get, localContext);
// 从上下文中得到HttpConnection对象
HttpConnection con = (HttpConnection) localContext
.getAttribute(ExecutionContext.HTTP_CONNECTION);
System.out.println("socket超时时间:" + con.getSocketTimeout());
// 从上下文中得到HttpHost对象
HttpHost target = (HttpHost) localContext
.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
System.out.println("最终请求的目标:" + target.getHostName() + ":"
+ target.getPort());
// 从上下文中得到代理相关信息.
HttpHost proxy = (HttpHost) localContext
.getAttribute(ExecutionContext.HTTP_PROXY_HOST);
if (proxy != null)
System.out.println("代理主机的目标:" + proxy.getHostName() + ":"
+ proxy.getPort());
System.out.println("是否发送完毕:"
+ localContext.getAttribute(ExecutionContext.HTTP_REQ_SENT));
// 从上下文中得到HttpRequest对象
HttpRequest request = (HttpRequest) localContext
.getAttribute(ExecutionContext.HTTP_REQUEST);
System.out.println("请求的版本:" + request.getProtocolVersion());
Header[] headers = request.getAllHeaders();
System.out.println("请求的头信息: ");
for (Header h : headers) {
System.out.println(h.getName() + "--" + h.getValue());
}
System.out.println("请求的链接:" + request.getRequestLine().getUri());
// 从上下文中得到HttpResponse对象
HttpResponse response = (HttpResponse) localContext
.getAttribute(ExecutionContext.HTTP_RESPONSE);
HttpEntity entity = response.getEntity();
if (entity != null) {
System.out.println("返回结果内容编码是:" + entity.getContentEncoding());
System.out.println("返回结果内容类型是:" + entity.getContentType());
dump(entity, encoding);
}
}
输出结果大致如下:
Txt代码
socket超时时间:0
最终请求的目标:money.finance.sina.com.cn:-1
是否发送完毕:true
请求的版本:HTTP/1.1
请求的头信息:
Host--money.finance.sina.com.cn
Connection--Keep-Alive
User-Agent--Apache-HttpClient/4.2.2 (java 1.5)
请求的链接:/corp/go.php/vFD_BalanceSheet/stockid/600031/ctrl/part/displaytype/4.phtml
返回结果内容编码是:null
返回结果内容类型是:Content-Type: text/html
8.设置代理
Java代码
//String hostIp代理主机ip,int port 代理端口
htpHost proxy = new HttpHost(hostIp, port);
// 设置代理主机.
htpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
proxy);
9.设置保持链接时间
Java代码
//在服务端设置一个保持持久连接的特性.
//HTTP服务器配置了会取消在一定时间内没有活动的链接,以节省系统的持久性链接资源.
httpClient.setKeepAliveStrategy(new ConnectionKeepAliveStrategy() {
public long getKeepAliveDuration(HttpResponse response,
HttpContext context) {
HeaderElementIterator it = new BasicHeaderElementIterator(
response.headerIterator(HTTP.CONN_KEEP_ALIVE));
while (it.hasNext()) {
HeaderElement he = it.nextElement();
String param = he.getName();
String value = he.getValue();
if (value != null && param.equalsIgnoreCase("timeout")) {
try {
return Long.parseLong(value) * 1000;
} catch (Exception e) {
}
}
}
HttpHost target = (HttpHost)context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
if("www.baidu.com".equalsIgnoreCase(target.getHostName())){
return 5*1000;
}
else
return 30*1000;
}
});
相关文章推荐
- java应用集锦9:httpclient4.2.2的几个常用方法,登录之后访问页面问题,下载文件
- httpclient4.2.2的几个常用方法,登录之后访问页面问题,下载文件
- httpclient4.2.2的几个常用方法,登录之后访问页面问题,下载文件
- HttpClient4.2.2的几个常用方法,登录之后访问页面问题,下载文件
- IIS虚拟目录内的视频文件访问出错:HTTP 错误 404.3 - Not Found 由于扩展配置问题而无法提供您请求的页面。如果该页面是脚本,请添加处理程序。如果应下载文件,请添加 MIME 映射。
- HttpClient 4使用方法的几个例子(代理,StringEntity字符串数据,文件上传)(转载)
- jsp页面java+js实现文件下载(并附带解决下载文件名乱码问题-火狐浏览器正常访问)
- JAVA通过访问页面中的URL实现Http文件下载到本地
- httpclient模拟浏览器下载文件-常用方法集锦
- jsp页面 java 文件下载设置禁止IE直接打开配置问题
- 安装完vs.2005之后,重新安装iis后无法使用http方式访问asp.net工程的页面的问题的解决方法
- JAVA通过访问页面中的URL实现Http文件下载到本地
- IIS设置允许下载.exe文件的解决方法(转自:http://hi.baidu.com/greenyork/blog/item/81da2a001d2175091d958319.html)
- HTTP 错误 404.3 - Not Found 由于扩展配置问题而无法提供您请求的页面。如果该页面是脚本,请添加处理程序。如果应下载文件,请添加 MIME 映射。
- java下载远程http地址的图片文件到本地-自动处理图片是否经过服务器gzip压缩的问题
- HTTP 错误 404.3 NOT FOUND 由于扩展配置问题而无法提供您请求的页面。如果该页面是脚本,请添加处理程序。如果应下载文件,请添加MIME映射。
- httpclient 怎么带上登录成功后返回的cookie值访问下一页面
- java 设置代理访问http
- java下载远程http地址的图片文件到本地-自动处理图片是否经过服务器gzip压缩的问题
- crawler_httpclient代理访问