您的位置:首页 > 理论基础 > 计算机网络

通过httpclinet登录网站(https),并且获取网页上的信息。post+get

2016-04-17 00:00 696 查看
摘要: 模拟登录http://jx.122.gov.cn/ 网站查询违章信息。源代码:http://download.csdn.net/detail/dpc761218914/9493818

这次主要记录蛋疼的爬取某网站信息的一天,之前用node.js模拟登录,不太好用,模拟登录后是准备使用到android客户端中,所以用java,期间遇到很多问题,大概记录如下:

(1),用httpclient,get访问网页不成功。

(2),get请求访问学校官方网站是能成功的,但是不能访问交管局的网站,(因为他的网站是https的,所以需要进行SSL处理)。

(3),可以访问首页以后,这里需要使用fiddler模拟登录,查看登陆时候的URl时候提交的用户名密码

(4),需要下载验证码,把验证码通过命令行输进去,然后放到模拟登录的参数链表中去。

(5),登录成功后,因为httpclient自身有管理session的功能,你需要查找那个信息就通过对应的get活着post访问对应的URl就可以了

1,先认识httpclient,了解到它是java程序模拟访问网站,首先我们用httpclient访问一个网站,就从学校NCHU网站首页开始吧。

首先引入httpclient-4.3.6.jar和httpcore-4.4.4.jar,然后写程序

public static void main(String[] args) throws ClientProtocolException,
IOException {
String url = "http://www.nchu.edu.cn/";
HttpClient httpclient = new DefaultHttpClient();
StringBuffer result = null;
HttpResponse response = null;
HttpGet request = new HttpGet(url);
response = httpclient.execute(request);
BufferedReader rd = null;
rd = new BufferedReader(new InputStreamReader(response.getEntity()
.getContent(), "UTF-8"));
result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line + "\n");
}
rd.close();
System.out.println(result);
}

运行本以为可以打印出网页的html的,但是发成了错误

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

at org.apache.http.impl.client.CloseableHttpClient.<init>(CloseableHttpClient.java:60)

at org.apache.http.impl.client.AbstractHttpClient.<init>(AbstractHttpClient.java:271)

at org.apache.http.impl.client.DefaultHttpClient.<init>(DefaultHttpClient.java:146)

at getTest.main(getTest.java:22)

于是去百度,发现还需要加一个common.logging.jar,于是我引入它,然后就获取网页成功了



2,好的,现在第一步访问学校官网成功了,但是现在试试https://jx.122.gov.cn/发现它报错了。说什么安全证书错误SSL,然后就去百度呗。发现了这篇博客解决了问题,好了交管局的首页也可以进来了。(http://blog.csdn.net/rongyongfeikai2/article/details/41659353/)(当我写博客的时候,我发现http://jx.122.gov.cn/也是能成功的,卧槽,第一次怎么没发现,一个劲解决https问题,而后干脆把他们都改成http的)

Exception in thread "main" javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

3:获取首页成功了,如果在网页上操作,我就就需要输入用户名,密码,验证码 让后点击登录按钮,发送post请求登录成功,我没先用fiddler看一下我们登录的过程,查看提交的参数。验证码也就是一个图片,我们可以把它下载到本地,然后手动录入进去,



为了方便处理,我们把get post 还有获取验证码都放到工具类里面。当我们访问首页的时候,服务器就分配一个session,用于客户端和服务器交互时候做记录。确实是这个用户。(我也不太懂)

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import javax.swing.text.html.HTMLDocument.Iterator;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.GzipDecompressingEntity;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
public class HttpUtil {
public static HttpClient httpclient = null;
private static String cookies;
static {
//  PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager();
//
//  connManager.setMaxTotal(100);
//  connManager.setDefaultMaxPerRoute(50);
try {
httpclient = new DefaultHttpClient();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
/**
* get请求
*/
public static String GetPageContent(String url, String cookie) throws ClientProtocolException, IOException{
StringBuffer result = null;
HttpResponse response = null;
HttpGet request = new HttpGet(url);
System.out.println(cookies);
request.setHeader("Cookie", cookie);

response = httpclient.execute(request);
BufferedReader rd = null;
rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8"));
result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line+"\n");
}
rd.close();
// set cookies
setCookies(
response.getFirstHeader("Set-Cookie") == null ? "" : response.getFirstHeader("Set-Cookie").toString());
//   response.close();
//          System.out.println(response.getFirstHeader("Set-Cookie"));
return result.toString();
}

//获取验证码
public static void GetPhotoContent(String url, String cookie) throws ClientProtocolException, IOException{

HttpResponse response = null;
HttpGet request = new HttpGet(url);
System.out.println(cookies);
request.setHeader("Cookie", cookie);
response = httpclient.execute(request);

// entity = response.getEntity();
//    InputStream inputStream = new GzipDecompressingEntity(response.getEntity()).getContent();

// write the inputStream to a FileOutputStream
OutputStream out = new FileOutputStream(new File("c:\\newfile2.png"));
response.getEntity().writeTo(out);
//    int read = 0;
//    byte[] bytes = new byte[1024];
//
//    while ((read = inputStream.read(bytes)) != -1) {
//        out.write(bytes, 0, read);
//    }

//    inputStream.close();
out.flush();
out.close();
System.out.println("Check file c:\\newfile2.png");
// set cookies
setCookies(
response.getFirstHeader("Set-Cookie") == null ? "" : response.getFirstHeader("Set-Cookie").toString());
//   response.close();
//          System.out.println(response.getFirstHeader("Set-Cookie"));

}

//post请求提交参数
@SuppressWarnings("unchecked")
public static String postWithParameters(Map<String,String> map, String postUrl, String cookie) throws IOException {
StringBuffer result = null;
HttpResponse response = null;
HttpPost httpost = new HttpPost(postUrl);
httpost.setHeader("Cookie", cookie);

//参数列表
List<NameValuePair> list = new ArrayList<NameValuePair>();
java.util.Iterator<Entry<String, String>> iterator = map.entrySet().iterator();
while(iterator.hasNext()){
Entry<String,String> elem = (Entry<String, String>) iterator.next();
list.add(new BasicNameValuePair(elem.getKey(),elem.getValue()));
}
if(list.size() > 0){
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list,"UTF-8");
httpost.setEntity(entity);
}

//httpost.setEntity(new UrlEncodedFormEntity(map, "UTF-8"));
response = httpclient.execute(httpost);

Header[] myheader=response.getAllHeaders();
for(int i=0;i<myheader.length;i++){
System.out.println(myheader[i]);
}

BufferedReader rd = null;
rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8"));
result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line+"\n");
System.out.println(line);
}
System.out.println(result);
setCookies(response.getFirstHeader("Set-Cookie") == null ? ""
: response.getFirstHeader("Set-Cookie").toString());
rd.close();

return result.toString();
}

/**
设置cookie
*/
public static String getCookies() {
return cookies;
}
public static void setCookies(String cookies) {
HttpUtil.cookies = cookies;
}
}

工具类搞好了,获取验证码的URL后面需要拼接一个时间戳,可以用一个随机变量。然后获取验证码下载到本地。还有post注意post请求的参数问题。

然后现在写一个测试类:

public class getTest {
/**
* @param args
* @throws IOException
* @throws ClientProtocolException
*/
public static void main(String[] args) throws ClientProtocolException,
IOException {
//主页面
String mainurl = "http://jx.122.gov.cn/";
//获取验证码页面
String yanzhengmaurl="http://jx.122.gov.cn/captcha1?nocache="+new Random().nextInt(1000);
//post 用户名密码页面
String postloginurl="http://jx.122.gov.cn/user/m/login";
//登录成功后的页面
String loginsucess="http://jx.122.gov.cn/views/member";

String mainurlStr=HttpUtil.GetPageContent(mainurl, "");
//System.out.println(mainurlStr);

//获取图片验证码
HttpUtil.GetPhotoContent(yanzhengmaurl,"");

System.out.print("输入验证码:");
Scanner scan = new Scanner(System.in);
String read = scan.nextLine();
System.out.println("输入数据:"+read);

//post登录
Map<String,String> createMap = new HashMap<String,String>();
createMap.put("usertype","1");
createMap.put("systemid","main");
//用户名;
createMap.put("username","");
//密码
createMap.put("password","");
createMap.put("captcha",read);

String mypostresult=HttpUtil.postWithParameters(createMap, postloginurl, "");
//  System.out.println(mypostresult);
String loginsucessstr=HttpUtil.GetPageContent(loginsucess, "");
System.out.println(loginsucessstr);
}
}


然后输入验证码:打印出结果:



在c盘查看验证码。录入验证码。

查看登录成功页面;



加入我们现在想获得我们自己的历史记录信息。我们post //违法历史记录url
String breakLowHistory="https://jx.122.gov.cn/user/m/uservio/vehssuris";就可以了,查看json数据结果,fidder看到的结果是一样的,



内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: