您的位置:首页 > 理论基础 > 计算机网络

如何使用Java中HttpClient解析Html中的table

2015-07-17 00:00 956 查看
摘要: 在开发过程中有时候需要静态解析某个网页中的数据,使用HttpClient能够很快的帮助我们完成这项工作

1、打开MyEclipse新建一个Java Project 输入名称XXX(httpClientTest)



2、打开地址:http://hc.apache.org/downloads.cgi,下载相应的jar包



3、打开新建的项目新建lib文件夹,并导入之前下载的jar包,右键项目选择Bulid Path--Configure Bulid Path--Libraries--Add JARs 导入lib中的jar如图所示



4、新建ClientTest及ClientPojo类。部分代码如下:(这里需要解析Html所以用到了jsoup,可自行上网下载导入jar包方式如上一步骤)

测试地址我选择的是:http://www.live.chinacourt.org/fygg/index/kindid/5.shtml,可根据自己项目需要自行设置。









package com.gsoft.getnotice;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class HttpClient {

public ArrayList<ArrayList<String>> getNoticeList(String url) {
HttpClientBuilder httpClientBuilder =HttpClientBuilder.create();
CloseableHttpClient closeableHttpClient=httpClientBuilder.build();
//String url="http://www.live.chinacourt.org/fygg/index/kindid/5.shtml";
//进行url设置

//String url="http://localhost:8090/httpclient-getnotice/";
HttpGet httpGet= new HttpGet(url);
ArrayList<ArrayList<String>> lists = new ArrayList<ArrayList<String>>();
try {
/*String urlheader="http://www.live.chinacourt.org";
List <ClientPojo> clientPojo = new ArrayList<ClientPojo>();
List <ClientPojo> clientPojoTexts = new ArrayList<ClientPojo>();*/
HttpResponse httpResponse =closeableHttpClient.execute(httpGet);
HttpEntity entity =httpResponse.getEntity();
if(null!=entity){
//设置编码格式
String response=EntityUtils.toString(entity, "GBK");
//转换Document对象
Document doc=Jsoup.parse(response);
//根据类名获取对象
Elements tables=doc.getElementsByClass("xian");

//Elements elenent=doc.select("table");

Elements tr=tables.select("tr");
for (int i = 0; i < tr.size(); i++) {
//String[] strings= new String[4];
//String[] tds= new String[4];
Element trs=tr.get(i);
//Elements href=trs.select("a");
Elements td=trs.select("td");
//将td值存入list中
ArrayList<String> list = new ArrayList<String>();
for (int j = 0; j < td.size(); j++) {
if(j!=0){
Element tdpojo=td.get(j);
//tds[j]=tdpojo.text();
list.add(tdpojo.text());
}

//System.out.println("---"+tdpojo.text());
}
lists.add(list);
}

//System.out.println(tr);
}
} catch (ClientProtocolException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}finally{
try {
closeableHttpClient.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

return lists;
}

}
http://jingyan.baidu.com/article/22fe7ced2741043002617f1c.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  java HttpClient Jsoup