您的位置:首页 > 编程语言 > Java开发

简单Java爬虫

2015-06-27 20:31 453 查看
Apache官网下载httpcomponents-client-4.4,在项目中导入jar

主要类:HttpClient,HttpGet,HttpResponse,HttpEntity

public static void getContentFromUrl(String url) {
HttpClient client = new DefaultHttpClient();
HttpGet getHttp = new HttpGet(url);
while (true) {
try {
HttpResponse response = client.execute(getHttp);
HttpEntity entity = response.getEntity();
String content = null;
String str = null;
if (entity != null) {
content = EntityUtils.toString(entity);
str = new String(content.getBytes("ISO-8859-1"), "UTF-8");  
//在这里可以正则匹配str获得需要的信息
}
sleep(2000);
} catch (IOException | InterruptedException | ParseException e) {
e.printStackTrace();
}
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: