您的位置:首页 > 理论基础 > 计算机网络

Java网络爬虫 - 一个简单的爬虫例子

2015-09-24 19:34 731 查看
WikiScraper.java

package master.haku.scrape;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.*;
import java.io.*;

public class WikiScraper {
public static void main(String[] args) {
scrapeTopic("/wiki/Python");
}

public static void scrapeTopic(String url) {
String html = getUrl("https://en.wikipedia.org" + url);
Document doc = Jsoup.parse(html);
String contentText = doc.select("#mw-content-text > p").first().text();
System.out.println(contentText);
}

public static String getUrl(String url) {
URL urlObj = null;
try {
urlObj = new URL(url);
} catch (MalformedURLException e) {
System.out.println("The url was malformed!");
return "";
}

URLConnection urlCon = null;
BufferedReader in = null;
String outputText = "";

try {
urlCon = urlObj.openConnection();
in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
String line = "";
while ((line = in.readLine()) != null) {
outputText += line;
}
in.close();
} catch (IOException e) {
System.out.println("There was an error connecting to the URL");
return "";
}

return outputText;
}
}


运行结果:

A python is a constricting snake belonging to the Python (genus), or, more generally, any snake in the family Pythonidae (containing the Python genus).
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: