Java网络爬虫 - 一个简单的爬虫例子
2015-09-24 19:34
731 查看
WikiScraper.java
运行结果:
A python is a constricting snake belonging to the Python (genus), or, more generally, any snake in the family Pythonidae (containing the Python genus).
package master.haku.scrape; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import java.net.*; import java.io.*; public class WikiScraper { public static void main(String[] args) { scrapeTopic("/wiki/Python"); } public static void scrapeTopic(String url) { String html = getUrl("https://en.wikipedia.org" + url); Document doc = Jsoup.parse(html); String contentText = doc.select("#mw-content-text > p").first().text(); System.out.println(contentText); } public static String getUrl(String url) { URL urlObj = null; try { urlObj = new URL(url); } catch (MalformedURLException e) { System.out.println("The url was malformed!"); return ""; } URLConnection urlCon = null; BufferedReader in = null; String outputText = ""; try { urlCon = urlObj.openConnection(); in = new BufferedReader(new InputStreamReader(urlCon.getInputStream())); String line = ""; while ((line = in.readLine()) != null) { outputText += line; } in.close(); } catch (IOException e) { System.out.println("There was an error connecting to the URL"); return ""; } return outputText; } }
运行结果:
A python is a constricting snake belonging to the Python (genus), or, more generally, any snake in the family Pythonidae (containing the Python genus).
相关文章推荐
- 从浏览器地址栏里输入一个URL开始,到出现整个页面,网络上都发生了什么事?
- 【网络安全】XCodeGhost事件刨根问底
- Minix文件系统框架 http://www.cnblogs.com/zhangchaoyang/articles/1896605.html
- 使用ruby搭建简易的http服务和sass环境
- TCP包的类型 (SYN, FIN, ACK, PSH, RST, URG)
- tcp_tw_reuse、tcp_tw_recycle 使用场景及注意事项
- nodejs如何请求路由,http和url模块
- HDU 3081Marriage Match II(二分法+并检查集合+网络流量的最大流量)
- java网络编程(二) 客户端和服务端(TCP)
- upload4j安全、高效、易用的java http文件上传框架
- android 获取网络图片
- 计算机网络面试题汇总
- TCP数据包分片机制详解
- ios网络请求 get——post 区别
- NodeJs创建https 服务
- tcp 出现rst情况整理
- Xcode7下网络连接错误无法联网
- wcf http 代理
- 【转】验证HTTP Referer字段
- hihoCoder 1227 The Cats' Feeding Spots 2015北京赛区网络赛