使用 XPATH 和 HTML Cleaner 解析 HTML/XML
2015-01-08 13:58
363 查看
使用 XPATH 和 HTML Cleaner 解析 HTML/XML
(Using XPATH and HTML Cleaner to parse HTML / XML)
太阳火神的美丽人生 (http://blog.csdn.net/opengl_es)本文遵循“署名-非商业用途-保持一致”创作公用协议转载请保留此句:太阳火神的美丽人生 - 本博客专注于 敏捷开发及移动和物联设备研究:iOS、Android、Html5、Arduino、pcDuino,否则,出自本博客的文章拒绝转载或再转载,谢谢合作。
使用 XPATH 和 HTML Cleaner 解析 HTML/XML
JANUARY 5, 2010tags: android, examples, HTML, parse, scraping, XML, XPATH大家好
Hey everyone,有时我发现有一种能力十分有用,尤其在 Web 相关的应用中,那就是从 web 站点获取 HTML 并且从 HTML 解析数据,或是任何你要想得到的内容(对于我的情况大多总是数据)。
So something that I’ve found to be extremely useful (especially in web related applications) is the ability to retrieve HTML from websites and parse their HTML for data or whatever you may be looking for (in my case it is almost always data).
I actually use this technique to do the real time stock/option imports for my Black-Scholes/Implied Volatility applications, so if you’re looking for an example on how to retrieve and parse HTML and run “queries” over it using, say, XPATH, then this post is for you.Now, before we begin, in order to do this you will have to reference an external JAR in your project’s build path. The JAR that I use comes from HtmlCleaner which even gives you an example of how they use it here HtmlCleaner Example, but in addition to that I’ll show you an example of how I use it.
So that’s it! Once you include the JAR in your build path, everything else is pretty easy! It’s a great tool to use. However, it does require knowledge of XPATH but XPATH isn’t too hard to pick up and is useful to know so if you don’t know it then take a look at the link.Now, a warning to everyone. It’s documented that the XPATH expressions recognized by HtmlCleaner is not complete in the sense that only “basic” XPATH is recognized. What’s excluded? For instance, you can’t use any of the “axes” operators (i.e. parent, ancestor, following, following-sibling, etc), but in my experience everything else is fair game. Yes, it sucks, and many times it can make your life a little bit harder, but usually it just requires you to be a tad more clever with your XPATH expressions before you can pull the desired information.And of course, this technique works for XML documents as well!Hope this was helpful to everyone. Let me know if you’re confused anywhere.- jwei
(Using XPATH and HTML Cleaner to parse HTML / XML)
太阳火神的美丽人生 (http://blog.csdn.net/opengl_es)本文遵循“署名-非商业用途-保持一致”创作公用协议转载请保留此句:太阳火神的美丽人生 - 本博客专注于 敏捷开发及移动和物联设备研究:iOS、Android、Html5、Arduino、pcDuino,否则,出自本博客的文章拒绝转载或再转载,谢谢合作。
使用 XPATH 和 HTML Cleaner 解析 HTML/XML
(Using XPATH and HTML Cleaner to parse HTML / XML)
JANUARY 5, 2010tags: android, examples, HTML, parse, scraping, XML, XPATH大家好Hey everyone,有时我发现有一种能力十分有用,尤其在 Web 相关的应用中,那就是从 web 站点获取 HTML 并且从 HTML 解析数据,或是任何你要想得到的内容(对于我的情况大多总是数据)。
So something that I’ve found to be extremely useful (especially in web related applications) is the ability to retrieve HTML from websites and parse their HTML for data or whatever you may be looking for (in my case it is almost always data).
I actually use this technique to do the real time stock/option imports for my Black-Scholes/Implied Volatility applications, so if you’re looking for an example on how to retrieve and parse HTML and run “queries” over it using, say, XPATH, then this post is for you.Now, before we begin, in order to do this you will have to reference an external JAR in your project’s build path. The JAR that I use comes from HtmlCleaner which even gives you an example of how they use it here HtmlCleaner Example, but in addition to that I’ll show you an example of how I use it.
相关文章推荐
- java 使用xpath解析xml和html
- 使用XPATH对XML数据进行解析
- 使用Perl的HTML::TreeBuilder::XPath来解析网页内容
- 使用DOM4J解析XML文档,以及使用XPath提取XML文档
- 使用XPath解析xml文档
- java使用dom4j和XPath解析XML与.net 操作XML小结
- 使用Objective-C HPPLE库解析HTML和XML
- 浅谈HtmlCleaner+XPath解析html或者xml
- java使用dom4j和XPath解析XML与.net 操作XML小结
- PHP使用xpath解析XML实例教程
- dom4j 使用xpath 解析 persistence.xml 出现xmlns后不能解析问题解决
- libxml中使用xpath解析xml文件
- 使用DOM4J解析XML文档,以及使用XPath提取XML文档
- dom4j解析XML时使用XPath直接定位至标签实例
- 使用dom4j和XPath解析XML之例子一
- 如何使用Objective-C解析HTML和XML
- 使用XPath解析HTML获取网页内容
- java使用xpath和dom4j解析xml
- 使用BeautifulSoup解析HTML和XML
- dom4j使用XPath解析xml