您的位置:首页 > 产品设计 > UI/UE

perl HTML::TreeBuilder::XPath

2016-02-24 12:45 417 查看
HTML::TreeBuilder::XPath 添加XPath 支持HTML::TreeBuilder

use HTML::TreeBuilder::XPath;

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file( "mypage.html");

my $nb=$tree->findvalue( '/html/body//p[@class="section_title"]/span[@class="nb"]');

my $id=$tree->findvalue( '/html/body//p[@class="section_title"]/@id');

my $p= $html->findnodes( '//p[@id="toto"]')->[0];

my $link_texts= $p->findvalue( './a'); # the texts of all a elements in $p

$tree->delete; # to avoid memory leaks, if you parse many HTML documents

描述:

这个模块增加典型的XPath 到HTML::TreeBuilder, 让它容易查询文档

让它更加容易的查询一个文档。

方法:

额外的方法增加到树对象和每个元素

findnodes ($path)

返回在$path找到的节点的列表 通过$path,在标量环境返回一个Tree::XPathEngine::NodeSet object.

findnodes_as_string ($path)

返回节点的文本值,作为一个字符串

findnodes_as_strings ($path)

返回结果节点的值的列表

findvalue ($path)

返回任何一个 Tree::XPathEngine::Literal, a Tree::XPathEngine::Boolean

或者一个Tree::XPathEngine::Number object.

如果path返回一个节点集,$nodeset->xpath_to_literal会被自动调用

(因此 a Tree::XPathEngine::Literal is returned)

注意 每个对象字符串所带来的开销,

所以你只需要打印找到的值,或者

findvalues ($path)

返回匹配节点的值作为列表,这主要是和findnodes_as_strings一样,除了列表的元素是对象

exists ($path)

如果给定的path存在 就返回true

matches($path)

返回真如果元素匹配路径

use LWP::UserAgent;

use HTML::TreeBuilder;

open DATAFH,">>data.html" || die "open data file failed:$!";

my $ua = LWP::UserAgent->new;

$ua->timeout(10);

$ua->env_proxy;

$ua->agent("Mozilla/8.0");

my $response = $ua->get('https://licai.yingyinglicai.com/product/list.htm');

if ($response->is_success) {

print DATAFH $response->decoded_content; # or whatever

# print $response->decoded_content; # or whatever

use HTML::TreeBuilder::XPath;

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file( "data.html");

##查找body内容,<td><div class="fresh"><p class="text-ellipsis-2"><i class="fresh-icon"></i><a href="/detail/11156-261-500-856-0544.htm">变现宝4275号</a></p></div></td>

my @nb=$tree->findvalue( '/html/body//div[@class="fresh"]');

foreach (@nb){print "Product is $_\n"};

}

else {

die $response->status_line;

};

~

~

~
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: