您的位置:首页 > Web前端 > HTML

使用XPath解析html

2014-03-07 23:11 357 查看
使用XPath解析html
可以从此处https://github.com/topfunky/hpple下载工程,将TFHpple.h,TFHpple.m,TFHppleElement.h,TFHppleElement.m,XPathQuery.h,XPathQuery.m加到自己的项目中,在Frameworks中导入libxml2.x




在项目中找到Other Linker Flags,加入-libxml2




在项目中找到Header Search Paths,加入/usr/include/libxml2
代码如下:
NSString *urlString =nil;

urlString =@"http://www.xiyou.edu.cn/new/lm.jsp?urltype=tree.TreeTempUrl&wbtreeid=724";

NSData *htmlData = [[NSData alloc] initWithContentsOfURL:[NSURLURLWithString:urlString]];

NSData *toHtmlData =[self
toUTF8:htmlData];

TFHpple *xpathParser =[[TFHpple
alloc] initWithHTMLData:toHtmlData];

NSArray*aArray = [xpathParser
searchWithXPathQuery:@"//a"];

if ([span count] >
0) {

for (int i =
87; i < 102; i++) {

//从<a>的第82个开始取值,共获取15个值

TFHppleElement *aElement = [aArrayobjectAtIndex:i];

NSArray *aArr = [aElement
children];

TFHppleElement *aEle = [aArr
objectAtIndex:0];

NSArray *aChild = [aEle
children];

TFHppleElement *aChildEle = [aChildobjectAtIndex:0];

NSArray *aChildren = [aChildElechildren];

NSString *aStr = [[aChildren
objectAtIndex:0]
content];

NSLog(@"aStr:%@",aStr);

NSDictionary *aAttributeDict = [aElementattributes];

NSLog(@"aAttributeDict:%@",aAttributeDict);

//获取a中的属性值

NSString *hrefStr = [NSString
stringWithFormat:@"http://www.xiyou.edu.cn%@",[aAttributeDictobjectForKey:@"href"]];

NSLog(@"hrefStr:%@",hrefStr);

[currentNewsArr
addObject:aStr];

[currentHrefArr
addObject:hrefStr];

}

[htmlData release];

[xpathParser release];

}

//如果解析的网页不是utf8编码,如gbk编码,可以先将其转换为utf8编码再对其进行解析

-(NSData *)toUTF8:(NSData *)sourceData {

CFStringRef
gbkStr =CFStringCreateWithBytes(NULL,[sourceData bytes],[sourceData length],kCFStringEncodingGB_18030_2000,false);

if (gbkStr ==
NULL) {

returnnil;

} else {

NSString*gbkString = (NSString*)gbkStr;

//根据网页源代码中编码方式进行修改,此处为从gbk转换为utf8

NSString
*utf8_String =[gbkString stringByReplacingOccurrencesOfString:@"METAhttp-equiv="Content-Type" content="text/html; charset=GBK""

withString:@"META http-equiv="Content-Type"content="text/html; charset=UTF-8""];

return[utf8_String
dataUsingEncoding:NSUTF8StringEncoding];

}

}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: