您的位置：首页 > 移动开发 > IOS开发

IOS文件解析之第三方框架Hpple的简单使用

2014-11-10 21:23 369 查看

一、Hpple的简介
Hpple:是一个轻量级的封装框架，通过和libxm搭配使用，可以很方便快捷的解析HTML或者XML，并且结合urlconnection可以实现边下载边解析。它是用XPath来定位和解析HTML或者XML。

二、研究说明

虽然现在公司大多使用json文件来获取信息，但是对于某些老一点的网站没有提供接口，也没有json文件，这就需要我们来直接对html文件进行解析，本文简单介绍一下这个框架，并详细介绍一个例子来告诉大家怎么使用这个Hpple库

三、Hpple的导入

在你的工程中使用Hpple,需要如下步骤:

1. 创建一个single view application project (withstoryboard and arc)
2. 把libxml2 library加到project里。
（1）左边窗口选定project root node，旁边会出现一个区域，选择the
node in "TARGETS"
（2） select "build phases" tab,expand "Link Binary With Libraries"，然后click
"+" button
（3） search "libxml2",
选定"libxml2.dylib", click "add" button，这时libxml2.dylib会添加到project，出于归类的目的，建议把它drag
and drop to "Frameworks"folder。
（4）重复step 1,
然后选定"Build Settings" tab，search "Header Search Paths"并expand
it，对于"debug" and "release"node，均通过click "+" button来添加一个value为"${SDK_DIR}"/usr/include/libxml2的item
(注意：该值是带有双引号的)
（5）简单测试你的project是否添加libxml2成功：在你的view
controller .m file里添加下列代码，然后看看是否编译成功，若成功则表示可以使用libxml2 lib

#import <libxml/HTMLparser.h>

3. 把hpple的源码添加到project。
（1）下载hpple from https://github.com/topfunky/hpple
（2）在你的project里create
a group （即folder） named "hpple" (这是出于归类便于管理的目的），然后把下列6个files拖拽进该folder，然后勾上option
"copy items into destinationgroup's folder", 选择option "Create groups for any added folders",
勾上option "Add to Targets", clickFinish button

1 TFHpple.h
2 TFHpple.m
3 TFHppleElement.h
4 TFHppleElement.m
5 XPathQuery.h
6 XPathQuery.m
（3）这时如果你build your project，就会出现很多"ARC"编译错误。你需要左边窗口选定project
root node，旁边会出现一个区域，选择the node in "TARGETS"，select "build phases" tab, expand"Compile Sources"，double
click里面的hpple的3个.m files，然后添加"-fno-objc-arc"。再次build
project，编译成功！

四、Hpple的多种使用方法

1、Hpple简单解析网页

（1）构建网址url

（2）url转换成NSData类型
（3）创建数据解析对象TFHpple
（4）通过Xpath定位到指定位置并获取数据
（5）使用数据
代码讲解：

//-----------------------------最简单的使用-----------------------------
- (void)testHtml
{
//1.构建网址
NSURL *url = [NSURL URLWithString:@"http://www.weiphone.com/apple/news/index_1.shtml"];
//2.转换成NSData类型
NSData *urlData = [[NSData alloc]initWithContentsOfURL:url];

//3.创建数据解析对象
TFHpple *xpathParser = [[TFHpple alloc]initWithHTMLData:urlData];
//4.通过Xpath定位到指定位置并获取数据
NSArray *elements = [xpathParser searchWithXPathQuery:@"//div[@id='news']//div//div[2]//h3//a[1]"];
//5.使用数据
NSLog(@"elements=%@",elements[0]);

}

2、Hpple简单解析UTF-8网页[见图4-2]

（1）构建网址

（2）网址转化成UTF-8编码的字符串

（3）字符串转换成编码为UTF8的NSData类型
（4）创建数据解析对象TFHpple
（5）通过Xpath定位到指定位置并获取数据
（6）使用数据

代码讲解：

//-----------------------------网页使用utf8编码-----------------------------
- (void)testUTF8
{
//1、网址
NSURL *url = [NSURL URLWithString:@"http://www.baidu.com/"];
NSString *htmlString = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];
//2、转换成NSData类型
NSData *htmlData = [htmlString dataUsingEncoding:NSUTF8StringEncoding];
//3、创建TFHpple对象
TFHpple *xpathParser = [[TFHpple alloc]initWithHTMLData:htmlData];
//4、通过Xpath定位到指定位置，并获取数据元素【这里需要指定到离数据最近的一层】难点
//需要注意转义字符
//-------//*[@id="ctl00_Head1"]/title-----
NSArray *elements = [xpathParser searchWithXPathQuery:@"/html/head/title"];

//使用数据
NSLog(@"elements=%@",elements);
for (TFHppleElement *element inelements)
{
NSLog(@"result = %@",element.content);
}
TFHppleElement *element = elements[0];
NSString *title = element.content;
NSLog(@"result = %@",title);
}

3、pple简单解析GB2312网页[见图4-3]

（1）构建网址

（2）字符串转换成NSData类型
（3）数据转码
3-1手动转码
3.1.1转换数据编码
3.1.2数据转换成字符串
3.1.3将字符串的类型
3.1.4GB2312转换成UTF-8
3.1.5字符串转换成NSData类型
3-2自动转码
3.2.1调用转码方法,传入NSData

（3）创建数据解析对象TFHpple
（4）通过Xpath定位到指定位置并获取数据
（5）使用数据

代码讲解：

//-----------------------------网页使用GB2312编码-----------------------------
- (void)changeGB2312
{
//1.构建网址
NSURL*url = [NSURL URLWithString:@"http://202.196.166.138/default2.aspx"];
//2.转换成NSData类型
NSData*urlData = [[NSData alloc]initWithContentsOfURL:url];
//3.----------------------数据转码----------------
//方式一，手动转码
//转换数据编码
NSStringEncoding enc= CFStringConvertEncodingToNSStringEncoding (kCFStringEncodingGB_18030_2000);
NSString*transStr=[[NSString alloc]initWithData:urlDataencoding:enc];
//gb2312转换成utf-8
NSString*utf8HtmlStr = [transStr stringByReplacingOccurrencesOfString:@"<metahttp-equiv=\"Content-Type\" content=\"text/html;charset=gb2312\">" withString:@"<metahttp-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\">"];
NSData*htmlDataUTF8 = [utf8HtmlStr dataUsingEncoding:NSUTF8StringEncoding];
//方式二，调用转码方法
NSData*htmlDataUTF82 = [self toUTF8:urlData];
NSLog(@"%@",htmlDataUTF82);
//4.创建数据解析对象
TFHpple *xpathParser= [[TFHpple alloc]initWithHTMLData:htmlDataUTF8];
//5.获取数据
NSArray*elements = [xpathParser searchWithXPathQuery:@"//input[@name='__VIEWSTATE']"];
//6.使用数据
NSLog(@"elements=%@",elements[0]);
//遍历数组的每个元素，取到字典raw
//从字典raw中获取到value的值【这里巧妙的利用了，所有元素中只有一个元素有value的属性元素】
for (TFHppleElement*element in elements)
{
NSLog(@"value:%@",element[@"src"]);
}
}

转换数据编码的方法

//-----------------------------转换数据编码-----------------------------
- (NSData *)toUTF8:(NSData *)sourceData
{
//转换数据编码
CFStringRef gbkStr = CFStringCreateWithBytes(NULL,sourceData.bytes,sourceData.length, kCFStringEncodingGB_18030_2000,false);
if(gbkStr == NULL)
{
return nil;
}
else
{
NSString *gbkString = (__bridge NSString*)gbkStr;
//gb2312转换成utf-8
NSString *utf8HtmlStr = [gbkString stringByReplacingOccurrencesOfString:@"<meta content=\"text/html;charset=gb2312\" http-equiv=\"Content-Type\">" withString:@"<metahttp-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\">"];

return [utf8HtmlStr dataUsingEncoding:NSUTF8StringEncoding];
}
}

图4-2UTF-8格式编码的网页
图4-3 GB2312格式编码的网页

五、XPath获取方法

可以自己根据xpath语法书写,也可以使用浏览器的插件来获取所需要的数据的xpath,下面简单说一下使用chrome来获取xpath:打开一个网页，F12，在弹出的小窗口中选中一个标签，右键，点击“copyXPath”即可！

以上是个人通过查询资料和测试得出的关于Hpple的知识，如果有不正确的地方，还希望大家指出。
参考网站

/article/4680311.html

http://nzx-214.lofter.com/post/1cc33600_3511767h

http://www.w3school.com.cn/xpath/xpath_nodes.asp 【XPath语法学习】

http://blog.sina.com.cn/s/blog_6dce99b10101layi.html

https://github.com/topfunky/hpple 【Hpple下载】

/article/2905597.htmlh

http://www.2cto.com/kf/201312/267860.html[各种浏览器下的页面元素xpath获取方法]

注：本框架简介的所有代码都已经归纳整理，大家可以直接使用

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

IOS文件解析 之第三方框架Hpple的简单使用

IOS文件解析之第三方框架Hpple的简单使用