iOS开发之html解析
2015-08-07 17:43
447 查看
使用XPath解析html
可以从此处https://github.com/topfunky/hpple下载工程,将TFHpple.h,TFHpple.m,TFHppleElement.h,TFHppleElement.m,XPathQuery.h,XPathQuery.m加到自己的项目中,在Frameworks中导入libxml2.x
在项目中找到Other Linker Flags,加入-libxml2
在项目中找到Header Search Paths,加入/usr/include/libxml2
代码如下:
NSString *urlString = nil;
urlString = @"http://www.xiyou.edu.cn/new/lm.jsp?urltype=tree.TreeTempUrl&wbtreeid=724";
NSData *htmlData = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:urlString]];
NSData *toHtmlData = [self toUTF8:htmlData];
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:toHtmlData];
NSArray *aArray = [xpathParser searchWithXPathQuery:@"//a"];
if ([span count] > 0)
{
for (int i = 87;
i < 102; i++) {
//从<a>的第82个开始取值,共获取15个值
TFHppleElement *aElement = [aArray objectAtIndex:i];
NSArray *aArr = [aElement children];
TFHppleElement *aEle = [aArr objectAtIndex:0];
NSArray *aChild = [aEle children];
TFHppleElement *aChildEle = [aChild objectAtIndex:0];
NSArray *aChildren = [aChildEle children];
NSString *aStr = [[aChildren objectAtIndex:0] content];
NSLog(@"aStr:%@",aStr);
NSDictionary *aAttributeDict = [aElement attributes];
NSLog(@"aAttributeDict:%@",aAttributeDict);
//获取a中的属性值
NSString *hrefStr = [NSString stringWithFormat:@"http://www.xiyou.edu.cn%@",[aAttributeDict objectForKey:@"href"]];
NSLog(@"hrefStr:%@",hrefStr);
[currentNewsArr addObject:aStr];
[currentHrefArr addObject:hrefStr];
}
[htmlData release];
[xpathParser release];
}
//如果解析的网页不是utf8编码,如gbk编码,可以先将其转换为utf8编码再对其进行解析
-(NSData *) toUTF8:(NSData *)sourceData {
CFStringRef gbkStr
= CFStringCreateWithBytes(NULL,
[sourceData bytes], [sourceData length],kCFStringEncodingGB_18030_2000, false);
if (gbkStr == NULL)
{
return nil;
} else {
NSString *gbkString = (NSString *)gbkStr;
//根据网页源代码中编码方式进行修改,此处为从gbk转换为utf8
NSString *utf8_String
= [gbkString stringByReplacingOccurrencesOfString:@"META
http-equiv="Content-Type" content="text/html; charset=GBK""
withString:@"META
http-equiv="Content-Type" content="text/html; charset=UTF-8""];
return [utf8_String dataUsingEncoding:NSUTF8StringEncoding];
}
}
用hpple较为便利的利用xpath解析html。
做法:http://lwxshow.com/ios-iphone-development-teaches-you-how-to-use-the-objective-c-parsing-html-lwxshow-com
(相关:
http://stackoverflow.com/questions/405749/parsing-html-on-the-iphone
http://stackoverflow.com/questions/9746745/xpath-attributes-selection
)
它里面说的挺详细的:就是引用 https://github.com/topfunky/hpple 上的hpple库,再结合libxml,就可以使用xpath搜索html了。
关于xpath的可以参考:w3school的教程 http://www.w3school.com.cn/xpath/index.asp
相关配置好了之后就可以直接使用:
//
// TFHppleElement.m
// Hpple
//
// Created by Geoffrey Grosenbach on 1/31/09.
//
// Copyright (c) 2009 Topfunky Corporation,
http://topfunky.com
//
// MIT LICENSE
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to
// the following conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#import
"TFHppleElement.h"
static
NSString
*
const
TFHppleNodeContentKey
= @"nodeContent";
static
NSString
*
const
TFHppleNodeNameKey
= @"nodeName";
static
NSString
*
const
TFHppleNodeChildrenKey
= @"nodeChildArray";
static
NSString
*
const
TFHppleNodeAttributeArrayKey
= @"nodeAttributeArray";
static
NSString
*
const
TFHppleNodeAttributeNameKey
= @"attributeName";
@interface
TFHppleElement ()
@property (nonatomic, retain, readwrite)
TFHppleElement
*parent;
@end
@implementation
TFHppleElement
@synthesize parent;
- (void) dealloc
{
[node release];
[parent release];
[super dealloc];
}
- (id) initWithNode:(NSDictionary
*) theNode
{
if (!(self
= [super init]))
return nil;
[theNode retain];
node
= theNode;
return self;
}
+ (TFHppleElement
*) hppleElementWithNode:(NSDictionary
*) theNode {
return [[[[self
class] alloc] initWithNode:theNode] autorelease];
}
#pragma mark
-
- (NSString
*) content
{
return [node objectForKey:TFHppleNodeContentKey];
}
- (NSString
*) tagName
{
return [node objectForKey:TFHppleNodeNameKey];
}
- (NSArray
*) children
{
NSMutableArray
*children
= [NSMutableArray array];
for (NSDictionary
*child
in [node objectForKey:TFHppleNodeChildrenKey]) {
TFHppleElement
*element
= [TFHppleElement hppleElementWithNode:child];
element.parent
= self;
[children addObject:element];
}
return children;
}
- (TFHppleElement
*) firstChild
{
NSArray
* children
= self.children;
if (children.count)
return [children objectAtIndex:0];
return nil;
}
- (NSDictionary
*) attributes
{
NSMutableDictionary
* translatedAttributes
= [NSMutableDictionary dictionary];
for (NSDictionary
* attributeDict
in [node objectForKey:TFHppleNodeAttributeArrayKey]) {
[translatedAttributes setObject:[attributeDict objectForKey:TFHppleNodeContentKey]
forKey:[attributeDict objectForKey:TFHppleNodeAttributeNameKey]];
}
return translatedAttributes;
}
- (NSString
*) objectForKey:(NSString
*) theKey
{
return [[self attributes] objectForKey:theKey];
}
- (id) description
{
return [node description];
}
@end
可以从此处https://github.com/topfunky/hpple下载工程,将TFHpple.h,TFHpple.m,TFHppleElement.h,TFHppleElement.m,XPathQuery.h,XPathQuery.m加到自己的项目中,在Frameworks中导入libxml2.x
在项目中找到Other Linker Flags,加入-libxml2
在项目中找到Header Search Paths,加入/usr/include/libxml2
代码如下:
NSString *urlString = nil;
urlString = @"http://www.xiyou.edu.cn/new/lm.jsp?urltype=tree.TreeTempUrl&wbtreeid=724";
NSData *htmlData = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:urlString]];
NSData *toHtmlData = [self toUTF8:htmlData];
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:toHtmlData];
NSArray *aArray = [xpathParser searchWithXPathQuery:@"//a"];
if ([span count] > 0)
{
for (int i = 87;
i < 102; i++) {
//从<a>的第82个开始取值,共获取15个值
TFHppleElement *aElement = [aArray objectAtIndex:i];
NSArray *aArr = [aElement children];
TFHppleElement *aEle = [aArr objectAtIndex:0];
NSArray *aChild = [aEle children];
TFHppleElement *aChildEle = [aChild objectAtIndex:0];
NSArray *aChildren = [aChildEle children];
NSString *aStr = [[aChildren objectAtIndex:0] content];
NSLog(@"aStr:%@",aStr);
NSDictionary *aAttributeDict = [aElement attributes];
NSLog(@"aAttributeDict:%@",aAttributeDict);
//获取a中的属性值
NSString *hrefStr = [NSString stringWithFormat:@"http://www.xiyou.edu.cn%@",[aAttributeDict objectForKey:@"href"]];
NSLog(@"hrefStr:%@",hrefStr);
[currentNewsArr addObject:aStr];
[currentHrefArr addObject:hrefStr];
}
[htmlData release];
[xpathParser release];
}
//如果解析的网页不是utf8编码,如gbk编码,可以先将其转换为utf8编码再对其进行解析
-(NSData *) toUTF8:(NSData *)sourceData {
CFStringRef gbkStr
= CFStringCreateWithBytes(NULL,
[sourceData bytes], [sourceData length],kCFStringEncodingGB_18030_2000, false);
if (gbkStr == NULL)
{
return nil;
} else {
NSString *gbkString = (NSString *)gbkStr;
//根据网页源代码中编码方式进行修改,此处为从gbk转换为utf8
NSString *utf8_String
= [gbkString stringByReplacingOccurrencesOfString:@"META
http-equiv="Content-Type" content="text/html; charset=GBK""
withString:@"META
http-equiv="Content-Type" content="text/html; charset=UTF-8""];
return [utf8_String dataUsingEncoding:NSUTF8StringEncoding];
}
}
iphone:解析html的第三库hpple初试
用hpple较为便利的利用xpath解析html。做法:http://lwxshow.com/ios-iphone-development-teaches-you-how-to-use-the-objective-c-parsing-html-lwxshow-com
(相关:
http://stackoverflow.com/questions/405749/parsing-html-on-the-iphone
http://stackoverflow.com/questions/9746745/xpath-attributes-selection
)
它里面说的挺详细的:就是引用 https://github.com/topfunky/hpple 上的hpple库,再结合libxml,就可以使用xpath搜索html了。
关于xpath的可以参考:w3school的教程 http://www.w3school.com.cn/xpath/index.asp
相关配置好了之后就可以直接使用:
- (void)viewDidLoad { [super viewDidLoad]; NSError *error; NSData *htmlData = [[NSString stringWithContentsOfURL:[NSURL URLWithString: @"http://dict.youdao.com/m/search?keyfrom=dict.mindex&vendor=&q=apple"] encoding:NSASCIIStringEncoding error:&error] dataUsingEncoding:NSUTF8StringEncoding]; TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData]; NSArray *elements = [xpathParser searchWithXPathQuery:@"//title"]; // get the title NSLog(@"%d",[elements count]); TFHppleElement *element = [elements objectAtIndex:0]; NSString *content = [element content]; NSString *tagname = [element tagName]; NSString *attr = [element objectForKey:@"href"]; NSLog(@"content = %@",content); NSLog(@"tagname = %@",tagname); NSLog(@"attr is = %@",attr); }
//
// TFHppleElement.m
// Hpple
//
// Created by Geoffrey Grosenbach on 1/31/09.
//
// Copyright (c) 2009 Topfunky Corporation,
http://topfunky.com
//
// MIT LICENSE
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to
// the following conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#import
"TFHppleElement.h"
static
NSString
*
const
TFHppleNodeContentKey
= @"nodeContent";
static
NSString
*
const
TFHppleNodeNameKey
= @"nodeName";
static
NSString
*
const
TFHppleNodeChildrenKey
= @"nodeChildArray";
static
NSString
*
const
TFHppleNodeAttributeArrayKey
= @"nodeAttributeArray";
static
NSString
*
const
TFHppleNodeAttributeNameKey
= @"attributeName";
@interface
TFHppleElement ()
@property (nonatomic, retain, readwrite)
TFHppleElement
*parent;
@end
@implementation
TFHppleElement
@synthesize parent;
- (void) dealloc
{
[node release];
[parent release];
[super dealloc];
}
- (id) initWithNode:(NSDictionary
*) theNode
{
if (!(self
= [super init]))
return nil;
[theNode retain];
node
= theNode;
return self;
}
+ (TFHppleElement
*) hppleElementWithNode:(NSDictionary
*) theNode {
return [[[[self
class] alloc] initWithNode:theNode] autorelease];
}
#pragma mark
-
- (NSString
*) content
{
return [node objectForKey:TFHppleNodeContentKey];
}
- (NSString
*) tagName
{
return [node objectForKey:TFHppleNodeNameKey];
}
- (NSArray
*) children
{
NSMutableArray
*children
= [NSMutableArray array];
for (NSDictionary
*child
in [node objectForKey:TFHppleNodeChildrenKey]) {
TFHppleElement
*element
= [TFHppleElement hppleElementWithNode:child];
element.parent
= self;
[children addObject:element];
}
return children;
}
- (TFHppleElement
*) firstChild
{
NSArray
* children
= self.children;
if (children.count)
return [children objectAtIndex:0];
return nil;
}
- (NSDictionary
*) attributes
{
NSMutableDictionary
* translatedAttributes
= [NSMutableDictionary dictionary];
for (NSDictionary
* attributeDict
in [node objectForKey:TFHppleNodeAttributeArrayKey]) {
[translatedAttributes setObject:[attributeDict objectForKey:TFHppleNodeContentKey]
forKey:[attributeDict objectForKey:TFHppleNodeAttributeNameKey]];
}
return translatedAttributes;
}
- (NSString
*) objectForKey:(NSString
*) theKey
{
return [[self attributes] objectForKey:theKey];
}
- (id) description
{
return [node description];
}
@end
相关文章推荐
- ios应用间通信和分享数据的机制
- 苹果开发 笔记(58)AFNetworking 类图
- C#语言开发iOS 应用程序
- IOS--CocoaPods的安装和使用
- IOS学习之委托和block
- IOS学习之一个示例弄懂代理(delegate)和协议
- 自定义iOS7导航栏背景,标题和返回按钮文字颜色
- ios开发之----扫描二维码、条形码
- iOS 支持多任务、断点下载(图片、音频、视频) (Demo 一)
- ios开发小记
- 8.7日星期五ios开发群
- iOS 定时器(发送短信验证)
- 隐藏iOS导航条底部与self.view的分界线的简单方法
- iOS开发 - mac下svn客户端的使用
- iOS7下获取内付费的receipt
- iOS开发之判断手机号
- iOS开发常见问题(不断更新)
- 霓歌即时通讯中的相关专利整理(六)
- ios controller嵌套
- iOS正则表达式验证等各种