您的位置:首页 > 编程语言 > PHP开发

PHP抓取网页内容,获取链接绝对路径和图片绝对路径

2013-05-28 09:08 736 查看
抓取网页内容方法:

$ch = @curl_init($url);
@curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$text = @curl_exec($ch);
@curl_close($ch);
$text=relative_to_absolute($text,$url);


相对路径转绝对路径方法:

function relative_to_absolute($content, $feed_url) {
preg_match('/(http|https|ftp):\/\//', $feed_url, $protocol);
$server_url = preg_replace("/(http|https|ftp|news):\/\//", "", $feed_url);
$server_url = preg_replace("/\/.*/", "", $server_url);

if ($server_url == '') {
return $content;
}

if (isset($protocol[0])) {
$new_content = preg_replace('/href="\//', 'href="'.$protocol[0].$server_url.'/', $content);
$new_content = preg_replace('/src="\//', 'src="'.$protocol[0].$server_url.'/', $new_content);
} else {
$new_content = $content;
}
return $new_content;
}


获取所有超链接方法:

function get_links($content) {
$pattern = '/<a(.*?)href="(.*?)"(.*?)>(.*?)<\/a>/i';
preg_match_all($pattern, $content, $m);
$re=array_unique($m[2]);
$i=0;
foreach ($re as $key => $value)
{
$regex = "(http|https|ftp|telnet|news)";
if((!empty($value)||strlen($value)>0)&&preg_match($regex,$value))
$output[$i++]=$value;
}
return  $output;
}


获取所有图片链接方法:

function get_pic($str)
{
$imgs=array();
preg_match_all("/((http|https|ftp|telnet|news):\/\/[a-z0-9\/\-_+=.~!%@?#%&;:$\\()|]+\.(jpg|gif|png|bmp|swf|rar|zip))/isU",$str,$imgs);
return array_unique($imgs[0]);;
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: