您的位置:首页 > 编程语言 > PHP开发

抓取网页信息PHP

2017-06-14 14:39 183 查看
<?php
/* header("content-type:text/html;charset='utf-8'");
set_time_limit(0);
$url="http://china.lottedfs.com/handler/ProductDetail-Start?productId=10000039734";
$str=file_get_contents($url);
$str=mb_convert_encoding($str,"utf-8","GBK");
print_r($str);*/

function getPage ($url) {
$useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36';
$timeout= 120;
$dir            = dirname(__FILE__);
$cookie_file    = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch, CURLOPT_ENCODING, "" );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_AUTOREFERER, true );
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com/');
$content = curl_exec($ch);
if(curl_errno($ch))
{
echo 'error:' . curl_error($ch);
}
else
{
return $content;
}
curl_close($ch);

}

$url="http://china.lottedfs.com/handler/ProductDetail-Start?productId=06042903112&viewCategoryId=5000110003&tracking=";
$getdata=getPage($url);
var_dump($getdata);
$preg='/<meta property="rb:itemName" content="+([\s\S]*?)||(.*?)+"\/>/';
//$preg='/\<span[\s]*id\=\"id_product_nm\"\>([\s\S]*?)\<\/span\>/sim';
$preg='/\<meta[\s]*property\=\"rb:itemName\"[\s]*content\=\"(.*?)\" \/\>/sim';
$str=preg_match($preg,$getdata, $matches);
print_r($matches);
?>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: