您的位置:首页 > 运维架构 > 网站架构

php 实现从其他网站拷贝的富文本内容并将里面的图片抓取到本地

2014-06-13 15:39 609 查看
</pre><pre code_snippet_id="391050" snippet_file_name="blog_20140613_1_8092354" name="code" class="php">
<p><pre name="code" class="php">public function saveImgFromList($content, $dist, $url)
{
$list = $this->getImgByReg($content);
$accessUrl  = $this->getServiceLocator()->get('access_upload_message');

foreach ($list as $key => $val) {
if ( strpos($val['src'], $accessUrl) !== false ) {
$arr = explode('/', $val['src']);
$name = array_pop($arr);
$list[$key]['src'] = $name;
continue;
}
$arr = explode('.', $val['src']);
$ext = array_pop($arr);
if (!$ext || !in_array($ext, self::$imgExt)) {
$ext = 'jpg';
}
$name = md5(uniqid()) . '.' . $ext;
$list[$key]['src'] = $name;

$file = file_get_contents($val['src']);
file_put_contents($dist . $name, $file);
}

$newImgInfo = $this->replaceImg($list, $url);
$newImgTags = $newImgInfo['newImgTags'];
$newImgUrls = $newImgInfo['newImgUrls'];

$patterns = array('/<img\s.*?>/');
$callback = function( $matches ) use ( &$newImgTags ) {
$matches[0] = array_shift($newImgTags);
return $matches[0];
};

$res = array();
$res['content'] = preg_replace_callback($patterns, $callback, $content);
$res['image_urls'] = $newImgUrls;

return $res;
}


<span style="white-space:pre"> </span>function getImgByReg($str)

{$list = array();$c1 = preg_match_all('/<img\s.*?>/', $str, $m1);for($i = 0; $i < $c1; $i++) {$c2 = preg_match_all('/(\w+)\s*=\s*(?:(?:(["\'])(.*?)(?=\2))|([^\/\s]*))/', $m1[0][$i], $m2); for($j = 0; $j < $c2; $j++) {$list[$i][$m2[1][$j]] = !empty($m2[4][$j])
? $m2[4][$j] : $m2[3][$j];}}return $list;}

<span style="white-space:pre">	</span>function replaceImg($list, $url)
{
$newImgTags = array();
$newImgUrls = array();

foreach ($list as $key => $val) {
$imgTag = '<img ';
foreach ($val as $attr => $v) {
if ($attr === 'src') {
$imgTag .= $attr . '="' . $url . $v . '" ';
$newImgUrls[] = $url . $v;
} else {
$imgTag .= $attr . '="' . $v . '" ';
}
}
$imgTag .= ' >';

$newImgTags[$key] = $imgTag;
}

return array('newImgTags' => $newImgTags, 'newImgUrls' => $newImgUrls);
}


// 模拟使用
</pre><pre code_snippet_id="391050" snippet_file_name="blog_20140613_5_3752909" name="code" class="php">//你想要保存图片的目录
$dist = '/User/www/img/' . date('/Y/m/d');
!is_dir($dist) && mkdir($dist, 0777, true);
</pre><pre code_snippet_id="391050" snippet_file_name="blog_20140613_9_3033752" name="code" class="php">define('<span style="font-family: Arial, Helvetica, sans-serif;">URLHOLDER', </span><span style="font-family: Arial, Helvetica, sans-serif;">'{{urlholer}}');</span><span style="font-family: Arial, Helvetica, sans-serif;">
</span>// 你的图片服务器或目录地址
$url = URLHOLDER . '/img/' . date('/Y/m/');
<p class="p1"><span class="s1">// 这是模拟你需要替换的用户提交的富文本内容 里面包含图片地址</span></p><p class="p1"><span class="s1">$content = </span>'<p>Push Pop Pressè‡´åŠ›äºŽåˆ›é€ ä¸€ä¸ªé€¼çœŸçš„ã€å……æ»¡ç‰©ç†æ•ˆå<img class="mm" src="http://cms.csdnimg.cn/article/201406/04/538edd87c1b25.jpg">º”的体验。POP就是在这个理念下催生出来的新一代成果。</p><p>POP使用Objective-C++编写。Objective-C++是对C++的扩展,就像Objective-C是Cçš„æ‰©å±•ä¸€æ ·ã€‚è€Œè‡³äºŽä¸ºä»€ä¹ˆä»–ä»¬ç”¨Objective-C++而不是纯粹的Objective-Cï¼ŒåŽŸå› åœ¨äºŽä»–ä»¬æ›´å–œæ¬¢Objective-C++的语法特性所提供的便利。</p><p><strong>POP的架构</strong></p><p>POP目前由四个部分组成(如图1所示),即Animations、Engine、Utility、WebCore。</p><p>图1  POP架构图</p><p><img src="http://cms.csdnimg.cn/article/201406/04/538edd54d9240.jpg" />POP动画极为流畅,其秘密就在于这个引擎中的POPAnimator。POP通过CADisplayLink让动画实现了60 FPSçš„æµç•…æ•ˆæžœï¼Œæ‰“é€ äº†ä¸€ä¸ªæ¸¸æˆçº§çš„åŠ¨ç”»å¼•æ“Žã€‚</p><p>CADisplayLink是类似NSTimer的定时器,不同之处在于,NSTimer用于我们定义任务的执</p>'<span class="s1">;</span></p>
$res = saveImgFromList($content, $dist, $url);
$param = array();
</pre><pre code_snippet_id="391050" snippet_file_name="blog_20140613_13_5760940" name="code" class="php">// 你想要的内容
$param['content'] = $res['content'];
// 内容里面图片url组成的数组
$param['image_urls'] = $res['image_urls'];


这时你的内容就可以入库了
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: