您的位置:首页 > 运维架构 > 网站架构

网站CSS背景图片抓取工具

2010-03-25 14:20 399 查看
工具介绍:
  1)分析网页,获取页面图片。
  2)分析网页引用CSS文件,获取背景图片。
  3)批量下载。
要点:
  1)正则

    LINK_PATTERN:获取页面所有连接

    BACKGROUND_IMAGE_PATTERN:获取CSS中背景图片地址
    CHECK_URL_PATTERN :检测URL是否有效

代码

/// <summary>
/// 根据网站URL获取CSS
/// 分析CSS获取背景图片地址
/// </summary>
/// <param name="url"></param>
/// <returns></returns>
protected List<Uri> FetchCSSWithSite(string url)
{
StringBuilder sourceCSS = new StringBuilder();
List<Uri> list = new List<Uri>();
using (WebClient client = new WebClient())
{
_basicUri = new Uri(url);
string sourceHtml = client.DownloadString(_basicUri);
sourceCSS.Append(sourceHtml);
Regex regex = new Regex(LINK_PATTERN, RegexOptions.IgnoreCase);
MatchCollection collection = regex.Matches(sourceHtml);
if (collection == null) return null;
string extension = string.Empty;
string link = string.Empty;
foreach (Match match in collection)
{
link=match.Groups["link"].Value;
lvLog.Items.Add(new ListViewItem(new string[] { new Uri(_basicUri, link).AbsoluteUri, DateTime.Now.ToString(TIME_FORMAT), STATUS_ANALYSIS, string.Empty, link.Contains(".") ? link.Substring(link.LastIndexOf('.')) : string.Empty }));

if (!link.Contains(".")) continue;
extension = link.Substring(link.LastIndexOf('.'));
switch (extension.ToUpper())
{
case ".CSS":
sourceCSS.Append(client.DownloadString(new Uri(_basicUri, link)));
break;
case ".GIF":
case ".PNG":
case ".JPG":
case ".JPEG":
list.Add(new Uri(_basicUri, link));
break;
default:
break;
}
}
}
list.AddRange(FetchBGImageUrlsWithCSS(sourceCSS.ToString()));

return list;
}

  4)效果图



下载地址:/Files/olartan/BID.zip
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: