您的位置:首页 > 编程语言 > ASP

ASP.NET 抓取网页

2015-11-13 09:33 633 查看
protected void GetHtml(string url,int pageSize)
{
int pagesize = Convert.ToInt32(txtPageSize.Text.Trim());//获取到总共有多少页
WebClient wc = new WebClient();
wc.Encoding = Encoding.Default;
for (int i = 1; i <= pagesize; i++)
{
url = url.Trim() + "?pn=";
if (string.IsNullOrEmpty(url))
{
return;
}
url += i;
string html = wc.DownloadString(url);//获取到当前页的html内容

//MatchCollection mc = Regex.Matches(html, @"\w+((-w+)|(\.\w+))*\@[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]+");
MatchCollection mc = Regex.Matches(html, @"[1-9][0-9]{4,11}@(qq|QQ).com");

StringBuilder sb = new StringBuilder();
foreach (Match m in mc)
{
sb.AppendLine(m.Value + ";");
}                string s = sb.ToString();
//File.AppendAllText(@"h:\1.txt", s, Encoding.Default);

}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: