您的位置:首页 > 运维架构 > 网站架构

由于最近网站内容需要更新的还是满多的,于是想开发一个采集系统。收集了一下资料。

2005-11-22 15:39 1016 查看
1<%@ Page language="C#" Trace="True" %>
2<%@ Import Namespace="System.Net" %>
3<%@ Import Namespace="System.IO" %>
4
5<html>
6<head>
7
8
9<SCRIPT runat="server">
10</SCRIPT>
43</head>
44
45<body>
46
47 <form method="post" runat="server">
48
49 <asp:Label runat=server ID="lblHTML" Rows="30" Cols="80" EnableViewState="false" Wrap="True"></asp:Label>
50 </form>
51
52</body>
53</html>
54

1using System;
2using System.Collections;
3using System.ComponentModel;
4using System.Data;
5using System.Drawing;
6using System.Web;
7using System.Web.SessionState;
8using System.Web.UI;
9using System.Web.UI.WebControls;
10using System.Web.UI.HtmlControls;
11using System.Text;
12using System.IO;
13using System.Net;
14
15namespace myclass.test
16
1用这个方法提取,两个参数,start_string是搜索开始的标识,end_string是搜索结束的标识
2在程序中,这两个参数最好是英文字母,如果是汉字的话就需要转换一下,比如:
3byte[] startCN = System.Text.Encoding.Default.GetBytes("这里写开始标记");
4string startUTF8 = System.Text.Encoding.UTF8.GetString(startCN);
5
6
7public string Get_Data(string start_string,string end_string)
8
1string PageUrl = string.Format("http://pachong.cn");
2WebClient wc = new WebClient();
3wc.Credentials = CredentialCache.DefaultCredentials;
4Byte[] pageData = wc.DownloadData(PageUrl);
5string result = Encoding.Default.GetString(pageData);
6wc.Dispose();

1GetPageHTML.aspx
2<%@ Page language="c#" validateRequest = "false" Codebehind="GetPageHtml.aspx.cs"
3 AutoEventWireup="false" Inherits="eMeng.Exam.GetPageHtml" %>
4<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
5<HTML>
6 <HEAD>
7 <title>得到网页源代码</title>
8 <meta name="GENERATOR" Content="Microsoft Visual Studio 7.0">
9 <meta name="CODE_LANGUAGE" Content="C#">
10 <meta name="vs_defaultClientScript" content="JavaScript">
11 <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
12 </HEAD>
13 <body MS_POSITIONING="GridLayout">
14 <form id="aspNetBuffer" method="post" runat="server">
15 <div align="center" style="FONT-WEIGHT: bold">得到任意网页源代码</div>
16 <asp:TextBox id="UrlText" runat="server" Width="400px">http://dotnet.aspx.cc/content.aspx
17 </asp:TextBox>
18 <asp:Button id="WebClientButton" Runat="server" Text="用WebClient得到"></asp:Button>
19 <asp:Button id="WebRequestButton" runat="server" Text="用WebRequest得到"></asp:Button>
20 <br>
21 <asp:TextBox id="ContentHtml" runat="server" Width="100%" Height="360px" TextMode="MultiLine">
22 </asp:TextBox>
23 </form>
24 </body>
25</HTML>
26
27

1using System;
2using System.Collections;
3using System.ComponentModel;
4using System.Data;
5using System.Drawing;
6using System.Web;
7using System.Web.SessionState;
8using System.Web.UI;
9using System.Web.UI.WebControls;
10using System.Web.UI.HtmlControls;
11using System.IO;
12using System.Net;
13using System.Text;
14using System.Text.RegularExpressions;
15namespace eMeng.Exam
16
104

1这里是针对一些利用 isa server proxy 上网的.
2修改下 WebRequest 方法:
3PageUrl = UrlText.Text;
4WebRequest request = WebRequest.Create(PageUrl);
5
6WebProxy myProxy=new WebProxy();
7myProxy = (WebProxy)request.Proxy;
8
9myProxy.Address = new Uri("http://代理服务器:端口");
10myProxy.Credentials = new NetworkCredential("用户名", "密码", "域名");
11request.Proxy = myProxy;
12
13WebResponse response = request.GetResponse();
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐