A C# Web Crawler
Here is a C# Web Crawler that could help you download web pages and attachments from one web-site. Such kind of crawler used vastly among some online stores during their beginning stage. They called them network spider, robot, crawler or radar. They were trained well to retrieve only some kind of information that not like Google, Baidu does. They service their own small and special group of customizers.
The logic behind the web crawler is very easy: given by one url, the crawler will download this page, and parse and find all Urls. It repeat the whole process again for all those new found urls until all Urls were visited. The full source code could be download from here.
Something need to fix
This crawler could not work well when come across some web sites. I could access those web sites from the FireFox or other web browser. But it seems this crawler could not access those web server, error ‘<404> Not Found 'always report. If any one have some well work solutions, please let me know, thanks! Maybe I could use external tools to download the web content instead of using native C# web API, may be I could use ‘wget’or something similar.
Reference
http://www.digitalcoding.com/Code-Snippets/C-Sharp/C-Code-Snippet-Download-HTML-Web-Page.html
转载于:https://www.cnblogs.com/open-coder/archive/2013/01/21/2870468.html
- A C# Web Crawler
- 笨笨图片批量下载器[C# | WinForm | 正则表达式 | HttpWebRequest]
- HOW TO: 将使用 Visual C#.NET 序列化为 Web 服务器控件
- 使用C#的HttpWebRequest模拟登陆访问人人网
- C# WEBSOCKET FLECK 调用非托管C++ DLL 实现通信(使用CHAR*接收)
- C# 配置错误定义了重复的“system.web.extensions/scripting/scriptResourceHandler”节
- Java后台调用C#WebService之Axis实现
- indexing and compressing problem in web scrawler
- C#使用WebConfigurationManager类修改Web.config文件
- c# web gridview导出到excel
- C#后台WebMethod方法中调用Server.MapPath方法
- C# webrequest 抓取数据时,多个域Cookie的问题
- 【GoLang笔记】A Tour of Go - Exercise: Web Crawler
- c# web gridview checkbox 应用
- .Net/C#: 实现支持断点续传多线程下载的 Http Web 客户端工具类 (C# DIY HttpWebClient)
- [分享]C#读取Web.config文件
- c#高性能在WEB端产生验证图片
- C# WinFrom(CS)程序调用 WEB 站点,获取响应内容
- c# webapi POST 参数解决方法
- C#实现web信息自动抓取