您的位置：首页 > 理论基础 > 计算机网络

asp通过xmlhttp抓取网页内容不完整

2008-05-23 21:26 405 查看

自己写的友情链接批量检查工具 http://www.linkhelper.cn，一直用得挺好，不过最近有人说，输入他的网址后，工具提示找不到友情链接。其中一个用户提出的网址为：http://www.nipei.com/index.php
于是按照惯例，在本地进行调试。发现确实找不到友情链接，仔细看看通过xmlhttp返回的内容，到了某个地方后，后面部分竟然丢失了，仔细检查了xmlhttp获取网页的方法，也看了二进制转换成字符串的函数，可是一直没能找到这个问题的原因。百度和google也搜索了，不过都没有类似的问题，真是郁闷到了极点。
写了个最简单的代码如下
<%
url="http://www.nipei.com/index.php"
Set Http=server.createobject("msxml2.serverxmlhttp.3.0")

Http.setTimeouts 10000, 10000, 10000, 10000
Http.open "GET",url,False
Http.Send()

If Http.Readystate<>4 Then

Else

If Http.status=200 Then
response.write BytesToBstr(http.responseBody,"gb2312")
End If
End If

Function BytesToBstr(Body,Cset)
Dim Objstream
Set Objstream = Server.CreateObject("adodb.stream")
objstream.Type = 1
objstream.Mode =3
objstream.Open
objstream.Write body
objstream.Position = 0
objstream.Type = 2
objstream.Charset = Cset
BytesToBstr = objstream.ReadText
objstream.Close
set objstream = nothing
End Function
%>

不管怎么修改这段代码，得到的内容总是不完整。大家可以把这个段代码拷过去调试一下。我也试过其他组件，如asphttp,inet等组件，都不能得到整个页面的内容。
最后只能到程序员论坛CSDN求救，终于得到答案，原来这个是adodb.stream的一个BUG，因为其中包含了chr(0)，导致认为文件已经结束，而导致后面的内容不会获取，从而导致xmlhttp获得的文件内容不完整。修改该代码，将chr(0)替换成""后，程序运行正常，能抓取到页面的整个内容。修改后的代码为
<%
url="http://www.nipei.com/index.php"
Set Http=server.createobject("msxml2.serverxmlhttp.3.0")

Http.setTimeouts 10000, 10000, 10000, 10000
Http.open "GET",url,False
Http.Send()

If Http.Readystate<>4 Then

Else

If Http.status=200 Then
response.write replace(BytesToBstr(http.responseBody,"gb2312"),chr(10),"")
End If
End If

Function BytesToBstr(Body,Cset)
Dim Objstream
Set Objstream = Server.CreateObject("adodb.stream")
objstream.Type = 1
objstream.Mode =3
objstream.Open
objstream.Write body
objstream.Position = 0
objstream.Type = 2
objstream.Charset = Cset
BytesToBstr = objstream.ReadText
objstream.Close
set objstream = nothing
End Function
%>

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航