您的位置:首页 > 运维架构

Why should i use url.openStream instead of of url.getContent?

2014-10-18 14:09 190 查看
I would like to retrieve the content of a url. Similar to pythons:

html_content = urllib.urlopen("http://www.test.com/test.html").read()


In examples( java2s.com ) you see very often the following code:

URL url = new URL("http://www.test.com/test.html");
String foo = (String) url.getContent();


The Description of getContent is the following:

Gets the contents of this URL. This method is a shorthand for: openConnection().getContent()
Returns: the contents of this URL.


In my opinion that should work perfectly fine. Buuut obviously this code doesnt work, because it raises an error:

Exception in thread "main" java.lang.ClassCastException: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast to java.lang.String


Obviously it returns an inputStream.

So i ask myself: what's the purpose of this function which isn't doing what it is seems to do? And why is no hint for quirks it in the documentation? And why did i saw it in several examples?

Or am i getting this wrong?

The suggested solution (stackoverflow) is to use url.openStream() and then read the Stream.

As you said, documentation says that
URL.getContent()
is a shortcut for
openConnection().getContent()
so we need to look at the documentation for
URLConnection.getContent()
.

We can see that this returns an
Object
the type of which is determined by the the
content-type
header field of the response. This type determines the
ContentHandler
that will be used. So a
ContentHandler
converts data based on its MIME type to the appropriate class of Java Object.

In other words the type of Object you get will depend on the content served. For example, it wouldn't make sense to return a
String
if the MIME type was
image/png
.

This is why in the example code you link to at java2s.com they check the class of the returned Object:

try {
URL u = new URL("http://www.java2s.com");
Object o = u.getContent();
System.out.println("I got a " + o.getClass().getName());
} catch (Exception ex) {
System.err.println(ex);
}


So you can say
String foo = (String) url.getContent();
if you know your
ContentHandler
will return a
String
.

There are default content handlers defined in the
sun.net.www.content
package but as you can see they are returning streams for you.

You could create your own
ContentHandler
that does return a
String
but it will probably be easier just to read the Stream as you suggest.

        URL url = new URL("http://www.so.com");
URLConnection.setContentHandlerFactory(new ContentHandlerFactory() {
@Override
public ContentHandlercreateContentHandler(String mimetype) {
return new ContentHandler() {

@Override
public Object getContent(URLConnection urlc) throws IOException {
InputStream input = urlc.getInputStream();
StringBuffer stringBuffer = new StringBuffer();
byte[] bytes = new byte[1024];
while(input.read() != -1){
input.read(bytes);
stringBuffer.append(new String(bytes));

}
return stringBuffer.toString();
}
};
}
});
String str = (String)url.getContent();
System.out.println(str);
/*
byte[] bytes = new byte[1024];
InputStream input = (InputStream)url.getContent();
StringBuffer stringBuffer = new StringBuffer();
while(input.read() != -1){
input.read(bytes);
stringBuffer.append(new String(bytes));

}

System.out.println(stringBuffer.toString());
*/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: