您的位置：首页 > 理论基础 > 计算机网络

面向基础系列之---Java网络编程---网络连接组件的使用（URL与URI） 3ff0

2018-09-12 17:31 851 查看

一、URI与URL的定义与区别

URI：统一资源标识符（Uniform Resource Identifier）是采用一种特定的语法标识一个资源的字符串

URL：是一个URI，除了能标识一个资源，还会为资源提供一个特定的网络位置，客户端可以用它来获取资源

1、URI组成

a、基本格式如下：

模式:模式特定部分

b、一些模式列举：

data：包含Base64编码的数据

file：本地磁盘上的文件

ftp：FTP服务器

http：使用超文本传输协议的国际互联网服务

mailto：电子邮件地址

magnet：可以通过对等网络下载的资源（例如BT种子）

telnet：telnet服务器

urn：统一资源名

2、URI编码

编码，对于这个章节来说，是一个重头戏。往往一个链接地址，会被编码问题搞得晕头转型，所以要特别重视

a、模式部分组成：

字母

数字

加号（+）

点（.）

连号（-）

b、模式特定部分组成：

ASCII字母数字

-、_、.、!和~

定界符：/、?、&和=

其他非ASCII字符，都要用%号进行转义编码

”/“与”@“如果不是用于特定的部分（定界符）都要进行转义编码

3、URL组成

http://my.oschina.net/ubw/blog/Protocol:/userInfo @host:port/path?query#fragment

按照上面的罗列，顺序说就是：

协议（protocol）

用户信息（userInfo）

URL主机（host）

端口（port）

路径（path）

查询（query）

片段（fragment）

4、理解相对URL

这里主要理解两种变现形式的相对URL定位，绝对路径是https://my.oschina.net/UBW/blog/2046461情况下

a、第一种导航定位：

<a href="javafaq.html">

结果是：https://my.oschina.net/UBW/blog/javafaq.html

b、第二种导航定位

<a href="/projects/ipv6/">

结果是：https://my.oschina.net/projects/ipv6/

二、URL类

java.net.URL类是对统一资源定位符的抽象：

final类型，直接继承Object类，线程安全

内部使用了策略设计模式，根据不同策略处理不同协议

既能标识资源，又能获取资源

1、创建URL

public URL(String spec) throws MalformedURLException//①
public URL(String protocol, String hostname, String file) throws MalformedURLException//②
public URL(String protocol, String host, int port, String file) throws MalformedURLException//③
public URL(URL base, String relative) throws MalformedURLException//④

a、公共的一些点：

如果所在的JVM不支持某种协议，创建URL的时候会抛出MalformedURLException异常

Java不对对它构造的URL完成任何正确性检查与合法性检查

b、判断一个协议是否被支持的代码片段：

public static void testProtocol(String url){
try{
URL url = new URL(url);
System.out.prinln("supported");
}catch (MalformedURLException e){
System.out.prinln("not supported");
}
}

c、②与③构造的情况

②的构造器没有port，所以默认设置成-1，-1的port被解析成默认协议的端口，例如http就是80

file这个参数包含：路径、文件名、可选的片段标识符

file是一个全路径，如果不小心写成不是/开头的相对路径，会抛错MalformedURLException

d、④的使用

try{
URL u1 = new URL("https://my.oschina.net/UBW/blog/2046461");
URL u2 = new URL(ur, "mailinglists.html");
/**
u2包装的url是：https://my.oschina.net/UBW/blog/mailinglists.html
*/
} catch(MalformedURLException e){
System.out.prinln(e);
}

2、通过URL获取地址数据

这个小结是重点的重点，主要获取地址数据的方法，为后面的服务器构建的基石。

public final InputStream openStream() throws java.io.IOException //①
public URLConnection openConnection() throws java.io.IOException //②
public final Object getContent() throws java.io.IOException //③
public final Object getContent(Class[] classes) throws java.io.IOException //④

a、①方法：简单的获取原始内容，不包括任何http的header与协议其他信息

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;

public class NetworkMain {

public static void main(String[] args) {
try {
URL u = new URL("http://www.baidu.com");
InputStream inputStream = u.openStream();
int c;
while((c = inputStream.read()) != -1){
System.out.print((char) c);
}
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

这种方式是最简单的方式，也是用处最少的，因为只能获取简单的文本方式的内容，对二进制的文件内容就无济于事了。并且还获取不到header，这个对于识别文件的编码，是很不友好的。

b、②方法：出了原始内容之外，可以访问这个协议指定的所有元数据和header

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.util.List;
import java.util.Map;

public class NetworkMain {

public static void main(String[] args) {
try {
URL u = new URL("http://www.baidu.com");
URLConnection urlConnection = u.openConnection();
Map<String, List<String>> headerFields = urlConnection.getHeaderFields();
// 这里就是获取header的方式
for(Map.Entry<String, List<String>> entry : headerFields.entrySet()){
System.out.println(entry.getKey()+":"+entry.getValue().toString());
}
// 获取的输入流其实也是内容
InputStream inputStream = urlConnection.getInputStream();
int c;
while((c = inputStream.read()) != -1){
System.out.print((char) c);
}
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

c、③方法：返回一个Object对象，根据源的不同能向下转型成不同的对象

例如图像对象java.awt.image.ImageProducer

例如音频对象sun.applet.AppletAudioClip

例如我们熟悉的对象java.io.InputStream

这个方法会读取header里面的Content-type字段，来进行类型识别。如果不识别的对象，会返回一个java.io.InputStream对象来代替

d、④方法：类似于③，不过会更具传入的class对象返回所希望的对象

URL u = new URL("http://www.nwu.org");
Class<?>[] types = new Class[3];
types[0] = String.class;
types[1] = Reader.class;
types[2] = InputStream.class;
Object o = u.getContent(types);

3、获取URL各个部分

下面写一个完整的URL的地址：

http ://admin:123456@www.ibiblio.org:8080/java/books/jnp/index.html/isbn=123423423#toc

模式（http）

授权机构：用户信息+主机名+端口（admin:123456@www.ibiblio.org:8080）

路径（/java/books/jnp/index.html）

片段标识符（toc）

查询字符串（isbn=123423423）

public String getProtocol();//协议信息(例如：http、https)
public String getHost();//主机名
public int getPort();//端口，没指定的话返回-1
public int getDefaultPort();//协议的默认端口，协议没有默认端口的话返回-1，即使设定了端口，也不影响这个的返回
public String getFile();//路径+文件名(例如上面url中的：/java/books/jnp/index.html)
public String getPath();//路径(例如上面url中的：/java/books/jnp/)
public String getRef();//片段(例如上面例子中的toc)
public String getQuery();//查询字符串例如(isbn=123423423)
public String getUserInfo();//用户信息，例如(admin:123456)
public String getAuthority();//授权机构

4、URL相等性比较

equal()方法会阻塞，进行DNS查询，主机一样，将会判定相等

当且仅当两个URL有相同的主机、端口和路径上相同的资源，而且有相同的片段和查询，才相等

URL没有实现Comparable接口

http://www.url.com:80 不等于http://www.url.com

http://www.url.com/index.html 不等于http://www.url.com

sameFile()不考虑片段标识符

URL的equal是先根据字面进行层级性判定，如果字面都有着同样的层级且不相等，才会进行DNS解析

/***
* equals源码解析：不难，几乎都是字符串的判断，就一处是使用了InetAdress
*/

//java.net.URLStreamHandler
protected boolean equals(URL u1, URL u2) {
String ref1 = u1.getRef();
String ref2 = u2.getRef();
return (ref1 == ref2 || (ref1 != null && ref1.equals(ref2))) &&
sameFile(u1, u2);
}

protected boolean sameFile(URL u1, URL u2) {
// Compare the protocols.
if (!((u1.getProtocol() == u2.getProtocol()) ||
(u1.getProtocol() != null &&
u1.getProtocol().equalsIgnoreCase(u2.getProtocol()))))
return false;

// Compare the files.
if (!(u1.getFile() == u2.getFile() ||
(u1.getFile() != null && u1.getFile().equals(u2.getFile()))))
return false;

// Compare the ports.
int port1, port2;
port1 = (u1.getPort() != -1) ? u1.getPort() : u1.handler.getDefaultPort();
port2 = (u2.getPort() != -1) ? u2.getPort() : u2.handler.getDefaultPort();
if (port1 != port2)
return false;

// Compare the hosts.
// hostsEqual会调用InetAdress的getByName()，进行DNS解析
if (!hostsEqual(u1, u2))
return false;

return true;
}

protected boolean hostsEqual(URL u1, URL u2) {
InetAddress a1 = getHostAddress(u1);
InetAddress a2 = getHostAddress(u2);
// if we have internet address for both, compare them
if (a1 != null && a2 != null) {
return a1.equals(a2);
// else, if both have host names, compare them
} else if (u1.getHost() != null && u2.getHost() != null)
return u1.getHost().equalsIgnoreCase(u2.getHost());
else
return u1.getHost() == null && u2.getHost() == null;
}

protected synchronized InetAddress getHostAddress(URL u) {
if (u.hostAddress != null)
return u.hostAddress;

String host = u.getHost();
if (host == null || host.equals("")) {
return null;
} else {
try {
// 这里进行了DNS查询操作
u.hostAddress = InetAddress.getByName(host);
} catch (UnknownHostException ex) {
return null;
} catch (SecurityException se) {
return null;
}
}
return u.hostAddress;
}

三、URI类

列出一些关键点吧：

URI完全有关于资源的标识和URI解析。

URI不能解析目标地址的数据

如果单纯进行地址字符串的比较，最好使用URI，因为URL的比较方法有可能进行DNS查询，而RUI不会

URI有toURL()方法，同样的URL也有toURI()方法

1、创建URI

public URI(String uri) throws URISyntaxException;//①
public URI(String scheme, String schemeSpecificPart, String fragment)
throws URISyntaxException;//②
public URI(String scheme, String host, String path, String fragment)
throws 	URISyntaxException;//③
public URI(String scheme, String authority, String path, String query, String fragment) 	throws URISyntaxExceptio
3ff8
n;//④
public URI(String scheme, String userInfo, String host, int port, String path,
String query, String fragment) throws URISyntaxException;//⑤

URI不依赖于底层的协议处理器，所以只要语法上面正确，就可以，所以可以创建类似于：tel:、urn:等这种

如果URI以冒号开头，说白了就是语法不正确，就会抛出URISyntaxException

scheme是协议也是模式的意思，必须由ASCII字母、数字以及三个标点符号（+、-和.）组成，并且必须以字母开头

模式scheme字段可以为null，如果为null那么就是一个相对的URI

模式scheme字段不为null的时候，并且path参数不以/开头，那么就会抛出URISyntaxException异常

上面各个构造方法都是对各个参数进行合法性检查的。如果确定URI是一个合法的URI，可以直接使用

public static URI create(String str)

进行创建

2、URI各部分

这里有个不透明和层次的概念，很难理解

a、透明与不透明（opaque：不透明的）

下面是JDK源码：

/**
* Tells whether or not this URI is opaque.
*
* <p> A URI is opaque if, and only if, it is absolute and its
* scheme-specific part does not begin with a slash character ('/').
* An opaque URI has a scheme, a scheme-specific part, and possibly
* a fragment; all other components are undefined. </p>
*
* @return  {@code true} if, and only if, this URI is opaque
*/
public boolean isOpaque() {
return path == null;
}

解释下方法官方说明：一个URI如果要是不透明，当且仅当他是一个绝对路径且他的模式特定部分不是/开头。一个不透明的URI会有模式，模式头顶部分，和可能存在片段。其他各个部分都是未定义的。

如果一个URI是透明的，就是我们经常使用的层次形式的URI，当然URL是透明的

其他的各种URI都是不透明的，isOpaque都返回true

public static void main(String[] args) {
try {
URI opaque = new URI("mail:jicheng@163.com");
System.out.println(opaque.isOpaque());//true
URI notOpaque = new URI("mail:/jicheng@163.com");
System.out.println(notOpaque.isOpaque());//false
} catch (URISyntaxException e) {
e.printStackTrace();
}
}

b、获取各个部分

//下面方法获取到的是解码的字符串
public String getAuthority();
public String getFragment();
public String getHost();
public String getPath();
public int getPort();
public String getQuery();
public String getUserInfo();
//下面方法获取到的是未解码的字符串，就是带%那种
public String getRawAuthority();
public String getRawFragment();
public String getRawPath();
public String getRawQuery();
public String getRawUserInfo();

不透明的URI只能获取到getScheme、getSchemeSpecificPart、getFragment

c、URI相等性比较

几点要注意的：

比较模式与授权机构不区分大小写，其余部分区分

转义字符比较前不解码

hashCode与相等性是一致行为

下面是比较点，顺序向下进行判断：

如果模式不同就比价模式，不区分大小写

模式相等，一般认为透明（层次）的URI小于有相同模式的不透明的URI

如果都是不透明的URI，则根据模式特定部分进行比较

如果不透明切模式特定部分都相等，就根据片段进行比较

如果两个URI都是透明的，则根据授权机构进行排序，其中主机比较是不区分大小写

如果模式和授权机构都相等，使用路径来排序

如果路径也相等，就比较查询字符串

如果查询字符串也相等，就比较片段

public static void main(String[] args) {
try {
URI uri1= new URI("maIL://jicheng:jicheng@wwW.JICHENg.com");
URI uri2= new URI("mail://jicheng:jicheng@www.jicheng.com");
System.out.println(uri1.equals(uri2));//true
} catch (URISyntaxException e) {
e.printStackTrace();
}
}

四、URL的编码与解码

对于URL字符串的编解码是业界的一个频率非常高的操作。平时互联网的相关开发，尤其要注意这个。

1、URL字符的组成

大写字母

小写字母

数字

标点：-、_、.、!、~、*、'、,

字符：/、&、?、@、#、;、$、+、=、%出现在路径或者查询字符串中要进行编码

2、URLEncoder

默认情况下使用这个进行编码，会对路径分隔符：/、&、=、:进行编码的，这不是我们想要的，我们只要对查询字符串进行编码就行：

public static void main(String[] args) {
String url = "https://www.jicheng.com/search?";
try {
url+=URLEncoder.encode("h1","UTF-8");
url+="=";
url+=URLEncoder.encode("I/O","UTF-8");
System.out.println(url);//https://www.jicheng.com/search?h1=I%2FO
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}

3、URLDecoder

这个可以全部字符串进行解码操作，不用担心局部被影响的问题了

public static void main(String[] args) {
String url = "//https://www.jicheng.com/search?h1=I%2FO";
try {
String decode = URLDecoder.decode(url, "UTF-8");
System.out.println(decode);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： Get File Java Opaque Entry

相关文章推荐

新的分享

章节导航