您的位置:首页 > 编程语言 > Java开发

JAVA泛型学习笔记

2015-12-13 18:23 453 查看
最近深入学习JAVA,抽时间开始看官方的tutorial。好不容易看到Generics这一章,发现真的很多坑等着我去跳。其实,我早已踩过雷,如今终于有机会深入地反向思考,去尝试理解其中的原理。

所以,我将换一种思路,谈谈泛型这个坑爹货。

1.为啥子要用泛型

官方文档的解释如下:

1)Stronger type checks at compile time.

2)Elimination of casts.

3)Enabling programmers to implement generic algorithms.

也就是,泛型的引入,不仅仅是增强灵活性,如实现泛型算法。还有更重要的一点,它也在增强编译型语言JAVA类型检查的作用!即编译检查更为严格。

所以,有松有紧,正是泛型的本质,而不是一味的强调Generalization。很多莫名其妙的error和warning很大部分比例是源自stronger type checks。

2.基本概念的明确

这里的基本概念,主要是type parameter和type argument.


Type Parameter and Type Argument Terminology: Many developers use the terms "type parameter" and "type argument" interchangeably, but these terms are not the same. When coding, one provides type arguments in order to create a parameterized type. Therefore, the Tin Foo<T> is a type parameter and the String in Foo<String> f is a type argument. This lesson observes this definition when using these terms.


这里非常明确地指出了两者的区别,Foo<T>中的T是parameter,而Foo<String>中的String则是argument。

一定要区别这两个专业术语,因为在后面的讲解中,会严格的按照这里的定义来。

3.有限类型参数(bounded type parameters)

泛型的参数除了简单的用单一参数以外,还引入了upper bounded类型,例如:

public class NaturalNumber<Textends Integer>
public static <Textends Comparable<T>> int countGreaterThan(T[] anArray, Telem)


需要注意的是,这里没有lower bounded的类型哦!至于为什么,可以看这个讨论:
http://stackoverflow.com/questions/4902723/why-cant-a-java-type-parameter-have-a-lower-bound
4.泛型的继承关系

大家都知道,父类变量可以引用子类对象。因而在函数中,可以将参数声明为父类,而实际传递参数为子类对象。这是多态的重要表现之一。

泛型的继承比较有意思,有一些特有的规律。下面两张图可以很好的说明其中的关系。







5.通配符(wildcard)的诞生

上述继承关系并不能很好的解决诸如Box<Number>和Box<Integer>的父类问题。只是以Object作为父类其范围太大,起不到编译时更加严格的类型检查约束的作用。所以引入了通配符。通配符分upper bounded wildcard(? extends someClass) , lower bounded wildcard(? super someClass) 和 unbounded wildcard(?)三类。具体关系如下图所示:





其具体的定义,也要好好琢磨,关键部分我加粗斜体表示。


In generic code, the question mark (?), called the wildcard, represents an unknown type. The wildcard can be used in a variety of situations: as the type of a parameter, field, or local variable; sometimes as a return type (though it is better programming practice to be more specific). The wildcard is never used as a type argument for a generic method invocation, a generic class instance creation, or a supertype.


也就是说,在泛型方法调用,泛型类实例创建,父类型的时候,wildcard是一种type parameter而不是type argument。

那么,是不是wildcard就不能作为type argument呢?答案是可以。下面的代码,是ok的。

List<? extends Integer> intList = new ArrayList<>();
List<? extends Number>  numList = intList;


通配符先暂时放在这里,后面的坑大多与此有关,到时候我们回来再讲。

6.类型擦除(type erasure)

java编译器在编译过程中,会自动擦除掉类型信息,进行某种程度的替换。同时在运行时期,JVM是不知道泛型的存在的,虚拟机是无法区别List<string>和List<Number>的差异的。

类型擦除是如何做的呢?文档有如下说明:


Generics were introduced to the Java language to provide tighter type checks at compile time and to support generic programming. To implement generics, the Java compiler applies type erasure to:

Replace all type parameters in generic types with their bounds or Object if the type parameters are unbounded. The produced bytecode, therefore, contains only ordinary classes, interfaces, and methods.

Insert type casts if necessary to preserve type safety.

Generate bridge methods to preserve polymorphism in extended generic types.

Type erasure ensures that no new classes are created for parameterized types; consequently, generics incur no runtime overhead.


这里用例子解释一下,类似地,有以下转换(均来自于官方文档)

public static <T> int count(T[] anArray, Telem)    ----->    public static int count(Object[] anArray, Object elem)
public static <Textends Shape> void draw(Tshape) { /* ... */ } ------>     public static void draw(Shape shape) { /* ... */ }


读到这里,发现类型擦除似乎只涉及到了标准的类型参数T以及bounded type parameters,而没有看到wildcard类型。

那么wildcard类型和标准的之间有什么区别吗?使用的时候需要注意什么?两者的type erase处理方式是否一样呢?

①类的定义

class A <Textends B>{}   //正确
class A <? extends B>{}  //错误


②对象初始化

A<? extends B> a = new A<>(); //正确
A<? extends B> a = new A<? extends B>(); //错误
A<Textends B> a = new A<>(); //错误
A<Textends B> a = new A<Textends B>(); //错误


③使用wildcard守则

其实规则的细节只有编译器的作者最为清楚,有时候,我们在去深究编译器到底会怎么处理一些看上去比较诡异的代码的时候,我们本身就应该尽量保证不去写这些诡异的代码。

写出简洁而又合乎常理的代码,才是我们应该做的。下面是应用wildcard的注意事项


An "In" Variable

An "in" variable serves up data to the code. Imagine a copy method with two arguments: copy(src, dest). The src argument provides the data to be copied, so it is the "in" parameter.

An "Out" VariableAn "out" variable holds data for use elsewhere. In the copy example, copy(src, dest), the dest argument accepts data, so it is the "out" parameter.

Wildcard Guidelines:

An "in" variable is defined with an upper bounded wildcard, using the extends keyword.

An "out" variable is defined with a lower bounded wildcard, using the super keyword.

In the case where the "in" variable can be accessed using methods defined in the Object class, use an unbounded wildcard.

In the case where the code needs to access the variable as both an "in" and an "out" variable, do not use a wildcard.

These guidelines do not apply to a method's return type. Using a wildcard as a return type should be avoided because it forces programmers using the code to deal with wildcards.


根据上述规范,我们可以总结出一点比较通俗的规律,那就是wildcard类型一般都是作为一个引用变量的声明 (A<? extends B> a = new A<>())或者作为某个函数的形参的声明,而不是用于类的定义和实例化。类的定义,用普通参数T(class A <Textends B>{}),对象实例化,要么diamond,让编译器自动infer,(如 A<? extends B> a = new A<>())或者明确指出是哪个类(A<? extends Number> a = new A<Integer>())。

④type erase如何处理?

上文说过一般情况下的处理方式,那么遇到wildcard会如何处理呢?

解答之前,先引入两个概念


A reifiable type is a type whose type information is fully available at runtime. This includes primitives, non-generic types, raw types, and invocations of unbound wildcards.

For example
String
,
Integer
, etc. A reifiable type essentially has the same type information at compile-time as it has at run-time.

Non-reifiable types are types where information has been removed at compile-time by type erasure — invocations of generic types that are not defined as unbounded wildcards. A non-reifiable type does not have all of its information available at runtime. Examples of non-reifiable types are List<String> and List<Number>; the JVM cannot tell the difference between these types at runtime. As shown in Restrictions on Generics, there are certain situations where non-reifiable types cannot be used: in an instanceof expression, for example, or as an element in an array.

For example
List<String>
,
List<T>
, and
T
. Non-reifiable types have less type information at run-time that at compile time. In fact, the run-time types of the above are
List
,
List
, and
Object
. During compilation, the generic type information is erased.


我们重点关注两种情况的type erase,一种是List<? extends Number> 、List<?>带有wildcard。还有一种是不带有通配符的。

不带通配符的上文已经说过,这里不在累述。那带有通配符呢。比如说List<?>会转化为List<object>,进而默认为List么?先看官方文档的资料吧:


It's important to note that List<Object> and List<?> are not the same. You can insert an Object, or any subtype of Object, into a List<Object>. But you can only insert null into a List<?>


这里有一个解答:
http://stackoverflow.com/questions/31583697/how-type-erasure-works-for-wildcard-in-java

The only difference between
? extends Number
and
Textends Number
is that in the second case, if
T
is encountered again, it should denote the same type. So
void add(List<? extends Number> first, List<? extends Number> second)
can be called with
add(new List<Double>(), new List<Long>())
but
<T> void add(List<Textends Number> first, List<Textends Number> second)
can not.

Wildcards only differ from named type parameters at compile time as the compiler will try to enforce that types using the same named parameter are indeed the same.

?
and
T
have different uses.
Think
T
for generic Type (Classes, Interfaces) creation - which can then be referred to anywhere in the type.

Think
?
as a way of limiting what types you can legally invoke a method with at Compile time.


答案其实很明显了。不管是不是通配符,到最后运行时是没有差别的,真正的差别就在于编译时期的检查规则有所不同。

回到上面一个问题:比如说List<?>会转化为List<object>,进而默认为List么?

答案是,某种程度上,可以看做?与object类似,某种程度上,又不一样。。。如何不一样呢?

首先,从结果上而言,到运行时,都是List,没什么差别。

但是,在编译过程中,其具体的操作访问规则就不太一样了。

官方文档的例子是这样的:

import java.util.List;

public class WildcardError {

void foo(List<?> i) {
i.set(0, i.get(0));
}
}

//In this example, the compiler processes the i input parameter as being of type Object. When the foo method invokes List.set(int, E), the compiler is not able to confirm the type of object that is being inserted into the list, and an error is produced.


官方解释的不算太详细,stackoverflow上有一个答案很好:http://stackoverflow.com/questions/12043874/java-generics-wildcard-capture-misunderstanding


The compiler doesn't know anything about the type of elements in
List<?> i
, by definition of
?
. Wildcard does not mean "any type;" it means "some unknown type."

the compiler can only know – at compile time, remember – that
i.get(0)
returns an
Object
, which is the upper bound of
?
. But there's no guarantee that
?
is at runtime
Object
, so there is no way for the compiler to know that
i.set(0, i.get(0))
is a safe call. It's like writing this:

List<Foo> fooz = /* init */;
Object foo = fooz.get(0);
fooz.set(0, foo); // won't compile because foo is an object, not a Foo

--------------------------------------------------------------------------------------------------------------------------------------------

Put differently, why does the compiler not know that the two usages of the wildcard type
List<?>
in

i.set(0, i.get(0));

refer to the same actual type?

Well, that would require the compiler to know that
i
contains the same instance for both evaluations of the expression. Since
i
isn't even final, the compiler would have to check whether
i
could possibly have been assigned in between evaluating the two expressions. Such an analysis is only simple for local variables (for who knows whether an invoked method will update a particular field of a particular object?). This is quite a bit of additional complexity in the compiler for rarely manifesting benefits. I suppose that's why the designers of the Java programming language kept things simple by specifying that different uses of the same wildcard type have different captures.


这个答案似乎有些争议,涉及到编译器内部实现的偏好问题。那么,我就折中的方式,谈谈我的思考。

在泛型编程中,有一个技术叫做type infer,即类型推断(http://docs.oracle.com/javase/tutorial/java/generics/genTypeInference.html)

类型推断的一个很重要的前提,就是必须要有一方,明确指出类型参数的具体类型。

在List<?>的情况下,调用get时,取出的类型为object,这时候,你可以“看作”是List<object>。其实,编译器也不知道,取出的到底是个什么玩意,那就只能用object暂时引用了。但是在修改,即set的时候,编译器是需要根据类型参数严格检查插入元素,而目前List是一个unknown类型,与任何其他的参数类型(这里为object)都无法匹配,即便是取出来了又放进去,编译器还是不敢做出该元素存取前后仍旧具备一致性的保证。所以索性简单点,编译器说,老子无法完成类型推断,故通不过检查,报错~

为了解决这个问题,采用以下方法,显式指出参数T,强制指出前后匹配性,这样就完成了编译检查。

public class WildcardFixed {
void foo(List<?> i) {
fooHelper(i);
}

// Helper method created so that the wildcard can be captured
// through type inference.
private <T> void fooHelper(List<T> l) {
l.set(0, l.get(0));
}
}


由此看来,T和?还是有区别的,正如前文所说,可以使得T在后续内部再次使用,保证了一致性(same)。

7、为何不能实例化一个类型参数对象 例如 E e = new E()

因为编译器根本就不知道E到底是个什么类型,如何谈初始化?可是你会说,不是都被type erase为object了么。。

错,注意定义,type erase针对的是type parameter。什么是type parameter,只有类似于foo<T>中的T才是。

换句话说,类型擦除的是各种类似于声明后用于引用的变量,而不是实例化对象。

即便允许擦除,例如改为Object e = new Object();这样的转换也实在是没有意义呀==|||

讨论了这么多,我自己脑袋也大了。真的,自己技术实力有限,摸着石头过河的感觉很不好受。可能我也没有最终找到其根本的原理。

乱之余,让我也更加深刻的理解了上文说的一句话,我们在去深究编译器到底会怎么处理一些看上去比较诡异的代码的时候,我们本身就应该尽量保证不去写这些诡异的代码。

wildcard的使用准则,业界也早已形成共识。



If you are writing a method that can be implemented using functionality provided in the Object class.

When the code is using methods in the generic class that don't depend on the type parameter. For example,List.size or List.clear. In fact, Class<?> is so often used because most of the methods in Class<T> do not depend on T.

This, in Java, implies a read-only nature, namely, we are allowed to read items from the generic structure, but we are not allowed to put anything back in it, because we cannot be certain of the actual type of the elements in it.

Wildcard Guidelines:

An "in" variable is defined with an upper bounded wildcard, using the extends keyword.

An "out" variable is defined with a lower bounded wildcard, using the super keyword.

In the case where the "in" variable can be accessed using methods defined in the Object class, use an unbounded wildcard.

In the case where the code needs to access the variable as both an "in" and an "out" variable, do not use a wildcard.

These guidelines do not apply to a method's return type. Using a wildcard as a return type should be avoided because it forces programmers using the code to deal with wildcards.



上述guildline在effective java里面也有探讨,那就是PECS原则,也叫Producer Extends, Consumer super

下面论述的问题和上面讲的通配符?的用法有类似之处。


Wildcards

As we've seen in the previous post, the subtyping relation of generic types is invariant. Sometimes, though, we'd like to use generic types in the same way we can use ordinary types:

Narrowing a reference (covariance).

Widening a reference (contravariance)

Covariance

Let's suppose, for example, that we've got a set of boxes, each one of a different kind of fruit. We'd like to be able to write methods that could accept a any of them. More formally, given a subtype A of a type B, we'd like to find a way to use a reference (or a method parameter) of type C<B> that could accept instances of C<A>.

To accomplish this task we can use a wildcard with extends, such as in the following example:

List<Apple> apples = new ArrayList<Apple>();
List<? extends Fruit> fruits = apples;

? extends reintroduces covariant subtyping for generics types: Apple is a subtype of Fruit andList<Apple> is a subtype of List<? extends Fruit>.

Contravariance

Let's now introduce another wildcard: ? super. Given a supertype B of a type A, then C<B> is a subtype ofC<? super A>:

List<Fruit> fruits = new ArrayList<Fruit>();
List<? super Apple> = fruits;

How Can Wildcards Be Used?

Enough theory for now: how can we take advantage of these new constructs?

? extends

Let's go back to the example we used in Part II when introducing Java array covariance:

Apple[] apples = new Apple[1];
Fruit[] fruits = apples;
fruits[0] = new Strawberry();

As we saw, this code compiles but results in a runtime exception when trying to add a Strawberry to anApple array through a reference to a Fruit array.

Now we can use wildcards to translate this code to its generic counterpart: since Apple is a subtype of Fruit, we will use the ? extends wildcard to be able to assign a reference of a List<Apple> to a reference of aList<? extends Fruit> :

List<Apple> apples = new ArrayList<Apple>();
List<? extends Fruit> fruits = apples;
fruits.add(new Strawberry());

This time, the code won't compile! The Java compiler now prevents us to add a strawberry to a list of fruits. We will detect the error at compile time and we won't even need any runtime check (such as in the case of array stores) to ensure that we're adding to the list a compatible type. The code won't compile even if we try to add a Fruit instance into the list:

fruits.add(new Fruit());

No way. It comes out that, indeed, you can't put anything into a structure whose type uses the ? extendswildcard.

The reason is pretty simple, if we think about it: the ? extends Twildcard tells the compiler that we're dealing with a subtype of the type T, but we cannot know which one. Since there's no way to tell, and we need to guarantee type safety, you won't be allowed to put anything inside such a structure. On the other hand, since we know that whichever type it might be, it will be a subtype of T, we can get data out of the structure with the guarantee that it will be a Tinstance:

Fruit get = fruits.get(0);

? super

What's the behavior of a type that's using the ? super wildcard? Let's start with this:

List<Fruit> fruits = new ArrayList<Fruit>();
List<? super Apple> = fruits;

We know that fruits is a reference to a List of something that is a supertype of Apple. Again, we cannot know which supertype it is, but we know that Apple and any of its subtypes will be assignment compatible with it. Indeed, since such an unknown type will be both an Apple and a GreenApple supertype, we can write:

fruits.add(new Apple());

fruits.add(new GreenApple());

If we try to add whichever Apple supertype, the compiler will complain:

fruits.add(new Fruit());
fruits.add(new Object());

Since we cannot know which supertype it is, we aren't allowed to add instances of any.

What about getting data out of such a type? It turns out that you the only thing you can get out of it will be Object instances: since we cannot know which supertype it is, the compiler can only guarantee that it will be a reference to an Object, since Object is the supertype of any Java type.

The Get and Put Principle or the PECS Rule

Summarizing the behavior of the ? extends and the ? super wildcards, we draw the following conclusion:

Use the ? extends wildcard if you need to retrieve object from a data structure.
Use the ? super wildcard if you need to put objects in a data structure.
If you need to do both things, don't use any wildcard.

This is what Maurice Naftalin calls The Get and Put Principle in his Java Generics and Collections and what Joshua Bloch calls The PECS Rule in his Effective Java.

Bloch's mnemonic, PECS, comes from "Producer Extends, Consumer Super" and is probably easier to remember and use.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: