您的位置:首页 > 其它

When and how to use a ThreadLocal

2015-03-12 13:24 411 查看
As our readers might already have guessed, I deal with memory leaks on a daily basis. A particular type of the OutOfMemoryError messages
has recently started catching my attention – the issues triggered by misused ThreadLocals have become more and more frequent. Looking at the causes for such leakages, I am starting to believe that more than half of those are caused by developers who either
have no clue what they are doing or who are trying to apply a solution to the problems which it is not meant to solve.

Instead of grinding my teeth, I decided to open up the topic by publishing two articles, first of which you are currently reading. In the post I explain the motivation behindThreadLocal usage.
In the second post currently in progress I will open up theThreadLocal bonnet
and look at the implementation.

Let us start with an imaginary scenario in which ThreadLocal usage
is indeed reasonable. For this, say hello to our hypothetical developer, named Tim. Tim is developing a webapp, in which there is a lot of localized content. For example a user from California would expect to be greeted with date formatted using a familiar MM/dd/yy pattern,
one from Estonia on the other hand would like to see a date formatted according to dd.MM.yyyy. So Tim starts writing code like this:

1
public
String
formatCurrentDate() {
2
DateFormat
df =
new
SimpleDateFormat(
"MM/dd/yy"
);
3
return
df.format(
new
Date());
4
}
5
6
public
String
formatFirstOfJanyary1970() {
7
DateFormat
df =
new
SimpleDateFormat(
"MM/dd/yy"
);
8
return
df.format(
new
Date(
0
));
9
}
After a while, Tim finds this to be boring and against good practices – the application code is polluted with such initializations. So he makes a seemingly reasonable move by extracting the DateFormat to
an instance variable. After making the move, his code now looks like the following:

1
private
DateFormat
df =
new
SimpleDateFormat(
"MM/dd/yy"
);
2
3
public
String
formatCurrentDate() {
4
return
df.format(
new
Date());
5
}
6
7
public
String
formatFirstOfJanyary1970() {
8
return
df.format(
new
Date(
0
));
9
}
Happy with the refactoring results, Tim tosses an imaginary high five to himself, pushes the change to the repository and walks home. Few days later the users start complaining – some of them seem to get completely garbled strings instead of the former nicely
formatted dates.

Investigating the issue Tim discovers that the DateFormat implementation
is not thread safe. Meaning that in the scenario above, if two threads simultaneously use the formatCurrentDate() and formatFirstOfJanyary1970() methods, there is a chance that the state gets mangled and displayed result could be messed up.
So Tim fixes the issue by limiting the access to the methods to make sure one thread at a time is entering at the formatting functionality. Now his code looks like the following:

1
private
DateFormat
df =
new
SimpleDateFormat(
"MM/dd/yy"
);
2
3
public
synchronized
String
formatCurrentDate() {
4
return
df.format(
new
Date());
5
}
6
7
public
synchronized
String
formatFirstOfJanyary1970() {
8
return
df.format(
new
Date(
0
));
9
}
After giving himself another virtual high five, Tim commits the change and goes to a long-overdue vacation. Only to start receiving phone calls next day complaining that the throughput of the application has dramatically fallen. Digging into the issue he finds
out that synchronizing the access has created an unexpected bottleneck in the application. Instead of entering the formatting sections as they pleased, threads now have to wait behind one another.

Reading further about the issue Tim discovers a different type of variables called ThreadLocal.
These variables differ from their normal counterparts in that each thread that accesses one (via ThreadLocal’s get or set method) has its own, independently initialized copy of the variable. Happy with the newly discovered concept, Tim once again rewrites
the code:

01
public
static
ThreadLocal
df =
new
ThreadLocal()
{
02
protected
DateFormat
initialValue() {
03
return
new
SimpleDateFormat(
"MM/dd/yy"
);
04
}
05
};
06
07
public
String
formatCurrentDate() {
08
return
df.get().format(
new
Date());
09
}
10
11
public
String
formatFirstOfJanyary1970() {
12
return
df.get().format(
new
Date(
0
));
13
}
Going through a process like this, Tim has through painful lessons learned a powerful concept. Applied like in the last example, the result serves as a good example about the benefits.

But the newly-found concept is a dangerous one. If Tim had used one of the application classes instead of the JDK bundled DateFormat classes loaded by the bootstrap classloader, we are already in the danger zone. Just forgetting to remove it after the task
at hand is completed, a copy of that Object will remain with the Thread, which tends to belong to a thread pool. Since lifespan of the pooled Threadsurpasses that of the application, it will prevent the object and thus a ClassLoader being responsible for
loading the application from being garbage collected. And we have created a leak, which has a chance to surface in a good old java.lang.OutOfMemoryError: PermGen space form

Another way to start abusing the concept is via using the ThreadLocal as a hack for getting a global context within your application. Going down this rabbit hole is a sure way to mangle your application code with all kind of unimaginary dependencies coupling
your whole code base into an unmaintainable mess.

This is a follow-up to my last week post, where I explained the motivation behind ThreadLocal usage.
From the post we could recall that ThreadLocal is indeed a cool concept if you wish to have an independently initialized copy of a variable for each thread. Now, the curious ones might have already started asking “how could I implement such a concept
in Java”?

Or you might feel that it will not be interesting topic – after all, all you need in here is a Map, isn’t it? When dealing with a ThreadLocal<T> it seems to make all the sense in the world to implement the solution as HashMap<Thread,T> withThread.currentThread() as
the key. Actually it is not that simple. So if you have five minutes, bear with me and I will guide you through a beautiful design concept.

First obvious problem with the simple HashMap solution is the
thread-safety. As HashMap is not built to support concurrent usage, we cannot safely use the implementation in the multi-threaded environment. Fortunately we do not need to look far for the fix – theConcurrentHashMap<Thread, T> looks like
a match made in heaven. Full concurrency of retrievals and adjustable expected concurrency for updates is exactly what we need in the first place.

Now, if you would apply a solution based on the ConcurrentHashMap to the ThreadLocal implementation in the JDK source you would have introduced two serious problems.

First and foremost, you are having Threads as keys in the Map structure. As the map is never garbage collected, you end up keeping a reference to the Threadforever, blocking the thread from being
GCd. Unwillingly you have created a massive memory leak in the design.
Second problem might take longer to surface, but even with the clever segmentation under the hood reducing the chance of lock contention, ConcurrentHashMap still bears a synchronization overhead. With the synchronization
requirement still in place you still have a structure which is a potential source for the bottleneck.

But let us start solving the biggest issue first. Our data structure needs to allow threads to be garbage collected if our reference is the last one pointing to a thread in question. Again, the first possible solution is staring right at us – instead of our
usual references to the object, why not use WeakReferences instead?
So the implementation would now look similar to the following:

1
Collections.synchronizedMap(
new
WeakHashMap<Thread,
T>())
Now we have gotten rid of the leakage issue – if nobody besides us is referring to the Thread, it can be finalized and garbage collected. But we still have not sorted out the concurrency issues. The solution to this is now really a sample about thinking
outside of the box. So far we have thought about the ThreadLocal variables as Threads mapping to the variables. But what if we reverse the thinking and instead envision a solution as a mapping of ThreadLocal objects to values in
each Thread? If each thread stores the mapping, andThreadLocal is just an interface into that mapping, we can avoid the synchronization issues. Better yet, we are also escaping the problems with GC!

And indeed, when we open up the source code of ThreadLocal and Threadclasses we
see that this is exactly how the solution is actually implemented in JDK:

1
public
class
Thread
implements
Runnable
{
2
ThreadLocal.ThreadLocalMap
threadLocals =
null
;
3
//
cut for brevity
4
}
01
public
class
ThreadLocal<T>
{
02
static
class
ThreadLocalMap
{
03
//
cut for brevity
04
}
05
06
ThreadLocalMap
getMap(Threadt) {
07
return
t.threadLocals;
08
}
09
10
public
T
get() {
11
Thread
t = Thread.currentThread();
12
ThreadLocalMap
map = getMap(t);
13
if
(map
!=
null
)
{
14
ThreadLocalMap.Entry
e = map.getEntry(
this
);
15
if
(e
!=
null
)
16
return
(T)
e.value;
17
}
18
return
setInitialValue();
19
}
20
21
private
T
setInitialValue() {
22
T
value = initialValue();
23
Thread
t = Thread.currentThread();
24
ThreadLocalMap
map = getMap(t);
25
if
(map
!=
null
)
26
map.set(
this
,
value);
27
else
28
createMap(t,
value);
29
return
value;
30
}
31
//
cut for brevity
32
}
So here we have it. Threadclass keeps a reference to a ThreadLocal.ThreadLocalMap instance, which is built using weak references to the keys. Building the structure in a reverse manner we have avoided thread contention issues altogether as
our ThreadLocal can only access the value in the current thread. Also, when the Threadhas finished the work, the map can garbage collected, so we have also avoided the memory leak issue.

I hope you felt enlightened when looking into the design, as it is indeed an elegant solution to a complex problem. I do feel that reading source code is a perfect way to learn about new concepts. And if you are a Java developer – what could be a better place
to get the knowledge than reading Joshua Bloch and Doug
Lea source code integrated to the JDK?
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐