您的位置：首页 > 其它

netty源码解析(4.0)-28 ByteBuf内存池:PooledByteBufAllocator-把一切组装起来

2019-11-12 14:15 441 查看

　　PooledByteBufAllocator负责初始化PoolArena(PA)和PoolThreadCache(PTC)。它提供了一系列的接口，用来创建使用堆内存或直接内存的PooledByteBuf对象，这些接口只是一张皮，内部完全使用了PA和PTC的能力。初始化过程分两个步骤，首先初始化一系列的默认参数，然后初始化PTC对象和PA数组。

默认参数和它们的值

　　DEFAULT_PAGE_SIZE: PoolChunk中的page的大小-pageSize, 使用-Dio.netty.allocator.pageSize设置, 默认值:8192。

　　DEFAULT_MAX_ORDER: PoolChunk中二叉树的高度: maxOrder, 使用-Dio.netty.allocator.maxOrder设置，默认值:11。

　　DEFAULT_NUM_HEAP_ARENA: 使用堆内存的PA数组的长度，使用-Dio.netty.allocator.numHeapArenas设置，默认值: CPU核心数 * 2。

　　DEFAULT_NUM_DIRECT_ARENA: 使用直接内存的PA数组的长度，使用-Dio.netty.allocator.numHeapArenas设置，默认值: CPU核心数 * 2。

　　DEFAULT_TINY_CACHE_SIZE: PTC对象中每个用来缓存Tiny内存的MemoryRegionCache对象中queue的长度，使用-Dio.netty.allocator.tinyCacheSize设置，默认值:512。

　　DEFAULT_SMALL_CACHE_SIZE: PTC对象中每个用来缓存Small内存的MemoryRegionCache对象中queue的长度，使用-Dio.netty.allocator.smallCacheSize设置，默认值:256。

　　DEFAULT_NORMAL_CACHE_SIZE: PTC对象中每个用来缓存Normal内存的MemoryRegionCache对象中queue的长度，使用-Dio.netty.allocator.normalCacheSize设置，默认值:64。

　　DEFAULT_MAX_CACHED_BUFFER_CAPACITY: PTC对象中缓存Normal内存的大小上限。使用-Dio.netty.allocator.maxCachedBufferCapacity设置，默认值32 * 1024。

　　DEFAULT_CACHE_TRIM_INTERVAL: PTC对象中释放缓存的内存阈值。当PTC分配内存次数大于这个值时会释放缓存的内存。使用-Dio.netty.allocator.cacheTrimInterval设置，默认值:8192。

　　DEFAULT_USE_CACHE_FOR_ALL_THREADS: 是否对所有的线程使用缓存。使用-Dio.netty.allocator.useCacheForAllThreads设置，默认值:true。

　　DEFAULT_DIRECT_MEMORY_CACHE_ALIGNMENT: 直接内存的对齐参数，分配直接内存的大小必须是它的整数倍。使用-Dio.netty.allocator.directMemoryCacheAlignment设置，默认值：0, 表示不对齐。

初始化PoolArena数组

　　PooledByteBufAllocator维护了两个数组:

PoolArena<byte[]>[] heapArenas;
PoolArena<ByteBuffer>[] directArenas;

　　heapArenas用来管理堆内存，directArenas用来管理直接内存。这两个数组在构造方法中初始化，构造方法的定义是：

public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder,
int tinyCacheSize, int smallCacheSize, int normalCacheSize,
boolean useCacheForAllThreads, int directMemoryCacheAlignment)

　　prefreDirect: 创建PooledByteBuf时，是否优先使用直接内存。

　　nHeapArena: 默认使用DEFAULT_NUM_HEAP_ARENA。

　　nDirectArena: 默认使用DEFAULT_NUM_DIRECT_ARENA。

　　pageSize: 默认使用的DEFAULT_PAGE_SIZE。

　　maxOrder: 默认使用DEFAULT_MAX_ORDER。

　　tinyCacheSize: 默认使用DEFAULT_TINY_CACHE_SIZE。

　　smallCacheSize: 默认使用DEFAULT_SMALL_CACHE_SIZE。

　　normalCacheSize: 默认使用DEFAULT_NORMAL_CACHE_SIZE。

　　useCacheForAllThreads: 默认使用DEFAULT_USE_CACHE_FOR_ALL_THREADS。

　　directMemoryCacheAlignment: 默认使用DEFAULT_DIRECT_MEMORY_CACHE_ALIGNMENT。

　　这两数组的初始化代码如下:

　　int pageShifts = validateAndCalculatePageShifts(pageSize);

if (nHeapArena > 0) {
heapArenas = newArenaArray(nHeapArena);
List<PoolArenaMetric> metrics = new ArrayList<PoolArenaMetric>(heapArenas.length);
for (int i = 0; i < heapArenas.length; i ++) {
PoolArena.HeapArena arena = new PoolArena.HeapArena(this,
pageSize, maxOrder, pageShifts, chunkSize,
directMemoryCacheAlignment);
heapArenas[i] = arena;
metrics.add(arena);
}
heapArenaMetrics = Collections.unmodifiableList(metrics);
} else {
heapArenas = null;
heapArenaMetrics = Collections.emptyList();
}

if (nDirectArena > 0) {
directArenas = newArenaArray(nDirectArena);
List<PoolArenaMetric> metrics = new ArrayList<PoolArenaMetric>(directArenas.length);
for (int i = 0; i < directArenas.length; i ++) {
PoolArena.DirectArena arena = new PoolArena.DirectArena(
this, pageSize, maxOrder, pageShifts, chunkSize, directMemoryCacheAlignment);
directArenas[i] = arena;
metrics.add(arena);
}
directArenaMetrics = Collections.unmodifiableList(metrics);
} else {
directArenas = null;
directArenaMetrics = Collections.emptyList();
}

　　1行，计算pageShifts，算法是pageShifts = Integer.SIZE - 1 - Integer.numberOfLeadingZeros(pageSize) = 31 - Integer.numberOfLeadingZeros(pageSize)。 Integer.numberOfLeadingZeros(pageSize)是pageSize(32位整数)从最高位起连续是0的位数，因此pageShifts可以简化为pageShifts = log2(pageSize)。

　　4,20行，创建数组，new PoolArena[size]。　　

　　6-12，22-17行, 初始化数组中的PoolArena对象，分别使用PooArena的两个内部类: HeapArena, DirectArena。

初始化PoolThreadCache

　　PoolThreadCache使用PoolThreadLocalCache(PTLC)间接初始化，PTLC是PooledByteBufAllocator的内部内，它的定义如下:

final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache>

　　这个类派生自io.netty.util.concurrent.FastThreadLocal<T>, 和java.lang.ThreadLocal<T>功能一样，实现了线程本地存储(TLS)的功能，不同的是FastThreadLocal<T>优化了访问性能。PTLC覆盖了父类的initialValue方法，这个方法负责初始化线程本地的PoolThreadCache对象。当第一次调用PTLC对象的get方法时，这个方法会被调用。

@Override
protected synchronized PoolThreadCache initialValue() {
final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);

if (useCacheForAllThreads || Thread.currentThread() instanceof FastThreadLocalThread) {
return new PoolThreadCache(
heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,
DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
}
// No caching for non FastThreadLocalThreads.
return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);
}

　　3，4行，分别从headArenas，directArenas中取出一个使用次数最少的PoolArena对象。PoolArena有一个numThreadCaches属性，这个属性是AtomicInteger类型的原子变量。它的作用是在用来记录被PoolThreadCache对象使用的次数。PoolThreadCache对象创建时会在构造方法中会调用它的getAndIncrement方法，释放时在free0方法中调用他的getAndDecrement方法。

　　6行, 如果运行每个线程都使用缓存(userCacheForAllThreads==true)，或者当成线程对象是FastThreadLocalThread时, 在第8行创建一个线程专用的PTC对象。

PoolChunkList(PCKL)

关键属性

　　PoolChunkList<T> nextList

　　PoolChunkList<T> prevList

　　这两个属性表明PCKL对象是一个双向链表的节点。

　　PoolChunk<T> head

　　这个属性表明PCKL对象还维护的一个PCK类型的链表，head指向这个链表的头。

　　int minUsage;

　　int maxUsage;

　　int maxCapacity;

　　minUsage是PCK链表中每个PCK对象内存的最小使用率，maxUseage是PCK的最大使用率。这两个值是百分比，例如：minUsage=10, maxUse=50，表示PCK链表中只能保存使用率在[10%，50%)的PCK对象。 maxCapacity表示PCK最大可分配的内存数，算法是: maxCapacity = (int)(chunkSize * (100L - minUseage) / 100L)。

初始化PCKL链表

　　PCKL链表有PoolArena负责维护，在PoolArena的构造方法中初始化:

// io.netty.buffer.PoolArena#PoolArena(PooledByteBufAllocator parent, int pageSize,
//          int maxOrder, int pageShifts, int chunkSize, int cacheAlignment)

q100 = new PoolChunkList<T>(this, null, 100, Integer.MAX_VALUE, chunkSize);
q075 = new PoolChunkList<T>(this, q100, 75, 100, chunkSize);
q050 = new PoolChunkList<T>(this, q075, 50, 100, chunkSize);
q025 = new PoolChunkList<T>(this, q050, 25, 75, chunkSize);
q000 = new PoolChunkList<T>(this, q025, 1, 50, chunkSize);
qInit = new PoolChunkList<T>(this, q000, Integer.MIN_VALUE, 25, chunkSize);

q100.prevList(q075);
q075.prevList(q050);
q050.prevList(q025);
q025.prevList(q000);
q000.prevList(null);
qInit.prevList(qInit);

　　4-9行，初始化PCKL节点。每个节点的名字q{num}，其中num表示这个节点的最小使用率minUsage，如q075节点的minUsage=%75。

　　11-16行，把PCKL节点组装成一个链表。

　　使用q(minUsage, maxUsage)表示一个节点，那么:

　　qInit = q(Integer.MIN_VALUE, 25%)

　　q000 = q(1%, 50%)

　　q025 = q(25%, 75%)

　　q075 = q(75%, 100%)

　　q100 = q(100%, Integer.MAX_VALUE)

　　这个链表的结构如下图所示:

PoolChunk(PCK)在PoolChunkList(PCKL)中移动

　　一个新创建的PCK对象，它的内存使用率是usage=%0，被放进qInit节节点。每次从这个PCK对象中分配内存，都会导致它的使用率增加，当usage>=25%，即大于等于qInit的maxUsage时，会把它移动到q000中。继续从PCK对象中分配内存，它的usage继续增加，当usage大于等于它所属PCKL的maxUsage时，把它移动到PKCL链表中的下一个节点，直到q100为止。下面是内存分配导致PCK移动的代码：

//io.netty.buffer.PoolChunkList#allocate
boolean allocate(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
if (head == null || normCapacity > maxCapacity) {
// Either this PoolChunkList is empty or the requested capacity is larger then the capacity which can
// be handled by the PoolChunks that are contained in this PoolChunkList.
return false;
}

for (PoolChunk<T> cur = head;;) {
long handle = cur.allocate(normCapacity);
if (handle < 0) {
cur = cur.next;
if (cur == null) {
return false;
}
} else {
cur.initBuf(buf, handle, reqCapacity);
if (cur.usage() >= maxUsage) {
remove(cur);
nextList.add(cur);
}
return true;
}
}
}

　　9-12行，尝试从PCK链表中的所有PCK节点分配所需的内存。

　　14行，没有找到能分配内存的PCK节点。

　　17行，从cur节点分配到所需的内存，并初始化PooledByteBuf对象。

　　18-21行，如cur节点的使用率大于等于当前PCKL节点maxUsage，调用remove方法把cur从head链表中删除，然后调用PCKL链表中的下一个节点的add方法，把cur移动到下一个节点中。

　　如果持续地释放内存，把内存还给PCK对象，会导致usage持续减小，当usage小于它所属的PCKL的minUsage时，把它移动到PCKL链表中的前一个节点，直到q000位为止。当释放内存导致PCK对象的usage等于%0，会销毁这个PCK对象，释放整个chunk的内存。下面是释放内存导致PCK对象移动的代码:

//io.netty.buffer.PoolChunkList#free
boolean free(PoolChunk<T> chunk, long handle) {
chunk.free(handle);
if (chunk.usage() < minUsage) {
remove(chunk);
// Move the PoolChunk down the PoolChunkList linked-list.
return move0(chunk);
}
return true;
}

//io.netty.buffer.PoolChunkList#move0
private boolean move0(PoolChunk<T> chunk) {
if (prevList == null) {
// There is no previous PoolChunkList so return false which result in having the PoolChunk destroyed and
// all memory associated with the PoolChunk will be released.
assert chunk.usage() == 0;
return false;
}
return prevList.move(chunk);
}

　　第3行，释放内存，把内存返还给PCK对象。

　　4-7行，如PCK的使用率小于当前PCKL的minUsage，调用remove方法把PCK对象从当前PCKL对象中删除，然后调用move0方法把它移动到前一个PCKL节点。

　　13-31行，移动PCK到前一个PCKL。

完整的内存分配释放流程

内存分配

　　入口方法:

　　io.netty.buffer.AbstractByteBufAllocator#heapBuffer(int, int)，创建使用堆内存的ByteBuf, 调用newHeapBuffer方法。

　　io.netty.buffer.AbstractByteBufAllocator#directBuffer(int, int), 创建使用直接内存的ByteBuf, 调用newDirectBuffer方法。

　　具体实现:

　　io.netty.buffer.PooledByteBufAllocator#newHeapBuffer(int initialCapacity, int maxCapacity)。

　　io.netty.buffer.PooledByteBufAllocator#newDirectBuffer(int initialCapacity, int maxCapacity)。

　　这两个方法都是从PoolThreadCache对象中得到线程专用的PoolArena对象，然后调用PoolArena的allocate方法创建PoolByteBuf对象。

　　PoolArena入口方法:

　　io.netty.buffer.PoolArena#allocate(io.netty.buffer.PoolThreadCache, int, int)，这个方法是PoolArena分配内存，创建PoolByteBuf对象的入口方法。它先调用子类实现的newByteBuf创建一个PoolByteBuf对象，这个方法有两个实现：

　　io.netty.buffer.PoolArena.HeapArena#newByteBuf(int maxCapacity)，创建使用堆内存的PooledByteBuf对象。

　　io.netty.buffer.PoolArena.DirectArena#newByteBuf(int maxCapacity)，创建使用直接内存PooledByteBuf对象。

　　然后调用io.netty.buffer.PoolArena#allocate(io.netty.buffer.PoolThreadCache, io.netty.buffer.PooledByteBuf<T>, int)方法为PoolByteBuf对象分配内存，这个方法是分配内存的核心方法，下面来重点分析一下它的代码:

private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
final int normCapacity = normalizeCapacity(reqCapacity);
if (isTinyOrSmall(normCapacity)) { // capacity < pageSize
int tableIdx;
PoolSubpage<T>[] table;
boolean tiny = isTiny(normCapacity);
if (tiny) { // < 512
if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
// was able to allocate out of the cache so move on
return;
}
tableIdx = tinyIdx(normCapacity);
table = tinySubpagePools;
} else {
if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
// was able to allocate out of the cache so move on
return;
}
tableIdx = smallIdx(normCapacity);
table = smallSubpagePools;
}

final PoolSubpage<T> head = table[tableIdx];

/**
* Synchronize on the head. This is needed as {@link PoolChunk#allocateSubpage(int)} and
* {@link PoolChunk#free(long)} may modify the doubly linked list as well.
*/
synchronized (head) {
final PoolSubpage<T> s = head.next;
if (s != head) {
assert s.doNotDestroy && s.elemSize == normCapacity;
long handle = s.allocate();
assert handle >= 0;
s.chunk.initBufWithSubpage(buf, handle, reqCapacity);
incTinySmallAllocation(tiny);
return;
}
}
synchronized (this) {
allocateNormal(buf, reqCapacity, normCapacity);
}

incTinySmallAllocation(tiny);
return;
}
if (normCapacity <= chunkSize) {
if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
// was able to allocate out of the cache so move on
return;
}
synchronized (this) {
allocateNormal(buf, reqCapacity, normCapacity);
++allocationsNormal;
}
} else {
// Huge allocations are never served via the cache so just call allocateHuge
allocateHuge(buf, reqCapacity);
}
}

　　第2行，根据需要的内存大小reqCapacity，计算可以分配的标准内存大小normCapacity。必须满足(1)normCapacity>=reqCapacity, (2)normCapacity是directMemoryCacheAlignment的整数倍，此外，还要根据reqCapacity的大小分3中情况:

　　　　reqCapacity>=chunkSize：normCapacity取同时满足(1),(2)的最小值。

　　　　reqCapacity>=512且reqCapacity<chunkSize: (3)normCapacity>=512*2k, (4)normCapacity<=chunkSize，normCapacit取同时满足(1),(2),(3),(4)的最小值。

　　　　reqCapacity<412: (5)normCapacity<512, (6)normCapacity是16的整数倍，normCapacity取同时满足(1),(2),(5),(6)的最小值。

　　8-13行，分配Tiny类型的内存(<512)。 8-10行，如果PoolThreadCache缓存对象中分配到内存，分配内流程结束。12-13行，如果缓存中没有，就从Tiny内存池中分配一块内存。

　　15-20行，分配Small类型的内存(>=512且<pageSize)。和分配Tiny内存的逻辑相同。

　　29-27行, 使用从前两个步骤中得到的Tiny或Small内存的索引，从子页面池中分配一块内存。33行，从子页面中分配内存。35行，使用分配到的内存初始化PoolByteBuf对象，如果能到这里，分配内存流程结束。

　　41行，如果子页面池中还没有内存可用，调用allocateNormal方法从PoolChunk对象中分配一个子页面，再从子页面中分配所需的内存。

　　47-55行，分配Normal类型的内存(>=pageSize且<chunkSize)。48,49行，从缓存中分配内存，如果成功，分配内存流程结束。53行，缓存中没有可用的内存，调用allocateNormal方法从PoolChunk中分配内存。

　　58行，如果分配的是>chunkSize的内存。这块内存不会进入PCKL链表中。

　　上面代码中的allocateNormal方法封装了创建PCK对象，从PCK对象中分配内存，再把PCK对象放入到PCKL链表中的逻辑，也是十分重要的代码。

private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
if (q050.allocate(buf, reqCapacity, normCapacity) || q025.allocate(buf, reqCapacity, normCapacity) ||
q000.allocate(buf, reqCapacity, normCapacity) || qInit.allocate(buf, reqCapacity, normCapacity) ||
q075.allocate(buf, reqCapacity, normCapacity)) {
return;
}

// Add a new chunk.
PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
long handle = c.allocate(normCapacity);
assert handle > 0;
c.initBuf(buf, handle, reqCapacity);
qInit.add(c);
}

　　2-5行，依次尝试从每个PCKL节点中分配内存，如果成功，分配内存流程结束。

　　9-13行，先创建一个新的PCK对象，然后从中分配内存，使用内存初始化PooledByteBuf对象，最后把PCK对象添加PCKL链表头节点qInit中。PKCL对象的add方法会和allocate一样，根据PCK对象的内存使用率，把它移动到链表中合适的位置。

内存释放

　　io.netty.buffer.PooledByteBuf#deallocate方法调用io.netty.buffer.PoolArena#free方法，这个free方法负责整个内存释放过程。

void free(PoolChunk<T> chunk, long handle, int normCapacity, PoolThreadCache cache) {
if (chunk.unpooled) {
int size = chunk.chunkSize();
destroyChunk(chunk);
activeBytesHuge.add(-size);
deallocationsHuge.increment();
} else {
SizeClass sizeClass = sizeClass(normCapacity);
if (cache != null && cache.add(this, chunk, handle, normCapacity, sizeClass)) {
// cached so not free it.
return;
}

freeChunk(chunk, handle, sizeClass);
}
}

　　这段代码重点在8-14行。第8，9行，优先把内存放到缓存中，这样下次就能快速地从缓存中直接取用。第14行，在不能放进缓存的情况下把内存返回给PCK对象。

void freeChunk(PoolChunk<T> chunk, long handle, SizeClass sizeClass) {
final boolean destroyChunk;
synchronized (this) {
switch (sizeClass) {
case Normal:
++deallocationsNormal;
break;
case Small:
++deallocationsSmall;
break;
case Tiny:
++deallocationsTiny;
break;
default:
throw new Error();
}
destroyChunk = !chunk.parent.free(chunk, handle);
}
if (destroyChunk) {
// destroyChunk not need to be called while holding the synchronized lock.
destroyChunk(chunk);
}
}

　　第17行，掉用PCKL对象的free方法把内存还给PCK对象，移动PCK对象在PCKL链表中位置。如果此时这个PCK对象的使用率变成0，destroyChunk=true。

　　第21行，调用destroyChunk方法销毁掉PCK对象。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航