您的位置:首页 > 其它

Erlang二进制创建的内部机制和优化(一)

2013-07-26 08:02 387 查看
《Erlang Binary的内部结构和分类介绍》一文是本文的基础,接下来要探讨的是构建Binary时,什么情景下才能充分发挥Erlang运行时系统对二进制创建做所做的优化特性。

下面是引用官方文档中的一个例子,并加予C源码进一步阐述二进制创建的内部机制。

Bin0 = <<0>>,                    %% 1
Bin1 = <<Bin0/binary,1,2,3>>,    %% 2
Bin2 = <<Bin1/binary,4,5,6>>,    %% 3
Bin3 = <<Bin2/binary,7,8,9>>,    %% 4
Bin4 = <<Bin1/binary,17>>,       %% 5 !!!
{Bin4,Bin3}                      %% 6


在第一行,系统创建了一个堆二进制(heap binary)。

在《Erlang Binary的内部结构和分类介绍》已经提到,堆二进制被直接存储到进程堆里,最大为64字节,如果大于64字节,引用计数二进制(refc binary)将会被创建。

第二行属于二进制的append操作,调用了erl_bits.c中的erts_bs_appen函数,C源码及注解如下:
Eterm
erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term,
Uint extra_words, Uint unit)
{
Eterm bin;			/* Given binary */
Eterm* ptr;
Eterm hdr;
ErlSubBin* sb;
ProcBin* pb;
Binary* binp;
Uint heap_need;
Uint build_size_in_bits;
Uint used_size_in_bits;
Uint unsigned_bits;
ERL_BITS_DEFINE_STATEP(c_p);

// 需要创建的二进制的位数: build_size_in_bits
if (is_small(build_size_term)) {
Sint signed_bits = signed_val(build_size_term);
if (signed_bits < 0) {
goto badarg;
}
build_size_in_bits = (Uint) signed_bits;
} else if (term_to_Uint(build_size_term, &unsigned_bits)) {
build_size_in_bits = unsigned_bits;
} else {
c_p->freason = unsigned_bits;
return THE_NON_VALUE;
}

bin = reg[live];
if (!is_boxed(bin)) {
badarg:
c_p->freason = BADARG;
return THE_NON_VALUE;
}
ptr = boxed_val(bin);
// 取出二进制数据流中的header
hdr = *ptr;
if (!is_binary_header(hdr)) {
goto badarg;
}
// #MARK_A
if (hdr != HEADER_SUB_BIN) {
// 非子二进制,不可写
goto not_writable;
}
sb = (ErlSubBin *) ptr;
if (!sb->is_writable) {
// is_writable==0,不可写
goto not_writable;
}
pb = (ProcBin *) boxed_val(sb->orig);

// 必须是refc binary
ASSERT(pb->thing_word == HEADER_PROC_BIN);
if ((pb->flags & PB_IS_WRITABLE) == 0) {
// 标明了不可写
goto not_writable;
}

/*
* OK, the binary is writable.
*/

erts_bin_offset = 8*sb->size + sb->bitsize;
if (unit > 1) {
if ((unit == 8 && (erts_bin_offset & 7) != 0) ||
(erts_bin_offset % unit) != 0) {
goto badarg;
}
}
used_size_in_bits = erts_bin_offset + build_size_in_bits;
// 原来的sub binary设为以后不可写,因为后继空间将要被写入数据
// #MARK_B
sb->is_writable = 0;	/* Make sure that no one else can write. */
// 扩展到所需大小
pb->size = NBYTES(used_size_in_bits);
pb->flags |= PB_ACTIVE_WRITER;

/*
* Reallocate the binary if it is too small.
*/
binp = pb->val;
// 如果容器的空间不足,则重新分配容器大小到所需的二倍
if (binp->orig_size < pb->size) {
Uint new_size = 2*pb->size;
binp = erts_bin_realloc(binp, new_size);
binp->orig_size = new_size;
// 注意:重新分配空间以后,pb->val指针会被改变,
// 所以此处的binary不能被外部引用
pb->val = binp;
pb->bytes = (byte *) binp->orig_bytes;
}
erts_current_bin = pb->bytes;

/*
* Allocate heap space and build a new sub binary.
*/

reg[live] = sb->orig;
heap_need = ERL_SUB_BIN_SIZE + extra_words;
if (c_p->stop - c_p->htop < heap_need) {
(void) erts_garbage_collect(c_p, heap_need, reg, live+1);
}
// 创建一个新的sub binary,指向原二进制的开头,
// 相比原来的sub binary,这里只是把空间大小扩展到所需值
sb = (ErlSubBin *) c_p->htop; // 从堆顶写入
// 进程堆顶上升ERL_SUB_BIN_SIZE(20)字节
c_p->htop += ERL_SUB_BIN_SIZE;
sb->thing_word = HEADER_SUB_BIN;
sb->size = BYTE_OFFSET(used_size_in_bits);
sb->bitsize = BIT_OFFSET(used_size_in_bits);
sb->offs = 0;
sb->bitoffs = 0;
// 最新的sub binary,设为可写
// 也就是说,在一系列的append操作中,只有最后一个sub binary是可写的
sb->is_writable = 1;
sb->orig = reg[live];

return make_binary(sb);

/*
* The binary is not writable. We must create a new writable binary and
* copy the old contents of the binary.
*/
not_writable:
{
Uint used_size_in_bytes; /* Size of old binary + data to be built */
Uint bin_size;
Binary* bptr;
byte* src_bytes;
Uint bitoffs;
Uint bitsize;
Eterm* hp;

/*
* Allocate heap space.
*/
heap_need = PROC_BIN_SIZE + ERL_SUB_BIN_SIZE + extra_words;
if (c_p->stop - c_p->htop < heap_need) {
(void) erts_garbage_collect(c_p, heap_need, reg, live+1);
bin = reg[live];
}
hp = c_p->htop;

/*
* Calculate sizes. The size of the new binary, is the sum of the
* build size and the size of the old binary. Allow some room
* for growing.
*/
ERTS_GET_BINARY_BYTES(bin, src_bytes, bitoffs, bitsize);
erts_bin_offset = 8*binary_size(bin) + bitsize;
if (unit > 1) {
if ((unit == 8 && (erts_bin_offset & 7) != 0) ||
(erts_bin_offset % unit) != 0) {
goto badarg;
}
}
used_size_in_bits = erts_bin_offset + build_size_in_bits;
used_size_in_bytes = NBYTES(used_size_in_bits);
bin_size = 2*used_size_in_bytes;

// 至少256字节
bin_size = (bin_size < 256) ? 256 : bin_size;

/*
* Allocate the binary data struct itself.
*/

// 创建大小为所需空间的二倍的binary(最小值为256字节),
// 它作为一个容器,存储在进程堆以外,
// 进程堆里只存放引用这个binary的refc binary
bptr = erts_bin_nrml_alloc(bin_size);
bptr->flags = 0;
bptr->orig_size = bin_size;
erts_refc_init(&bptr->refc, 1);
erts_current_bin = (byte *) bptr->orig_bytes;

/*
* Now allocate the ProcBin on the heap.
*/

// 创建refc binary,引用上面的binary, 并存储到进程堆
pb = (ProcBin *) hp;
hp += PROC_BIN_SIZE;
pb->thing_word = HEADER_PROC_BIN;
// 当前设置为实际所需的大小,以后的append操作可扩展
pb->size = used_size_in_bytes;
pb->next = MSO(c_p).first;
MSO(c_p).first = (struct erl_off_heap_header*)pb;
pb->val = bptr;
pb->bytes = (byte*) bptr->orig_bytes;
pb->flags = PB_IS_WRITABLE | PB_ACTIVE_WRITER;
OH_OVERHEAD(&(MSO(c_p)), pb->size / sizeof(Eterm));

/*
* Now allocate the sub binary and set its size to include the
* data about to be built.
*/

// 创建sub binary,引用上面的refc binary,并设置为所需大小
sb = (ErlSubBin *) hp;
hp += ERL_SUB_BIN_SIZE;
sb->thing_word = HEADER_SUB_BIN;
sb->size = BYTE_OFFSET(used_size_in_bits);
sb->bitsize = BIT_OFFSET(used_size_in_bits);
sb->offs = 0;
sb->bitoffs = 0;
sb->is_writable = 1;
sb->orig = make_binary(pb);

c_p->htop = hp;

/*
* Now copy the data into the binary.
*/

copy_binary_to_buffer(erts_current_bin, 0, src_bytes, bitoffs, erts_bin_offset);
return make_binary(sb);
}
}


从上面代码#MARK_A处可以看到,如果不是子二进制(sub binary)就跳到not_writable,然后创建所需要的容器、refc binary和sub binary,并拷贝Bin0的内容(详细请看not_writable部分中的注释),为append做准备。
Bin0 = <<0>>,                    %% 1
Bin1 = <<Bin0/binary,1,2,3>>,    %% 2
Bin2 = <<Bin1/binary,4,5,6>>,    %% 3
Bin3 = <<Bin2/binary,7,8,9>>,    %% 4
Bin4 = <<Bin1/binary,17>>,       %% 5 !!!
{Bin4,Bin3}                      %% 6
在第三行,由于Bin1是最后一个执行过append操作的,它的后继空间是自由的,是可被扩展的,而且,Bin1不可能再被改变,

所以Bin1不会被复制,只是在Bin1的后面依次追加1、2、3,

第四行的执行过程和第三行一样。

在第五行,是往Bin1后面追加数据,而不是Bin3。由于Bin1已经不是最后被执行过append操作的数据,即Bin1的后继空间已经有别的数据存在(此处Bin1后面已经保存了4,5,6,7,8,9)。所以执行过程不会和上面两行一样。在这里将会创建新的sub binary并拷贝Bin1,然后在它的后面追加17。

我们是怎么知道它后面不能再追加数据?文档中也有这么一个问题:

We will not explain here how the run-time system can know that it is not allowed to write into Bin1; it is left as an exercise to the curious reader to figure out how it is done by reading the emulator sources, primarily erl_bits.c.

这个问题的答案,上面append函数中可以找到。其实在执行第三行时,Bin1已被设置为不可写(参见#MARK_B处)。

Erlang二进制创建的内部机制和优化(二)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息