您的位置:首页 > 其它

垃圾回收器的实现

2015-03-12 21:59 176 查看

In
computer science, garbage collection (GC) is a form of automatic
memory management. The garbage collector, or just collector, attempts to reclaim
garbage, or memory occupied by

objects that are no longer in use by the
program. Garbage collection was invented by
John McCarthy around 1959 to solve problems in
Lisp.

This section presentsthe mark-and-sweep  garbage collection algorithm.The mark-and-sweep algorithm was the first garbage collection algorithmto
be developed that is able to reclaim cyclic data structures.

Variations
of the mark-and-sweep algorithm continue to be among the mostcommonly used garbage collection techniques.

When using mark-and-sweep,unreferenced objects are not reclaimed immediately.Instead, garbage is allowed to accumulate until all available memoryhas been exhausted.When that happens,the execution of the program is suspended temporarilywhile
the mark-and-sweep algorithm collects all the garbage.Once all unreferenced objects have been reclaimed,the normal execution of the program can resume.

The mark-and-sweep algorithm is called a tracing garbage collectorbecause is
traces out the entire collection of objectsthat are directly or indirectly accessible by the program.The objects that a program can access directlyare those objects which are referenced by local variableson the processor stack as well as by any static
variablesthat refer to objects.In the context of garbage collection,these variables are called the
roots .An object is indirectly accessibleif it is referenced by a field in some other(directly or indirectly) accessible object.An accessible object is said to be
live .Conversely, an object which is not
live is garbage.

The mark-and-sweep algorithm consists of two phases:In the first phase, it finds and marks all accessible objects.The first phase is called the
mark phase.In the second phase, the garbage collection algorithm scansthrough the heap and reclaims all the unmarked objects.The second phase is called the
sweep phase.The algorithm can be expressed as follows:

for each root variable r
mark (r);
sweep ();

In order to distinguish the live objects from garbage,we record the state of an object in each object.That is, we add a special
boolean field to each objectcalled, say, marked.By default, all objects are unmarked when they are created.Thus, the
marked field is initially false.

An object p and all the objects indirectly accessiblefrom
p can be marked by using the following recursivemark method:

void mark (Object p)

if (!p.marked)

p.marked = true;
for each Object q referenced by p
mark (q);

Notice that this recursive mark algorithmdoes nothing when it encounters an object that has already been marked.Consequently, the algorithm is guaranteed to terminate.And it terminates only when all accessible objects have
been marked.
In its second phase, the mark-and-sweep algorithmscans through all the objects in the heap,in order to locate all the unmarked objects.The storage allocated to the unmarked objects is reclaimed during the scan.At the same time,
the marked field on every live object is set backto false in preparation for the next invocation of themark-and-sweep garbage collection algorithm:

void sweep ()

for each Object p in the heap

if (p.marked)
p.marked = false
else
heap.release (p);

Figure 

illustrates the operation
of the mark-and-sweepgarbage collection algorithm.Figure 

 (a)
shows the conditions before garbage collection begins.In this example, there is a single root variable.Figure 

 (b)
shows the effect of the mark phaseof the algorithm.At this point, all live objects have been marked.Finally, Figure 

 (c)
shows the objects left after the sweepphase has been completed.Only live objects remain in memory and the
marked fields haveall been set to false again.

  



Figure: Mark-and-sweep garbage collection.

Because the mark-and-sweep garbage collection algorithmtraces out the set of objects accessible from the roots,it is able to correctly identify and collect garbageeven in the presence of reference cycles.This is the main advantage
of mark-and-sweep over the referencecounting technique presented in the preceding section.A secondary benefit of the mark-and-sweep approach is thatthe normal manipulations of reference variables incurs no overhead.

The main disadvantage of the mark-and-sweep approach is the factthat that normal program execution is suspended while thegarbage collection algorithm runs.In particular, this can be a problem in a program that interactswith a
human user or that must satisfy real-time execution constraints.For example, an interactive application that uses mark-and-sweepgarbage collection becomes unresponsive periodically.

本篇博客用C语言实现用John McCarthy提出的mark-sweep算法.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#define STACK_MAX 256
#define INITIAL_GC_THRESHOLD 8
typedef int    bool;
#define true   1
#define false  0
typedef enum {
OBJ_INT,
OBJ_PAIR
}ObjectType;

typedef struct object {
char marked;
struct object *next;
ObjectType type;
union {
/* OBJ_INT*/
int value;
/* OBJ_PAIR*/
struct {
struct object *head;
struct object *tail;
};
};
}object;

typedef struct {
int num_objects;
int max_objects;
object * firstobject;
object *stack[STACK_MAX];
int stacksize;
}VM;

VM* newVM();
object *newObject(VM *vm, ObjectType type);
bool isEmpty(VM *vm);
bool isFull(VM *vm);

void push(VM *vm, object *ref);
object *pop(VM *vm);

object *pushPair(VM *vm);
void pushInt(VM *vm, int value);

void mark(object *obj);
void markAll(VM *vm);
void sweep(VM *vm);

void gc(VM *vm);
void freeVM(VM *vm);

VM* newVM()
{
VM* vm = malloc(sizeof(VM));
vm->stacksize = 0;
vm->firstobject = NULL;
vm->num_objects = 0;
vm->max_objects = INITIAL_GC_THRESHOLD;
return vm;
}
bool isEmpty(VM *vm)
{
return vm->stacksize == 0;
}
bool isFull(VM *vm)
{
return vm->stacksize == STACK_MAX;
}

void push(VM *vm, object *ref)
{
if(isFull(vm))
{
perror("Stack overflow");
exit(EXIT_FAILURE);
}
vm->stack[vm->stacksize ++] = ref;

}

object *pop(VM *vm)
{
if(isEmpty(vm))
{
perror("Stack underflow");
exit(EXIT_FAILURE);
}
return vm->stack[-- vm->stacksize];
}

object *newObject(VM *vm, ObjectType type)
{
if(vm->num_objects == vm->max_objects)
gc(vm);
object *obj = malloc(sizeof(object));
obj->type = type;
obj->marked = false;

obj->next = vm->firstobject;
vm->firstobject = obj;
vm->num_objects ++;
return obj;
}

void pushInt(VM *vm, int value)
{
object *obj = newObject(vm, OBJ_INT);
obj->value = value;
push(vm, obj);
}
//return value

object *pushPair(VM *vm)
{
object *obj = newObject(vm, OBJ_PAIR);
obj->tail = pop(vm);
obj->head = pop(vm);
push(vm, obj);
return obj;
}

void markAll(VM *vm)
{
int i;
for(i = 0; i < vm->stacksize; i++)
mark(vm->stack[i]);
}

void mark(object *obj)
{
/* avoid cyecle refference in the pair*/
if(obj->marked)
return;
obj->marked = true;
if(obj->type == OBJ_PAIR)
{
mark(obj->head);
mark(obj->tail);
}
}

void sweep(VM *vm)
{
object *prev = NULL;
object *cur = vm->firstobject;
while(cur)
{
object *next = cur->next;
if(!cur->marked)
{
if(prev)
{
prev->next = next;
}
else
vm->firstobject = next;
free(cur);
vm->num_objects --;
}
else
{
prev = cur;
cur->marked = false;
}

cur =next;
}
}

void gc(VM *vm)
{
int num_object = vm->num_objects;
markAll(vm);
sweep(vm);
vm->max_objects = vm->num_objects * 2;
printf("collect %d objects, %d objects remain\n", num_object - vm->num_objects, vm->num_objects);
}

void freeVM(VM *vm)
{
vm->stacksize = 0;
gc(vm);
free(vm);
}

void test1()
{
printf("test1:\n");
VM *vm = newVM();
pushInt(vm, 1);
pushInt(vm, 2);
gc(vm);
assert(vm->num_objects == 2);
freeVM(vm);
}

void test2()
{
printf("test2:\n");
VM *vm = newVM();
pushInt(vm, 1);
pushInt(vm, 2);
pop(vm);
pop(vm);

gc(vm);
assert(vm->num_objects == 0);
freeVM(vm);
}

void test3()
{
printf("test3:\n");
VM *vm = newVM();
pushInt(vm, 1);
pushInt(vm, 2);
pushPair(vm);
pushInt(vm, 3);
pushInt(vm, 4);
pushPair(vm);
pushPair(vm);

gc(vm);
assert(vm->num_objects == 7);
freeVM(vm);
}

void test4()
{
printf("test4:\n");
VM *vm = newVM();
pushInt(vm, 1);
pushInt(vm, 2);
object *obj1 = pushPair(vm);

pushInt(vm ,3);
pushInt(vm ,4);
object *obj2 = pushPair(vm);

/* make the 2, 4 unreachable*/
obj1->tail = obj2;
obj2->tail = obj1;
gc(vm);
assert(vm->num_objects == 4);
freeVM(vm);
}

int main(void)
{
test1();
test2();
test3();
test4();
perfTest();
return 0;
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: