您的位置:首页 > 编程语言

操作系统中的虚拟内存技术及其实现代码

2015-06-03 12:27 417 查看
虚拟内存是现代操作系统普遍使用的一种技术。

虚拟内存的基本思想是,每个进程有用独立的逻辑地址空间,内存被分为大小相等的多个块,称为页(Page)。每个页都是一段连续的地址。对于进程来看,逻辑上貌似有很多内存空间,其中一部分对应物理内存上的一块(称为页框 page frame,通常页和页框大小相等),还有一些没加载在内存中的对应在硬盘上。通过引入进程的逻辑地址,把进程地址空间与实际存储空间分离,增加存储管理的灵活性。

地址空间和存储空间两个基本概念的定义如下:

地址空间:将源程序经过编译后得到的目标程序,存在于它所限定的地址范围内,这个范围称为地址空间。地址空间是逻辑地址的集合。

存储空间:指主存中一系列存储信息的物理单元的集合,这些单元的编号称为物理地址存储空间是物理地址的集合。

由此衍生出的管理方式有三种:
页式存储管理、段式存储管理和段页式存储管理。这里主要介绍页式存储。

在页式系统中进程建立时,操作系统为进程中所有的页分配页框。当进程撤销时收回所有分配给它的页框。在程序的运行期间,如果允许进程动态地申请空间,操作系统还要为进程申请的空间分配物理页框。操作系统为了完成这些功能,必须记录系统内存中实际的页框使用情况。操作系统还要在进程切换时,正确地切换两个不同的进程地址空间到物理内存空间的映射。为了理解操作系统如何完成这些需求,我们先理解页表技术。先看张图,转载自51CTO:



页表中的条目被称为页表项(page table entry),一个页表项负责记录一段虚拟地址到物理地址的映射关系。

既然页表是存储在内存中的,那么程序每次完成一次内存读取时都至少会访问内存两次,相比于不使用MMU(MMU是Memory Management Unit的缩写,它代表集成在CPU内部的一个硬件逻辑单元,主要作用是给CPU提供从虚拟地址向物理地址转换的功能,从硬件上给软件提供一种内存保护的机制)时的一次内存访问,效率被大大降低了,如果所使用的内存的性能比较差的话,这种效率的降低将会更明显。因此,如何在发挥MMU优势的同时使系统消耗尽量减小,就成为了一个亟待解决的问题。

于是,TLB产生了。TLB是什么呢?我们叫它转换旁路缓冲器,它实际上是MMU中临时存放转换数据的一组重定位寄存器。既然TLB本质上是一组寄存器,那么不难理解,相比于访问内存中的页表,访问TLB的速度要快很多。因此如果页表的内容全部存放于TLB中,就可以解决访问效率的问题了。

然而,由于制造成本等诸多限制,所有页表都存储在TLB中几乎是不可能的。这样一来,我们只能通过在有限容量的TLB中存储一部分最常用的页表,从而在一定程度上提高MMU的工作效率。

这一方法能够产生效果的理论依据叫做存储器访问的局部性原理。它的意思是说,程序在执行过程中访问与当前位置临近的代码的概率更高一些。因此,从理论上我们可以说,TLB中存储了当前时间段需要使用的大多数页表项,所以可以在很大程度上提高MMU的运行效率。

我们这里所用的是二级页表的技术,何为二级页表,即是MMU采用二级查表的方法,即首先由虚拟地址索引出第一张表的某一段内容,然后再根据这段内容搜索第二张表,最后才能确定物理地址。这里的第一张表,我们叫它一级页表,第二张表被称为是二级页表。采用二级查表法的主要目的是减小页表自身占据的内存空间,但缺点是进一步降低了内存的寻址效率。

好了,前情介绍完毕,下面上干货,用哈佛大学开发的用于教学的OS161来实现VM,OS161基于MIP-I hardware。

代码位于github上:https://github.com/tian-jiang/OS161-VirtualMemory

首先看一段代码,kern/arch/mips/include/vm.h,物理内存的分配定义在此

/*
* MIPS-I hardwired memory layout:
*    0xc0000000 - 0xffffffff   kseg2 (kernel, tlb-mapped)
*    0xa0000000 - 0xbfffffff   kseg1 (kernel, unmapped, uncached)
*    0x80000000 - 0x9fffffff   kseg0 (kernel, unmapped, cached)
*    0x00000000 - 0x7fffffff   kuseg (user, tlb-mapped)
*
* (mips32 is a little different)
*/

#define MIPS_KUSEG  0x00000000
#define MIPS_KSEG0  0x80000000
#define MIPS_KSEG1  0xa0000000
#define MIPS_KSEG2  0xc0000000


内存的分配用图表示如下





这张图展示了在OS161中物理内存的分配.

让我们从头开始:kern/startup/man.c

/* Early initialization. */
ram_bootstrap();
.......

/* Late phase of initialization. */
vm_bootstrap();
........


在操作系统启动的时候,调用raw_bootstrap()以及vm_bootstrap()来启动vm管理模块。那么这两个函数是在哪里定义和使用的呢,我们接着看下面的代码。

kern/include/vm.h和kern/arch/mips/include/vm.h

/* Initialization function */
void vm_bootstrap(void);
......

/* Allocate/free kernel heap pages (called by kmalloc/kfree) */

void frametable_bootstrap(void);

/*
* Interface to the low-level module that looks after the amount of
* physical memory we have.
*
* ram_getsize returns the lowest valid physical address, and one past
* the highest valid physical address. (Both are page-aligned.) This
* is the memory that is available for use during operation, and
* excludes the memory the kernel is loaded into and memory that is
* grabbed in the very early stages of bootup.
*
* ram_stealmem can be used before ram_getsize is called to allocate
* memory that cannot be freed later. This is intended for use early
* in bootup before VM initialization is complete.
*/

void ram_bootstrap(void);
paddr_t ram_stealmem(unsigned long npages);
void ram_getsize(paddr_t *lo, paddr_t *hi);


这两个function是定义在这里的,那么这两个function又是干什么事情的呢

kern/arch/mips/vm/ram.c, kern/arch/mips/vm/vm.c, kern/vm/frametable.c

vaddr_t firstfree;   /* first free virtual address; set by start.S */

static paddr_t firstpaddr;  /* address of first free physical page */
static paddr_t lastpaddr;   /* one past end of last free physical page */

/*
* Called very early in system boot to figure out how much physical
* RAM is available.
*/
void
ram_bootstrap(void)
{
size_t ramsize;

/* Get size of RAM. */
ramsize = mainbus_ramsize();

/*
* This is the same as the last physical address, as long as
* we have less than 508 megabytes of memory. If we had more,
* various annoying properties of the MIPS architecture would
* force the RAM to be discontiguous. This is not a case we
* are going to worry about.
*/
if (ramsize > 508*1024*1024) {
ramsize = 508*1024*1024;
}

lastpaddr = ramsize;

/*
* Get first free virtual address from where start.S saved it.
* Convert to physical address.
*/
firstpaddr = firstfree - MIPS_KSEG0;

kprintf("%uk physical memory available\n",
(lastpaddr-firstpaddr)/1024);
}


/*
* Initialise the frame table
*/
void
vm_bootstrap(void)
{
frametable_bootstrap();
}


/*
* Make variables static to prevent it from other file's accessing
*/
static struct frame_table_entry *frame_table;
static paddr_t frametop, freeframe;

/*
* initialise frame table
*/
void
frametable_bootstrap(void)
{
struct frame_table_entry *p;
paddr_t firsta, lasta, paddr;
unsigned long framenum, entry_num, frame_table_size, i;

// get the useable range of physical memory
ram_getsize(&firsta, &lasta);
KASSERT((firsta & PAGE_FRAME) == firsta);
KASSERT((lasta & PAGE_FRAME) == lasta);

framenum = (lasta - firsta) / PAGE_SIZE;

// calculate the size of the whole framemap
frame_table_size = framenum * sizeof(struct frame_table_entry);
frame_table_size = ROUNDUP(frame_table_size, PAGE_SIZE);
entry_num = frame_table_size / PAGE_SIZE;
KASSERT((frame_table_size & PAGE_FRAME) == frame_table_size);

frametop = firsta;
freeframe = firsta + frame_table_size;

if (freeframe >= lasta) {
// This is impossible for most of the time
panic("vm: framemap consume physical memory?\n");
}

// keep the frame state in the top of the useable range of physical memory
// the free frame page address started from the end of the frame map
frame_table = (struct frame_table_entry *) PADDR_TO_KVADDR(firsta);

// Initialise the frame list, each entry corrsponding to a frame,
// and each entry stores the address of the next free frame.
// If the next frame address of this entry equals zero, means this current frame is allocated
p = frame_table;
for (i = 0; i < framenum-1; i++) {
if (i < entry_num) {
p->next_freeframe = 0;
p += 1;
continue;
}
paddr = frametop + (i+1) * PAGE_SIZE;
p->next_freeframe = paddr;
p += 1;
}
}


kern/include/vm.h
struct frame_table_entry {
// address of next free frame
size_t          next_freeframe;
};


raw_bootstrap是系统初始化时用来查看有多少物理内存可以使用的。而vm_bootstrap只是简单的调用了frametable_bootstrap(),而frametable_bootstrap()则是将能用的物理内存分页,每页大小为4K,然后保存一个记录空白页的linked list在内存中,从free的内存的顶部开始存放,但是在存放之前,先要算出需要多少空间来存放这个frame table。所以代码的前段在计算frame table的大小,后面则是初始化frame table这个linked list。因为初始化的时候都是空的,所以直接指向下一个page的地址即可。

操作系统的vm初始化到此完毕。那vm是怎么使用的呢,请看下面

kern/include/vm.h

/* Fault handling function called by trap code */
int vm_fault(int faulttype, vaddr_t faultaddress);

vaddr_t alloc_kpages(int npages);
void free_kpages(vaddr_t addr);


kern/include/addrspace.h,实现在kern/vm/addrspace.c

/*
* Address space - data structure associated with the virtual memory
* space of a process.
*
* You write this.
*/

/*
* A linked list which defined to store the information for regions(code, text, bss...)
*/
struct as_region {
vaddr_t as_vbase;    /* the started virtual address for one region */
size_t as_npages;    /* how many pages this region occupied from the vbase */
unsigned int as_permissions;    /* does this region readable? writable? executable? */
struct as_region *as_next_region;    /* address of the following region */
};

struct addrspace {
#if OPT_DUMBVM
vaddr_t as_vbase1;
paddr_t as_pbase1;
size_t as_npages1;
vaddr_t as_vbase2;
paddr_t as_pbase2;
size_t as_npages2;
paddr_t as_stackpbase;
#else
/* Put stuff here for your VM system */
struct as_region *as_regions_start;    /* header of the regions linked list */
vaddr_t as_pagetable;               /* address of the first-level page table */
#endif
};

/*
* The structure of PTE in page table:
* |        address             |  PTE_VALID      |    PE_W        |    PF_R        |    PF_X
*  the virtual address of frame | valid indicator | writeable flag | readable flag | executable flag
* I don't use structure to represent PTE, just use type vaddr_t, and becuase the last 12 bit is free
* for a virtual address of frame, some of they could be used for the flags
*/

/*
* Functions in addrspace.c:
*
*    as_create - create a new empty address space. You need to make
*                sure this gets called in all the right places. You
*                may find you want to change the argument list. May
*                return NULL on out-of-memory error.
*
*    as_copy   - create a new address space that is an exact copy of
*                an old one. Probably calls as_create to get a new
*                empty address space and fill it in, but that's up to
*                you.
*
*    as_activate - make the specified address space the one currently
*                "seen" by the processor. Argument might be NULL,
*                meaning "no particular address space".
*
*    as_destroy - dispose of an address space. You may need to change
*                the way this works if implementing user-level threads.
*
*    as_define_region - set up a region of memory within the address
*                space.
*
*    as_prepare_load - this is called before actually loading from an
*                executable into the address space.
*
*    as_complete_load - this is called when loading from an executable
*                is complete.
*
*    as_define_stack - set up the stack region in the address space.
*                (Normally called *after* as_complete_load().) Hands
*                back the initial stack pointer for the new process.
*
*    as_zero_region - zero out a new allocated page.
*
*    as_destroy_regions - free all the space allocated for regions storeage.
*/

struct addrspace *as_create(void);
int               as_copy(struct addrspace *src, struct addrspace **ret);
void              as_activate(struct addrspace *);
void              as_destroy(struct addrspace *);

int               as_define_region(struct addrspace *as,
vaddr_t vaddr, size_t sz,
int readable,
int writeable,
int executable);
int               as_prepare_load(struct addrspace *as);
int               as_complete_load(struct addrspace *as);
int               as_define_stack(struct addrspace *as, vaddr_t *initstackptr);
void          as_zero_region(vaddr_t vaddr, unsigned npages);
void          as_destroy_regions(struct as_region *ar);


kern/vm/frametable.c

/*
* Allocate n pages.
* Before frame table initialisation, using ram_stealmem
*/
static
paddr_t
getppages(int npages)
{
paddr_t paddr;
struct frame_table_entry *p;
int i;

spinlock_acquire(&frametable_lock);
if (frame_table == 0)
paddr = ram_stealmem(npages);
else
{
if (npages > 1){
spinlock_release(&frametable_lock);
return 0;
}

// Freeframe equals zero means all the frames have been allocated
// and there is no frame to use.
if (freeframe == 0){
spinlock_release(&frametable_lock);
return 0;
}

// Get the current free frame's entry id
// and retrieve the next free frame
paddr = freeframe;
i = (freeframe - frametop) / PAGE_SIZE;
p = frame_table + i;

freeframe = p->next_freeframe;
p->next_freeframe = 0;
}
spinlock_release(&frametable_lock);

return paddr;
}

/*
* Allocation function for public accessing
* Returning virtual address of frame
*/
vaddr_t
alloc_kpages(int npages)
{
paddr_t paddr = getppages(npages);

if(paddr == 0)
return 0;

return PADDR_TO_KVADDR(paddr);
}

/*
* Free page
* Stores the address of the current freeframe into the entry of the frame to be freed
* and update the address of the freeframe.
*/
static
void
freeppages(paddr_t paddr)
{
struct frame_table_entry *p;
int i;
spinlock_acquire(&frametable_lock);
i = (paddr - frametop) / PAGE_SIZE;
p = frame_table + i;
p->next_freeframe = freeframe;
freeframe = paddr;
spinlock_release(&frametable_lock);
}

/*
* Free page function for public accessing
*/
void
free_kpages(vaddr_t addr)
{
KASSERT(addr >= MIPS_KSEG0);

paddr_t paddr = KVADDR_TO_PADDR(addr);
if (paddr <= frametop) {
// memory leakage
}
else {
freeppages(paddr);
}
}


kern/arch/mips/vm

这是最关键的一个函数,当TLB里面找不到用户app需要的virtual page时,怎么处理

/*
* When TLB miss happening, a page fault will be trigged.
* The way to handle it is as follow:
* 1. check what page fault it is, if it is READONLY fault,
*    then do nothing just pop up an exception and kill the process
* 2. if it is a read fault or write fault
*    1. first check whether this virtual address is within any of the regions
*       or stack of the current addrspace. if it is not, pop up a exception and
*       kill the process, if it is there, goes on.
*    2. then try to find the mapping in the page table,
*       if a page table entry exists for this virtual address insert it into TLB
*    3. if this virtual address is not mapped yet, mapping this address,
*     update the pagetable, then insert it into TLB
*/
int
vm_fault(int faulttype, vaddr_t faultaddress)
{
vaddr_t *vaddr1, *vaddr2, vaddr, vbase, vtop, faultadd = 0;
paddr_t paddr;
struct addrspace *as;
struct as_region *s;
uint32_t ehi, elo;
int i, index1, index2, spl;
unsigned int permis = 0;

switch (faulttype) {
case VM_FAULT_READONLY:
return EFAULT;
case VM_FAULT_READ:
case VM_FAULT_WRITE:
break;
default:
return EINVAL;
}

as = curthread -> t_addrspace;
if (as == NULL) {
return EFAULT;
}

// Align faultaddress
faultaddress &= PAGE_FRAME;

// Go through the link list of regions
// Check the validation of the faultaddress
KASSERT(as->as_regions_start != 0);
s = as->as_regions_start;
while (s != 0) {
KASSERT(s->as_vbase != 0);
KASSERT(s->as_npages != 0);
KASSERT((s->as_vbase & PAGE_FRAME) == s->as_vbase);
vbase = s->as_vbase;
vtop = vbase + s->as_npages * PAGE_SIZE;
if (faultaddress >= vbase && faultaddress < vtop) {
faultadd = faultaddress;
permis = s->as_permissions;
break;
}
s = s->as_next_region;
}

if (faultadd == 0) {
vtop = USERSTACK;
vbase = vtop - VM_STACKPAGES * PAGE_SIZE;
if (faultaddress >= vbase && faultaddress < vtop) {
faultadd = faultaddress;
// Stack is readable, writable but not executable
permis |= (PF_W | PF_R);
}

// faultaddress is not within any range of the regions and stack
if (faultadd == 0) {
return EFAULT;
}
}

index1 = (faultaddress & TOP_TEN) >> 22;
index2 = (faultaddress & MID_TEN) >> 12;

vaddr1 = (vaddr_t *)(as->as_pagetable + index1 * 4);
if (*vaddr1) {
vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
// If the mapping exits in page table,
// get the address stores in PTE,
// translate it into physical address,
// check writeable flag,
// and prepare the physical address for TLBLO
if (*vaddr2 & PTE_VALID) {
vaddr = *vaddr2 & PAGE_FRAME;
paddr = KVADDR_TO_PADDR(vaddr);
if (permis & PF_W) {
paddr |= TLBLO_DIRTY;
}
}
// If not exists, do the mapping,
// update the PTE of the second page table,
// check writeable flag,
// and prepare the physical address for TLBLO
else {
vaddr = alloc_kpages(1);
KASSERT(vaddr != 0);

as_zero_region(vaddr, 1);
*vaddr2 |= (vaddr | PTE_VALID);

paddr = KVADDR_TO_PADDR(vaddr);
if (permis & PF_W) {
paddr |= TLBLO_DIRTY;
}
}
}
// If second page table even doesn't exists,
// create second page table,
// do the mapping,
// update the PTE,
// and prepare the physical address.
else {
*vaddr1 = alloc_kpages(1);
KASSERT(*vaddr1 != 0);
as_zero_region(*vaddr1, 1);

vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
vaddr = alloc_kpages(1);
KASSERT(vaddr != 0);
as_zero_region(vaddr, 1);
*vaddr2 |= (vaddr | PTE_VALID);

paddr = KVADDR_TO_PADDR(vaddr);
if (permis & PF_W) {
paddr |= TLBLO_DIRTY;
}
}

spl = splhigh();

// update TLB entry
// if there still a empty TLB entry, insert new one in
// if not, randomly select one, throw it, insert new one in
for (i=0; i<NUM_TLB; i++) {
tlb_read(&ehi, &elo, i);
if (elo & TLBLO_VALID) {
continue;
}
ehi = faultaddress;
elo = paddr | TLBLO_VALID;
tlb_write(ehi, elo, i);
splx(spl);
return 0;
}

// FIXME, TLB replacement algo.
ehi = faultaddress;
elo = paddr | TLBLO_VALID;
tlb_random(ehi, elo);
splx(spl);
return 0;
}


在系统运行的过程中,会不断的产生page fault,这是因为,虽然系统给了运行的程序分配了页(分配的函数见kern/vm/frametable.c),但是这个TLB里面没有记录这个页面从虚拟地址到物理地址的映射,所以无法使用。所以在程序真正需要使用这个页的时候,需要首先访问TLB,从里面取出对应的物理地址。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: