您的位置:首页 > 运维架构 > Linux

Linux开发心得总结14 - linux情景分析--存储管理

2012-03-02 00:31 477 查看

2.1 linux内存管理基本框架

linux中的分段分页机制分三层,页目录(PGD),中间目录(PMD),页表(PT)。PT中的表项称为页表项(PTE)。注意英文缩写,在linux程序中函数变量的名字等都会和英文缩写相关。

LINUX中的三级映射流程如图:



但是arm结构的MMU在硬件只有2级映射,所以在软件上会跳过PMD表。即:在PGD中直接放的是PT的base address。在linux软件上就是:

[cpp]
view plaincopyprint?

#define PMD_SHIFT 21

#define PGDIR_SHIFT 21 //让PMD_SHIFT 和 PGDIR_SHIFT 相等就可以了。

[cpp]
view plaincopyprint?

/*
* Hardware-wise, we have a two level page table structure, where the first

* level has 4096 entries, and the second level has 256 entries.  Each entry

* is one 32-bit word.  Most of the bits in the second level entry are used

* by hardware, and there aren't any "accessed" and "dirty" bits.

*
* Linux on the other hand has a three level page table structure, which can

* be wrapped to fit a two level page table structure easily - using the PGD

* and PTE only.  However, Linux also expects one "PTE" table per page, and

* at least a "dirty" bit.
*
* Therefore, we tweak the implementation slightly - we tell Linux that we

* have 2048 entries in the first level, each of which is 8 bytes (iow, two

* hardware pointers to the second level.)  The second level contains two

* hardware PTE tables arranged contiguously, preceded by Linux versions

* which contain the state information Linux needs.  We, therefore, end up

* with 512 entries in the "PTE" level.

*
* This leads to the page tables having the following layout:

*
*    pgd             pte

* |        |
* +--------+
* |        |       +------------+ +0

* +- - - - +       | Linux pt 0 |

* |        |       +------------+ +1024

* +--------+ +0    | Linux pt 1 |

* |        |-----> +------------+ +2048

* +- - - - + +4    |  h/w pt 0  |

* |        |-----> +------------+ +3072

* +--------+ +8    |  h/w pt 1  |

* |        |       +------------+ +4096

*
* See L_PTE_xxx below for definitions of bits in the "Linux pt", and

* PTE_xxx for definitions of bits appearing in the "h/w pt".

*
* PMD_xxx definitions refer to bits in the first level page table.

*
* The "dirty" bit is emulated by only granting hardware write permission

* iff the page is marked "writable" and "dirty" in the Linux PTE.  This

* means that a write to a clean page will cause a permission fault, and

* the Linux MM layer will mark the page dirty via handle_pte_fault().

* For the hardware to notice the permission change, the TLB entry must

* be flushed, and ptep_set_access_flags() does that for us.

*
* The "accessed" or "young" bit is emulated by a similar method; we only

* allow accesses to the page if the "young" bit is set.  Accesses to the

* page will cause a fault, and handle_pte_fault() will set the young bit

* for us as long as the page is marked present in the corresponding Linux

* PTE entry.  Again, ptep_set_access_flags() will ensure that the TLB is

* up to date.
*
* However, when the "young" bit is cleared, we deny access to the page

* by clearing the hardware PTE.  Currently Linux does not flush the TLB

* for us in this case, which means the TLB will retain the transation

* until either the TLB entry is evicted under pressure, or a context

* switch which changes the user space mapping occurs.

*/

/*
* Hardware-wise, we have a two level page table structure, where the first
* level has 4096 entries, and the second level has 256 entries.  Each entry
* is one 32-bit word.  Most of the bits in the second level entry are used
* by hardware, and there aren't any "accessed" and "dirty" bits.
*
* Linux on the other hand has a three level page table structure, which can
* be wrapped to fit a two level page table structure easily - using the PGD
* and PTE only.  However, Linux also expects one "PTE" table per page, and
* at least a "dirty" bit.
*
* Therefore, we tweak the implementation slightly - we tell Linux that we
* have 2048 entries in the first level, each of which is 8 bytes (iow, two
* hardware pointers to the second level.)  The second level contains two
* hardware PTE tables arranged contiguously, preceded by Linux versions
* which contain the state information Linux needs.  We, therefore, end up
* with 512 entries in the "PTE" level.
*
* This leads to the page tables having the following layout:
*
*    pgd             pte
* |        |
* +--------+
* |        |       +------------+ +0
* +- - - - +       | Linux pt 0 |
* |        |       +------------+ +1024
* +--------+ +0    | Linux pt 1 |
* |        |-----> +------------+ +2048
* +- - - - + +4    |  h/w pt 0  |
* |        |-----> +------------+ +3072
* +--------+ +8    |  h/w pt 1  |
* |        |       +------------+ +4096
*
* See L_PTE_xxx below for definitions of bits in the "Linux pt", and
* PTE_xxx for definitions of bits appearing in the "h/w pt".
*
* PMD_xxx definitions refer to bits in the first level page table.
*
* The "dirty" bit is emulated by only granting hardware write permission
* iff the page is marked "writable" and "dirty" in the Linux PTE.  This
* means that a write to a clean page will cause a permission fault, and
* the Linux MM layer will mark the page dirty via handle_pte_fault().
* For the hardware to notice the permission change, the TLB entry must
* be flushed, and ptep_set_access_flags() does that for us.
*
* The "accessed" or "young" bit is emulated by a similar method; we only
* allow accesses to the page if the "young" bit is set.  Accesses to the
* page will cause a fault, and handle_pte_fault() will set the young bit
* for us as long as the page is marked present in the corresponding Linux
* PTE entry.  Again, ptep_set_access_flags() will ensure that the TLB is
* up to date.
*
* However, when the "young" bit is cleared, we deny access to the page
* by clearing the hardware PTE.  Currently Linux does not flush the TLB
* for us in this case, which means the TLB will retain the transation
* until either the TLB entry is evicted under pressure, or a context
* switch which changes the user space mapping occurs.
*/


[cpp]
view plaincopyprint?

<p>#define PTRS_PER_PTE 512 //PTE的个数

#define PTRS_PER_PMD 1

#define PTRS_PER_PGD 2048</p><p>#define PTE_HWTABLE_PTRS (PTRS_PER_PTE)

#define PTE_HWTABLE_OFF (PTE_HWTABLE_PTRS * sizeof(pte_t))

#define PTE_HWTABLE_SIZE (PTRS_PER_PTE * sizeof(u32))</p><p>/*

* PMD_SHIFT determines the size of the area a second-level page table can map

* PGDIR_SHIFT determines what a third-level page table entry can map

*/
#define PMD_SHIFT 21

#define PGDIR_SHIFT 21 //另PMD和PDGIR相等,来跳过PMD。</p>

[cpp]
view plaincopyprint?

/*
* These are used to make use of C type-checking..

*/
typedef struct { pteval_t pte; } pte_t;

typedef struct { unsigned long pmd; } pmd_t;

typedef struct { unsigned long pgd[2]; } pgd_t; //定义一个[2]数组,这样就和上面的介绍对应起来了,每个PGD是一个8个byte的值(即两个long型数)。

typedef struct { unsigned long pgprot; } pgprot_t;

/*
* These are used to make use of C type-checking..
*/
typedef struct { pteval_t pte; } pte_t;
typedef struct { unsigned long pmd; } pmd_t;
typedef struct { unsigned long pgd[2]; } pgd_t; //定义一个[2]数组,这样就和上面的介绍对应起来了,每个PGD是一个8个byte的值(即两个long型数)。
typedef struct { unsigned long pgprot; } pgprot_t;

这几个结构体定义PTE等他们的结构, pgprot_t 是page protect的意思,最上面的图可知,在具体映射时候,需要将PTE表中的高20bit的值 + 线性地址的低 12bit的值 才是具体的物理地址。所以PTE只有高20bit是在地址映射映射时有效的,那么他的低12bit用来放一些protect的数据,如writeable,rdonly......下面是pgprot_t的一些值:

[cpp]
view plaincopyprint?

#define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)

#define __PAGE_SHARED __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)

#define __PAGE_SHARED_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER)

#define __PAGE_COPY __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_RDONLY | L_PTE_XN)

#define __PAGE_COPY_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_RDONLY)

#define __PAGE_READONLY __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_RDONLY | L_PTE_XN)

#define __PAGE_READONLY_EXEC __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_RDONLY)

[cpp]
view plaincopyprint?

#define mk_pte(page,prot)   pfn_pte(page_to_pfn(page), prot)

#define mk_pte(page,prot)	pfn_pte(page_to_pfn(page), prot)

这个函数即是 page_to_pfn(page)得到高20bit的值 + prot的值。 prot即为pgprot_t的结构型值,低12bit。表示页的属性。

生成这一个pte后,我们要让这个pte生效,就要在将这个值 赋值到对应的地方去。

set_pte(pteptr,pteval): 但是在2.6.39没有找到这个代码,过程应该差不多了,算了·····

另外,我们可以通过对PTE的低12bit设置,来让mmu判断,是否建立了映射?还是建立了映射但已经被swap出去了?

还有其他很多的函数,和宏定义,具体可以看《深入linux内核》的内存寻址这章。

这里都是说的PTE中的低12bit的, 但是在2.6.39中明显多了一个linux pt, 具体做什么用还不太清楚。但是linux在arch/arm/include/asm/pgtable.h 中由个注释

[cpp]
view plaincopyprint?

/*
* "Linux" PTE definitions.
*
* We keep two sets of PTEs - the hardware and the linux version.

* This allows greater flexibility in the way we map the Linux bits

* onto the hardware tables, and allows us to have YOUNG and DIRTY

* bits.
*
* The PTE table pointer refers to the hardware entries; the "Linux"

* entries are stored 1024 bytes below.

*/

[cpp]
view plaincopyprint?

/*
* Each physical page in the system has a struct page associated with

* it to keep track of whatever it is we are using the page for at the

* moment. Note that we have no way to track which tasks are using

* a page, though if it is a pagecache page, rmap structures can tell us

* who is mapping it.
*/
struct page {
unsigned long flags;        /* Atomic flags, some possibly

* updated asynchronously */

atomic_t _count;        /* Usage count, see below. */

union {
atomic_t _mapcount; /* Count of ptes mapped in mms,

* to show when page is mapped

* & limit reverse map searches.

*/

struct {        /* SLUB */

u16 inuse;
u16 objects;
};
};
union {
struct {

unsigned long private;      /* Mapping-private opaque data:

* usually used for buffer_heads

* if PagePrivate set; used for

* swp_entry_t if PageSwapCache;

* indicates order in the buddy

* system if PG_buddy is set.

*/

struct address_space *mapping;  /* If low bit clear, points to

* inode address_space, or NULL.

* If page mapped as anonymous

* memory, low bit is set, and

* it points to anon_vma object:

* see PAGE_MAPPING_ANON below.

*/

};
#if USE_SPLIT_PTLOCKS

spinlock_t ptl;
#endif

struct kmem_cache *slab;    /* SLUB: Pointer to slab */

struct page *first_page;    /* Compound tail pages */

};
union {

pgoff_t index;      /* Our offset within mapping. */

void *freelist;     /* SLUB: freelist req. slab lock */

};
struct list_head lru;       /* Pageout list, eg. active_list

* protected by zone->lru_lock !

*/

/*
* On machines where all RAM is mapped into kernel address space,

* we can simply calculate the virtual address. On machines with

* highmem some memory is mapped into kernel virtual memory

* dynamically, so we need a place to store that address.

* Note that this field could be 16 bits on x86 ... ;)

*
* Architectures with slow multiplication can define

* WANT_PAGE_VIRTUAL in asm/page.h

*/
#if defined(WANT_PAGE_VIRTUAL)

void *virtual;          /* Kernel virtual address (NULL if

not kmapped, ie. highmem) */

#endif /* WANT_PAGE_VIRTUAL */

#ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS

unsigned long debug_flags;  /* Use atomic bitops on this */

#endif

#ifdef CONFIG_KMEMCHECK

/*
* kmemcheck wants to track the status of each byte in a page; this

* is a pointer to such a status block. NULL if not tracked.

*/
void *shadow;

#endif
};

/*
* Each physical page in the system has a struct page associated with
* it to keep track of whatever it is we are using the page for at the
* moment. Note that we have no way to track which tasks are using
* a page, though if it is a pagecache page, rmap structures can tell us
* who is mapping it.
*/
struct page {
unsigned long flags;		/* Atomic flags, some possibly
* updated asynchronously */
atomic_t _count;		/* Usage count, see below. */
union {
atomic_t _mapcount;	/* Count of ptes mapped in mms,
* to show when page is mapped
* & limit reverse map searches.
*/
struct {		/* SLUB */
u16 inuse;
u16 objects;
};
};
union {
struct {
unsigned long private;		/* Mapping-private opaque data:
* usually used for buffer_heads
* if PagePrivate set; used for
* swp_entry_t if PageSwapCache;
* indicates order in the buddy
* system if PG_buddy is set.
*/
struct address_space *mapping;	/* If low bit clear, points to
* inode address_space, or NULL.
* If page mapped as anonymous
* memory, low bit is set, and
* it points to anon_vma object:
* see PAGE_MAPPING_ANON below.
*/
};
#if USE_SPLIT_PTLOCKS
spinlock_t ptl;
#endif
struct kmem_cache *slab;	/* SLUB: Pointer to slab */
struct page *first_page;	/* Compound tail pages */
};
union {
pgoff_t index;		/* Our offset within mapping. */
void *freelist;		/* SLUB: freelist req. slab lock */
};
struct list_head lru;		/* Pageout list, eg. active_list
* protected by zone->lru_lock !
*/
/*
* On machines where all RAM is mapped into kernel address space,
* we can simply calculate the virtual address. On machines with
* highmem some memory is mapped into kernel virtual memory
* dynamically, so we need a place to store that address.
* Note that this field could be 16 bits on x86 ... ;)
*
* Architectures with slow multiplication can define
* WANT_PAGE_VIRTUAL in asm/page.h
*/
#if defined(WANT_PAGE_VIRTUAL)
void *virtual;			/* Kernel virtual address (NULL if
not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */
#ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
unsigned long debug_flags;	/* Use atomic bitops on this */
#endif

#ifdef CONFIG_KMEMCHECK
/*
* kmemcheck wants to track the status of each byte in a page; this
* is a pointer to such a status block. NULL if not tracked.
*/
void *shadow;
#endif
};

页的page结构放在mem_map中。 要想找到一个物理地址对应的page结构很简单, 就是 mem_map[pfn]即可。

由于硬件的关系,会将整个内存分成不同的zone,:ZONE_DMA, ZONE_NORMAL, ZONE_HIGHMEM. 这样分的原因是 :有些芯片的DMA设置只能设置到固定区域的地址,所以将这些区域作为ZONE_DMA以免其他操作占用。NORMAL就是正常的内存。高位内存内核空间因为只有1G不能全部映射,所以也要做特殊的处理,进行映射,才能使用。

这三个ZONE使用的描述符是:

[cpp]
view plaincopyprint?

struct zone {

/* Fields commonly accessed by the page allocator */

/* zone watermarks, access with *_wmark_pages(zone) macros */

unsigned long watermark[NR_WMARK];

/*
* When free pages are below this point, additional steps are taken

* when reading the number of free pages to avoid per-cpu counter

* drift allowing watermarks to be breached

*/
unsigned long percpu_drift_mark;

/*
* We don't know if the memory that we're going to allocate will be freeable

* or/and it will be released eventually, so to avoid totally wasting several

* GB of ram we must reserve some of the lower zone memory (otherwise we risk

* to run OOM on the lower zones despite there's tons of freeable ram

* on the higher zones). This array is recalculated at runtime if the

* sysctl_lowmem_reserve_ratio sysctl changes.

*/
unsigned long lowmem_reserve[MAX_NR_ZONES];

#ifdef CONFIG_NUMA

int node;

/*
* zone reclaim becomes active if more unmapped pages exist.

*/
unsigned long min_unmapped_pages;

unsigned long min_slab_pages;

#endif

struct per_cpu_pageset __percpu *pageset;

/*
* free areas of different sizes

*/
spinlock_t lock;
int all_unreclaimable; /* All pages pinned */

#ifdef CONFIG_MEMORY_HOTPLUG

/* see spanned/present_pages for more description */

seqlock_t span_seqlock;
#endif

struct free_area free_area[MAX_ORDER];

#ifndef CONFIG_SPARSEMEM

/*
* Flags for a pageblock_nr_pages block. See pageblock-flags.h.

* In SPARSEMEM, this map is stored in struct mem_section

*/
unsigned long *pageblock_flags;

#endif /* CONFIG_SPARSEMEM */

#ifdef CONFIG_COMPACTION

/*
* On compaction failure, 1<<compact_defer_shift compactions

* are skipped before trying again. The number attempted since

* last failure is tracked with compact_considered.

*/
unsigned int compact_considered;

unsigned int compact_defer_shift;

#endif

ZONE_PADDING(_pad1_)

/* Fields commonly accessed by the page reclaim scanner */

spinlock_t lru_lock;
struct zone_lru {

struct list_head list;

} lru[NR_LRU_LISTS];

struct zone_reclaim_stat reclaim_stat;

unsigned long pages_scanned; /* since last reclaim */

unsigned long flags; /* zone flags, see below */

/* Zone statistics */

atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];

/*
* The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on

* this zone's LRU. Maintained by the pageout code.

*/
unsigned int inactive_ratio;

ZONE_PADDING(_pad2_)
/* Rarely used or read-mostly fields */

/*
* wait_table -- the array holding the hash table

* wait_table_hash_nr_entries -- the size of the hash table array

* wait_table_bits -- wait_table_size == (1 << wait_table_bits)

*
* The purpose of all these is to keep track of the people

* waiting for a page to become available and make them

* runnable again when possible. The trouble is that this

* consumes a lot of space, especially when so few things

* wait on pages at a given time. So instead of using

* per-page waitqueues, we use a waitqueue hash table.

*
* The bucket discipline is to sleep on the same queue when

* colliding and wake all in that wait queue when removing.

* When something wakes, it must check to be sure its page is

* truly available, a la thundering herd. The cost of a

* collision is great, but given the expected load of the

* table, they should be so rare as to be outweighed by the

* benefits from the saved space.

*
* __wait_on_page_locked() and unlock_page() in mm/filemap.c, are the

* primary users of these fields, and in mm/page_alloc.c

* free_area_init_core() performs the initialization of them.

*/
wait_queue_head_t * wait_table;
unsigned long wait_table_hash_nr_entries;

unsigned long wait_table_bits;

/*
* Discontig memory support fields.

*/
struct pglist_data *zone_pgdat;

/* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */

unsigned long zone_start_pfn;

/*
* zone_start_pfn, spanned_pages and present_pages are all

* protected by span_seqlock. It is a seqlock because it has

* to be read outside of zone->lock, and it is done in the main

* allocator path. But, it is written quite infrequently.

*
* The lock is declared along with zone->lock because it is

* frequently read in proximity to zone->lock. It's good to

* give them a chance of being in the same cacheline.

*/
unsigned long spanned_pages; /* total size, including holes */

unsigned long present_pages; /* amount of memory (excluding holes) */

/*
* rarely used fields:

*/
const char *name;

} ____cacheline_internodealigned_in_smp;

[cpp]
view plaincopyprint?

struct zone_lru {

struct list_head list;

} lru[NR_LRU_LISTS];

struct zone_lru {
struct list_head list;
} lru[NR_LRU_LISTS];


[cpp]
view plaincopyprint?

<p>/*
* We do arithmetic on the LRU lists in various places in the code,

* so it is important to keep the active lists LRU_ACTIVE higher in

* the array than the corresponding inactive lists, and to keep

* the *_FILE lists LRU_FILE higher than the corresponding _ANON lists.

*
* This has to be kept in sync with the statistics in zone_stat_item

* above and the descriptions in vmstat_text in mm/vmstat.c

*/
#define LRU_BASE 0

#define LRU_ACTIVE 1

#define LRU_FILE 2</p><p>enum lru_list {

LRU_INACTIVE_ANON = LRU_BASE,
LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
LRU_UNEVICTABLE,
NR_LRU_LISTS
};</p>

[cpp]
view plaincopyprint?

typedef struct pglist_data {

struct zone node_zones[MAX_NR_ZONES];

struct zonelist node_zonelists[MAX_ZONELISTS];

int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */

struct page *node_mem_map;

#ifdef CONFIG_CGROUP_MEM_RES_CTLR

struct page_cgroup *node_page_cgroup;

#endif

#endif
#ifndef CONFIG_NO_BOOTMEM

struct bootmem_data *bdata;

#endif

#ifdef CONFIG_MEMORY_HOTPLUG

/*
* Must be held any time you expect node_start_pfn, node_present_pages

* or node_spanned_pages stay constant.  Holding this will also

* guarantee that any pfn_valid() stays that way.

*
* Nests above zone->lock and zone->size_seqlock.

*/
spinlock_t node_size_lock;
#endif

unsigned long node_start_pfn;

unsigned long node_present_pages; /* total number of physical pages */

unsigned long node_spanned_pages; /* total size of physical page

range, including holes */

int node_id;
wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;

int kswapd_max_order;

enum zone_type classzone_idx;

} pg_data_t;

typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
struct zonelist node_zonelists[MAX_ZONELISTS];
int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP	/* means !SPARSEMEM */
struct page *node_mem_map;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
struct page_cgroup *node_page_cgroup;
#endif
#endif
#ifndef CONFIG_NO_BOOTMEM
struct bootmem_data *bdata;
#endif
#ifdef CONFIG_MEMORY_HOTPLUG
/*
* Must be held any time you expect node_start_pfn, node_present_pages
* or node_spanned_pages stay constant.  Holding this will also
* guarantee that any pfn_valid() stays that way.
*
* Nests above zone->lock and zone->size_seqlock.
*/
spinlock_t node_size_lock;
#endif
unsigned long node_start_pfn;
unsigned long node_present_pages; /* total number of physical pages */
unsigned long node_spanned_pages; /* total size of physical page
range, including holes */
int node_id;
wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;
int kswapd_max_order;
enum zone_type classzone_idx;
} pg_data_t;

在pglist_data结构中,一个成员变量node_zones[], 他代表了这个node下的所有zone。组成一个数组。

另一个成员变量node_zonelist[]. 这是一个list数组,每一个list中将不同node下的所有的zone都link到一起。数组中不同元素list中,zone的排列顺序不一样。这样当内核需要malloc一些page的时候,可能当前的这个node中并没有足够的page,那么就会按照这个数组list中的顺序依次去申请空间。内核在不同的情况下,需要按照不同的顺序申请空间。所以需要好几个不同的list,这些list就组成了这个数组。

每一个pglist_data结构对应一个node。这样,在每个zone结构上又多了一个node结构。这样内存页管理的结构应该是

node:同过UMA和NUMA将内存分成几个node。(在arm系统中,如果不是smp的一般都是一个node)

zone:在每个node中,再将内存分配成几个zone。

page:在每个zone中,对page进行管理,添加都各种list中。

for example: 可以分析一下 for_each_zone这个宏定义,会发现就是先查找每个node,在每个node下进行zone的扫描。

上面的这些都是描述物理空间的,page,zone,node都是物理空间的管理结构,下面的结构体,描述虚拟空间

从物理空间来看,物理空间主要是“供”,他是实实在在存在的,主要目的就是向OS提供空间。

从虚拟空间来看,虚拟空间是“需”,他是虚拟的,主要是就发送需求。

*****当虚拟空间提出了 “需求”,但物理空间无法满足时候,就会进行swap out操作了。

vm_area_struct结构:这个结构描述 进程的的内存空间的情况。

[cpp]
view plaincopyprint?

/*
* This struct defines a memory VMM memory area. There is one of these

* per VM-area/task. A VM area is any part of the process virtual memory

* space that has a special rule for the page-fault handlers (ie a shared

* library, the executable area etc).

*/
struct vm_area_struct {

struct mm_struct * vm_mm; /* The address space we belong to. */

unsigned long vm_start; /* Our start address within vm_mm. */

unsigned long vm_end; /* The first byte after our end address

within vm_mm. */

/* linked list of VM areas per task, sorted by address */

struct vm_area_struct *vm_next, *vm_prev;

pgprot_t vm_page_prot; /* Access permissions of this VMA. */

unsigned long vm_flags; /* Flags, see mm.h. */

struct rb_node vm_rb;

/*
* For areas with an address space and backing store,

* linkage into the address_space->i_mmap prio tree, or

* linkage to the list of like vmas hanging off its node, or

* linkage of vma in the address_space->i_mmap_nonlinear list.

*/
union {

struct {
struct list_head list;

void *parent; /* aligns with prio_tree_node parent */

struct vm_area_struct *head;

} vm_set;

struct raw_prio_tree_node prio_tree_node;

} shared;

/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma

* list, after a COW of one of the file pages. A MAP_SHARED vma

* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack

* or brk vma (with NULL file) can only be in an anon_vma list.

*/
struct list_head anon_vma_chain; /* Serialized by mmap_sem &

* page_table_lock */

struct anon_vma *anon_vma; /* Serialized by page_table_lock */

/* Function pointers to deal with this struct. */

const struct vm_operations_struct *vm_ops;

/* Information about our backing store: */

unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE

units, *not* PAGE_CACHE_SIZE */

struct file * vm_file; /* File we map to (can be NULL). */

void * vm_private_data; /* was vm_pte (shared mem) */

unsigned long vm_truncate_count;/* truncate_count or restart_addr */

#ifndef CONFIG_MMU

struct vm_region *vm_region; /* NOMMU mapping region */

#endif

#ifdef CONFIG_NUMA

struct mempolicy *vm_policy; /* NUMA policy for the VMA */

#endif
};

[cpp]
view plaincopyprint?

pgprot_t vm_page_prot;  /* Access permissions of this VMA. */

unsigned long vm_flags;  /* Flags, see mm.h. */

pgprot_t vm_page_prot;  /* Access permissions of this VMA. */
unsigned long vm_flags;  /* Flags, see mm.h. */

这两个成员变量来表示。看到这两个成员变量可算看到亲人了。 还记得那个大明湖畔的容嬷嬷妈?-------pgprot_t vm_page_prot;

每一个进程的所有vm_erea_struct结构通过

[cpp]
view plaincopyprint?

/* linked list of VM areas per task, sorted by address */

struct vm_area_struct *vm_next, *vm_prev;

[cpp]
view plaincopyprint?

<span style="font-size:13px;">    /*

* For areas with an address space and backing store,

* linkage into the address_space->i_mmap prio tree, or

* linkage to the list of like vmas hanging off its node, or

* linkage of vma in the address_space->i_mmap_nonlinear list.

*/
union {

struct {
struct list_head list;

void *parent;   /* aligns with prio_tree_node parent */

struct vm_area_struct *head;

} vm_set;

struct raw_prio_tree_node prio_tree_node;

} shared;
</span>

<span style="font-size:13px;">	/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap prio tree, or
* linkage to the list of like vmas hanging off its node, or
* linkage of vma in the address_space->i_mmap_nonlinear list.
*/
union {
struct {
struct list_head list;
void *parent;	/* aligns with prio_tree_node parent */
struct vm_area_struct *head;
} vm_set;

struct raw_prio_tree_node prio_tree_node;
} shared;
</span>


有两种情况虚拟空间会与磁盘flash发生关系。

1. 当内存分配失败,没有足够的内存时候,会发生swap。

2. linux进行mmap系统时候,会将磁盘上的内存map到用户空间,像访问memory一样,直接访问文件内容。

为了迎合1. vm_area_struct 的mapping,vm_next_share,vm_pprev_share,vm_file等。但是在2.6.39中没找到这些变量。仅有:(在page结构体中,也有swap相关的信息)

[cpp]
view plaincopyprint?

/* Information about our backing store: */

unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE

units, *not* PAGE_CACHE_SIZE */

struct file * vm_file; /* File we map to (can be NULL). */

void * vm_private_data; /* was vm_pte (shared mem) */

unsigned long vm_truncate_count;/* truncate_count or restart_addr */

[cpp]
view plaincopyprint?

/* Function pointers to deal with this struct. */

const struct vm_operations_struct *vm_ops;

/* Function pointers to deal with this struct. */
const struct vm_operations_struct *vm_ops;

这样的变量,进行操作。

在vm_area_struct结构中,有个mm_struct结构,注释说他属于这个结构,由此看来,mm_struct结构应该是vm_area_struct结构的上层。

[cpp]
view plaincopyprint?

struct mm_struct {

struct vm_area_struct * mmap; /* list of VMAs */

struct rb_root mm_rb;

struct vm_area_struct * mmap_cache; /* last find_vma result */

#ifdef CONFIG_MMU

unsigned long (*get_unmapped_area) (struct file *filp,

unsigned long addr, unsigned long len,

unsigned long pgoff, unsigned long flags);

void (*unmap_area) (struct mm_struct *mm, unsigned long addr);

#endif
unsigned long mmap_base; /* base of mmap area */

unsigned long task_size; /* size of task vm space */

unsigned long cached_hole_size; /* if non-zero, the largest hole below free_area_cache */

unsigned long free_area_cache; /* first hole of size cached_hole_size or larger */

pgd_t * pgd;
atomic_t mm_users; /* How many users with user space? */

atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */

int map_count; /* number of VMAs */

spinlock_t page_table_lock; /* Protects page tables and some counters */

struct rw_semaphore mmap_sem;

struct list_head mmlist; /* List of maybe swapped mm's. These are globally strung

* together off init_mm.mmlist, and are protected

* by mmlist_lock

*/

unsigned long hiwater_rss; /* High-watermark of RSS usage */

unsigned long hiwater_vm; /* High-water virtual memory usage */

unsigned long total_vm, locked_vm, shared_vm, exec_vm;

unsigned long stack_vm, reserved_vm, def_flags, nr_ptes;

unsigned long start_code, end_code, start_data, end_data;

unsigned long start_brk, brk, start_stack;

unsigned long arg_start, arg_end, env_start, env_end;

unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */

/*
* Special counters, in some configurations protected by the

* page_table_lock, in other configurations by being atomic.

*/
struct mm_rss_stat rss_stat;

struct linux_binfmt *binfmt;

cpumask_t cpu_vm_mask;

/* Architecture-specific MM context */

mm_context_t context;

/* Swap token stuff */

/*
* Last value of global fault stamp as seen by this process.

* In other words, this value gives an indication of how long

* it has been since this task got the token.

* Look at mm/thrash.c
*/
unsigned int faultstamp;

unsigned int token_priority;

unsigned int last_interval;

/* How many tasks sharing this mm are OOM_DISABLE */

atomic_t oom_disable_count;

unsigned long flags; /* Must use atomic bitops to access the bits */

struct core_state *core_state; /* coredumping support */

#ifdef CONFIG_AIO

spinlock_t ioctx_lock;
struct hlist_head ioctx_list;

#endif

#ifdef CONFIG_MM_OWNER

/*
* "owner" points to a task that is regarded as the canonical

* user/owner of this mm. All of the following must be true in

* order for it to be changed:

*
* current == mm->owner
* current->mm != mm

* new_owner->mm == mm
* new_owner->alloc_lock is held

*/
struct task_struct __rcu *owner;

#endif

#ifdef CONFIG_PROC_FS

/* store ref to file /proc/<pid>/exe symlink points to */

struct file *exe_file;

unsigned long num_exe_file_vmas;

#endif
#ifdef CONFIG_MMU_NOTIFIER

struct mmu_notifier_mm *mmu_notifier_mm;

#endif

#ifdef CONFIG_TRANSPARENT_HUGEPAGE

pgtable_t pmd_huge_pte; /* protected by page_table_lock */

#endif
};

[cpp]
view plaincopyprint?

__dabt_usr:
usr_entry
kuser_cmpxchg_check

@
@ Call the processor-specific abort handler:
@
@  r2 - aborted context pc
@  r3 - aborted context cpsr
@
@ The abort handler must return the aborted address in r0, and

@ the fault status register in r1.                           //说明了调用do_DataAbort时候传递给他的参数。  r0,r1,

@
#ifdef MULTI_DABORT

ldr r4, .LCprocfns
mov lr, pc
ldr pc, [r4, #PROCESSOR_DABT_FUNC]
#else
bl  CPU_DABORT_HANDLER
#endif

@
@ IRQs on, then call the main handler
@
enable_irq
mov r2, sp                                           //参数r2

adr lr, ret_from_exception
b   do_DataAbort                   //调用do_DataAbort

ENDPROC(__dabt_usr)

__dabt_usr:
usr_entry
kuser_cmpxchg_check

@
@ Call the processor-specific abort handler:
@
@  r2 - aborted context pc
@  r3 - aborted context cpsr
@
@ The abort handler must return the aborted address in r0, and
@ the fault status register in r1.                           //说明了调用do_DataAbort时候传递给他的参数。  r0,r1,
@
#ifdef MULTI_DABORT
ldr	r4, .LCprocfns
mov	lr, pc
ldr	pc, [r4, #PROCESSOR_DABT_FUNC]
#else
bl	CPU_DABORT_HANDLER
#endif

@
@ IRQs on, then call the main handler
@
enable_irq
mov	r2, sp                                           //参数r2
adr	lr, ret_from_exception
b	do_DataAbort                   //调用do_DataAbort
ENDPROC(__dabt_usr)


do_DataAbor根据川进来的参数调用 inf->fn ,在这里就是 do_page_fault

[cpp]
view plaincopyprint?

asmlinkage void __exception

do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs)

{
const struct fsr_info *inf = fsr_info + (fsr & 15) + ((fsr & (1 << 10)) >> 6);

struct siginfo info;

if (!inf->fn(addr, fsr, regs))

return;

printk(KERN_ALERT "Unhandled fault: %s (0x%03x) at 0x%08lx\n",

inf->name, fsr, addr);

info.si_signo = inf->sig;
info.si_errno = 0;
info.si_code = inf->code;
info.si_addr = (void __user *)addr;

arm_notify_die("", regs, &info, fsr, 0);

}

[cpp]
view plaincopyprint?

static int __kprobes                               //__kprobes标志应该是调试的一种东西,等以后再专门研究一下。

do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)

{
struct task_struct *tsk;

struct mm_struct *mm;

int fault, sig, code;

if (notify_page_fault(regs, fsr))       //这个函数是专门和上面的__kprobes对应的,调试的东西

return 0;

tsk = current;
mm  = tsk->mm;                          //将出现页异常的进程的  的进程描述符 赋给tsk,内存描述符赋给mm

/*
* If we're in an interrupt or have no user

* context, we must not take the fault..

*/
if (in_atomic() || !mm)                //判断发生发生异常的,in_atomic判断是否是在 原子操作中:中断程序,可延迟函数,禁用内核抢占的临界区。   !mm  判断进程是否是内核线程。参照博客线程调度的文章。

goto no_context;              //如果缺页是发生在这些情况下,那么就要特殊处理,因为这些程序都是没有用户空间的,要特殊处理。

/*
* As per x86, we may deadlock here.  However, since the kernel only

* validly references user space from well defined areas of the code,

* we can bug out early if this is from code which shouldn't.

*/
if (!down_read_trylock(&mm->mmap_sem)) {

if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))

goto no_context;

down_read(&mm->mmap_sem);
}                                           //down sem,没啥说的。

fault = __do_page_fault(mm, addr, fsr, tsk);          //下面分析。

up_read(&mm->mmap_sem);

/*
* Handle the "normal" case first - VM_FAULT_MAJOR / VM_FAULT_MINOR

*/
if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP | VM_FAULT_BADACCESS))))

return 0;                                                          //如果返回值不是上面的值。那么就是MAJOR  MINOR,说明问题解决了,return,如果是,那么还要go on

/*
* If we are in kernel mode at this point, we

* have no context to handle this fault with.

*/
if (!user_mode(regs))

goto no_context; //如果是内核空间出现了 页异常,并且通过__do_page_fault没有没有解决,那么到on_context

if (fault & VM_FAULT_OOM) {

/*
* We ran out of memory, or some other thing

* happened to us that made us unable to handle

* the page fault gracefully.

*/

printk("VM: killing process %s\n", tsk->comm);

do_group_exit(SIGKILL);
return 0;
}
if (fault & VM_FAULT_SIGBUS) {

/*
* We had some memory, but were unable to

* successfully fix up this page fault.

*/
sig = SIGBUS;
code = BUS_ADRERR;
} else {

/*
* Something tried to access memory that

* isn't in our memory map..

*/

sig = SIGSEGV;
code = fault == VM_FAULT_BADACCESS ?
SEGV_ACCERR : SEGV_MAPERR;
}                       //这上面的英文描述很清楚了

__do_user_fault(tsk, addr, fsr, sig, code, regs);    //用户态错误,这个函数什么都不做,就是发个新号,弄死进程

return 0;

no_context:
__do_kernel_fault(mm, addr, fsr, regs);             //内核错误,这个函数什么都不干,发送OOP:::啊啊啊 啊啊啊,这个警告曾经弄死多少好汉

return 0;
}

static int __kprobes                               //__kprobes标志应该是调试的一种东西,等以后再专门研究一下。
do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
{
struct task_struct *tsk;
struct mm_struct *mm;
int fault, sig, code;

if (notify_page_fault(regs, fsr))       //这个函数是专门和上面的__kprobes对应的,调试的东西
return 0;

tsk = current;
mm  = tsk->mm;                          //将出现页异常的进程的  的进程描述符 赋给tsk,内存描述符赋给mm

/*
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
if (in_atomic() || !mm)                //判断发生发生异常的,in_atomic判断是否是在 原子操作中:中断程序,可延迟函数,禁用内核抢占的临界区。   !mm  判断进程是否是内核线程。参照博客线程调度的文章。
goto no_context;              //如果缺页是发生在这些情况下,那么就要特殊处理,因为这些程序都是没有用户空间的,要特殊处理。

/*
* As per x86, we may deadlock here.  However, since the kernel only
* validly references user space from well defined areas of the code,
* we can bug out early if this is from code which shouldn't.
*/
if (!down_read_trylock(&mm->mmap_sem)) {
if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
goto no_context;
down_read(&mm->mmap_sem);
}                                           //down sem,没啥说的。

fault = __do_page_fault(mm, addr, fsr, tsk);          //下面分析。
up_read(&mm->mmap_sem);

/*
* Handle the "normal" case first - VM_FAULT_MAJOR / VM_FAULT_MINOR
*/
if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP | VM_FAULT_BADACCESS))))
return 0;                                                          //如果返回值不是上面的值。那么就是MAJOR  MINOR,说明问题解决了,return,如果是,那么还要go on

/*
* If we are in kernel mode at this point, we
* have no context to handle this fault with.
*/
if (!user_mode(regs))
goto no_context; //如果是内核空间出现了 页异常,并且通过__do_page_fault没有没有解决,那么到on_context

if (fault & VM_FAULT_OOM) {
/*
* We ran out of memory, or some other thing
* happened to us that made us unable to handle
* the page fault gracefully.
*/
printk("VM: killing process %s\n", tsk->comm);
do_group_exit(SIGKILL);
return 0;
}
if (fault & VM_FAULT_SIGBUS) {
/*
* We had some memory, but were unable to
* successfully fix up this page fault.
*/
sig = SIGBUS;
code = BUS_ADRERR;
} else {
/*
* Something tried to access memory that
* isn't in our memory map..
*/
sig = SIGSEGV;
code = fault == VM_FAULT_BADACCESS ?
SEGV_ACCERR : SEGV_MAPERR;
}                       //这上面的英文描述很清楚了

__do_user_fault(tsk, addr, fsr, sig, code, regs);    //用户态错误,这个函数什么都不做,就是发个新号,弄死进程
return 0;

no_context:
__do_kernel_fault(mm, addr, fsr, regs);             //内核错误,这个函数什么都不干,发送OOP:::啊啊啊 啊啊啊,这个警告曾经弄死多少好汉
return 0;
}


[cpp]
view plaincopyprint?

static int

__do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,

struct task_struct *tsk)

{
struct vm_area_struct *vma;

int fault, mask;

vma = find_vma(mm, addr); //find_vma函数是从mm结构中找到一个vm_area_struct结构,这个结构的vm_end值 > 参数addr, 但是vm_start可能大于也可能小于,但fing_vma会尽可能找到小于的。即:addr在vma这个区间中。

fault = VM_FAULT_BADMAP;
if (!vma) //如果vma为0,那么想当于所有vma的vm_end地址都 < 参数addr,这个产生异常错误的地址 肯定是无效地址了。 因为根据进程的结构,如上图,所有vm中vm_end的最大值是3G。

goto out;

if (vma->vm_start > addr) //如果vm_start>addr,说明这个异常的地址是 图中 的那个 空洞中,那么则可能是在用户态的栈中出错的。则跳去 check_stack

goto check_stack;

/*
* Ok, we have a good vm_area for this

* memory access, so we can handle it.

*///如果上面条件都不是,那么得到的vma结构就包含 addr这个地址。有可能是malloc后出错的。 用户态下的malloc时候,其实并不分配物理空间,只是返回一个虚拟地址,并产生一个vm_area_struct结构,当真正要用到malloc的空间的时候,才会产生异常,在这里得到真正的物理空间。 当然也有可能是其他原因,比如在read only的时候进行write了。

good_area:
if (fsr & (1 << 11)) /* write? */ //fsr从第一段的汇编中得出意义,他是r1.状态。 这里看地址异常时候 是写还是其他。并复制到mask中

mask = VM_WRITE;
else
mask = VM_READ|VM_EXEC|VM_WRITE;

fault = VM_FAULT_BADACCESS;
if (!(vma->vm_flags & mask))

goto out; //如果是写,但是vma->vm_flag写没有set 1,说明的确是权限的问题,那么就set fault的值,退出。

/*
* If for any reason at all we couldn't handle

* the fault, make sure we exit gracefully rather

* than endlessly redo the fault.

*/
survive: //上面原因都不是,那么就给进程分配一个新的页框。成功返回VM_FAULT_MAJOR(在handle_mm_fault中得到一个页框时候出现了阻塞,进行了睡眠)/VM_FAULT_MINOR(没有睡眠)

fault = handle_mm_fault(mm, vma, addr & PAGE_MASK, fsr & (1 << 11));

if (unlikely(fault & VM_FAULT_ERROR)) {

if (fault & VM_FAULT_OOM) //返回OOM,没有足够的内存。 到out_of_memory中,sleep一会,retry。

goto out_of_memory;

else if (fault & VM_FAULT_SIGBUS)

return fault;

BUG();
}
if (fault & VM_FAULT_MAJOR)

tsk->maj_flt++;
else

tsk->min_flt++;
return fault;

out_of_memory:
if (!is_global_init(tsk))

goto out;

/*
* If we are out of memory for pid1, sleep for a while and retry

*/
up_read(&mm->mmap_sem);
yield();
down_read(&mm->mmap_sem);
goto survive;

check_stack:
if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr)) //检查堆栈,进行expand,扩展原来的栈的vma,然后goto good_area,去得到一个实际的物理地址。

goto good_area;

out:
return fault;

}

[cpp]
view plaincopyprint?

/* Fields commonly accessed by the page reclaim scanner */

spinlock_t      lru_lock;
struct zone_lru {

struct list_head list;

} lru[NR_LRU_LISTS];

/* Fields commonly accessed by the page reclaim scanner */
spinlock_t		lru_lock;
struct zone_lru {
struct list_head list;
} lru[NR_LRU_LISTS];


用来连接这个zone中的各个不同状态的页。

[cpp]
view plaincopyprint?

enum lru_list {

LRU_INACTIVE_ANON = LRU_BASE,
LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,

LRU_UNEVICTABLE,
NR_LRU_LISTS
};

enum lru_list {
LRU_INACTIVE_ANON = LRU_BASE,
LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
LRU_UNEVICTABLE,
NR_LRU_LISTS
};

LRU_INACTIVE_ANON 和LRU_ACTIVE_ANON对应于匿名页,就是上面说的可交换的页。 LRU_INACTIVE_FILE 和 LRU_ACTIVE_FILE对应于可同步页。LRU_UNEVICTABLE应该指不可回收的页。

讲解一下上图中相关的一些知识:

首先两个方框表示两个list,一个active,一个inactive。从了在zone的这个list中可以找到active的页,也可以从page这个结构体中检查。当page结构体中的 flag这个成员变量,flag&PG_active ==1时,page对应的页是active的,如flag&PG_active ==0,则是inactive的。

在每个方框中,也有两个部分,Ref0和Ref1。他代表页是否刚被访问过。如:当一个进程访问一个页时候,就会调用mark_page_accessed函数,这个函数将这个页的page结构的flag变量 flag |= PG_reference;这样标志页刚刚被访问过。但linux的kswap进程会扫描所有页,将 flag &= ~PG_reference;标志页有一段时间没有被访问过了。这样如果在Ref0中停留过长时间的话,就会调用shrink_active_list函数将这个页移动到inactive
list中。

下面介绍回收的流程:



上图就是介绍的两个shrink memory的方法。一个是当alloc_page内存不够时,直接direct page reclaim。 一种是swap daemons。

当然他们shrink的强度是不一样的,direct page reclaim是紧急情况的,必须要回收到的。而swap daemons是尽量回收,保留更多的内存空间。所以在try_to_free_pages在调用shrink_zones函数中会传入个priority,这个priority代表shrink的强度,priority会不断减一,强度越来越大。直到释放了足够的alloc_page的空间。如果priority减到0了,还不能shrink足够的空间。那么就是调用out_of_memory函数,来kill
进程,释放空间。

在进行shrink_page_list中,会根据相关的页进行不同的设置。如:

如当shrink LRU_INACTIVE_ANON 里的page的时候,就会将所有对应这个页的页表(PTE)的present位和dirty位请0.但是其他位不为0. 其他位标志了这个页所在的swap区的位置(此时pte里的值已经不是页表信息了,而是表示这个页保存在磁盘的地址。具体意义可看《深入理解linux内核》的710页,换出页标识符)。

当shrink LRU_INACTIVE_FILE 里的page的时候,就会将所有对应这个页的页表(PTE)的present位请0,但dirty位置一.由于这些页在磁盘上本来就是有对应的文件的,所以不需要记录他的具体问题,linux可以找到。

所以就有了上面讨论缺页异常时候,读页的那三个函数。和判断。。。。。。



在上面提到过,将所有对应于该页的pte,由于对应于一个页,可能会有不止一个的PTE映射在这个页上,那么如果通过这个page结构体来找到对应的所有的pte呢?这个机制叫反响映射,在《深入理解linux内核》的回收页框的 674页,反向映射。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: