您的位置:首页 > 大数据 > 人工智能

Containers In Memory: How Big Is Big?

2015-06-19 11:06 621 查看
Containers In Memory: How Big Is Big?























On the

blog


November 4: Other
Concurrency Sessions at PDC

November 3: PDC'09: Tutorial & Panel
October 26: Hoare
on Testing

October 23: Deprecating export Considered for ISO C++0x


Containers In Memory: How Big Is Big?

This article appeared in C/C++ Users Journal, 19(1), January 2001.

This column addresses two topics: An update on what's going on in the C++ standards process, and a technical question. For information about the former, see the accompanying "Standards Update" sidebar for breaking news about the 1998 C++ standard's first
official update, whose ink should be drying as you read this, and how it affects you.

The technical question is this: How much memory do the various standard containers use to store the same number of objects of the same type T? To answer this question, we have to consider two major items:


- the internal data structures used by containers like vector, deque, list, set/multiset, and map/multimap; and


- the way dynamic memory allocation works.

Let's begin with a brief recap of dynamic memory allocation and then work our way back up to what it means for the standard library.

Memory Managers and Their Strategies: A Brief Survey

To understand the total memory cost of using various containers, it's important to understand the basics of how the underlying dynamic memory allocation works - after all, the container has to get its memory from some memory manager somewhere, and
that manager in turn has to figure out how to parcel out available memory by applying some memory management strategy.

Here, in brief, are two popular memory management strategies. Further details are beyond the scope of this article; consult your favorite operating systems text for more information:


General-purpose allocation can provide any size of memory block that a caller might request (the request size, or block size). General-purpose allocation is very flexible, but has
several drawbacks, two of which are: a) performance, because it has to do more work; and b) fragmentation, because as blocks are allocated and freed we can end up with lots of little noncontiguous areas of unallocated memory.


Fixed-size allocation always returns a block of the same fixed size. This is obviously less flexible than general-purpose allocation, but it can be done much faster and doesn't result in the same kind
of fragmentation.

In practice, you'll often see combinations of the above. For example, perhaps your memory manager uses a general-purpose allocation scheme for all requests over some size S, and as an optimization provides a fixed-size allocation scheme for all requests
up to size S. It's usually unwieldy to have a separate arena for requests of size 1, another for requests of size 2, and so on; what normally happens is that the manager has a separate arena for requests of multiples of a certain size, say 16 bytes. If you
request 16 bytes, great, you only use 16 bytes; if you request 17 bytes, the request is allocated from the 32-byte arena, and 15 bytes are wasted. This is a source of possible overhead, but more about that in a moment.

The obvious next question is, Who selects the memory management strategy? There are several possible layers of memory manager involved, each of which may override the previous (lower-level) one:


The operating system kernel provides the most basic memory allocation services. This underlying allocation strategy, and its characteristics, can vary from one operating systems platform to another,
and this level is the most likely to be affected by hardware considerations.


The compiler's default runtime library builds its allocation services, such as C++'s operator new and C's malloc, upon the native allocation services. The compiler's services might just be a thin wrapper
around the native ones and inherit their characteristics, or the compiler's services might override the native strategies by buying larger chunks from the native services and then parceling those out according to their own methods.


The standard containers and allocators in turn use the compiler's services, and possibly further override them to implement their own strategies and optimizations.


Finally, user-defined containers and/or user-defined allocators can further reuse any of the lower-level services (for example, they may want to directly access native services if portability
doesn't matter) and do pretty much whatever they please.

These levels are summarized in Figure 1.



Thus memory allocators come in various flavours, and can or will vary from operating system to operating system, from compiler to compiler on the same operating system, from container to container - and even from object to object, say in the case of a vector<int>
object which uses the strategy implemented by allocator<int>, and a vector<int, MyAllocator> object which could express-mail memory blocks from Taiwan unless it's a weekday night and the Mets are playing, or implement whatever other strategy you like.

"I'll Take 'Operator New' For 200 Bytes, Alex"

When you ask for n bytes of memory using new or malloc, you actually use up at least n bytes of memory because typically the memory manager must add some overhead to your request. Two common considerations that affect this overhead are:

1. Housekeeping overhead.

In a general-purpose (i.e., not fixed-size) allocation scheme, the memory manager will have to somehow remember how big each block is so that it later knows how much memory to release when you call delete or free. Typically the manager remembers the block
size by storing that value at the beginning of the actual block it allocates, and then giving you a pointer to "your" memory that's offset past the housekeeping information. (See Figure 2.) Of course, this means it has to allocate extra space for that value,
which could be a number as big as the largest possible valid allocation and so is typically the same size as a pointer. When freeing the block, the memory manager will just take the pointer you give it, subtract the number of housekeeping bytes and read the
size, then perform the deallocation.



Of course, fixed-size allocation schemes (i.e., ones that return blocks of a given known size) don't need to store such overhead because they always know how big the block will be.

2. Chunk size overhead.

Even when you don't need to store extra information, a memory manager will often reserve more bytes than you asked for because memory is often allocated in certain-sized chunks.

For one thing, some platforms require certain types of data to appear on certain byte boundaries (e.g., some require pointers to be stored on 4-byte boundaries) and either break or perform more slowly if they're not. This is called alignment, and
it calls for extra padding within, and possibly at the end of, the object's data. Even plain old built-in C-style arrays are affected by this need for alignment because it contributes to sizeof(struct). See Figure 3, where I distinguish between internal padding
bytes and at-the-end padding bytes, although both contribute to sizeof(struct).



For example:

// Example 1: Assume sizeof(long) == 4 and longs have a 4-byte

// alignment requirement.

struct X1

{

char c1; // at offset 0, 1 byte

// bytes 1-3: 3 padding bytes

long l; // bytes 4-7: 4 bytes, aligned on 4-byte boundary

char c2; // byte 8: 1 byte

// bytes 9-11: 3 padding bytes (see narrative)

}; // sizeof(X1) == 12

In Figure 3's terms, n == 1 + 3 + 4 + 1 == 9, and m == sizeof(X1) == 12.[1] Note that all the padding contributes to sizeof(X1). The
at-the-end padding bytes may seem odd, but are needed so that when you build an array of X1's one after the other in memory, the long data is always 4-byte aligned. This at-the-end padding is the padding that's the most noticeable, and the most often surprising
for folks examining object data layout for the first time. It can be particularly surprising in this rearranged struct:

struct X2

{

long l; // bytes 0-3

char c1; // byte 4

char c2; // byte 5

// bytes 6-7: 2 padding bytes

}; // sizeof(X2) == 8

Now the data members really are all contiguous in memory (n == 6),[2] yet there's still extra space at the end that counts toward m ==
sizeof(X2) == 8 and that padding is most noticeable when you build an array of X2's. Bytes 6-7 are the padding highlighted in Figure 3.

Incidentally, this is why when writing the standard it's surprisingly tricky to wordsmith the requirement that "vectors must be contiguous" in the same sense as arrays - in Figure 3, the memory is considered contiguous even though there are "gaps" of dead
space, so what is "contiguous," really? Essentially, the individual sizeof(struct) chunks of memory are contiguous, and that definition works because sizeof(struct) already includes padding overhead. See also the "Standards Update" sidebar accompanying this
article for more about contiguous vectors.

The C++ standard guarantees that all memory allocated via operator new or malloc will be suitably aligned for any possible kind of object you might want to store in it, which means that operator new and malloc have to respect the strictest possible type
alignment requirement of the native platform.

Alternatively, as described earlier, a fixed-size allocation scheme might maintain memory arenas for blocks of certain sizes that are multiples of some basic size m, and a request for n bytes will get rounded up to the next multiple of m.

Memory and the Standard Containers: The Basic Story

Now we can address the original question: How much memory do the various standard containers use to store the same number of elements of the same type T?

Each standard container uses a different underlying memory structure and therefore imposes different overhead per contained object:


A vector<T> internally stores a contiguous C-style array of T objects, and so has no extra per-element overhead at all (besides padding for alignment, of course; note that here "contiguous" has the same meaning
as it does for C-style arrays, as shown in Figure 3).


A deque<T> can be thought of as a vector<T> whose internal storage is broken up into chunks. A deque<T> stores chunks, or "pages," of objects; the actual page size isn't specified by the standard, and depends
mainly on how big T objects are and on the size choices made by your standard library implementer. This paging requires the deque to store one extra pointer of management information per page, which usually works out to a mere fraction of a bit per contained
object; for example, on a system with 8-bit bytes and 4-byte ints and pointers, a deque<int> with a 4K page size incurs an overhead per int of 0.03125 bits - just 1/32 of a bit. There's no other per-element overhead because deque<T> doesn't store any extra
pointers or other information for individual T objects. There is no requirement that a deque's pages be C-style arrays, but that's the usual implementation.


A list<T> is a doubly-linked list of nodes that hold T elements. This means that for each T element, list<T> also stores two pointers, which point to the previous and next nodes in the list. Every time we insert
a new T element, we also create two more pointers, so a list<T> requires at least two pointers' worth of overhead per element.


A set<T> (and, for that matter, a multiset<T>, map<Key,T>, or multimap<Key,T>) also stores nodes that hold T (or pair<const Key,T>) elements. The usual implementation of a set is as a tree with three extra
pointers per node. Often people see this and think, 'why three pointers? isn't two enough, one for the left child and one for the right child?' The reason three are needed is that we also need an "up" pointer to the parent node, otherwise determining the 'next'
element starting from some arbitrary iterator can't be done efficiently enough. (Besides trees, other internal implementations of set are possible; for example, an alternating skip list can be used, which still requires at least three pointers per element
in the set.)[3]

Table 1 summarizes this basic overhead for each container.

Container

Typical housekeeping data overhead per contained object

vector

No overhead per T.

deque

Nearly no overhead per T — typically just a fraction of a bit.

list

Two pointers per T.

set, multiset

Three pointers per T.

map, multimap

Three pointers per pair<const Key, T>.

Table 1: Basic overhead per contained object for various containers

Memory and the Standard Containers: The Real World

Now we get to the interesting part: Don't be too quick to draw conclusions from Table 1. For example, judging from just the housekeeping data required for list and set, you might conclude that list requires
less overhead per contained object than set - after all, list only stores two extra pointers, whereas set stores three. The interesting thing is that this may not be true once you take into consideration the runtime memory allocation policies.

To dig a little deeper, consider Table 2 which shows the node layouts typically used internally by list, set/multiset, and map/multimap.

Container

Typical dynamic memory block used per contained object

vector

None; objects are not allocated individually. (See sidebar.)

deque

None; objects are allocated in pages, and nearly always each page will store many objects.

list

struct LNode {

LNode* prev;

LNode* next;

T object;

};

set, multiset

struct SNode {

SNode* prev;

SNode* next;

SNode* parent;

T object;

}; // or equivalent

map, multimap

struct MNode {

MNode* prev;

MNode* next;

MNode* parent;

std::pair<const Key, T> data;

}; // or equivalent

Table 2: Dynamic memory blocks used per contained object for various containers
Next, consider what happens in the real world under the following assumptions, which happen to be drawn from a popular platform:


Pointers and ints are 4 bytes long. (Typical for 32-bit platforms.)


sizeof(string) is 16. Note that this is just the size of the immediate string object and ignores any data buffers the string may itself allocate; the number and size of string's internal buffers will vary from
implementation to implementation, but doesn't affect the comparative results below. (This sizeof(string) is the actual value of one popular implementation.)


The default memory allocation strategy is to use fixed-size allocation where the block sizes are multiples of 16 bytes. (Typical for Microsoft Visual C++.)

Table 3 contains a sample analysis with these numbers. You can try this at home; just plug in the appropriate numbers for your platform to see how this kind of analysis applies to your own current environment. To see how to write a program that figures out
what the actual block overhead is for allocations of specific sizes on your platform, see Appendix 3 of Jon Bentley's classic Programming Pearls, 2nd edition.[4]

Container

Basic node data size

Actual size of allocation block for node, including internal node data alignment and block allocation overhead

list<char>

9 bytes

16 bytes

set<char>, multiset<char>

13 bytes

16 bytes

list<int>

12 bytes

16 bytes

set<int>, multiset<int>

16 bytes

16 bytes

list<string>

24 bytes

32 bytes

set<string>, multiset<string>

28 bytes

32 bytes

Table 3: Same actual overhead per contained object

(implementation-dependent assumptions: sizeof(string) == 16,

4-byte pointers and ints, and 16-byte fixed-size allocation blocks)
Looking at Table 3, we immediately spy one interesting result: For many cases - that is, for about 75% of possible sizes of the contained type T - list and set/multiset actually incur the same memory overhead
in this particular environment. What's more, here's an even more interesting result: even list<char> and set<int> have the same actual overhead in this particular environment, even though the latter stores both more object data and more housekeeping
data in each node.

If memory footprint is an important consideration for your choice of data structure in specific situations, take a few minutes to do this kind of analysis and see what the difference really is in your own environment
- sometimes it's less than you might think!

Summary

Each kind of container chooses a different space/performance tradeoff. You can do things efficiently with vector and set that you can't do with list, such as O(log N) searching;[5] you
can do things efficiently with vector that you can't do with list or set, such as random element access; you can do things efficiently with list, less so with set, and more slowly still with vector, such as insertion in the middle; and so on. To get more flexibility
often requires more storage overhead inside the container, but after you account for data alignment and memory allocation strategies, the difference in overhead may be significantly different than you'd think! For related discussion about data alignment and
space optimizations, see also Item 30 in Exceptional C++.[6]

Acknowledgment

Thanks to Pete Becker for the discussion that got me thinking about this topic.

Standards Update [SIDEBAR BOX]
Breaking news at press time: On Friday, October 27, 2000 at the conclusion of the Toronto meeting, the C++ standards committee approved two important milestones:

1. Approved the contents of the C++ Standard's first Technical Corrigendum (TC). The vote passed unanimously, and this material will become the first official update to the ISO/ANSI C++ Standard pending only a few more months of grinding through
the routine publication mechanics and paperwork that the standards bodies require.

One of the interesting changes in the TC is that the standard will now guarantee that vector storage is contiguous (except of course for the specialization vector<bool>), where "contiguous" means to be stored in the same way as a C-style array; see Figure
3. One reason that it's important that vector be stored contiguously is so that it can be used easily as a complete drop-in replacement for C-style arrays, even when calling legacy facilities designed to operate on plain arrays; for more details, see my July/August
1999 column in C++ Report.[7] If you're wondering why vector<bool> would get a seemingly surprising exception from this rule, see also my May 1999 column
in C++ Report for the scoop on that juicy little eccentricity.[8]

2. Approved initiation of work toward a Library Extensions Technical Report. The Library Working Group (LWG) and the full committee agree that it's time we start considering extensions to the standard library; of course, any such extensions won't
appear in an official standard for probably at least three years yet, if not more, but the point is that there are things that the community wants/needs and that we ought to start working into the standard library. Commonly requested items include things like
hash-based containers, regular expressions, smart pointers, and expression templates, among other facilities.

Between now and the next meeting (Copenhagen, April 29 - May 4, 2001) we will be drafting an official ISO request for a New Work Item, which essentially translates to "a request for ISO's blessing/authorization to do this work." I fully expect this request
to be drafted and approved by/at the Copenhagen meeting; and after a few more months' worth of bureaucratic machinery we should be officially in business by summer. Of course, some people have already been starting to work on such facilities in anticipation
of this approval; if you haven't checked out Boost yet, be sure to do so at www.boost.org.

Notes

1. Only a perverse implementation would add more than the minimum padding.

2. The compiler isn't allowed to do this rearrangement itself, though. The standard requires that all data that's in the same public:, protected:, or private: must be laid out in that order by the compiler. If you intersperse
your data with access specifiers, though, the compiler is allowed to rearrange the access-specifier-delimited blocks of data to improve the layout, which is why some people like putting an access specifier in front of every data member.

3. Laurence Marrie. "Algorithm Alley: Alternating Skip Lists" (Dr. Dobb's Journal, 25(8), August 2000).

4. Jon Bentley. Programming Pearls, 2nd edition (Addison-Wesley, 2000).

5. If the vector's contents are sorted.

6. Herb Sutter. Exceptional C++ (Addison-Wesley, 2000).

7. Herb Sutter. "Standard Library News, Part 1: Vectors and Deques" (C++ Report, 11(7), July/August 1999).

8. Herb Sutter. "When Is a Container Not a Container?" (C++ Report, 11(5), May 1999).

Copyright © 2009 Herb Sutter

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: