内存分配策略及计算
1、General-purpose allocation can provide any size of memory block that a caller might request(the request size, or block size). General-purpose allocation is very flexible, but has several drawbacks, two of which are: a) performance, because it has to do more work; and b) fragmentation, because as blocks are allocated and freed we can end up with lots of little noncontiguous areas of
  unallocated memory.(通用分配策略)
 
2、Fixed-size allocation always returns a block of the same fixed size. This is obviously less flexible than general-purpose allocation, but it can be done much faster and doesn't result in the same kind of fragmentation.(固定尺寸分配策略)
  
In practice, you'll often see combinations of the above. For example, perhaps your memory manager uses a general-purpose allocation scheme for all requests over some size S, and as an optimization provides a fixed-size allocation scheme for all requests up to size S. It's usually unwieldy to have a separate arena for requests of size 1, another for requests of size 2, and so on; what normally happens is that the manager has a separate arena for requests of multiples of a certain size, say 16 bytes. If you request 16 bytes, great, you only use 16 bytes; if you request 17 bytes, the request is allocated from the 32-byte arena, and 15 bytes are wasted. This is a source of possible overhead, but more about that in a moment.

Who selects the memory management strategy? There are several possible layers of memory manager involved, each of which may override the previous (lower-level) one:(内存分配策略的选择)

1.The operating system kernel provides the most basic memory allocation services. This underlying  allocation strategy, and its characteristics, can vary from one operating systems platform to another,   and this level is the most likely to be affected by hardware considerations.
 
2.The compiler's default runtime library builds its allocation services, such as C++'s operator new  and C's malloc, upon the native allocation services. The compiler's services might just be a thin   wrapper around the native ones and inherit their characteristics, or the compiler's services might  override the native strategies by buying larger chunks from the native services and then parceling those out according to their own methods.
 
3.The standard containers and allocators in turn use the compiler's services, and possibly further override them to implement their own strategies and optimizations.
 
4.Finally, user-defined containers and/or user-defined allocators can further reuse any of the lower-level  services (for example, they may want to directly access native services if portability doesn't matter)  and do pretty much whatever they please.
         包容关系
              ------------------------------------------------+
              |  user-defined containters and/or allocators   |
              |    +------------------------------------------+
              |    | stardard containters  and/or allocators  |
              |  +--------------------------------------------|
              |  |                                            |
              |  |      operator new and allocator            |
              |------------------------------------------------
              |            操作系统核心                       | 
              ------------------------------------------------|
When you ask for n bytes of memory using new or malloc, you actually use up at least n bytes of memory because typically the memory manager must add some overhead to your request. Two common considerations that affect this overhead are:

1. Housekeeping overhead.
In a general-purpose (i.e., not fixed-size) allocation scheme, the memory manager will have to somehow remember how big each block is so that it later knows how much memory to release when you call delete or free. Typically the manager remembers the block size by storing that value at the beginning of the actual block it allocates, and then giving you a pointer to "your" memory that's offset past the housekeeping information. (See Figure 2.) Of course, this means it has to allocate extra space for that value, which could be a number as big as the largest possible valid allocation and so is typically the same size as a pointer.
When freeing the block, the memory manager will just take the pointer you give it, subtract the number of housekeeping bytes and read the size, then perform the deallocation.
                           size          +    n bytes    用于通用分配策略
                     (housrkeeping info)    (我申请的)
                    
2. Chunk size overhead.
Even when you don't need to store extra information, a memory manager will often reserve more bytes than you asked for because memory is often allocated in certain-sized chunks.

For one thing, some platforms require certain types of data to appear on certain byte boundaries (e.g., some require pointers to be stored on 4-byte boundaries) and either break or perform more slowly if they're not. This is called alignment, and it calls for extra padding within, and possibly at the end of, the object's data. Even plain old built-in C-style arrays are affected by this need for alignment because it contributes to sizeof(struct).固定分配大小,须填充以补齐字节

For example:

// Example 1: Assume sizeof(long) == 4 and longs have a 4-byte
//            alignment requirement.(32位操作系统,不同的操作系统有不同)
struct X1
{
  char c1; // at offset 0, 1 byte
           // bytes 1-3: 3 padding bytes
  long l;  // bytes 4-7: 4 bytes, aligned on 4-byte boundary
  char c2; // byte 8: 1 byte
           // bytes 9-11: 3 padding bytes (see narrative)
}; // sizeof(X1) == 12
n == 1 + 3 + 4 + 1 == 9, and m == sizeof(X1) == 12

struct X2
{
  long l;  // bytes 0-3
  char c1; // byte 4
  char c2; // byte 5
           // bytes 6-7: 2 padding bytes
}; // sizeof(X2) == 8
 
容器的内存空间分配
Each standard container uses a different underlying memory structure and therefore imposes different overhead per contained object:
1、vector<T> internally stores a contiguous C-style array of T objects, and so has no extra per-element    overhead at all (besides padding for alignment, of course; note that here "contiguous" has the same meaning as it does for C-style arrays, as shown in Figure 3).
    
2、deque<T> can be thought of as a vector<T> whose internal storage is broken up into chunks.  A deque<T> stores chunks, or "pages," of objects; the actual page size isn't specified by the standard,    and depends mainly on how big T objects are and on the size choices made by your standard library implementer.   This paging requires the deque to store one extra pointer of management information per page, which usually    works out to a mere fraction of a bit per contained object; for example, on a system with 8-bit bytes and
   4-byte ints and pointers, a deque<int> with a 4K page size incurs an overhead per int of 0.03125 bits -    just 1/32 of a bit. There's no other per-element overhead because deque<T> doesn't store any extra pointers    or other information for individual T objects. There is no requirement that a deque's pages be C-style arrays,   but that's the usual implementation.
 
3、list<T> is a doubly-linked list of nodes that hold T elements. This means that for each T element,    list<T> also stores two pointers, which point to the previous and next nodes in the list. Every time we   insert a new T element, we also create two more pointers, so a list<T> requires at least two pointers'    worth of overhead per element.
 
4、set<T> (and, for that matter, a multiset<T>, map<Key,T>, or multimap<Key,T>) also stores nodes that hold    T (or pair<const Key,T>) elements. The usual implementation of a set is as a tree with three extra    pointers per node. Often people see this and think, 'why three pointers? isn't two enough, one for the    left child and one for the right child?' The reason three are needed is that we also need an "up" pointer    to the parent node, otherwise determining the 'next' element starting from some arbitrary iterator can't
    be done efficiently enough. (Besides trees, other internal implementations of set are possible; for     example, an alternating skip list can be used, which still requires at least three pointers per element     in the set.)
 
     Container      Typical housekeeping data overhead per contained object
     ----------    ---------------------------------------------------------
      vector         No overhead per T.
      deque          Nearly no overhead per T — typically just a fraction of a bit.
      list           Two pointers per T.
      set, multiset  Three pointers per T.
      map, multimap  Three pointers per pair<const Key, T>.
 
 多分配的结构:
     vector      None; objects are not allocated individually. (See sidebar.)
     deque       None; objects are allocated in pages, and nearly always each
                 page will store many objects.
      list           | set, multiset              |    map, multimap                                      
      struct LNode { | struct SNode {             |     struct MNode {                   
        LNode* prev; |  SNode* prev;              |      MNode* prev;                    
        LNode* next; |  SNode* next;              |      MNode* next;                    
        T object;    |  SNode* parent;            |      MNode* parent;                  
      };             |   T object;                |      std::pair<const Key, T> data;   
                     |    }; // or equivalent     |       }; // or equivalent
  在如下条件:
 1、Pointers and ints are 4 bytes long. (Typical for 32-bit platforms.)
 2、sizeof(string) is 16. Note that this is just the size of the immediate string object
    and ignores any data buffers the string may itself allocate; the number and size of
    string's internal buffers will vary from implementation to implementation, but doesn't
    affect the comparative results below. (This sizeof(string) is the actual value of one
     popular implementation.)
 3、The default memory allocation strategy is to use fixed-size allocation where the block
    sizes are multiples of 16 bytes. (Typical for Microsoft Visual C++.)
   
        Container                       Basic node    Actual size of allocation block for node,
                                        data size     including internal node data alignment
                                                      and block allocation overhead
        ----------------------   ------------------   -----------------------------------------
        list<char>                     9 bytes         16 bytes        
        set<char>, multiset<char>      13 bytes        16 bytes        
        list<int>                      12 bytes        16 bytes        
        set<int>, multiset<int>        16 bytes        16 bytes        
        list<string>                   24 bytes        32 bytes
        set<string>, multiset<string>  28 bytes        32 bytes
       -----------------------------------------------------------------------------------------
       条件: Same actual overhead per contained object                  
        (implementation-dependent assumptions: sizeof(string) == 16,
        4-byte pointers and ints, and 16-byte fixed-size allocation blocks)                                                                                                                               
    
 内存算法:
 best-first:
  - The first block in the heap that is big enough is allocated
  - Each free block keeps a pointer to the next free block
  - These pointers may be adjusted as memory is allocated
   
    We can keep track of free space in a separate list called the free list
   
 nearest-fit
 worst-fit
 next-fit ......
 
 memory leak: If a dynamically allocated variable leaves its scope before being
               recycled, the memory cannot be recycled and the program will
               gradually drain away memory until the computer halts.
 dangling pointer: The invalid reference in this kind of situation is known as a
                   dangling pointer.    
    |-----------|------------------------------------|------------------------------|
    |           | Manual Memory                      |   Automatic Memory           |
    |           | Management                         |    Management                |
    |-----------|------------------------------------|------------------------------|
    | Benefits  |   size (smaller)                   |  constrains complexity       |
    |           |   speed (faster)                   |                              |
    |           |   control (you decide when to free)|                              |
    |-----------|------------------------------------|------------------------------|        
    | Costs     |    complexity                      |larger total memory footprint |    
    |           |    memory leaks                    |“comparable” performance      |     
    |           |    dangling pointers               |                              |
    |-----------|------------------------------------|------------------------------|
  If you rewrite all of the functions in a program              
  as inline macros, you can increase the speed at which the  
  program executes. This is because the processor doesn’t have to  
  waste time jumping around to different memory locations. In addition,   
  because execution will not frequently jump to a nonlocal spot           
  in memory, the processor will be able to spend much of its time executing
  code in the cache.                                                      
 
  Likewise, you can make a program smaller by isolating every bit     
  of redundant code and placing it in its own function. While this will
  decrease the total number of machine instructions, this tactic will 
  make a program slower because not only does the processor spend     
  most of its time jumping around memory, but the processor’s cache  
  will also need to be constantly refreshed.                          
 
  memory alloc:
  1. Bitmapped Allocation
     利用bitmap来记录内存分配的占位符,利用BST数来记录分配内存的大小
  2. Sequential Fit
     The sequential fit technique organizes memory into a linear linked    
     list of free and reserved regions . When an allocation request occurs,
     the memory manager moves sequentially through the list until it finds
     a free block of memory that can service/fit the request (hence the name “sequential fit”).    
     分配和回收、归并      
     其中选择空闲节点的匹配可选算法
     [ best-fit
       nearest-fit                       
       worst-fit          
       next-fit ......          
      ]
  3.Segregated Lists
     
   Garbage Collection的种类
          
   Reference counting collectors: identify garbage by maintaining a    
   running tally of the number of pointers that reference each block of
   allocated memory. When the number of references to a particular    
   block of memory reaches zero, the memory is viewed as garbage      
   and reclaimed. There are a number of types of reference counting  
   algorithms, each one implementing its own variation of the counting
   mechanism (i.e., simple reference counting, deferred reference    
   counting, 1-bit reference counting, etc.).    
   
    Follows these rules:
    - When an object is created, create a reference count (RC) field
    - Set the reference count to 1 upon creation, to account for   
      the object which initially points to this object             
    - When an object with pointers is created, increment the RC of 
      all of the objects pointed to by this object by 1            
    - When an object with pointers is destroyed (goes out of scope, etc.)
      decrement the RC of all objects pointed to by this object by 1
    - When a pointer is modified, decrement the RC of the old target
      by 1 and increment the RC of the new target by 1             
    - When an object’s RC reaches 0, reclaim the object as free spa
    
   Tracing garbage collectors traverse the application run-time environment
   (i.e., registers, stack, heap, data section) in search of              
   pointers to memory in the heap. Think of tracing collectors as         
   pointer hunter-gatherers. If a pointer is found somewhere in the       
   run-time environment, the heap memory that is pointed to is            
   assumed to be “alive” and is not recycled. Otherwise, the allocated  
   memory is reclaimed. There are several subspecies of tracing garbage   
   collectors, including mark-sweep, mark-compact, and copying            
   garbage collectors.                                                    
  
suballocator:they are talking about a specialpurpose                           
             application component that is implemented by the programmer    
             and based on existing services provided by application libraries
             (like malloc() and free()).

Suballocators are user code which reserve the system-provided
heap and then manage that memory itself