内存布局

结构体

C++规范在“结构”上使用了和C相同的,简单的内存布局原则:成员变量按其被声明的顺序排列,按具体实现所规定的对齐原则在内存地址上对齐。

struct S {
    char a;     // memory location #1
    int b : 5;  // memory location #2
    int c : 11, // memory location #2 (continued)
    char  : 0,
    int d : 8;  // memory location #3
    struct {
        int ee : 8; // memory location #4
    } e;
} obj; // The object 'obj' consists of 4 separate memory locations
  • 类的静态成员不占用类的空间,静态成员在程序数据段中。

位域结构体

bit-field-struct

对齐

class is meant to store data that is somehow connected to each other.
For example, an array and the length of the array. These two should never go out of sync, that’s what the class is responsible for.
If all you want to do is store some variables, a struct is the better choice. A struct is technically almost identical to a class. Their differences are mostly conventional.
Since x and a, and a,b,c are unrelated, let’s use a struct.

struct A
{
  int x{};
  char a{};
};

Next, int and char could have any size from 1 to 64 bits. That makes it difficult to talk about these examples. C++ has fixed-width integer types. It’s up to implementations to support these types, so their use is generally discouraged. But because the size of types is important in this example, let’s use them.

#include <cstdint>
 
struct A
{
  std::uint32_t x{}; // 32 bits = 4 bytes
  std::uint8_t a{}; // 8 bits = 1 byte
};

I’ve used unsigned types, because signed, although irrelevant to this example, is more complicated on a bit-level.

Now, to gather more information about what might be happening, we can play with more structs

struct A1 {
  std::uint8_t a{};
  std::uint8_t b{};
  std::uint8_t c{};
}; // sizeof(A1) = 3
 
struct A2 {
  std::uint16_t a{};
  std::uint8_t b{};
}; // sizeof(A2) = 4
 
struct B1 {
  std::uint32_t a{};
  std::uint8_t b{};
}; // sizeof(B1) = 8
 
struct B2 {
  std::uint16_t a{};
  std::uint16_t b{};
  std::uint16_t c{};
}; // sizeof(B2) = 6

Just looking at the members of these structs, you might expect sizeof(A1) == sizeof(A2) and sizeof(B1) == sizeof(B2).
That’s not the case though, so member variable size cannot be all that matters.

Introducing, alignment.

Every type wants to be placed at a multiple of some value.

  • std::uint8_t wants to be placed at a multiple of 1 (ie. any address).
  • std::uint16_t wants to be placed at a multiple of 2 (0, 2, 4, 6, etc.).
  • std::uint32_t wants to be placed at a multiple of 4 (0, 4, 8, etc.).

所有数据的地址必须是这个数据类型所占大小的整数倍

Alignment, as everything in my reply, is implementation-defined. You might see different values.
You can get the alignment of a type using alignof(T), just like you used sizeof(T).

I’m using fixed-width types, but the same goes for charint, etc.

For example, an std::uint32_t cannot be at address 0xab421, because that’s not a multiple of 4. If 0xab420 is already occupied, the std::uint32_t will be placed at 0xab424, the next closest address with fitting alignment.
The compiler does to because the CPU can read aligned values faster. Unaligned reads would cause the CPU to read memory from before and after the variable and discard it later.

Consider this new struct

struct C
{
  std::uint8_t a{};
  std::uint32_t b{};
};

and suppose an instance of C is created at address 0.
a is at placed address 0. That’s fine, std::uint8_t can be placed at any address.
b has to be placed after a, that would be address 1. But that won’t work, because 1 is not a multiple of 4 (alignof(std::uint32_t)). To fix it, the compiler generated padding bytes between the a and b.

struct C_padded
{
  std::uint8_t a{};
  std::uint8_t padding[3]; // This is an array. It means 3 variables of type std::uint8_t.
  std::uint32_t b{};
};

Now if a is at address 0, padding occupies address 1, 2, and 3. That places b at address 4, which is fine.

Back to A2

struct A2 {
  std::uint16_t a{};
  std::uint8_t b{};
}; // sizeof(A2) = 4

This doesn’t need padding, does it? a can be placed at address 0 and b can be placed at address 2, so where does the 1 additional byte come from to cause sizeof(A2) to be 4?

Let’s think about what alignof(A2) must be. All member variables of A2 must be properly aligned.
alignof(A2) cannot be 1, because that would place a at odd addresses. But 2 works. In fact, using the greatest alignment of all member variables as the alignment of the struct always works.

Cool, so how is this related to the 1 extra byte we observe in A2?
Say we have two successive instances of A2

struct Flowers {
  A2 cauliflower{};
  A2 mayflower{};
};

Suppose sizeof(A2) is 3, ie. sizeof(a) + sizeof(b).
The compiler again has to consider the alignment of the member variables. cauliflower is fine at address 0, because everything is fine at address 0. A2 has to be aligned to 2, so it cannot be placed at address 3.
1 padding byte would have to be inserted between cauliflower and mayflower to move mayflower to address 4. Although that’s what would happen according to what I wrote so far, the compiler chooses a different approach for structs.

Unrelated to how A2 is used, the compiler inserts 1 padding byte at the end of A2

struct A2_padded {
  std::uint16_t a{};
  std::uint8_t b{};
  std::uint8_t padding[1];
};

That’s where the 1 extra byte in A2 came from. It automatically aligns successive instances of A2 without having to insert padding. We don’t need padding between cauliflower and mayflower, because the padding is built-into A2.
The question now is, why does the compiler add the padding into the struct, rather than outside?

Why alignment?

When you CPU wants to read from memory, all it can do is read a word. A word means 2 bytes of memory.

Say you have a std::uint16_t with value 0x1234.
If the compiler placed the std::uint16_t at address 0x10, the memory might look like this

address  value
...
000c     83 21 // some other variable
000e     ab cd // some other variable
0010     12 34
0012     ef 01 // some other variable
...

When the CPU wants to read the std::uint16_t, it can do so with a single read operation on address 0x10.

If the std::uint16_t were misaligned, eg. placed at address 0x11

address  value
...
000c     83 21 // some other variable
000e     ab cd // some other variable
0010     44 12
0012     34 ef
...

the CPU would have to read a word from 0x10 and 0x12, then discard the 0x44 and 0xef it didn’t want, before it can give you your value.

Cf.

  1. https://www.learncpp.com/cpp-tutorial/object-sizes-and-the-sizeof-operator/#comment-563585
  2. http://www.catb.org/esr/structure-packing/

内存对齐

继承下的内存布局

一个类的实例占用的空间由非静态成员变量加一个虚函数表的指针以及因字节对齐填充的字节数构成。为什么不每个类的实例都保存所有虚函数的地址,而是一个虚函数表呢,因为这样实例所占的空间累计会更少。因为函数地址都是一样的,所以没必要每个类的实例都有这部分。

派生类对象将会完全继承基类的成员及其内存布局,例如

struct A {  // sizeof(A)=8
    int a;
    int b;
}
 
struct B : A {  // sizeof(B)=12
    int c;
}

很简单,B完全继承A的布局,也不存在对齐的问题,A占8字节,B占12字节。

再看下面这个例子:

struct B {  // sizeof(B)=8
    int a;
    short b;
    short c;
}

此时,bc 因为对齐机制共同占用4个字节,故B总共占用8字节。

再看下面这个例子:

struct A {  // sizeof(A)=8
    int a;
    short b;
}
 
struct B : A {  // sizeof(B)=12
    short c;
}

在这个例子中,B的内存空间还是12字节,并不是8字节。

我们画一下A,B的内存布局

A:
xxxx xx00
  a   b padding

B:
xxxx xx00 xx00
  a   b    c

A是单独的一个类,但由于内存对齐,A必须按4字节对齐,所以成员b后面紧跟2个字节的padding。而B继承A,就要继承它的内存布局,所以B的内存布局有两处padding,无形之中增加了4个字节。

按照省内存的理解,这种继承应该使用8字节的布局方案,但是c++使用的是12字节。考虑这样一个问题:

A *pa, *pb;
pa = new A();
pb = new B();
*pa = *pb;

当b的指针赋值给a的时候,如果是8字节的内存布局,那么c就变成了填充的pad。导致A的内存出现了错误,所以为了避免这个问题,b将完全继承a的内存布局,这样在赋值的时候,也能正确的把a给填充。

内存对齐带来的成员排序

struct A {
    int a;
    short b;
    int c;
    short d;
}
 
struct B {
    int a;
    int c;
    short b;
    short d;
}
 
sizeof(A) // 16
sizeof(B) // 12

A占用的空间是16字节,而B占用的空间是12字节。所以我们在设计类成员的时候,应该将相同类型的成员放一起。这样能减少类对象所占用的内存,一个两个差很少,但是十几万的时候,内存差距就拉开了。

程序的内存区域

cpp mem-segment The memory that a program uses is typically divided into a few different areas, called segments:

  • The code segment (also called a text segment), where the compiled program sits in memory. The code segment is typically read-only.
  • The bss segment (also called the uninitialized data segment), where zero-initialized global and static variables are stored.
  • The data segment (also called the initialized data segment), where initialized global and static variables are stored.
  • The heap, where dynamically allocated variables are allocated from.
  • The call stack, where function parameters, local variables, and other function-related information are stored.

The heap has advantages and disadvantages:

  • Allocating memory on the heap is comparatively slow.
  • Allocated memory stays allocated until it is specifically deallocated (beware memory leaks) or the application ends (at which point the OS should clean it up).
  • Dynamically allocated memory must be accessed through a pointer. Dereferencing a pointer is slower than accessing a variable directly.
  • Because the heap is a big pool of memory, large arrays, structures, or classes can be allocated here.

The stack has advantages and disadvantages:

  • Allocating memory on the stack is comparatively fast.
  • Memory allocated on the stack stays in scope as long as it is on the stack. It is destroyed when it is popped off the stack.
  • All memory allocated on the stack is known at compile time. Consequently, this memory can be accessed directly through a variable.
  • Because the stack is relatively small, it is generally not a good idea to do anything that eats up lots of stack space. This includes passing by value or creating local variables of large arrays or other memory-intensive structures.

References

  1. 12.2 — The stack and the heap