Internal Data Representation and Areas

4.1.4 Internal Data Representation and Areas

This section explains the data type and the internal data representation. The internal data representation is determined according to the following four items:

Size

Shows the memory size necessary to store the data.

-	Boundary alignment

Restricts the addresses to which data is allocated. There are three types of alignment; 1-byte alignment in which data can be allocated to any address, 2-byte alignment in which data is allocated to even byte addresses, and 4-byte alignment in which data is allocated to addresses of multiples of four bytes.

-	Data range

Shows the range of data of scalar type (C) or basic type (C++).

-	Data allocation example

Shows an example of assignment of element data of compound type (C) or class type (C++).

(1)	Scalar Type (C), Basic Type (C++)

Table 3.15 shows internal representation of scalar type data in C and basic type data in C++.

Table 4.17

Internal Representation of Scalar-Type and Basic-Type Data

No	Data Type	Size (bytes)	Align-ment (bytes)	Signed/Unsigned	Data Range
No	Data Type	Size (bytes)	Align-ment (bytes)	Signed/Unsigned	Minimum Value	Maximum Value
1	char *1	1	1	Unsigned	0	28–1 (255)
2	signed char	1	1	Signed	–27 (–128)	27–1 (127)
3	unsigned char	1	1	Unused	0	28–1 (255)
4	short	2	2	Signed	–215 (–32768)	215–1 (32767)
5	signed short	2	2	Signed	–215 (–32768)	215–1 (32767)
6	unsigned short	2	2	Unsigned	0	216–1 (65535)
7	int *2	4	4	Signed	–231 (–2147483648)	231–1 (2147483647)
8	signed int *2	4	4	Signed	–231 (–2147483648)	231–1 (2147483647)
9	unsigned int*2	4	4	Unsigned	0	232–1 (4294967295)
10	long	4	4	Signed	–231 (–2147483648)	231–1 (2147483647)
11	signed long	4	4	Signed	–231 (–2147483648)	231–1 (2147483647)
12	unsigned long	4	4	Unsigned	0	232–1 (4294967295)
13	long long	8	4	Signed	–263 (–9223372036854775808)	263–1 (9223372036854775807)
14	signed long, long	8	4	Signed	–263 (–9223372036854775808)	263–1 (9223372036854775807)
15	unsigned long, long	8	4	Unsigned	0	264–1 (18446744073709551615)
16	float	4	4	Signed	–∞	+∞
17	double, long double	4 *4	4	Signed	–∞	+∞
18	size_t	4	4	Unsigned	0	232–1 (4294967295)
19	ptrdiff_t	4	4	Signed	–231 (–2147483648)	231–1 (2147483647)
20	enum*3	4	4	Signed	–231 (–2147483648)	231–1 (2147483647)
21	Pointer	4	4	Unsigned	0	232–1 (4294967295)
22	bool 5 _Bool 8	1 *9	1 *9	-*9	-	-
23	Reference *6	4	4	Unsigned	0	232–1 (4294967295)
24	Pointer to a data member *6	4	4	Signed	0	232–1 (4294967295)
25	Pointer to a function member 67	12	4	- *10	-	-

Notes 1.

When the signed_char option is specified, the char type has the same value range as the signed char type.

Notes 2.

When the int_to_short option is specified, the int type has the same value range as the short type, the signed int type has the same value ranges as the signed short type, and the unsigned int type has the same value range as the unsigned short type.

Notes 3.

When the auto_enum option is specified, the smallest type that holds enumeration values is selected.

Notes 4.

When dbl_size=8 is specified, the size of the double type and long double type is 8 bytes.

Notes 5.

This data type is only valid for compilation of C++ programs or C programs including stdbool.h.

Notes 6.

These data types are only valid for compilation of C++ programs.

Notes 7.

Pointers to function and virtual function members are represented in the following data structure.

Notes 8.

This data type is only valid when compiling a C99 program or C program in which stdbool.h has been included.

Notes 9.

When C89 is used for compiling, the size, number of bytes for alignment, and sign are the same as for the unsigned long type.

Notes 10.

This data type does not include a concept of sign.

(2)	Compound Type (C), Class Type (C++)

This section explains internal representation of array type, structure type, and union type data in C and class type data in C++.

Table 4.18 shows internal representation of compound type and class type data.

Table 4.18

Internal Representation of Compound Type and Class Type Data

Data Type	Alignment (bytes)	Size (bytes)	Data Allocation Example
Array	Array element alignment	Number of array elements × element size	char a[10]; Alignment: 1 byte Size: 10 bytes
Structure	Maximum structure member alignment	Total size of members. Refer to (a) Structure Data Allocation, below.	struct { char a,b; }; Alignment: 1 byte Size: 2 bytes
Union	Maximum union member alignment	Maximum size of member. Refer to (b) Union Data Allocation, below.	union { char a,b; }; Alignment: 1 byte Size: 1 byte
Class	1. Always 4 if a virtual function is included 2. Other than 1 above: maximum member alignment	Sum of data members, pointer to the virtual function table, and pointer to the virtual base class. Refer to (c) Class Data Allocation, below.	class B:public A { virtual void f(); }; Alignment: 4 bytes Size: 8 bytes class A { char a; }; Alignment: 1 byte Size: 1 byte

In the following examples, a rectangle ( ) indicates four bytes. The diagonal line ( ) represents an unused area for alignment. The address increments from right to left (the left side is located at a higher address).

(a)	Structure Data Allocation

When structure members are allocated, an unused area may be generated between structure members to align them to boundaries.

Example

struct {

 char a;

 int b;

} obj

If a structure has 4-byte alignment and the last member ends at an 1-, 2-, or 3-byte address, the following three, two, or one byte is included in this structure.

Example

struct {

 int a;

 char b;

} obj

(b)	Union Data Allocation

When an union has 4-byte alignment and its maximum member size is not a multiple of four, the remaining bytes up to a multiple of four is included in this union.

Example

union {

 int a;

 char b[7];

} o;

(c)	Class Data Allocation

For classes having no base class or virtual functions, data members are allocated according to the allocation rules of structure data.

Example

class A{

  char data1;

  int data2;

public:

  A();

  int getData1(){return data1;}

}obj;

If a class is derived from a base class of 1-byte alignment and the start member of the derived class is 1-byte data, data members are allocated without unused areas.

Example

class A{

  char data1;

};

class B:public A{

  char data2;

  short data3;

}obj;

For a class having a virtual base class, a pointer to the virtual base class is allocated.

Example

class A{

   short data1;

};

class B: virtual protected A{

   char data2;

}obj;

For a class having virtual functions, the compiler creates a virtual function table and allocates a pointer to the virtual function table.

Example

class A{

    char data1;

  public:

    virtual int getData1();

}obj;

An example is shown for class having virtual base class, base class, and virtual functions.

Example

class A{

  char data1;

  virtual short getData1();

};

class B:virtual public A{

  char data2;

  char getData2();

  short getData1();

};

class C:virtual protected A{

  int data3;

};

class D:virtual public A,public B,public C{

  public:

  int data4;

  short getData1();

}obj;

For an empty class, a 1-byte dummy area is assigned.

Example

class A{

  void fun();

}obj;

For an empty class having an empty class as its base class, the dummy area is one byte.

Example

class A{

  void fun();

};

class B: A{

  void sub();

}obj;

Dummy areas shown in the above two examples are allocated only when the class size is 0. No dummy area is allocated if a base class or a derived class has a data member or has a virtual function.

Example

class A{

  void fun();

};

class B: A{

  char data1;

}obj;

(3)	Bit Fields

A bit field is a member allocated with a specified size in a structure, a union, or a class. This section explains how bit fields are allocated.

(a)	Bit Field Members

Table 3.17 shows the specifications of bit field members.

Table 4.19

Bit Field Member Specifications

No.	Item	Specifications
1	Type specifier allowed for bit fields	(unsigned )char, signed char, bool1, _Bool5, (unsigned )short, signed short, enum, (unsigned )int, signed int, (unsigned )long, signed long, (unsigned )long long, signed long long
2	How to treat a sign when data is extended to the declared type*2	Unsigned: Zero extension3 Signed: Sign extension4
3	Sign type for the type without sign specification	Unsigned. When the signed_bitfield option is specified, the signed type is selected.
4	Sign type for enum type	Signed. When the auto_enum option is specified, the resultant type is selected.

Notes 1.

The bool type is only valid for compilation of C++ programs or C programs including stdbool.h.

Notes 2.

To use a bit field member, data in the bit field is extended to the declared type. One-bit field data declared with a sign is interpreted as the sign, and can only indicate 0 and −1.

Notes 3.

Zero extension: Zeros are written to the upper bits to extend data.

Notes 4.

Sign extension: The most significant bit of a bit field is used as a sign and the sign is written to the upper bits to extend data.

Notes 5.

This data type is only valid for programs in C99. The _Bool type is treated as the bool type in compilation.

(b)	Bit Field Allocation

Bit field members are allocated according to the following five rules:

-	Bit field members are placed in an area beginning from the right, that is, the least significant bit.

Example

struct b1 {

  int a:2;

  int b:3;

} x;

-	Consecutive bit field members having type specifiers of the same size are placed in the same area as much as possible.

Example

struct b1 {

  long           a:2;

  unsigned int   b:3;

} y;

-	Bit field members having type specifiers with different sizes are allocated to separate areas.

Example

struct b1 {

  int    a:5;

  char   b:4;

} z;

-	If the number of remaining bits in an area is less than the next bit field size, even though the type specifiers indicate the same size, the remaining area is not used and the next bit field is allocated to the next area.

Example

struct b2 {

  char   a:5;

  char   b:4;

} v;

-	If a bit field member with a bit field size of 0 is declared, the next member is allocated to the next area.

Example

struct b2 {

  char   a:5;

  char    :0;

  char   c:3;

} w;

Note	It is also possible to place bit field members from the upper bit. For details, refer to the description on the bit_order option in Compiler Options, and the description on #pragma bit_order in 4.2 Extended Language Specifications.

(4)	Memory Allocation in Big Endian

In big endian, data are allocated in the memory as follows:

(a)	One-Byte Data (char, signed char, unsigned char, bool1, and _Bool1 types)

The order of bits in one-byte data for the little endian and the big endian is the same.

Notes 1.

When C89 is used for compiling, the size and the number of bytes for alignment are 4.

(b)	Two-Byte Data ((signed) short and unsigned short types)

The upper byte and the lower byte will be reversed in two-byte data between the little endian and the big endian.

Example

When two-byte data 0x1234 is allocated at address 0x100:

Little Endian: Address 0x100: 0x34 Big Endian: Address 0x100: 0x12
Address 0x101: 0x12 Address 0x101: 0x34

(c)	Four-Byte Data ((signed) int2, unsigned int2, (signed) long, unsigned long, and float types)

The order of bytes will be reversed in four-byte data between the little endian and the big endian.

Notes 2.

When the int_to_short option is specified, the signed int and unsigned int types have the same size and number of bytes for alignment as the signed short and unsigned short types, respectively.

Example

When four-byte data 0x12345678 is allocated at address 0x100:

Little Endian: Address 0x100: 0x78 Big Endian: Address 0x100: 0x12
Address 0x101: 0x56 Address 0x101: 0x34
Address 0x102: 0x34 Address 0x102: 0x56
Address 0x103: 0x12 Address 0x103: 0x78

(d)	Eight-Byte Data ((signed) long long, unsigned long long, and double types)

The order of bytes will be reversed in eight-byte data between the little endian and the big endian.

Example

When eight-byte data 0x123456789abcdef is allocated at address 0x100:

Little Endian: Address 0x100: 0xef Big Endian: Address 0x100: 0x01
Address 0x101: 0xcd Address 0x101: 0x23
Address 0x102: 0xab Address 0x102: 0x45
Address 0x103: 0x89 Address 0x103: 0x67
Address 0x104: 0x67 Address 0x104: 0x89
Address 0x105: 0x45 Address 0x105: 0xab
Address 0x106: 0x23 Address 0x106: 0xcd
Address 0x107: 0x01 Address 0x107: 0xef

(e)	Compound-Type and Class-Type Data

Members of compound-type and class-type data will be allocated in the same way as that of the little endian. However, the order of byte data of each member will be reversed according to the rule of data size.

Example

When the following function exists at address 0x100:

struct {

	   short a;

	   int b;

	}z= {0x1234,  0x56789abc};

Little Endian: Address 0x100: 0x34 Big Endian: Address 0x100: 0x12
Address 0x101: 0x12 Address 0x101: 0x34
Address 0x102: Unused area Address 0x102: Unused area
Address 0x103: Unused area Address 0x103: Unused area
Address 0x104: 0xbc Address 0x104: 0x56
Address 0x105: 0x9a Address 0x105: 0x78
Address 0x106: 0x78 Address 0x106: 0x9a
Address 0x107: 0x56 Address 0x107: 0xbc

(f)

Bit Field

Bit fields will be allocated in the same way as that of the little endian. However, the order of byte data in each area will be reversed according to the rule of data size.

Example

When the following function exists at address 0x100:

struct {

	   long a:16;

	   unsigned int b:15;

	   short c:5;

	}y= {1,1,1};

Little Endian: Address 0x100: 0x01 Big Endian: Address 0x100: 0x00
Address 0x101: 0x00 Address 0x101: 0x01
Address 0x102: 0x01 Address 0x102: 0x00
Address 0x103: 0x00 Address 0x103: 0x01
Address 0x104: 0x01 Address 0x104: 0x00
Address 0x105: 0x00 Address 0x105: 0x01
Address 0x106: Unused area Address 0x106: Unused area
Address 0x107: Unused area Address 0x107: Unused area

(5)	Floating-Point Number Specifications

(a)	Internal Representation of Floating-Point Numbers

Floating-point numbers handled by this compiler are internally represented in the standard IEEE format. This section outlines the internal representation of floating-point numbers in the IEEE format.

This section assumes that the dbl_size=8 option is specified. When the dbl_size=4 option is specified, the internal representation of the double type and long double type is the same as that of the float type.

(b)	Format for Internal Representation

float types are represented in the IEEE single-precision (32-bit) format, while double types and long double types are represented in the IEEE double-precision (64-bit) format.

(c)	Structure of Internal Representation

Figure 3.1 shows the structure of the internal representation of float, double, and long double types.

Figure 4.1

Structure of Internal Representation of Floating-Point Numbers

The internal representation format consists of the following parts:

i. Sign

Shows the sign of the floating-point number. 0 is positive, and 1 is negative.

ii. Exponent

Shows the exponent of the floating-point number as a power of 2.

iii. Mantissa

Shows the data corresponding to the significant digits (fraction) of the floating-point number.

(d)	Types of Values Represented as Floating-Point Numbers

In addition to the normal real numbers, floating-point numbers can also represent values such as infinity. The following describes the types of values represented by floating-point numbers.

i. Normalized number

Represents a normal real value; the exponent is not 0 or not all bits are 1.

ii. Denormalized number

Represents a real value having a small absolute number; the exponent is 0 and the mantissa is other than 0.

iii. Zero

Represents the value 0.0; the exponent and mantissa are 0.

iv. Infinity

Represents infinity; all bits of the exponent are 1 and the mantissa is 0.

v. Not-a-number

Represents the result of operation such as "0.0/0.0", "∞/∞", or "∞–∞", which does not correspond to a number or infinity; all bits of the exponents are 1 and the mantissa is other than 0.

Table 3.18 shows the types of values represented as floating-point numbers.

Table 4.20

Types of Values Represented as Floating-Point Numbers

Mantissa	Exponent
Mantissa	0	Not 0 or Not All Bits are 1	All Bits are 1
0	0	Normalized number	Infinity
Other than 0	Denormalized number	Normalized number	Not-a-number

Note

Denormalized numbers are floating-point numbers of small absolute values that are outside the range represented by normalized numbers. There are fewer valid digits in a denormalized number than in a normalized number. Therefore, if the result or intermediate result of a calculation is a denormalized number, the number of valid digits in the result cannot be guaranteed.
When denormalize=off is specified, denormalized numbers are processed as 0.
When denormalize=on is specified, denormalized numbers are processed as denormalized numbers.

(e)	float Type

The float type is internally represented by a 1-bit sign, an 8-bit exponent, and a 23-bit mantissa.

i. Normalized numbers

The sign indicates the sign of the value, either 0 (positive) or 1 (negative). The exponent is between 1 and 254 (28 – 2). The actual exponent is gained by subtracting 127 from this value. The range is between –126 and 127. The mantissa is between 0 and 223 – 1. The actual mantissa is interpreted as the value of which 223rd bit is 1 and this bit is followed by the decimal point. Values of normalized numbers are as follows:

(–1)sign × 2exponent–127 × (1 + (mantissa) × 2–23)

Example

Sign: –

Exponent: 10000000(2) – 127 = 1, where (2) indicates binary

Mantissa: 1.11(2) = 1.75

Value: –1.75 × 21 = –3.5

ii. Denormalized numbers

The sign indicates the sign of the value, either 0 (positive) or 1 (negative). The exponent is 0 and the actual exponent is –126. The mantissa is between 1 and 223–1, and the actual mantissa is interpreted as the value of which 223rd bit is 0 and this bit is followed by the decimal point. Values of denormalized numbers are as follows:

(–1)sign × 2–126 × ((mantissa) × 2–23)

Example

Sign: +

Exponent: –126

Mantissa: 0.11(2) = 0.75, where (2) indicates binary

Value: 0.75 × 2–126

iii. Zero

The sign is 0 (positive) or 1 (negative), indicating +0.0 or –0.0, respectively. The exponent and mantissa are both 0.

+0.0 and –0.0 are both the value 0.0.

iv. Infinity

The sign is 0 (positive) or 1 (negative), indicating +∞ or –∞, respectively.

The exponent is 255 (28–1).

The mantissa is 0.

v. Not-a-number

The exponent is 255 (28–1).

The mantissa is a value other than 0.

Note	A not-a-number is called a quiet NaN when the MSB of the mantissa is 1, or a signaling NaN when the MSB of the mantissa is 0. There are no stipulations regarding the values of the rest of the mantissa and of the sign.

(f)	double Types and long double Types

The double and long double types are internally represented by a 1-bit sign, an 11-bit exponent, and a 52-bit mantissa.

i. Normalized numbers

The sign indicates the sign of the value, either 0 (positive) or 1 (negative). The exponent is between 1 and 2046 (211–2). The actual exponent is gained by subtracting 1023 from this value. The range is between –1022 and 1023. The mantissa is between 0 and 252–1. The actual mantissa is interpreted as the value of which 252nd bit is 1 and this bit is followed by the decimal point. Values of normalized numbers are as follows:

(–1)sign × 2exponent–1023 × (1+(mantissa) × 2–52)

Example

Sign: +

Exponent: 1111111111(2) –1023 = 0, where (2) indicates binary

Mantissa: 1.111(2) = 1.875

Value: 1.875 × 20 = 1.875

ii. Denormalized numbers

The sign indicates the sign of the value, either 0 (positive) or 1 (negative). The exponent is 0 and the actual exponent is –1022. The mantissa is between 1 and 252–1, and the actual mantissa is interpreted as the value of which 252nd bit is 0 and this bit is followed by the decimal point. Values of denormalized numbers are as follows:

(–1)sign × 2–1022 × ((mantissa) × 2–52)

Example

Sign: –

Exponent: –1022

Mantissa: 0.111(2) = 0.875, where (2) indicates binary

Value: 0.875 × 2–1022

iii. Zero

The sign is 0 (positive) or 1 (negative), indicating +0.0 or –0.0, respectively. The exponent and mantissa are both 0.

+0.0 and –0.0 are both the value 0.0.

iv. Infinity

The sign is 0 (positive) or 1 (negative), indicating +∞ or –∞, respectively. The exponent is 2047 (211–1).

The mantissa is 0.

v. Not-a-number

The exponent is 2047 (211–1).

The mantissa is a value other than 0.

Note	A not-a-number is called a quiet NaN when the MSB of the mantissa is 1, or signaling NaN when the MSB of the mantissa is 0. There are no specifications regarding the values of other mantissa fields or the sign.