Accessing Unaligned Data

Technical Note 210127

Architectures:

Arm

Component:

compiler

Updated:

3/8/2021 11:26 AM

Introduction

Sometimes you want to access unaligned data. Perhaps the data is in a buffer received from a network or serial link. Accessing the unaligned data in a safe and portable way can be tricky—the result can depend on the CPU architecture, the compiler optimization level, or even which memory region you are working with.

This Technical Note shows how to inform the compiler about the unaligned data, and thereby avoid trouble.

Discussion

According to the C language standard ISO/IEC 9899:

“A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined”.

Basic data types

This is a pointer to uint32_t:

uint32_t *data_p;

Because the 32-bit data type uint32_t has an alignment requirement, we know that the uint32_t pointer is correctly aligned. (If not, the behavior would be undefined.) With the following function definition, we tell the compiler that out_p is a pointer to an aligned uint32_t variable:

void set_data(uint32_t *out_p, uint32_t val)
{
    *out_p = val;
}

Now, assume that we have a byte array, like this:

uint8_t network_data[] = {0,1,2,3,4,5,6,7,8,9};

Fooling the compiler

If we refer to an "odd" byte in our byte array and convert it to uint32_t with a cast, the behavior is undefined, because the resulting pointer is not correctly aligned for the pointed-to type:

data_p = (uint32_t*) &network_data[1];

Now, using the resulting data_p pointer gives undefined behavior.
For example, calling our set_data function with data_p results in undefined behavior:

set_data(data_p, 1200);

With the cast, we have tried to "fool" the compiler by saying that the uint8_t byte pointer &network_data[1] really is an aligned uint32_t pointer, which is not true.

With a Cortex-M0, the result of set_data is a UsageFault exception. With a Cortex-M3, the same function call works fine.

On a Cortex-M0, the STR instruction used by set_data requires an aligned address.
On a Cortex-M3, the STR instruction accepts unaligned addresses.

For Cortex-M0, M0+, and M1, the Armv6-M Architecture Reference Manual informs us:

A3.2.1 Alignment behavior
The following data accesses always generate an alignment fault:
* Non word-aligned LDR and STR
* [...]

For Cortex-M3, M4, and M7, the ARM v7-M Architecture Reference Manual informs us:

A3.2.1 Alignment behavior
The following data accesses support unaligned addressing, and only 
generate alignment faults when the CCR.UNALIGN_TRP bit is set to 1:
* Non word-aligned LDR and STR
* [...]

So, with a Cortex-M0, the code will always generate a UsageFault.
With a Cortex-M3, the code might generate a UsageFault, depending on whether CCR.UNALIGN_TRP is set to 1 or not.

As we can see, the behavior of the unaligned access is undefined and depends on the processor architecture.

This is from the Cortex-M3 Devices Generic User Guide:

The Cortex-M3 processor supports unaligned access only for the following 
instructions: LDR, LDRT, LDRH, LDRHT, LDRSH, LDRSHT, STR, STRT, STRH, STRHT

Unaligned accesses are usually slower than aligned accesses. 
In addition, some memory regions might not support unaligned accesses. 
Therefore, ARM recommends that programmers ensure that accesses are aligned. 
To trap accidental generation of unaligned accesses, use the UNALIGN_TRP bit 
in the Configuration and Control Register.

Unaligned memory accesses are also described in the Linux Kernel documentation:

The effects of performing an unaligned memory access vary 
from architecture to architecture.

- Some architectures are able to perform unaligned memory accesses
  transparently, but there is usually a significant performance cost.
- Some architectures raise processor exceptions when unaligned accesses
  happen. 
- Some architectures are not capable of unaligned memory access, but will
  silently perform a different memory access

As we can see, to create a portable application that can be used on many architectures, you should avoid unaligned accesses, and avoid relying on undefined behavior.

Getting help from the compiler

So, how can the compiler help us to avoid unaligned accesses? The answer is: We need to inform the compiler that the data is unaligned. This can be done by using #pragma pack, __packed, or as described in the IAR C/C++ Development Guide: "Alternatively, write your own customized functions for packing and unpacking structures".

Using __packed

Modifying our earlier example, we can inform the compiler that the data might be unaligned by using the __packed data type attribute. Like this:

void set_unaligned_data(uint32_t __packed *out_p, uint32_t val)
{
    *out_p = val;
}

With the code above, you have informed the compiler that the uint32_t out_p pointer might be unaligned, and the compiler will adjust accordingly.

Note that if you try to call the original set_data function with an unaligned __packed pointer, the compiler will produce a helpful error message:

Error[Pe167]: argument of type "uint32_t __packed *" is incompatible with parameter of type "uint32_t *"

The set_unaligned_data function now works fine on both a Cortex-M0 and a Cortex-M3. On a Cortex-M0, the STR instruction is no longer used. With a Cortex-M3 however, you might be surprised to see that the STR instruction still is used, even when you have informed the compiler that the data is unaligned. This is because the compiler "knows" that certain unaligned accesses are supported by the Cortex-M3 hardware. To avoid these hardware-supported unaligned accesses, use the --no_unaligned_access compiler option.

As the Arm documentation says, to find and “trap accidental generation of [any] unaligned accesses, use the UNALIGN_TRP bit”.

Structures

Using #pragma pack

With structures, you can use #pragma pack for a tighter layout of the structure. This data type attribute also informs the compiler that the structure potentially contains unaligned data. When you use the packed structure type, the compiler knows that the data might be unaligned and will adjust accordingly. For example:

#pragma pack(1)
typedef struct my_packed_struct_s {
    uint8_t byte1;
    uint32_t val1;
    uint8_t byte2;
    uint32_t val2;
} my_packed_struct_t;
#pragma pack()

my_packed_struct_t *struct_p = (my_packed_struct_t*) &network_data[0];

void set_s_data(my_packed_struct_t *out_p, uint32_t v1, uint32_t v2)
{
    out_p->val1 = v1;
    out_p->val2 = v2;
}

Because the set_s_data function above uses the my_packed_struct_t type, you have informed the compiler that the data might be unaligned (with the #pragma pack directive on my_packed_struct_t). The compiler will adjust accordingly.

Note that if you try to create a pointer to potentially unaligned data in the packed structure, the compiler will produce a helpful warning:

uint32_t *p = &out_p->val1;
Warning[Pa039]: use of address of unaligned structure member

Portability

If the application is meant to be truly portable across different architectures and compilers, consider this IAR-specific list of supported pragma directives and data type attributes:

From IAR C/C++ Development Guide (January 2021):

The above list shows that the support for __packed and #pragma pack varies, even between IAR compilers.

Performance

A drawback with using __packed and #pragma pack is that each access to an unaligned element in the structure will use more code. From the IAR C/C++ Development Guide:

“Note: Accessing an object that is not correctly aligned requires code that is both larger and slower. If such structure members are accessed many times, it is usually better to construct the correct values in a struct that is not packed, and access this struct instead”.

Packing and unpacking

As the IAR C/C++ Development Guide says, you can also "write your own customized functions for packing and unpacking structures".

This is the most portable and safe way. The drawback with packing and unpacking is the need for two views on the structure data: packed and unpacked.

To continue with the examples above, the packing and unpacking functions might look something like this (where my_struct_t is a normal structure with aligned data):

void unpack_data(const uint8_t *unaligned_data_p, my_struct_t *struct_p);
void pack_data(uint8_t *unaligned_data_p, const my_struct_t *struct_p);

Example project

The example project 2021-01-22_unaligned_8509.zip shows examples of __packed, #pragma pack and custom packing and unpacking. It also shows how an unaligned access ends up in the UsageFault_Handler for Cortex-M0 and M3.

With the example project, use the C-SPY simulator debugger driver and the View>Memory window to study the network_data variable. Note that the C-SPY simulator can also be a useful tool for detecting unaligned accesses. These helpful debugger warnings are shown when you run the example code on different architectures:

MSP430: Warning: A word access on odd address
RL78: Word write access at odd address
RISC-V: Misaligned word data access

Conclusion

If an address must be unaligned, its type must reflect this; using #pragma pack, or __packed. This is not advisable unless it is absolutely needed: use aligned addresses whenever possible.

For portability and performance reasons, try to avoid unaligned memory accesses:

  • To get help from the compiler, inform it about the unaligned data, using the #pragma pack directive or the __packed data type attribute.
  • Alternatively, write your own customized functions for packing and unpacking structures.
  • Always use correct data types, and avoid converting pointers to different data types by casting.

 

 

All product names are trademarks or registered trademarks of their respective owners.

 

We do no longer support Internet Explorer. To get the best experience of iar.com, we recommend upgrading to a modern browser such as Chrome or Edge.