Accessing Unaligned Data
Technical Note 210127
Architectures:
Arm
Component:
compiler
Updated:
3/8/2021 11:26 AM
Introduction
Sometimes you want to access unaligned data. Perhaps the data is in a buffer received from a network or serial link. Accessing the unaligned data in a safe and portable way can be tricky—the result can depend on the CPU architecture, the compiler optimization level, or even which memory region you are working with.
This Technical Note shows how to inform the compiler about the unaligned data, and thereby avoid trouble.
Discussion
According to the C language standard ISO/IEC 9899:
“A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined”.
Basic data types
This is a pointer to uint32_t
:
uint32_t *data_p;
Because the 32-bit data type uint32_t
has an alignment requirement, we know that the uint32_t
pointer is correctly aligned. (If not, the behavior would be undefined.) With the following function definition, we tell the compiler that out_p
is a pointer to an aligned uint32_t
variable:
void set_data(uint32_t *out_p, uint32_t val)
{
*out_p = val;
}
Now, assume that we have a byte array, like this:
uint8_t network_data[] = {0,1,2,3,4,5,6,7,8,9};
Fooling the compiler
If we refer to an "odd" byte in our byte array and convert it to uint32_t
with a cast, the behavior is undefined, because the resulting pointer is not correctly aligned for the pointed-to type:
data_p = (uint32_t*) &network_data[1];
Now, using the resulting data_p
pointer gives undefined behavior.
For example, calling our set_data
function with data_p
results in undefined behavior:
set_data(data_p, 1200);
With the cast, we have tried to "fool" the compiler by saying that the uint8_t
byte pointer &network_data[1]
really is an aligned uint32_t
pointer, which is not true.
With a Cortex-M0, the result of set_data
is a UsageFault
exception. With a Cortex-M3, the same function call works fine.
On a Cortex-M0, the STR
instruction used by set_data
requires an aligned address.
On a Cortex-M3, the STR
instruction accepts unaligned addresses.
For Cortex-M0, M0+, and M1, the Armv6-M Architecture Reference Manual informs us:
A3.2.1 Alignment behavior
The following data accesses always generate an alignment fault:
* Non word-aligned LDR and STR
* [...]
For Cortex-M3, M4, and M7, the ARM v7-M Architecture Reference Manual informs us:
A3.2.1 Alignment behavior
The following data accesses support unaligned addressing, and only
generate alignment faults when the CCR.UNALIGN_TRP bit is set to 1:
* Non word-aligned LDR and STR
* [...]
So, with a Cortex-M0, the code will always generate a UsageFault
.
With a Cortex-M3, the code might generate a UsageFault
, depending on whether CCR.UNALIGN_TRP
is set to 1 or not.
As we can see, the behavior of the unaligned access is undefined and depends on the processor architecture.
This is from the Cortex-M3 Devices Generic User Guide:
The Cortex-M3 processor supports unaligned access only for the following
instructions: LDR, LDRT, LDRH, LDRHT, LDRSH, LDRSHT, STR, STRT, STRH, STRHT
Unaligned accesses are usually slower than aligned accesses.
In addition, some memory regions might not support unaligned accesses.
Therefore, ARM recommends that programmers ensure that accesses are aligned.
To trap accidental generation of unaligned accesses, use the UNALIGN_TRP bit
in the Configuration and Control Register.
Unaligned memory accesses are also described in the Linux Kernel documentation:
The effects of performing an unaligned memory access vary
from architecture to architecture.
- Some architectures are able to perform unaligned memory accesses
transparently, but there is usually a significant performance cost.
- Some architectures raise processor exceptions when unaligned accesses
happen.
- Some architectures are not capable of unaligned memory access, but will
silently perform a different memory access
As we can see, to create a portable application that can be used on many architectures, you should avoid unaligned accesses, and avoid relying on undefined behavior.
Getting help from the compiler
So, how can the compiler help us to avoid unaligned accesses? The answer is: We need to inform the compiler that the data is unaligned. This can be done by using #pragma pack
, __packed
, or as described in the IAR C/C++ Development Guide: "Alternatively, write your own customized functions for packing and unpacking structures".
Using __packed
Modifying our earlier example, we can inform the compiler that the data might be unaligned by using the __packed
data type attribute. Like this:
void set_unaligned_data(uint32_t __packed *out_p, uint32_t val)
{
*out_p = val;
}
With the code above, you have informed the compiler that the uint32_t out_p
pointer might be unaligned, and the compiler will adjust accordingly.
Note that if you try to call the original set_data
function with an unaligned __packed
pointer, the compiler will produce a helpful error message:
Error[Pe167]: argument of type "uint32_t __packed *" is incompatible with parameter of type "uint32_t *"
The set_unaligned_data
function now works fine on both a Cortex-M0 and a Cortex-M3. On a Cortex-M0, the STR
instruction is no longer used. With a Cortex-M3 however, you might be surprised to see that the STR
instruction still is used, even when you have informed the compiler that the data is unaligned. This is because the compiler "knows" that certain unaligned accesses are supported by the Cortex-M3 hardware. To avoid these hardware-supported unaligned accesses, use the --no_unaligned_access
compiler option.
As the Arm documentation says, to find and “trap accidental generation of [any] unaligned accesses, use the UNALIGN_TRP
bit”.
Structures
Using #pragma pack
With structures, you can use #pragma pack
for a tighter layout of the structure. This data type attribute also informs the compiler that the structure potentially contains unaligned data. When you use the packed structure type, the compiler knows that the data might be unaligned and will adjust accordingly. For example:
#pragma pack(1)
typedef struct my_packed_struct_s {
uint8_t byte1;
uint32_t val1;
uint8_t byte2;
uint32_t val2;
} my_packed_struct_t;
#pragma pack()
my_packed_struct_t *struct_p = (my_packed_struct_t*) &network_data[0];
void set_s_data(my_packed_struct_t *out_p, uint32_t v1, uint32_t v2)
{
out_p->val1 = v1;
out_p->val2 = v2;
}
Because the set_s_data
function above uses the my_packed_struct_t
type, you have informed the compiler that the data might be unaligned (with the #pragma pack
directive on my_packed_struct_t
). The compiler will adjust accordingly.
Note that if you try to create a pointer to potentially unaligned data in the packed structure, the compiler will produce a helpful warning:
uint32_t *p = &out_p->val1;
Warning[Pa039]: use of address of unaligned structure member
Portability
If the application is meant to be truly portable across different architectures and compilers, consider this IAR-specific list of supported pragma directives and data type attributes:
From IAR C/C++ Development Guide (January 2021):
The above list shows that the support for __packed
and #pragma pack
varies, even between IAR compilers.
Performance
A drawback with using __packed
and #pragma pack
is that each access to an unaligned element in the structure will use more code. From the IAR C/C++ Development Guide:
“Note: Accessing an object that is not correctly aligned requires code that is both larger and slower. If such structure members are accessed many times, it is usually better to construct the correct values in a struct that is not packed, and access this struct instead”.
Packing and unpacking
As the IAR C/C++ Development Guide says, you can also "write your own customized functions for packing and unpacking structures".
This is the most portable and safe way. The drawback with packing and unpacking is the need for two views on the structure data: packed and unpacked.
To continue with the examples above, the packing and unpacking functions might look something like this (where my_struct_t
is a normal structure with aligned data):
void unpack_data(const uint8_t *unaligned_data_p, my_struct_t *struct_p);
void pack_data(uint8_t *unaligned_data_p, const my_struct_t *struct_p);
Example project
The example project 2021-01-22_unaligned_8509.zip shows examples of __packed
, #pragma pack
and custom packing and unpacking. It also shows how an unaligned access ends up in the UsageFault_Handler
for Cortex-M0 and M3.
With the example project, use the C-SPY simulator debugger driver and the View>Memory window to study the network_data
variable. Note that the C-SPY simulator can also be a useful tool for detecting unaligned accesses. These helpful debugger warnings are shown when you run the example code on different architectures:
MSP430: Warning: A word access on odd address
RL78: Word write access at odd address
RISC-V: Misaligned word data access
Conclusion
If an address must be unaligned, its type must reflect this; using #pragma pack
, or __packed
. This is not advisable unless it is absolutely needed: use aligned addresses whenever possible.
For portability and performance reasons, try to avoid unaligned memory accesses:
- To get help from the compiler, inform it about the unaligned data, using the
#pragma pack
directive or the__packed
data type attribute. - Alternatively, write your own customized functions for packing and unpacking structures.
- Always use correct data types, and avoid converting pointers to different data types by casting.
All product names are trademarks or registered trademarks of their respective owners.