Checksum generation

Technical Note 26457

Targets:
All

Component:
General

Updated:
6/12/2015 9:04 AM

Introduction

This is an overview of checksum generation and a discussion about weighing the size of the CRC implementation against the speed.

If you are new to crcs you probably want to start by reading "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS" by Ross N. Williams. The text provides a rather thorough introduction to crcs. He begins with a simple byte sum (byte1 + byte2 + byte3 + ...) and gradually improves it. In the end it becomes the table driven polynomial version that is pretty much standard these days.

The text is over 20 years old, but crcs have not changed since then and the text is considered a classic in the crc field.

This text will use the CRC16 polynomial (0x1021) and a 2-byte checksum. Generalizing it to larger polynomials and checksum sizes is rather straightforward and sample code for CRC32 () is supplied in appendix A.

Basic version

In the most basic form a crc-implementation in C might look something like this:

typedef unsigned char * ptr;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned long uint32_t;

uint16_t
crc_impl(uint16_t sum, ptr p, uint32_t len)
{
while (len--)
{
uint8_t byte = *p++;

for (int i = 0; i < 8; ++i)
{
uint16_t osum = sum;

sum <<= 1;

if (byte & 0x80)
sum |= 1 ;

if (osum & 0x8000)
sum ^= 0x1021; // the polynomial

byte <<= 1;
}
}
return sum;
}

This certainly works, but if you want to compare the checksum returned from crc_impl to the one computed by crc tools (like ielftool) you need to feed 2 additional zeroes into it after the byte sequence has been processed. Why is this so? Williams explains it the text linked above. The short version is that the final answer is still "in the pipeline" and needs to be "flushed".

static const uint8_t zeroes[] = { 0, 0 };

uint16_t
crc(uint16_t sum, ptr p, uint32_t len)
{
return crc_impl(crc_impl(sum, p, len), (ptr)zeroes, 2);
}

On a Cortex-M3 the above code compiles to 72 bytes (high balanced optimization) and it requires about 125 cycles for each checksummed byte.

A note on cycle counts

All cycle counts in this text are obtained from the Simulator. The Simulator is not cycle accurate and the timing of some instructions can vary on different real world processors. The actual cycle count can be higher or lower depending on the circumstances.

Checking an ielftool generated checksum

  • Setup the linker to fill and generate a checksum.
  • Use the fill string 0xFF (the exact choice of fill string does not matter but 0xFF is the most common fill value). Fill from 0x0 to 0x1FFF. Again the exact values does not matter much but typically the entire ROM should be filled and checksummed.
  • Generate a 2-byte crc-16 checksum with the initial value 0x0.
  • No complement, MSB first (no mirroring), 8-bit checksum unit size.

Now it's time to write a program that verifies the checksum. The checksum itself must be excluded from the checksum process. The reason for this is that the producer of __checksum cannot use the checksum's bytes to produce the checksum's bytes. They must be excluded from the process entirely.

The first approach is to make sure that the checksum is not checksummed. Checksum all bytes from the linker defined label __checksum_begin to the last byte before the checksum and then from the first byte after the checksum to the linker defined label __checksum_end.

extern uint16_t __checksum;          // linker defined
extern int __checksum_begin; // linker defined
extern int __checksum_end; // linker defined

int main()
{
// where checksumming starts
ptr p0 = (ptr)&__checksum_begin;

// the address of the checksum
ptr p1 = (ptr)&__checksum;

// number of bytes between the start address and the checksum
uint32_t len1 = (p1-p0),

// number of bytes after the checksum
len2 = ((ptr)&__checksum_end-p1)-1;

// the initial value of the checksum
uint16_t sum = 0;

// if they exist, checksum the bytes between the checksum start
// and the checksum

if (len1)
sum = crc(sum, p0, len1);

// ignore the bytes of the checksum itself
// if they exist, checksum the bytes between the first byte after
// the checksum and the end checksum end

if (len2)
sum = crc(sum, p1+2, len2);

// compare the computed checksum to the precomputed one
if (sum == __checksum)
return 1;

return 0;
}

This works regardless of .icf-file but handling the split range is tedious.

Another approach is to move the checksum to a known location. Use a custom .icf file and update it to include this line:

place at address mem:0x40 { section .checksum };

0x40 can of course be replaced by whatever address that is suitable (and available). Typically the checksum should be placed at the start or end of the range that is to be checksummed. Doing that results in a single checksum range and that is easier to handle in the checksumming code. The interrupt vector (0x0-0x3F) is usually occupied on an ARM device, so placing the checksum at the end of the memory is probably better. Fill 0x0-0x1FFD and place the checksum on 0x1FFE.

int main()
{
// where checksumming starts
ptr p0 = (ptr)&__checksum_begin;

// number of bytes to checksum
uint32_t len1 = ((ptr)&__checksum_end-p0)+1;

// the initial value of the checksum
uint16_t sum = 0;

// checksum the range
sum = crc(sum, p0, len1);

// compare the computed checksum to the precomputed one
if (sum == __checksum)
return 1;

return 0;
}

The code here is clearer, smaller and more readable than in the example above. The downside is that it requires changes to the .icf file.

ROM byte table version

Many crc-implementations today are byte oriented table driven ones. The implementations in ielftool and XLINK are table driven.

static const uint16_t t[256] = {
0x0000,0x1021,0x2042,0x3063,0x4084,0x50a5,0x60c6,0x70e7,0x8108,0x9129,0xa14a,
0xb16b,0xc18c,0xd1ad,0xe1ce,0xf1ef,0x1231,0x0210,0x3273,0x2252,0x52b5,0x4294,
0x72f7,0x62d6,0x9339,0x8318,0xb37b,0xa35a,0xd3bd,0xc39c,0xf3ff,0xe3de,0x2462,
0x3443,0x0420,0x1401,0x64e6,0x74c7,0x44a4,0x5485,0xa56a,0xb54b,0x8528,0x9509,
0xe5ee,0xf5cf,0xc5ac,0xd58d,0x3653,0x2672,0x1611,0x0630,0x76d7,0x66f6,0x5695,
0x46b4,0xb75b,0xa77a,0x9719,0x8738,0xf7df,0xe7fe,0xd79d,0xc7bc,0x48c4,0x58e5,
0x6886,0x78a7,0x0840,0x1861,0x2802,0x3823,0xc9cc,0xd9ed,0xe98e,0xf9af,0x8948,
0x9969,0xa90a,0xb92b,0x5af5,0x4ad4,0x7ab7,0x6a96,0x1a71,0x0a50,0x3a33,0x2a12,
0xdbfd,0xcbdc,0xfbbf,0xeb9e,0x9b79,0x8b58,0xbb3b,0xab1a,0x6ca6,0x7c87,0x4ce4,
0x5cc5,0x2c22,0x3c03,0x0c60,0x1c41,0xedae,0xfd8f,0xcdec,0xddcd,0xad2a,0xbd0b,
0x8d68,0x9d49,0x7e97,0x6eb6,0x5ed5,0x4ef4,0x3e13,0x2e32,0x1e51,0x0e70,0xff9f,
0xefbe,0xdfdd,0xcffc,0xbf1b,0xaf3a,0x9f59,0x8f78,0x9188,0x81a9,0xb1ca,0xa1eb,
0xd10c,0xc12d,0xf14e,0xe16f,0x1080,0x00a1,0x30c2,0x20e3,0x5004,0x4025,0x7046,
0x6067,0x83b9,0x9398,0xa3fb,0xb3da,0xc33d,0xd31c,0xe37f,0xf35e,0x02b1,0x1290,
0x22f3,0x32d2,0x4235,0x5214,0x6277,0x7256,0xb5ea,0xa5cb,0x95a8,0x8589,0xf56e,
0xe54f,0xd52c,0xc50d,0x34e2,0x24c3,0x14a0,0x0481,0x7466,0x6447,0x5424,0x4405,
0xa7db,0xb7fa,0x8799,0x97b8,0xe75f,0xf77e,0xc71d,0xd73c,0x26d3,0x36f2,0x0691,
0x16b0,0x6657,0x7676,0x4615,0x5634,0xd94c,0xc96d,0xf90e,0xe92f,0x99c8,0x89e9,
0xb98a,0xa9ab,0x5844,0x4865,0x7806,0x6827,0x18c0,0x08e1,0x3882,0x28a3,0xcb7d,
0xdb5c,0xeb3f,0xfb1e,0x8bf9,0x9bd8,0xabbb,0xbb9a,0x4a75,0x5a54,0x6a37,0x7a16,
0x0af1,0x1ad0,0x2ab3,0x3a92,0xfd2e,0xed0f,0xdd6c,0xcd4d,0xbdaa,0xad8b,0x9de8,
0x8dc9,0x7c26,0x6c07,0x5c64,0x4c45,0x3ca2,0x2c83,0x1ce0,0x0cc1,0xef1f,0xff3e,
0xcf5d,0xdf7c,0xaf9b,0xbfba,0x8fd9,0x9ff8,0x6e17,0x7e36,0x4e55,0x5e74,0x2e93,
0x3eb2,0x0ed1,0x1ef0
};

uint16_t crc(uint16_t sum, ptr p, uint32_t len)
{
while (len--)
sum = t[(sum >> 8) ^ *p++] ^ (sum << 8);
return sum;
}

Williams has a rather thorough discussion about how the basic crc algorithm is transformed into the table driven one. There are several intermediate steps that might be of interest for those that are interested in understanding how crcs work. This table driven version computes the exact same crc as the code above (after the 2 zeroes have been processed).

On a Cortex-M3 the table driven version compiles to 552 bytes (high balanced optimization). The crc table itself consumes 512 of those. About 15 cycles are required for each checksummed byte so the table driven version is significantly faster than the basic one, at the cost of being significantly larger.

RAM byte table version

It is of course quite possible to generate the table in the program. This uses less ROM space (because the code that builds the table is significantly smaller than the table itself) but more RAM space (because the table needs to be written at runtime).

void construct_crc_table(uint16_t poly, void * space)
{
uint16_t * crc_table = (uint16_t *)space;

for (uint16_t i = 0; i < 256 ; ++i)
{
uint16_t r = i << 8;

for (int j = 0 ; j < 8 ; ++j)
r = (r & 0x8000) ? (r << 1) ^ poly : (r << 1);

crc_table[i] = r;
}
}

uint16_t crc(void * table, uint16_t sum, ptr p, uint32_t len)
{
uint16_t * t = (uint16_t *)table;

while (len--)
sum = t[(sum >> 8) ^ *p++] ^ (sum << 8);

return sum;
}

On a Cortex-M3 the RAM-based version compiles to 74 bytes (high balanced optimization). The table needs to be built in RAM, this will require 512 consecutive bytes of RAM, something that is not necessarily readily available on all systems. It is possible to build the table on the stack or on the heap, or use a global variable. The choice depends on the situation.

A possible variant when RAM is scarce is to build the table at system startup, before the initialized variables are initialized, using addresses where other things, typically initialized variables, will reside later. The checksum routine can then do its job and then variable initialization will proceed normally, destroying the table. If this is used extreme care must be taken to make sure that the table is never constructed, or used, after initialization has been performed.

The RAM-version can use different tables for different checksum jobs. This is rather unlikely to happen, but different checksum polynomials can be used in the same program.

The table is constructed using about 96 cycles per entry. About 14 cycles are required for each checksummed byte, making it speed wise more or less equivalent with the ROM-table version.

ROM nibble table version

An observation one can make here is that it is a big problem that the table uses 256 entries. If the checksum entry size is 2 bytes that will inevitably result in usage of 512 bytes of table space. If the number of entries could be made smaller the overall space requirements could be made smaller, but more look up operations would be needed in the table.

A natural candidate for a smaller unit would be 4 bits, a nibble. The nibble oriented table only has 16 entries and thus requires 32 bytes if the checksum entry is 2 bytes. The price of using a nibble in place of a byte is that you need to look up two nibbles to process one byte.

static const uint16_t t[] = {
0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50a5, 0x60c6, 0x70e7,
0x8108, 0x9129, 0xa14a, 0xb16b, 0xc18c, 0xd1ad, 0xe1ce, 0xf1ef,
};

uint16_t crc_nibble_rom(uint16_t sum, ptr p, uint32_t len)
{
while (len--) {
// hi nibble
sum = t[(sum>>12)^(*p >> 4)]^(sum<<4);

// lo nibble
sum = t[(sum>>12)^(*p++ & 0xF)]^(sum<<4);
}

return sum;
}

On a Cortex-M3 the nibble oriented ROM-version compiles to 88 bytes (high balanced). About 19 cycles are required for each byte. No additional RAM is needed.

RAM nibble table version

There is of course a corresponding nibble oriented RAM-version.

void
construct_nibble_table(uint16_t poly, void * space)
{
uint16_t * crc_table = (uint16_t *)space;

for (uint16_t i = 0; i < 16 ; ++i) {
uint16_t r = i << 12;

for (int j = 0 ; j < 4 ; ++j)
r = (r & 0x8000) ? (r << 1) ^ poly : (r << 1);

crc_table[i] = r;
}
}

uint16_t crc_nibble_ram(void * table, uint16_t sum, ptr p, uint32_t len)
{
uint16_t * t = (uint16_t *)table;

while (len--) {
// hi nibble
sum = t[(sum>>12)^(*p >> 4)]^(sum<<4);

// lo nibble
sum = t[(sum>>12)^(*p++ & 0xF)]^(sum<<4);
}

return sum;
}

On a Cortex-M3 the nibble oriented RAM-version compiles to 96 bytes (high balanced). It requires 32 consecutive bytes of space for the table. Compared to the 256 bytes the byte oriented RAM-version needs that is a significant reduction. Constructing the RAM nibble table requires about 52 cycles / entry, roughly half of what the byte oriented version needs as each entry processes half as many bits. About 19 cycles are required to checksum each byte. That makes it even with the ROM-based nibble version but it is noticeably behind the byte oriented versions.

ROM use (size of code and const tables)

RAM use RAM-memory needed for lookup table (stack usage not included)

CpB    Cycles per Byte, how many cycles crc of one byte needs
CpTE Cycles per Table Entry, how many cycles one table entry needs to
be constructed

ROM use RAM use CpB CpTE 64k-cycles 64k-time@74MHz
basic crc 70 - 125 - 8.19M 110 ms
byte oriented ROM 552 - 15 - 983k 13.3 ms
byte oriented RAM 74 512 14 96 918k* 12.4 ms*
nibble oriented ROM 88 - 19 - 1.25M 16.8 ms
nibble oriented RAM 96 32 19 52 1.25M* 16.8 ms*

* excluding the construction of the table

The RAM-versions have a serious disadvantage compared to the ROM-versions, the lookup table has to be constructed.

Table type    Entries   Cycles/entry   Time@74MHz
Nibble 16 52 0.01 ms
Byte 256 96 0.33 ms

If the table can be used for more than one checksumming (assuming that the memory is not released between uses) the cost for the building the table can be amortized over several uses. If the memory is released, or if the checksumming is only performed once, the cost of building the table should be added to the checksum cost.

The RAM-versions have an advantage as well, you can have several tables (either create an additional table or overwrite an existing one) with different polynomials. It is rather rare that more than one polynomial is used in the same program, but it is not unheard of.

Larger index types

It is possible to use larger tables than nibble or byte indexed ones. Using larger tables reduces the cycle requirement as there are fewer table lookups but requires massive resources (a 16-bit indexed crc16 table requires 128k of RAM) and needs to handle endianess. Such approaches might be interesting in (very) specific situations but they are outside the scope of this text.

Picking a implementation

The "best" solution for a particular project will depend on what you want to checksum and how often the checksumming routine will run. A fairly common case is to checksum the ROM at startup or when the flash has been updated. When choosing a crc implementation two things matter, the size of the implementation and how quickly it processes the bytes. An implementation must be both small enough and quick enough. Once an implementation is quick enough it should be as small as possible. Once an implementation fits in the available space it should be as quick as possible.

The basic crc algorithm checksums 64k in about 110 ms (at 74 MHz), it requires 70 bytes of ROM and no RAM (not counting a few bytes of stack). If that is sufficient there is no real reason to look at the other options, it is fast enough and there is no smaller alternative.

On the other hand the nibble oriented ROM-version is only 18 bytes larger, has no other additional resource requirements and is basically 6 times faster. Is the 85% time reduction worth the additional 18 bytes of ROM? In many cases it probably is.

In much the same way the byte oriented ROM-version is 464 bytes larger than the nibble oriented ROM-version. It reduces the cycle requirement per byte by about 20%. Is that time reduction worth the additional 464 bytes of ROM? In many cases it probably is not.

If you want the byte oriented version but cannot fit the 552 bytes of ROM for the ROM-version, but you can fit the 74 bytes for the byte oriented RAM-version, and you have access to 512 contiguous bytes of RAM the RAM-version might be an option. Is the time reduction worth the (possibly temporal) usage of 512 bytes of RAM? In many cases it probably is not, relatively few systems have RAM to spare.

The nibble oriented ROM version is probably the best trade-off for many, but all. It is reasonably small and reasonably fast.

Appendix A

CRC32 versions of the crc16 examples in the main text:

CRC32 has the value 0x4C11DB7 (really 0x14C11DB7, but the most significant bit is usually dropped as it does not fit in a 32-bit value).

Basic version

uint32_t
crc_impl(uint32_t sum, ptr p, uint32_t len)
{
while (len--)
{
uint8_t byte = *p++;

for (int i = 0; i < 8; ++i)
{
uint32_t osum = sum;

sum <<= 1;

if (byte & 0x80)
sum |= 1 ;

if (osum & 0x80000000)
sum ^= 0x4C11DB7; // the polynominal

byte <<= 1;
}
}
return sum;
}

static const uint8_t zeroes[] = { 0, 0, 0, 0 };

uint32_t
crc(uint32_t sum, ptr p, uint32_t len)
{
return crc_impl(crc_impl(sum, p, len), (ptr)zeroes, 4);
}


ROM byte table version

static const uint32_t t[256] = {
0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc, 0x17c56b6b,
0x1a864db2, 0x1e475005, 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd, 0x4c11db70, 0x48d0c6c7,
0x4593e01e, 0x4152fda9, 0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3,
0x709f7b7a, 0x745e66cd, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5, 0xbe2b5b58, 0xbaea46ef,
0xb7a96036, 0xb3687d81, 0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49, 0xc7361b4c, 0xc3f706fb,
0xceb42022, 0xca753d95, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d, 0x34867077, 0x30476dc0,
0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16, 0x018aeb13, 0x054bf6a4,
0x0808d07d, 0x0cc9cdca, 0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02, 0x5e9f46bf, 0x5a5e5b08,
0x571d7dd1, 0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e, 0xbfa1b04b, 0xbb60adfc,
0xb6238b25, 0xb2e29692, 0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a, 0xe0b41de7, 0xe4750050,
0xe9362689, 0xedf73b3e, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683, 0xd1799b34,
0xdc3abded, 0xd8fba05a, 0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb, 0x4f040d56, 0x4bc510e1,
0x46863638, 0x42472b8f, 0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5,
0x3f9b762c, 0x3b5a6b9b, 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623, 0xf12f560e, 0xf5ee4bb9,
0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f, 0xc423cd6a, 0xc0e2d0dd,
0xcda1f604, 0xc960ebb3, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b, 0x9b3660c6, 0x9ff77d71,
0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640, 0x4e8ee645, 0x4a4ffbf2,
0x470cdd2b, 0x43cdc09c, 0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24, 0x119b4be9, 0x155a565e,
0x18197087, 0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088, 0x2497d08d, 0x2056cd3a,
0x2d15ebe3, 0x29d4f654, 0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c, 0xe3a1cbc1, 0xe760d676,
0xea23f0af, 0xeee2ed18, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5, 0x9e7d9662,
0x933eb0bb, 0x97ffad0c, 0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4,
};

uint32_t crc(uint32_t sum, ptr p, uint32_t len)
{
while (len--)
{
uint8_t i = (sum >> 24) ^ *p++;
sum = t[i] ^ (sum << 8);
}

return sum;
}


RAM byte table version

void construct_crc_table(uint32_t poly, void * space)
{
uint32_t * crc_table = (uint32_t *)space;

for (uint32_t i = 0; i < 256 ; ++i)
{
uint32_t r = i << 24;

for (int j = 0 ; j < 8 ; ++j)
r = (r & 0x80000000) ? (r << 1) ^ poly : (r << 1);

crc_table[i] = r;
}
}

uint32_t crc_ram_table(void * table, uint32_t sum, ptr p, uint32_t len)
{
uint32_t * t = (uint32_t *)table;

while (len--)
{
uint8_t i = (sum >> 24) ^ *p++;
sum = t[i] ^ (sum << 8);
}

return sum;
}


ROM nibble table version

static const uint32_t t[16] = {
0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9,
0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005,
0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd,
};

uint32_t crc(uint32_t sum, ptr p, uint32_t len)
{
while (len--) {
// hi nibble
sum = t[(sum>>28)^(*p >> 4)]^(sum<<4);
// lo nibble
sum = t[(sum>>28)^(*p++ & 0xF)]^(sum<<4);
}
return sum;
}


RAM nibble table version

void
construct_nibble_table(uint32_t poly, void * space)
{
uint32_t * crc_table = (uint32_t *)space;
for (uint32_t i = 0; i < 16 ; ++i) {
uint32_t r = i << 28;
for (int j = 0 ; j < 4 ; ++j)
r = (r & 0x80000000) ? (r << 1) ^ poly : (r << 1);
crc_table[i] = r;
}
}

uint32_t crc(void * table, uint32_t sum, ptr p, uint32_t len)
{
uint32_t * t = (uint32_t *)table;
while (len--) {
// hi nibble
sum = t[(sum>>28)^(*p >> 4)]^(sum<<4);
// lo nibble
sum = t[(sum>>28)^(*p++ & 0xF)]^(sum<<4);
}
return sum;
}


Appendix B

Code for reflection/mirroring.

Sometimes you need to reflect bytes to make the crc match (perhaps because the crc was generated under the assumption that reflecting/mirroring crc hardware would be used and now it has to be verified with software).

static uint8_t nibb[] =
{
0, // 0 = 0000 = 0000 = 0
8, // 1 = 0001 = 1000 = 8
4, // 2 = 0010 = 0100 = 4
12, // 3 = 0011 = 1100 = C
2, // 4 = 0100 = 0010 = 2
10, // 5 = 0101 = 1010 = A
6, // 6 = 0110 = 0110 = 6
14, // 7 = 0111 = 1110 = E
1, // 8 = 1000 = 0001 = 1
9, // 9 = 1001 = 1001 = 9
5, // 10 = 1010 = 0101 = 5
13, // 11 = 1011 = 1101 = D
3, // 12 = 1100 = 0011 = 3
11, // 13 = 1101 = 1011 = E
7, // 14 = 1110 = 0111 = 7
15 // 15 = 1111 = 1111 = F
};

uint8_t mirror8(uint8_t byte)
{
return (nibb[byte & 0xF] << 4) | (nibb[byte >> 4]);
}

uint16_t mirror16(uint16_t hw)
{
return (mirror8(hw & 0xFF) << 8) | mirror8(hw >> 8);
}

void generate_mirror_byte_table(void * space)
{
uint8_t * mirror = (uint8_t *)space;

for (uint16_t i = 0 ; i < 256 ; ++i)
mirror[i] = (nib[i & 0xF] << 4) | nib[i >> 4];
}

If you can spare the 256 bytes for a mirroring table use that. If you cannot, use mirror8, no RAM requirements at a penalty of roughly 6 cycles / mirrored byte.

Appendix C

Code for initial values

Some crc algorithms use a non-zero initial value. The table driven and the basic algorithms use the initial value differently so the same initial value cannot be supplied to the initial call to the chosen crc routine.

// this is for indirect / prefixed values
// call this with the initial value and use
// the returned value instead of the initial value
// when calling a table driven (but not the basic!)
// crc routine for the first time
uint16_t fixup_table(uint16_t sum)
{
uint16_t crc = crc_table[(sum >> 8) & 0xFF];
crc = crc_table[(crc >> 8) ^ (sum & 0xFF)] ^ (crc << 8);
return crc;
}

// this is for direct / non prefixed values
// call this with the initial value and use
// the returned value instead of the initial value
// when calling the basic (but not the table driven!)
// crc routine for the first time
uint16_t fixup_basic(uint16_t init)
{
for (uint16_t i = 0 ; i < 16 ; ++i)
{
uint16_t bit = init & 1;

if (bit)
init ^= 0x1021; // poly, change as needed or make it a parameter

init >>= 1;

if (bit)
init |= 0x8000;
}
return init;
}


Appendix D

Code for outputting crc tables.

This code is not intended to be used in the program using crc. It is intended to run to generate output, a table declaration, that is then entered in the source code of the program using crc.

The program, as is, can output byte or nibble tables and it can use 16 or 32-bit polynomials. Adding a new crc-algorithm is rather straight forward;

  • add the name to the CrcType enum
  • add the enum to the switch that sets size and polynomial

The code has only been tested for 16 and 32-bit polynomial and 16 or 256 entry tables. Other configurations are possible but they probably require some extension of the code.

#include <stdio.h>
#include <stdlib.h>

typedef unsigned char * ptr;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned long uint32_t;

typedef struct crc_config
{
uint32_t mCrcMask;
uint32_t mValueMask;
uint32_t mPoly;
uint16_t mElements;
uint8_t mSize;
uint8_t mInitialShiftCount;
uint8_t mBitLimit;
uint8_t mIndent;
} crc_config;

typedef enum { kByteTable, kNibbleTable } TableType;
typedef enum { kCrc16, kCrc32 } CrcType;

uint32_t entry(uint32_t i, crc_config const * cc)
{
uint32_t r = i << cc->mInitialShiftCount;
uint32_t m = cc->mCrcMask;
uint32_t p = cc->mPoly;
uint8_t l = cc->mBitLimit;

for (int j = 0 ; j < l ; ++j)
r = (r & m) ? ((r << 1) ^ p) : (r << 1);

r &= cc->mValueMask;

return r;
}

void fill_in_crc_object(TableType table, CrcType crc, crc_config * cc)
{
// setup table type related values
switch (table)
{
case kByteTable: cc->mBitLimit = 8; break;
case kNibbleTable: cc->mBitLimit = 4; break;
default:
printf("'%d' is not a valid table type\n", table);
abort();
}

cc->mElements = (1u << cc->mBitLimit);

// setup algorithm type related values
uint8_t size;
uint32_t poly;

switch (crc)
{
case kCrc16:
size = 2;
poly = 0x1021;
break;

case kCrc32:
size = 4;
poly = 0x4C11DB7;
break;

/* add additional crc types here */
default:
printf("'%d' is not a valid polynomial type\n", crc);
abort();
}

cc->mSize = size;
cc->mPoly = poly;
uint8_t bits = size*8;

// setup derived values
// most significant bit set
cc->mCrcMask = (1u << (bits-1));

// all bits set
cc->mValueMask = (1u << bits)-1;

// initial shift count
cc->mInitialShiftCount = bits - cc->mBitLimit;

// formatting values
cc->mIndent = 2;
}

void print_crc_table(crc_config * cc)
{
printf("static const uint%d_t t[%d] = {\n",
cc->mSize*8, cc->mElements);

char buf[30];

// create the formatting string for table entries
sprintf(buf, "0x%%0%d%s", cc->mSize*2, "lx, ");

uint8_t isNew = 1, count = 0;

// max number of crc values per row
uint8_t limit = (80 - cc->mIndent) / (cc->mSize*2 + 4);

for (uint16_t i = 0 ; i < cc->mElements ; ++i)
{
// fix indentation for first row entry
if (isNew) isNew = !printf("%*s", cc->mIndent, "");

printf(buf, entry(i, cc));

// possibly break line
if (++count >= limit) count = !(isNew = printf("\n"));
}

printf("\n};\n");
}

main()
{
crc_config conf, *cc = &conf;
fill_in_crc_object(kNibbleTable, kCrc16, cc);
print_crc_table(cc);
return 0;
}

 

All Product names are trademarks or registered trademarks of their respective owners.

Related Tech Notes

© IAR Systems 1995-2016 - All rights reserved.