 A simple system clock module for AVR

26 Jul 2021 - tsp
Last update 26 Jul 2021 13 mins

The following blog post describes the implementation and workings of a simple module for AVR microcontrollers that allows keeping track of elapsed time and a simple implementation of functions such as delay and micros that are commonly known from various libraries and runtimes (such as the C standard library, Arduino, etc.). It shows how one is able to implement such functions utilizing the timer 0 of AVRs and the simple equations that are required to keep track of time.

Initialization and timer 0 overflow ISR

In my module Timer 0 is configured to run off the system frequency F_CPU, for example $16 MHz$ or $16000000 Hz$. Since frequency is inverse proportional to the elapsed period one can directly calculate the length per clock cycle:

[ \Delta t_{cpu} = \frac{1}{f_{cpu}} \\ \Delta t_{cpu} = \frac{1}{16 * 10^6 Hz} \\ \Delta t_{cpu} = 6.25 * 10^{-8} s \\ \Delta t_{cpu} = 62.5 * 10^{-9} s = 62.5 ns ]

As one can see each clock pulse is taking $62.5$ nanoseconds. Since this would be way too fast for an ISR to trigger (the ISR would only have a single instruction to finish) one applies a prescaler. The larger the prescaler the lower the resolution of the clock but the less time is wasted for the timekeeping task. The period of a single timer tick scales linear with the prescaler - a 64 way prescaler for example would have a period of $\Delta t = 62.5 * 64 ns = 1000 ns$ and thus $\Delta t = 1 us$. Each ISR would have only 64 clock cycles of time to handle the task - before the next ISR is triggered again. This would still be way to much work for the microcontroller - and even if possible a huge proportion of the available time would be used for timekeeping without providing any more useful functionality.

[ \Delta t_{tick} = \frac{1}{f_{cpu}} * n_{prescaler} \\ \Delta t_{tick} = \Delta t_{cpu} * n_{prescaler} \\ \Delta t_{tick} = 62.5 ns * 64 = 1000 ns = 1 \mu s ]

To divide the clock cycle even further I’m only using the overflow interrupt of timer 0. Since timer 0 is a 8 bit timer it will overflow every 256’th time - the overflow interrupt will thus be only triggered

[ \Delta t_{overflow} = \frac{1}{f_{cpu}} * n_{prescaler} * 256 \\ \Delta t_{overflow} = 1 \mu s * 256 = 256 \mu s ]

A period of $256$ microseconds sounds way less problematic - this is a single overflow interrupt every 16384 clock cycle. This is the first definition that’s calculated in sysclk.h:

[ \Delta t_{overflow} = \frac{1}{f_{cpu}} * n_{prescaler} * 256 \text{seconds} \\ \Delta t_{overflow} = \frac{1}{f_{cpu}} * n_{prescaler} * 256 * 1000000 \text{microseconds} \\ \Delta t_{overflow} = \frac{1000000 * 256 * 64}{f_{cpu}} ]
#define SYSCLK_TIMER_OVERFLOW_MICROS	(64L * ((256L * 1000000L) / F_CPU))

The brackets are selected in a way to not lead to an overflow during calculation since compilers truncate the intermediate results. From this information one can calculate the milliseconds per tick rounded down to the nearest integer millisecond:

#define SYSCLK_MILLI_INCREMENT			(SYSCLK_TIMER_OVERFLOW_MICROS / 1000L)

To reduce the drift due to rounding error and to allow a micros() function with sub millisecond resolution later on the application will also keep track of the elapsed micro seconds. This is done by adding the remaining microseconds whenever the timer overflows. As soon as the microseconds sum up to a full millisecond the microsecond counter will advance again. Unfortunately the remainder of a division by $1000$ would require 10 bits of storage. To still fit into a single byte and allow useful addition only the most significant 7 bits of the millisecond increment will be stored. Thus also the reachable threshold for a millisecond will have to be shifted the same amount of bits (divided by 8 / shifted by 3 bits)

#define SYSCLK_MILLIFRACT_INCREMENT		((SYSCLK_TIMER_OVERFLOW_MICROS % 1000L) >> 3)
#define SYSCLK_MILLIFRACT_MAXIMUM		(1000 >> 3)

The ISR that’s then called on every timer overflow performs a pretty simple job:

• Increment the internal current millisecond counter by SYSCLK_TIMER_OVERFLOW_MICROS
• Increment the internal fractional microsecond counter by SYSCLK_MILLIFRACT_INCREMENT
• Whenever the fractional microsecond counter passes over SYSCLK_MILLIFRACT_MAXIMUM the internal millisecond counter will be incremented by one and the used microseconds will be removed from the internal microsecond counter.
• An additional internal tick counter systemMonotonicOverflowCnt will be incremented on every timer invocation.
volatile unsigned long int systemMillis					= 0;
volatile unsigned long int systemMilliFractional		= 0;
volatile unsigned long int systemMonotonicOverflowCnt	= 0;

ISR(TIMER0_OVF_vect) {
unsigned long int m, f;

m = systemMillis;
f = systemMilliFractional;

m = m + SYSCLK_MILLI_INCREMENT;
f = f + SYSCLK_MILLIFRACT_INCREMENT;

if(f >= SYSCLK_MILLIFRACT_MAXIMUM) {
f = f - SYSCLK_MILLIFRACT_MAXIMUM;
m = m + 1;
}

systemMonotonicOverflowCnt = systemMonotonicOverflowCnt + 1;

systemMillis = m;
systemMilliFractional = f;
}

The initialization is pretty simple:

• Disable interrupts
• Configure the timer 0 without any output compare modes in normal operation mode (no PWM, no special waveform generation modes, etc.) by setting TCCR0A to 0x00
• Disable any forced output compare results and set output prescaler to $\frac{1}{64}$ by setting TCCR0B to 0x03
• Enable the timer overflow interrupt by setting the timer overflow interrupt bit TOIE0 in TIMSK0
• Disable any power management features by writing 0 to PRTIM0 in PRR
• Re-enable interrupts in case they’d been enabled before invocation of the init function.
void systickInit() {
uint8_t sregOld = SREG;

cli();

TCCR0A = 0x00;
TCCR0B = 0x03;		/* /64 prescaler */
TIMSK0 = 0x01;		/* Enable overflow interrupt */

PRR = PRR & (~0x20);

SREG = sregOld;
}

A simple millis() function

The millis function should just return the current time in milliseconds modulo an implementation specific word size. Since we’re counting a systemMillis variable this is pretty easy - interrupts are just disabled in case they’ve been enabled to prevent partial reads from the variable:

unsigned long int millis() {
unsigned long int m;

uint8_t srOld = SREG;
cli();

m = systemMillis;
SREG = srOld;

return m;
}

The more complex micros() function

The micros() function that should deliver the current time in microseconds modulo an implementation specific word size is a little bit more challenging. To implement that function I’m counting the number of overflows that have occurred (every 256’th timer tick). In addition one can use the current value of the TCNT0 register - i.e. the current number of elapsed timer ticks.

[ n_{tickstotal} = n_{TCNT0} + n_{overflow} * 256 ]

Using the duration of a single tick in microseconds one could now calculate the elapsed time:

[ n_{tickstotal} = n_{TCNT0} + n_{overflow} * 256 * \Delta t_{tick} \\ n_{tickstotal} = n_{TCNT0} + n_{overflow} * 256 * \frac{n_{prescaler} * 1000000}{f_{cpu}} \\ ]

There is one drawback though and that’s whenever there is an unhandled timer 0 interrupt in case the timer has overflown exactly when the register has been read. To circumvent that situation one check if the timer has a pending overflow by checking the TOV0 bit in the TIFR0 register:

unsigned long int micros() {
uint8_t srOld = SREG;
unsigned long int overflowCounter;
unsigned long int timerCounter;

cli();
overflowCounter = systemMonotonicOverflowCnt;
timerCounter = TCNT0;

if(((TIFR0 & 0x01) != 0) && (timerCounter < 255)) {
overflowCounter = overflowCounter + 1;
}

SREG = srOld;

return ((overflowCounter << 8) + timerCounter) * (64L / (F_CPU / 1000000L));
}

Again brackets in the calculation have been chosen in a way to prevent rounding errors and truncation due to overflow.

A busy waiting delay(period) function

Based on the monotonic clock and the micros() function one can also implement a simple delay function. It would also be possible to use millis() but that would result in an error of around $\pm 1 ms$. Using micros() reduces the possible error way into the sub millisecond range. In case this is not necessary millisr might be more interesting. Keep in mind that busy-waiting is usually considered a bad idea anyways - consider interrupt driven designs in these cases and only use busy-waiting when it’s really acceptable.

The easiest implementation to implement the delay first queries the current time in microseconds. It then checks if the difference between current and previous microsecond timestamp is equal or larger than $1000 \mu s$ which equals $1 ms$. If this happens an internal counter that contains the time to wait in milliseconds gets decremented. The internal state variable of the last known microseconds timestamp gets added $1000$. It’s not reset to the current value to not accumulate drift by execution of instructions during the micros() calculation and inside the loop of the delay function - thus the reference is always the timestamp queried initially and error does not accumulate. Overflow is accounted automatically for by the wrap around of the addition and subtraction.

void delay(unsigned long millisecs) {
unsigned int lastMicro;

lastMicro = (unsigned int)micros();

while(millisecs > 0) {
unsigned int curMicro = micros();
if(curMicro - lastMicro >= 1000)  {
lastMicro = lastMicro + 1000;
millisecs = millisecs - 1;
}
}
return;
}

Waiting for microseconds

Implementing a delayMicros(period) function is way more challenging if one wants to be able to specify durations precisely. So this function contains some arcane magic in form of inline assembly to burn a specific amount of CPU cycles to round up function invocation to full clock cycles as well as some knowledge about the code generated by the compiler - so this is not portable and works only for a specific compiler with specific optimization setting. In case one discovers that the routines don’t work any more one has to inspect assembly output and readjust again.

void delayMicros(unsigned int microDelay) {
#if F_CPU == 20000000L
__asm__ __volatile__ (
"nop\n"
"nop\n"
);
if((microDelay = microDelay - 1) == 0) {
return;
}

microDelay = (microDelay << 2) + microDelay;
#elif F_CPU == 16000000L
if((microDelay = microDelay - 1) == 0) {
return;
}

microDelay = (microDelay << 2) - 2;
#elif F_CPU == 8000000L
if((microDelay = microDelay - 1) == 0) {
return;
}
if((microDelay = microDelay - 1) == 0) {
return;
}

microDelay = (microDelay << 1) - 1;
#else
#error No known delay loop calibration available for this F_CPU
#endif

__asm__ __volatile__ (
"lp: sbiw %0, 1\n"
"    brne lp"
: "=w" (microDelay)
: "0" (microDelay)
);
return;
}

Clock source and system clock prescaler

Since this is a common mistake a short note about clock sources is useful: If you’re running the code above and measuring that a delay of $1000 ms$ takes around 15 to 16 seconds (varying in exact time) this might be due to the fact the AVRs are usually shipped with an internal RC oscillator selected (CKSEL = 0010b) that’s oscillating at around 8 MHz. In addition the CKDIV8 fuse is also set (0) so the clock is divided even further by a factor of 8 resulting in a 1 MHz master clock (side note: The startup delay is also maximized by setting SUT = 10b which gives the oscillator the longest time available to stabilize before executing the reset handler). Since it’s an RC oscillator it’s not that stable so timing might not be reliable. The specified 8 MHz are nominal frequency at 25 degree Celsius and stable 5V operating voltage - the same instability also has to be accounted for the other available internal oscillator that’s running at 128 kHz that’s available for special low power applications.

In case one wants to use an external full swing crystal oscillator (for example at 16 MHz) one has to set a proper clock selection in CKSEL and disable the clock division. The CKDIV8 bit could be programmed by flashing the low fuse byte - but it’s also possible to change the prescaler bits in CLKPR at runtime:

CLKPR = 0x80;
CLKPR = 0;

Changing the clock source away from the internal RC oscillator is not possible at runtime and requires pre programming the CKSEL and SUT bits in the low fuse byte. When one uses an external 16 MHz quartz oscillator for example one should set CKSEL=1111 and SUT=11. In case one also wants to disable clock division by 8 on startup the CKDIV8 fuse should be set to 1. The last feature that is controlled via the low fuse byte is CKOUT. If this fuse is programmed the divided system clock will be output by PB0 independent of any other settings - which is usually not desired. To disable this feature one should set CKOUT=1. In case one wants to use the mentioned external 16 MHz quartz it’s required to program the low fuse to 0xFF. This is done by adding the argument -U lfuse:w:0xff:m to the avrdude command.

For example:

avrdude -v -p atmega328p -c avrisp -P /dev/ttyU0 -b 57600 -U lfuse:w:0x7f:m -U flash:w:example.hex:i

To revert the fuses to factory defaults one might use

avrdude -v -p atmega328p -c avrisp -P /dev/ttyU0 -b 57600 -U lfuse:w:0x7f:m -U hfuse:w:0xd9:m -U efuse:w:0xff:m

A last word of caution before one plays around with fuses: If one disabled SPI program download in the hfuse (which is enabled by default) the usual cheap or self built programmers for AVRs (or using an Arduino as an ISP) won’t work any more. In this case one requires a programmer that supports high voltage programming mode that applies 12V to the AVRs reset pin and then uploads the program using an parallel programming interface instead of the serial one used by SPI based programmers.

Source code

The source code for the whole simple sysclock module is available as a GitHub GIST