STM32 / Firmware Series · STM32 Firmware in Practice

FreeRTOS on STM32: Tasks, Queues, Mutexes, and How to Organize Firmware into Modules

Building a FreeRTOS skeleton on STM32: tasks, queues, mutexes, and how to split firmware into independent modules.

May 25, 2026 Updated June 25, 2026 11 min read

Đọc bằng English Tiếng Việt

1. Why you need an RTOS, and when you don’t

On the POS system I work on, firmware has to handle several things at once: receiving commands from the cashier terminal over UART, talking to the printer over a second UART, reading a card reader, updating a display, feeding the watchdog, and occasionally writing config to Flash. If everything runs in one big while(1) loop, sooner or later one of those things will block another.

FreeRTOS solves this by letting each thing run in its own task. The scheduler decides which task runs based on priority and each task’s current state.

But let’s be honest: an RTOS isn’t the answer for every project. If your firmware is a single linear flow, read a sensor, process it, send the result, a super loop with interrupts is enough and far simpler. An RTOS adds complexity, adds RAM overhead (a stack per task), and introduces new categories of bugs that this post will get into.

The entire architecture in this post has been implemented in a real skeleton project on a NUCLEO-G0B1RE: 5 tasks, 1 queue, 1 mutex, and counters measured directly from the board. See the details in STM32 FreeRTOS Firmware Skeleton.

2. Tasks, the basic unit of FreeRTOS

Each task is a C function running in an infinite loop, with its own stack and its own state:

void TaskWorker(void *argument){    for (;;) {        AppEvent_t event;        if (xQueueReceive(xAppEventQueue, &event, portMAX_DELAY) == pdTRUE) {            ProcessEvent(&event);        }    }}

You create a task with xTaskCreate() or through CubeMX (using the CMSIS RTOS v2 wrapper):

/* Bare FreeRTOS API */xTaskCreate(    TaskWorker,         /* Task function */    "Worker",           /* Debug name */    256,                /* Stack size, in WORDS, not bytes */    NULL,               /* Parameter */    3,                  /* Priority: higher number = higher priority */    &xWorkerTaskHandle);/* Or CMSIS RTOS v2 (CubeMX-generated) */workerTaskHandle = osThreadNew(TaskWorker, NULL, &workerTask_attributes);

Stack size, an easy thing to get wrong

With the native FreeRTOS API xTaskCreate(), stack depth is usually expressed in words. On Cortex-M, 1 word = 4 bytes, so 256 works out to about 1024 bytes.

But with CMSIS-RTOS v2’s osThreadNew(), the .stack_size field in osThreadAttr_t is passed in bytes, not words. So if you want the same 256-word stack as the example above, you need to write it explicitly:

osThreadAttr_t workerTask_attributes = {    .name = "Worker",    .stack_size = 256 * 4,  /* bytes, not words */    .priority = (osPriority_t) osPriorityNormal,};

This is a very easy mistake to make when you’re reading the original FreeRTOS docs while also using code generated by CubeMX/CMSIS-RTOS. You read “stack size 256,” type .stack_size = 256 straight into CubeMX, and end up allocating only a quarter of the stack you intended, often leading to a stack overflow you can’t explain.

Too small a stack leads to overflow and crashes with no obvious cause, and these are usually hard to debug because the crash happens far from where it actually started. FreeRTOS has a watermark to measure real stack usage:

/* See how much stack is left, call while the system is running normally */UBaseType_t remaining = uxTaskGetStackHighWaterMark(NULL); /* NULL = current task *//* If remaining is small (under 20 words), increase the stack */

During development, enable configCHECK_FOR_STACK_OVERFLOW in FreeRTOSConfig.h:

#define configCHECK_FOR_STACK_OVERFLOW  2  /* Method 2, more thorough checking */void vApplicationStackOverflowHook(TaskHandle_t xTask, char *pcTaskName) {    /* Whichever task overflows its stack ends up here */    /* Blink an error LED or log the task name before hanging */    Error_Handler();}

3. Priority and the scheduler, understanding it so it doesn’t surprise you

FreeRTOS uses a preemptive priority scheduler: whichever task has the highest priority and is in the Ready state takes the CPU. A running task gets preempted as soon as a higher-priority task becomes ready.

High priority --- Task_CardReader (priority 5)                       blocked waiting for a cardMid priority  --- Task_Printer    (priority 3)  <- currently runningLow priority  --- Task_UI         (priority 1)  <- preemptedLow priority  --- Task_Logger     (priority 1)

A reasonable approach to assigning priority in a multi-task firmware:

Item	Value	Note
High priority (4-5)	Time-critical: card reads, host commands	If this task is late, it directly affects the transaction.
Medium priority (3)	Output: printing, sending responses	Important, but can tolerate a few ms of delay.
Low priority (1-2)	Background: UI updates, logging, watchdog feeding	Runs whenever no important task needs the CPU.
Priority 0	FreeRTOS's idle task	Don't put application tasks here.

The skeleton project uses a simpler model than the table above (tasks don’t have a wide spread of priority levels), because its goal is to prove that task timing and queue communication work correctly, not to test priority inversion. Priority inheritance and priority inversion will get their own post once there’s a project with enough priority levels to actually measure that phenomenon.

4. Communication between tasks, the Queue is the default choice for data transfer

A queue passes structured data from a producer to a consumer. It’s thread-safe, ISR-safe (with the FromISR API), and can hold multiple items.

/* Create a queue holding up to 8 AppEvent_t */xAppEventQueue = xQueueCreate(8, sizeof(AppEvent_t));/* Producer: send an event into the queue */AppEvent_t event = { .type = APP_EVENT_SENSOR_SAMPLE, .tick = xTaskGetTickCount(), .value = 42 };if (xQueueSend(xAppEventQueue, &event, pdMS_TO_TICKS(100)) != pdTRUE) {    dropped_count++; /* Queue full, event dropped */}/* Consumer: receive an event from the queue (blocks until one arrives) */AppEvent_t received_event;xQueueReceive(xAppEventQueue, &received_event, portMAX_DELAY);

A queue copies data; the receiver gets a copy, not a pointer into the sender’s memory. Safer, but watch the item size: for queue items larger than around 32 bytes, it’s usually better to pass a pointer instead of copying.

With a 12-byte event struct and an 8-slot queue, the queue holds only 96 bytes total, small enough to calculate exactly when it will fill up. This is how the skeleton project designed its Queue Stress test: generate a burst of 20 events while the queue only holds 8, so the dropped counter increases according to the math, not by chance.

Accounting check: making sure events don’t silently disappear

A much stronger way to verify a queue is to not just look at dropped_count, but to trace where every single event actually went.

In the hands-on project STM32 FreeRTOS Firmware Skeleton, the stress test generates 20 events per burst while the queue only has 8 slots. Each burst produces:

generated=20 accepted=8 dropped=12

In a longer snapshot:

generated=4220
processed=1680
dropped=2532
queue=8

You can verify this with a simple formula:

processed + dropped + queue = generated
1680 + 2532 + 8 = 4220

The key point is that the firmware never lets an event silently disappear. Every event generated must end up in one of three states: processed, dropped and counted, or still sitting in the queue. In real firmware, this kind of accounting check makes debugging overload conditions far clearer than simply watching whether the system “still runs.”

5. Mutex, mutual exclusion for a shared resource

A mutex protects a shared resource, ensuring only one task accesses it at a time. It has ownership (only the task that took it can give it back) and priority inheritance to reduce priority inversion.

/* Protecting a UART log shared across multiple tasks */if (xSemaphoreTake(xUartLogMutex, pdMS_TO_TICKS(50)) == pdTRUE) {    HAL_UART_Transmit(&huart2, (uint8_t*)buffer, len, 1000);    xSemaphoreGive(xUartLogMutex);}

This is exactly how the skeleton project uses its UART Log Mutex: multiple tasks (the Worker Task, the Monitor Task) both need to print logs, and without a mutex, two log lines could end up with their characters interleaved mid-print. The concrete test and the real log proving this live in the hands-on section.

6. ISRs: what you must never do

This is a list worth memorizing, not just skimming:

/* INSIDE AN ISR, DO NOT USE */xQueueSend()              -> use xQueueSendFromISR() insteadvTaskDelay()               -> NEVER, an ISR must not blockprintf() / vsnprintf()     -> heavyweight, not ISR-safeHAL_UART_Transmit()        -> blocking, never inside an ISRmalloc() / free()          -> not thread-safe inside an ISR/* INSIDE AN ISR, always pair with portYIELD_FROM_ISR */void MY_IRQHandler(void) {    BaseType_t xHigherPriorityTaskWoken = pdFALSE;    xQueueSendFromISR(xQueue, &data, &xHigherPriorityTaskWoken);    /* If this send just woke a task with higher priority than the     * one currently running, yield immediately so it runs on ISR exit */    portYIELD_FROM_ISR(xHigherPriorityTaskWoken);}

Missing portYIELD_FROM_ISR doesn’t crash anything right away, but the higher-priority task gets delayed until the next SysTick tick, introducing unnecessary latency.

The skeleton project currently generates synthetic events from within a task (the Event Generator Task); there’s no real ISR sending into the queue yet. The case of a real ISR feeding a queue, along with the subtle bugs that arise when a signal gets lost between an ISR and a task, will get their own post once the project has a real interrupt source to measure against.

7. Splitting tasks into independent modules

This is the part I find most valuable when organizing code for a multi-task firmware.

The real problem: if code isn’t clearly separated, tasks tend to call each other’s functions directly, a change to one task can affect another without anyone noticing, and bugs show up in places nobody expected.

The fix: each task is its own independent module, communicating only through queues or mutexes, never calling another module’s functions directly.

firmware/├─ Core/│   ├─ Inc/│   │   ├─ app_main.h│   │   ├─ app_tasks.h│   │   ├─ app_events.h│   │   └─ uart_log.h│   └─ Src/│       ├─ app_main.c      (init, create RTOS objects, start the scheduler)│       ├─ app_tasks.c      (task entry functions, counters, monitor output)│       ├─ app_events.c     (AppEvent_t definition, event-building helpers)│       └─ uart_log.c       (logging function using the mutex)

Principles for genuinely independent tasks

1. Tasks only communicate through a queue or mutex, never call another task’s function directly:

/* Correct: the Event Generator Task sends an event into the queue for the Worker Task */AppEvent_t event = BuildSensorEvent(value);xQueueSend(xAppEventQueue, &event, pdMS_TO_TICKS(100));/* Wrong: calling the Worker Task's handler function directly */ProcessEvent(&event);  /* Who guarantees Worker is ready, or what context it's in right now? */

2. Each counter is owned by exactly one task.

This is the Counter Ownership principle in the skeleton project: fast_count is only ever incremented by the Fast Task, heartbeat_count only by the Heartbeat Task, processed_count only by the Worker Task. The Monitor Task only reads and prints, never computing a counter from uptime on its own.

/* Correct: the Monitor Task only reads a value some other task incremented */UartLog_Printf("[MONITOR] fast=%lu heartbeat=%lu\n", fast_count, heartbeat_count);/* Wrong: the Monitor Task derives the number itself */uint32_t estimated_heartbeat = uptime_ms / 500;  /* This is no longer measured evidence */

Why this principle matters: if the Monitor Task recomputed a counter itself, the number it prints would no longer prove the Heartbeat Task actually ran correctly, it would only prove the division was correct.

3. uart_log.h is a shared API, not a place for tasks to call HAL directly:

/* uart_log.h */void UartLog_Init(UART_HandleTypeDef *huart, osMutexId_t mutex);void UartLog_Printf(const char *format, ...);

The mutex is passed in at init time, because uart_log.c needs to hold a reference to it for every subsequent call to UartLog_Printf(). Every task logs through UartLog_Printf(), never through HAL_UART_Transmit() directly. When you need to change UARTs or add a policy (rate-limiting logs, adding timestamps), there’s exactly one file to touch: uart_log.c.

8. A FreeRTOS checklist for a firmware skeleton

Before deploying a FreeRTOS firmware skeleton

Has every task's stack size been checked with uxTaskGetStackHighWaterMark()?
Is configCHECK_FOR_STACK_OVERFLOW enabled in the development build?
Do ISRs only use the FromISR API (xQueueSendFromISR, etc.)?
Is portYIELD_FROM_ISR called after every FromISR API call?
Does each counter have exactly one task allowed to increment it?
Does the Monitor Task only read and print counters, never recompute them from uptime?
Does UART logging use a mutex to prevent interleaved characters between tasks?
Does the queue have a clear full policy (drop and count)?
Does each task have its own module file, without calling another task's function directly?

9. Summary

FreeRTOS isn’t hard, but there are plenty of details where one mistake produces a bug that’s very hard to track down:

Task stacks tend to be set too small, use the watermark to measure
A Queue is the default choice for passing structured data between tasks
A Mutex protects shared resources like UART logging, with ownership and priority inheritance
An ISR must never use blocking or non-FromISR APIs
Each counter should have a single owner, so the measured evidence actually means something
Splitting tasks into independent modules makes debugging easier, whether you’re working alone or with a team

References

FreeRTOS Documentation, Mastering the FreeRTOS Real Time Kernel, free PDF
FreeRTOS API Reference, Queue, Semaphore, Task Notification API
STM32CubeIDE FreeRTOS Guide, CMSIS RTOS v2 on STM32