SMM in UEFI: System Management Mode from a debug perspective
SMM is not a linear boot phase but an isolated execution mode. SMI triggers, SMRAM, handler timing, SmmReadyToLock, and debugging SetVariable that silently fails.
Why firmware engineers need to understand SMM
There’s a class of bug report I see fairly often when customizing BIOS for POS systems: the customer says “the terminal is acting strange — settings are right today but back to defaults by morning.” They suspect an auto-reset or some malware. It’s neither.
Most of those cases trace back to SMM — or more precisely, to the path from Setup UI to SPI flash that goes through SMM and gets blocked somewhere without a clear error message.
If you’re working on BIOS and you encounter any of these situations:
- BIOS setting is saved but reverts after reboot
- BIOS flash from OS reports success but nothing changes
- Some firmware services stop working once the OS has booted
SetVariable()returnsEFI_ACCESS_DENIEDorEFI_WRITE_PROTECTED
There’s a good chance the problem is in SMM, System Management Mode.
SMM is not an obscure academic concept. It’s the layer many platforms use to protect NVRAM/SPI flash writes, handle certain hardware events, and enforce security policy. When it fails or is misconfigured, symptoms tend to be vague — only certain operations don’t work, with no clear error message.
SMM: a simple mental model
The easiest way to think about it: SMM is a completely isolated execution environment that is activated by a special signal called SMI#.
When the CPU receives SMI#:
- The CPU stops everything — including the running OS
- Saves the entire CPU state to a special memory area called the Save State
- Jumps into SMRAM, a memory region the OS cannot read or write after it’s properly locked
- Runs the SMM handler
- Returns via the
RSMinstruction; the CPU state is restored and the OS continues
OS running
CPU is executing normal OS or application code
SMI occurs
Hardware event or software trigger generates a System Management Interrupt
CPU state saved
Register state written to the Save State area
Enter SMM
CPU runs SMM Core, Dispatcher, and handlers in SMRAM
Return from SMM
CPU restores state, OS continues as if nothing happened
The OS never knows SMM ran. The entire process is invisible to the OS — unless SMM runs so slowly that it causes noticeable latency.
Two common SMI types
Hardware SMI
Generated by the chipset in response to a hardware event: power button, thermal event, GPIO trigger, and others. Platform and board specific.
Software SMI (SW SMI)
Generated by writing to the APM control port, typically 0xB2:
mov al, 0x42 ; command ID, platform-specific
out 0xB2, al ; trigger SW SMI
After the out instruction, the chipset asserts SMI#, the CPU enters SMM, and the dispatcher looks for a handler registered for command 0x42.
// In firmware, register an SW SMI handler
SmmSwDispatch2->Register (
SmmSwDispatch2,
MySwSmiHandler,
&Context, // contains SwSmiInputValue = 0x42
&DispatchHandle
);
POST codes related to SMM
For AMI Aptio 5.x, SMM initialization happens in two phases:
| Code | Phase | Meaning |
|---|---|---|
| 0x36 | PEI | CPU post-memory initialization, SMM initialization |
| 0x6A | DXE | North Bridge DXE SMM initialization |
| 0x71 | DXE | South Bridge DXE SMM initialization |
If POST code is stuck at 0x6A or 0x71, the problem is related to SMM initialization in DXE — typically SMRAM not allocated correctly, or the SMM Core failing to load.
On some boards with a debug build, you may also see messages like:
SMM Core: SMRAM not found
SMM Core: Failed to allocate SMRAM
SMRAM: the hidden memory region
SMRAM is a region of DRAM that firmware hides from the OS and DXE after a certain point. After SMRAM is properly closed and locked, only the CPU in SMM mode can read or write that region.
The SMRAM lifecycle:
Reserve SMRAM
SMRAM is discovered and reserved, but not yet fully locked
Load SMM Core
SMM Core and SMM drivers are loaded into SMRAM
Register handlers
SMM drivers register their handlers before the lock
SmmReadyToLock
SMRAM is closed/locked; security policy starts being enforced
Runtime use
SMRAM is locked; OS cannot access it as regular RAM
After SMRAM is locked, no new SMM modules can be loaded. This is why handlers must be registered before SmmReadyToLock.
A very common mistake: a developer adds a new SMM driver but it dispatches after SmmReadyToLock because of a wrong DEPEX or incorrect dispatch order. The result is that the handler is never registered and the service silently does nothing — no error, just silence.
SmmReadyToLock: the point of no return
In EDK II/PI-style firmware, SmmReadyToLock is typically represented by a protocol or notification signaled in DXE to indicate that SMM is about to be locked. When this signal fires:
- SMRAM is closed and locked
- No new SMM modules can be installed
- Many security policies are enforced
DXE execution
SMM drivers can still dispatch and register handlers
Register handler
Handler registered before lock — OK
Register handler
Handler registered before lock — OK
SmmReadyToLock
SMRAM closed/locked, policy enforced
Dispatched late
Too late to register handler — service silently does nothing
Timing bugs around SmmReadyToLock are some of the hardest bugs in BIOS development because:
- There’s no assertion or clear error message
- The handler was never registered; the service silently does nothing
- It only reproduces when some driver changes the dispatch order
SMM Communication: the safe path for talking to SMM
When a DXE driver or firmware component needs to request a service inside SMM, it typically goes through the SMM Communication Protocol. After the OS is running, many operations like SetVariable() go through UEFI Runtime Services, and the runtime variable service underneath may trigger an SMI to forward the request into SMM. The OS does not call the SMM Communication Protocol directly the way a DXE driver does.
Firmware component
DXE driver or firmware component creates CommBuffer outside SMRAM
SmmCommunication
Calls SmmCommunication->Communicate()
Trigger SMI
Firmware path triggers SMI to enter the SMM handler
Handler processes
Handler validates input, processes request, writes result to CommBuffer
RSM
DXE driver reads the result after SMM returns
OS calls Runtime Service
For example, efibootmgr or OS component calls SetVariable()
Runtime variable service
Runtime Services receive the request after ExitBootServices
Trigger SMI
Runtime variable path may forward the request into SMM
SMM variable handler
Handler applies policy and writes to NVRAM/SPI flash if valid
CommBuffer MUST be outside SMRAM. This validation is required in every SMM handler:
// Pseudo code — not production code
if (IsAddressInSmram(CommBuffer, CommBufferSize)) {
return EFI_SECURITY_VIOLATION;
}
Skipping this validation allows an OS-level attacker to create a CommBuffer pointing into SMRAM and read or write sensitive data — this is a well-known class of SMM vulnerability.
Serial log in SMM
Debugging in SMM is harder than in DXE because:
- SMM runs in an isolated environment; some infrastructure normally available in DXE is not accessible here
- The
DEBUG()macro can still be used, but requires a DebugLib configured appropriately for SMM
// In an SMM driver, after setup
DEBUG ((DEBUG_INFO, "SMM Handler: received command 0x%x\n", Command));
DEBUG ((DEBUG_INFO, "SMM Handler: CommBuffer at 0x%p, size %d\n",
CommBuffer, CommBufferSize));
Patterns to look for when debugging SMM through serial log:
Log from the DXE driver calling Communicate:
VariableSmm: Communicate called, command = SetVariable
VariableSmm: Waiting for SMM...
VariableSmm: SMM returned EFI_SUCCESS
Log from the SMM handler (if serial log is configured for SMM):
SmmVar: Handler entered, GUID = ...
SmmVar: SetVariable 'BootOrder', size = 8
SmmVar: Writing to NVRAM...
SmmVar: Done
If you see the DXE side calling Communicate but no SMM handler log, there are three things to suspect: the handler was not registered, the SMI trigger failed, or the log level is wrong.
Debug diary: setting saved but has no effect
This is something I’ve encountered more than once. On a POS system, after changing some policies — boot timeout, USB policy, network stack — and flashing new firmware, the customer’s technician goes into Setup, changes settings, saves and exits. After reboot everything is back to defaults. They change them again. Reboot again. Same result.
The first instinct is to suspect aggressive default-restore logic somewhere. But after enabling serial log, what appears is a line: SetVariable() = EFI_WRITE_PROTECTED, silent, no popup, no error message in the UI.
Approach layer by layer:
Layer 1: Is HII/Setup UI calling SetVariable?
Log the DXE variable service. If you don’t see SetVariable being called after the user saves, the problem is in the HII callback or the Setup Browser.
Layer 2: SetVariable is being called but returning an error?
VariableSmm: SetVariable 'SetupData' = EFI_ACCESS_DENIED
EFI_ACCESS_DENIED here typically means: the variable attributes require an authenticated write, or an SMM policy is blocking writes from DXE or the OS.
Layer 3: SetVariable succeeds but SPI flash isn’t written
VariableSmm: SetVariable success
SpiFlash: WriteBlock failed: write protection enabled
Firmware writes to the NVRAM cache in RAM successfully but flushing to SPI flash fails because flash write protection wasn’t unlocked correctly.
Layer 4: SPI flash is written but corrupted
Less common, but possible due to a full variable store or power loss during the write.
Checklist: setting not persisting
What SMM handlers can and cannot do
SMM is a high-privilege environment, but also a more constrained one than DXE. Things that look normal in DXE can cause crashes or undefined behavior in SMM.
Allowed
Validate and process the communication buffer:
// OK: validate before using
if (!IsBufferOutsideMmram(CommBuffer, CommBufferSize)) {
return EFI_SECURITY_VIOLATION;
}
CopyMem(&LocalCopy, CommBuffer, sizeof(MY_REQUEST));
Read/write hardware registers via MMIO:
// OK: accessing hardware registers in SMM
UINT32 Val = MmioRead32(ChipsetBase + REG_OFFSET);
MmioWrite32(ChipsetBase + REG_OFFSET, Val | ENABLE_BIT);
Use SMM-safe memory services, read CPU Save State, write log via serial (if DebugLib is configured for SMM).
Not allowed, and the consequences
1. Calling Boot Services after they’ve been invalidated
After ExitBootServices(), all Boot Services are gone. If an SMM handler still calls gBS->AllocatePool() or any Boot Service:
Undefined behavior — typically CPU exception or system hang
The gBS pointer still exists in memory but the code it points to has been freed or is no longer valid.
2. Calling DXE protocols from an SMM handler
SMM handlers cannot locate and use DXE protocols directly. SMM runs in an isolated environment; the DXE handle database is not accessible from here in the normal way.
3. Accessing SMRAM from outside SMM mode
Once SMRAM is locked, any read/write to the SMRAM region from OS or DXE code is rejected by hardware. This is intentional. Hardware ensures the OS cannot spy on or patch SMM.
4. Triggering SMIs from within an SMM handler (nested SMI)
Nested SMI is highly platform-dependent. Many chipsets will mask or defer SMIs while the CPU is in SMM. Don’t design handlers that depend on nested SMI.
5. Using floating point or SIMD without saving/restoring state
The CPU Save State when entering SMM does not by default save FPU/SSE/AVX registers. If a handler uses float or SIMD operations without saving and restoring them:
OS/application loses FPU register data
OS-level crash, not firmware-level
Extremely hard to trace — no error in SMM
6. Long-running operations that block the CPU
SMM blocks all CPUs in a multi-core system while running. If a handler does something that takes milliseconds — waiting for a flash write without a timeout:
OS scheduler stalls, watchdog may fire
System hang or NMI
On laptops: battery/thermal management interrupted
Summary: what not to forget about SMM
| Symptom / topic | Key point | What to check |
|---|---|---|
| Handler registration timing | Must be before SmmReadyToLock | After lock, handler won't register; service silently does nothing |
| CommBuffer location | Must be outside SMRAM | Missing validation can become an SMM vulnerability |
| POST 0x6A / 0x71 hang | SMM init failed in DXE | Check SMRAM allocation and SMM Core loading |
| POST 0x36 hang | PEI-phase SMM init failed | Related to CPU/chipset SMM configuration |
| SetVariable = EFI_ACCESS_DENIED | Policy or protection blocking | Variable attributes, auth policy, SMM policy, flash write protection |
| Setting reverts after reboot | Not just a UI bug | Debug from HII → SetVariable → SMM → SPI flash |
References
- AMI Aptio 5.x Status Codes, Public Document — POST code range for SMM init
- EDK II PiSmmCore, GitHub — SMM Core source, public reference
- EDK II SMM modules, GitHub — SMM infrastructure modules
- UEFI PI Specification — Volume 4: official SMM spec
- EDK II Debugging, TianoCore Wiki — debug config for SMM drivers
- Lauterbach UEFI H2O Awareness Manual — debug tool with SMM debug support
Read next
- What is SMM?
- What is an SMI?
- What is SMRAM?
- What is SW SMI?
- What is SMM Communication?
- What is SmmReadyToLock?
- DXE Phase: Drivers, Protocols, and reading logs when nothing runs (related article)
Found this useful?
Save it or share it with someone learning firmware, BIOS/UEFI, and embedded systems.
Nội dung liên quan
Một số bài viết, ghi chú hoặc project có liên quan đến nội dung bạn vừa đọc.
What is SMM?
SMM is a special CPU mode for firmware to handle sensitive tasks outside OS control. Understand SMM to debug variable write hangs, flash failures, and security issues.
What is an SMI?
An SMI is the interrupt that puts the CPU into SMM to run a handler. Sources: SW SMI, GPIO, timer, chipset. Key for debugging latency spikes and SMM communication.
What is ACPI: why firmware shouldn't power off by toggling a GPIO
Why shouldn't firmware toggle a GPIO to power off? ACPI is the contract firmware writes and the OS reads: tables, AML, power states, and debugging broken sleep and wake.
Đọc thêm về BIOS/UEFI
Khám phá các bài viết về BIOS/UEFI, embedded firmware, debugging và system-level thinking.