SMM in UEFI: System Management Mode from a debug perspective

SMM is not a linear boot phase but an isolated execution mode. SMI triggers, SMRAM, handler timing, SmmReadyToLock, and debugging SetVariable that silently fails.

Updated 12 min read
Đọc bằng English Tiếng Việt 日本語
BIOS / UEFI cover

Why firmware engineers need to understand SMM

There’s a class of bug report I see fairly often when customizing BIOS for POS systems: the customer says “the terminal is acting strange — settings are right today but back to defaults by morning.” They suspect an auto-reset or some malware. It’s neither.

Most of those cases trace back to SMM — or more precisely, to the path from Setup UI to SPI flash that goes through SMM and gets blocked somewhere without a clear error message.

If you’re working on BIOS and you encounter any of these situations:

  • BIOS setting is saved but reverts after reboot
  • BIOS flash from OS reports success but nothing changes
  • Some firmware services stop working once the OS has booted
  • SetVariable() returns EFI_ACCESS_DENIED or EFI_WRITE_PROTECTED

There’s a good chance the problem is in SMM, System Management Mode.

SMM is not an obscure academic concept. It’s the layer many platforms use to protect NVRAM/SPI flash writes, handle certain hardware events, and enforce security policy. When it fails or is misconfigured, symptoms tend to be vague — only certain operations don’t work, with no clear error message.

SMM: a simple mental model

The easiest way to think about it: SMM is a completely isolated execution environment that is activated by a special signal called SMI#.

When the CPU receives SMI#:

  1. The CPU stops everything — including the running OS
  2. Saves the entire CPU state to a special memory area called the Save State
  3. Jumps into SMRAM, a memory region the OS cannot read or write after it’s properly locked
  4. Runs the SMM handler
  5. Returns via the RSM instruction; the CPU state is restored and the OS continues
01 OS

OS running

CPU is executing normal OS or application code

02 SMI#

SMI occurs

Hardware event or software trigger generates a System Management Interrupt

03 Save State

CPU state saved

Register state written to the Save State area

04 SMRAM

Enter SMM

CPU runs SMM Core, Dispatcher, and handlers in SMRAM

05 RSM

Return from SMM

CPU restores state, OS continues as if nothing happened

SMM mental model: SMI puts the CPU into an isolated execution environment; RSM returns it to the previous context

The OS never knows SMM ran. The entire process is invisible to the OS — unless SMM runs so slowly that it causes noticeable latency.

Two common SMI types

Hardware SMI

Generated by the chipset in response to a hardware event: power button, thermal event, GPIO trigger, and others. Platform and board specific.

Software SMI (SW SMI)

Generated by writing to the APM control port, typically 0xB2:

mov al, 0x42     ; command ID, platform-specific
out 0xB2, al     ; trigger SW SMI

After the out instruction, the chipset asserts SMI#, the CPU enters SMM, and the dispatcher looks for a handler registered for command 0x42.

// In firmware, register an SW SMI handler
SmmSwDispatch2->Register (
    SmmSwDispatch2,
    MySwSmiHandler,
    &Context,        // contains SwSmiInputValue = 0x42
    &DispatchHandle
);

POST codes related to SMM

For AMI Aptio 5.x, SMM initialization happens in two phases:

POST codes related to SMM init
Code Phase Meaning
0x36 PEI CPU post-memory initialization, SMM initialization
0x6A DXE North Bridge DXE SMM initialization
0x71 DXE South Bridge DXE SMM initialization

If POST code is stuck at 0x6A or 0x71, the problem is related to SMM initialization in DXE — typically SMRAM not allocated correctly, or the SMM Core failing to load.

On some boards with a debug build, you may also see messages like:

SMM Core: SMRAM not found
SMM Core: Failed to allocate SMRAM

SMRAM: the hidden memory region

SMRAM is a region of DRAM that firmware hides from the OS and DXE after a certain point. After SMRAM is properly closed and locked, only the CPU in SMM mode can read or write that region.

The SMRAM lifecycle:

01 PEI

Reserve SMRAM

SMRAM is discovered and reserved, but not yet fully locked

02 DXE

Load SMM Core

SMM Core and SMM drivers are loaded into SMRAM

03 Register

Register handlers

SMM drivers register their handlers before the lock

04 Lock

SmmReadyToLock

SMRAM is closed/locked; security policy starts being enforced

05 BDS/OS

Runtime use

SMRAM is locked; OS cannot access it as regular RAM

SMRAM lifecycle: handlers must be registered before SMRAM is locked

After SMRAM is locked, no new SMM modules can be loaded. This is why handlers must be registered before SmmReadyToLock.

A very common mistake: a developer adds a new SMM driver but it dispatches after SmmReadyToLock because of a wrong DEPEX or incorrect dispatch order. The result is that the handler is never registered and the service silently does nothing — no error, just silence.

SmmReadyToLock: the point of no return

In EDK II/PI-style firmware, SmmReadyToLock is typically represented by a protocol or notification signaled in DXE to indicate that SMM is about to be locked. When this signal fires:

  1. SMRAM is closed and locked
  2. No new SMM modules can be installed
  3. Many security policies are enforced
01 DXE

DXE execution

SMM drivers can still dispatch and register handlers

02 Driver A

Register handler

Handler registered before lock — OK

03 Driver B

Register handler

Handler registered before lock — OK

04 Lock

SmmReadyToLock

SMRAM closed/locked, policy enforced

05 Driver C

Dispatched late

Too late to register handler — service silently does nothing

Timing bug around SmmReadyToLock: a driver dispatched late may fail to register its handler

Timing bugs around SmmReadyToLock are some of the hardest bugs in BIOS development because:

  • There’s no assertion or clear error message
  • The handler was never registered; the service silently does nothing
  • It only reproduces when some driver changes the dispatch order

SMM Communication: the safe path for talking to SMM

When a DXE driver or firmware component needs to request a service inside SMM, it typically goes through the SMM Communication Protocol. After the OS is running, many operations like SetVariable() go through UEFI Runtime Services, and the runtime variable service underneath may trigger an SMI to forward the request into SMM. The OS does not call the SMM Communication Protocol directly the way a DXE driver does.

01 DXE

Firmware component

DXE driver or firmware component creates CommBuffer outside SMRAM

02 Communicate

SmmCommunication

Calls SmmCommunication->Communicate()

03 SMI

Trigger SMI

Firmware path triggers SMI to enter the SMM handler

04 SMM

Handler processes

Handler validates input, processes request, writes result to CommBuffer

05 Return

RSM

DXE driver reads the result after SMM returns

Boot-time/DXE path: firmware component talks to SMM through SMM Communication Protocol
01 OS

OS calls Runtime Service

For example, efibootmgr or OS component calls SetVariable()

02 RT

Runtime variable service

Runtime Services receive the request after ExitBootServices

03 SMI

Trigger SMI

Runtime variable path may forward the request into SMM

04 SMM

SMM variable handler

Handler applies policy and writes to NVRAM/SPI flash if valid

OS/runtime path: OS calls through Runtime Services, not through SMM Communication Protocol directly

CommBuffer MUST be outside SMRAM. This validation is required in every SMM handler:

// Pseudo code — not production code
if (IsAddressInSmram(CommBuffer, CommBufferSize)) {
    return EFI_SECURITY_VIOLATION;
}

Skipping this validation allows an OS-level attacker to create a CommBuffer pointing into SMRAM and read or write sensitive data — this is a well-known class of SMM vulnerability.

Serial log in SMM

Debugging in SMM is harder than in DXE because:

  1. SMM runs in an isolated environment; some infrastructure normally available in DXE is not accessible here
  2. The DEBUG() macro can still be used, but requires a DebugLib configured appropriately for SMM
// In an SMM driver, after setup
DEBUG ((DEBUG_INFO, "SMM Handler: received command 0x%x\n", Command));
DEBUG ((DEBUG_INFO, "SMM Handler: CommBuffer at 0x%p, size %d\n",
        CommBuffer, CommBufferSize));

Patterns to look for when debugging SMM through serial log:

Log from the DXE driver calling Communicate:

VariableSmm: Communicate called, command = SetVariable
VariableSmm: Waiting for SMM...
VariableSmm: SMM returned EFI_SUCCESS

Log from the SMM handler (if serial log is configured for SMM):

SmmVar: Handler entered, GUID = ...
SmmVar: SetVariable 'BootOrder', size = 8
SmmVar: Writing to NVRAM...
SmmVar: Done

If you see the DXE side calling Communicate but no SMM handler log, there are three things to suspect: the handler was not registered, the SMI trigger failed, or the log level is wrong.

Debug diary: setting saved but has no effect

This is something I’ve encountered more than once. On a POS system, after changing some policies — boot timeout, USB policy, network stack — and flashing new firmware, the customer’s technician goes into Setup, changes settings, saves and exits. After reboot everything is back to defaults. They change them again. Reboot again. Same result.

The first instinct is to suspect aggressive default-restore logic somewhere. But after enabling serial log, what appears is a line: SetVariable() = EFI_WRITE_PROTECTED, silent, no popup, no error message in the UI.

Approach layer by layer:

Layer 1: Is HII/Setup UI calling SetVariable?

Log the DXE variable service. If you don’t see SetVariable being called after the user saves, the problem is in the HII callback or the Setup Browser.

Layer 2: SetVariable is being called but returning an error?

VariableSmm: SetVariable 'SetupData' = EFI_ACCESS_DENIED

EFI_ACCESS_DENIED here typically means: the variable attributes require an authenticated write, or an SMM policy is blocking writes from DXE or the OS.

Layer 3: SetVariable succeeds but SPI flash isn’t written

VariableSmm: SetVariable success
SpiFlash: WriteBlock failed: write protection enabled

Firmware writes to the NVRAM cache in RAM successfully but flushing to SPI flash fails because flash write protection wasn’t unlocked correctly.

Layer 4: SPI flash is written but corrupted

Less common, but possible due to a full variable store or power loss during the write.

Checklist: setting not persisting

What SMM handlers can and cannot do

SMM is a high-privilege environment, but also a more constrained one than DXE. Things that look normal in DXE can cause crashes or undefined behavior in SMM.

Allowed

Validate and process the communication buffer:

// OK: validate before using
if (!IsBufferOutsideMmram(CommBuffer, CommBufferSize)) {
    return EFI_SECURITY_VIOLATION;
}
CopyMem(&LocalCopy, CommBuffer, sizeof(MY_REQUEST));

Read/write hardware registers via MMIO:

// OK: accessing hardware registers in SMM
UINT32 Val = MmioRead32(ChipsetBase + REG_OFFSET);
MmioWrite32(ChipsetBase + REG_OFFSET, Val | ENABLE_BIT);

Use SMM-safe memory services, read CPU Save State, write log via serial (if DebugLib is configured for SMM).

Not allowed, and the consequences

1. Calling Boot Services after they’ve been invalidated

After ExitBootServices(), all Boot Services are gone. If an SMM handler still calls gBS->AllocatePool() or any Boot Service:

Undefined behavior — typically CPU exception or system hang

The gBS pointer still exists in memory but the code it points to has been freed or is no longer valid.

2. Calling DXE protocols from an SMM handler

SMM handlers cannot locate and use DXE protocols directly. SMM runs in an isolated environment; the DXE handle database is not accessible from here in the normal way.

3. Accessing SMRAM from outside SMM mode

Once SMRAM is locked, any read/write to the SMRAM region from OS or DXE code is rejected by hardware. This is intentional. Hardware ensures the OS cannot spy on or patch SMM.

4. Triggering SMIs from within an SMM handler (nested SMI)

Nested SMI is highly platform-dependent. Many chipsets will mask or defer SMIs while the CPU is in SMM. Don’t design handlers that depend on nested SMI.

5. Using floating point or SIMD without saving/restoring state

The CPU Save State when entering SMM does not by default save FPU/SSE/AVX registers. If a handler uses float or SIMD operations without saving and restoring them:

OS/application loses FPU register data
OS-level crash, not firmware-level
Extremely hard to trace — no error in SMM

6. Long-running operations that block the CPU

SMM blocks all CPUs in a multi-core system while running. If a handler does something that takes milliseconds — waiting for a flash write without a timeout:

OS scheduler stalls, watchdog may fire
System hang or NMI
On laptops: battery/thermal management interrupted

Summary: what not to forget about SMM

SMM debug: key points
Symptom / topic Key point What to check
Handler registration timing Must be before SmmReadyToLock After lock, handler won't register; service silently does nothing
CommBuffer location Must be outside SMRAM Missing validation can become an SMM vulnerability
POST 0x6A / 0x71 hang SMM init failed in DXE Check SMRAM allocation and SMM Core loading
POST 0x36 hang PEI-phase SMM init failed Related to CPU/chipset SMM configuration
SetVariable = EFI_ACCESS_DENIED Policy or protection blocking Variable attributes, auth policy, SMM policy, flash write protection
Setting reverts after reboot Not just a UI bug Debug from HII → SetVariable → SMM → SPI flash

References

Found this useful?

Save it or share it with someone learning firmware, BIOS/UEFI, and embedded systems.

Nội dung liên quan

Một số bài viết, ghi chú hoặc project có liên quan đến nội dung bạn vừa đọc.

Đọc thêm về BIOS/UEFI

Khám phá các bài viết về BIOS/UEFI, embedded firmware, debugging và system-level thinking.