vfio_assigned_device: unmap BARs on PCI PM state transitions#3609
Open
jstarks wants to merge 6 commits into
Open
vfio_assigned_device: unmap BARs on PCI PM state transitions#3609jstarks wants to merge 6 commits into
jstarks wants to merge 6 commits into
Conversation
When a guest transitions a VFIO-assigned device to D3hot via a PMCSR write, the VFIO driver puts the physical device into D3hot and tears down the userspace BAR mmap mappings. Any subsequent access to those pages faults with SIGBUS. Since OpenVMM maps these pages into KVM as guest physical address space for direct MMIO access, the stale mappings cause the VMM process to be killed on the next guest MMIO access to the device. Fix this by intercepting PMCSR writes and unmapping BARs from the guest address space whenever the device leaves D0, using the same pattern already used for the Memory Space Enable bit in the Command register. BARs are remapped when the device returns to D0. This is slightly more conservative than VFIO, which only tears down mappings on D3hot -- we also unmap on D1/D2 transitions per the PCI spec requirement that BARs not decode in non-D0 states. D1/D2 are extremely rare in practice so the conservatism has zero cost. The capability discovery code was refactored as part of this change. The three separate functions that each independently walked capability chains via raw file descriptor parameters are now unified into a single discover_capabilities function that walks both the standard and extended chains in one pass. The function takes a \`&dyn ConfigSpaceRead\` trait object instead of raw file parameters, allowing the capability parsing logic to be unit tested with an in-memory mock config space.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens VFIO PCI passthrough against guest-initiated PCI Power Management (PMCSR) D-state transitions by ensuring BAR-backed guest mappings are torn down when the device is not in D0, preventing SIGBUS crashes from stale VFIO mmaps. It also refactors PCI capability discovery into a single pass over standard + extended capability chains and adds unit tests via a mock config space reader.
Changes:
- Intercept PMCSR writes to toggle BAR mappings based on D0 vs non-D0 power state, similar to existing Command(MSE) handling.
- Unify capability discovery into
discover_capabilities(&dyn ConfigSpaceRead, ...)and expand discovery to include PM capability + config patch generation. - Add unit tests for capability discovery using an in-memory mock config space; extend PCI capability IDs to include Power Management.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| vm/devices/pci/vfio_assigned_device/src/lib.rs | Adds PMCSR interception + D-state tracking for BAR unmapping, refactors capability discovery, and introduces unit tests with a mock config-space reader. |
| vm/devices/pci/vfio_assigned_device/Cargo.toml | Adds Linux-only dev-dependencies needed for the new unit tests. |
| vm/devices/pci/pci_core/src/spec.rs | Adds POWER_MANAGEMENT to the PCI standard capability ID enum for capability parsing. |
| Cargo.lock | Records the new dev-dependency (test_with_tracing) in the lockfile. |
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
vm/devices/pci/vfio_assigned_device/src/lib.rs:141
bar_masksis no longer derived via the "write all 1s, read back" BAR probe cycle (it’s derived from the VFIO BAR region sizes below). The field doc comment is now misleading and should be updated to match the actual derivation, since this is security-/stability-adjacent code and incorrect comments can cause future regressions.
/// BAR masks as read from the physical device (write 0xFFFFFFFF, read back).
#[inspect(iter_by_index, hex)]
bar_masks: [u32; 6],
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a guest transitions a VFIO-assigned device to D3hot via a PMCSR write, the VFIO driver puts the physical device into D3hot and tears down the userspace BAR mmap mappings. Any subsequent access to those pages faults with SIGBUS. Since OpenVMM maps these pages into KVM as guest physical address space for direct MMIO access, the stale mappings cause the VMM process to be killed on the next guest MMIO access to the device.
Fix this by intercepting PMCSR writes and unmapping BARs from the guest address space whenever the device leaves D0, using the same pattern already used for the Memory Space Enable bit in the Command register. BARs are remapped when the device returns to D0. This is slightly more conservative than VFIO, which only tears down mappings on D3hot -- we also unmap on D1/D2 transitions per the PCI spec requirement that BARs not decode in non-D0 states. D1/D2 are extremely rare in practice so the conservatism has zero cost.
The capability discovery code was refactored as part of this change. The three separate functions that each independently walked capability chains via raw file descriptor parameters are now unified into a single discover_capabilities function that walks both the standard and extended chains in one pass. The function takes a
&dyn ConfigSpaceReadtrait object instead of raw file parameters, allowing the capability parsing logic to be unit tested with an in-memory mock config space.