Today’s discussion: KVM/QEMU, IOMMU Passthrough, and AMD Bxxx chipsets (e.g. B450) vs AMD Xyyy (e.g. X470)
OK… so, you are running Linux. (Good for you!)
And, you have an install of KVM/QEMU for providing a virtual hosting platform… this way you can host Windows VM’s for members of your family, or other Linux VM’s for your own sick Linux pleasure. (Again, good for you, buddy!)
…AND you bought either:
A) An SR-IOV capable ethernet card, or
B) A dedicated GPU / graphics card
Which you want to assign directly to one of your virtual machines…not as a software-virtualized device, but as an actual PCI-connected hardware device! You want maximum performance, and you should be able to have it!!!
Enter IOMMU… (yay!)
And by extension IOMMU Groups (Ugh!)
IOMMU Groups require that ALL PCI devices in the group are using the exact same PCI driver.
…but that only works if all the PCI devices are the exact same type.
How can ALL members of an IOMMU Group be the same type?
Either:
A) You only have 1 PCI device assigned to the group…this is the ideal case, or
B) You happen to have 2 PCIe slots on your motherboard, and you plug identical cards in the 2 slots (e.g. 2 GPU cards or 2 Ethernet cards). This is almost never the case. So, you want option A.
OK, so the short version is that if you have an AMD Bxxx chipset, as near as I can tell from my own testing + lots of posts on the Internet, … *by design, across manufacturers*, AMD has the CPU present 1 x 16lane PCIe card slot as a single-device IOMMU Group, and has the CPU present 1×4 lane PCIe NVME/SSD slot as a single-device IOMMU Group, but, ALL the other slots + device connections provided by the Bxxx chipset in 1 giant group. This means you have USB, Ethernet, a possible 2nd NVME/SSD slot, SSD connectors, and sound card all in 1 Group. … And they *CANNOT* share 1 hardware driver, because they’re not the same hardware. duh?!
However, if you buy an AMD Xyyy chipset, as near as I can tell, all those other devices are more or less separated into their own groups?!
So, the moral of the story is that if you want to use IOMMU “Passthrough” to your KVM/QEMU virtual machine so that it has direct PCI control of add-in card hardware, you really need to buy an Xyyy motherboard for your AMD CPU.
3 other notes:
1) IOMMU groups represent all the ports talking to the same I/O bus. If they ONLY connect to the Northbridge chipset (e.g. X470), their grouping is individual port per group. However, if they talk to a “PCI/PCIe switch” that then connects to the Northbridge, then 1 PCI device can talk to 2nd PCI device without the Northbridge acting has a hardware communications firewall/enforcement point… in which case all the devices connected to the switch appear in 1 shared group
2) There is a software patch for Linux kernel that requires you to recompile your kernel from scratch in order to enable it…and it tells the kernel to separate PCI devices into their own IOMMU groups, even if not connected to Bxxx Northbridge instead of Xyyy Northbridge. This will literally work for allowing you to assign the hardware to a VM, but it is not secure at a hardware level. This insecurity is only relevant if you plan to host untrusted people or potentially-malicious programs in the virtual machine (e.g. Windows desktop for normal user). Why? Because the PCI device can abuse its access to a shared DMA controller, which can allow it to escape virtualization.
3) I haven’t work out all the minutiae, but…if you have an AMD APU (a CPU with on-chip/chip-integrated GPU), Linux won’t even load the IOMMU driver in most cases. The same system in every other respect, but with a different AMD CPU (but not APU) will load the IOMMU driver!
To be fair, the system I was testing this on had BOTH the APU + separate GPU in the PCIe x16 slot, which may have created some issue; but based on my experience, the AMD APU is its own problem, just to load the IOMMU driver, before even beginning to fight with IOMMU Groups.
To see larger image of Favorites Icon for your favorite site (like this one), type in:
https://website.domain/favicon.ico
e.g. https://notashutin.com/favicon.ico
Today’s discussion: KVM/QEMU, IOMMU Passthrough, and AMD Bxxx chipsets (e.g. B450) vs AMD Xyyy (e.g. X470)
OK… so, you are running Linux. (Good for you!)
And, you have an install of KVM/QEMU for providing a virtual hosting platform… this way you can host Windows VM’s for members of your family, or other Linux VM’s for your own sick Linux pleasure. (Again, good for you, buddy!)
…AND you bought either:
A) An SR-IOV capable ethernet card, or
B) A dedicated GPU / graphics card
Which you want to assign directly to one of your virtual machines…not as a software-virtualized device, but as an actual PCI-connected hardware device! You want maximum performance, and you should be able to have it!!!
Enter IOMMU… (yay!)
And by extension IOMMU Groups (Ugh!)
IOMMU Groups require that ALL PCI devices in the group are using the exact same PCI driver.
…but that only works if all the PCI devices are the exact same type.
How can ALL members of an IOMMU Group be the same type?
Either:
A) You only have 1 PCI device assigned to the group…this is the ideal case, or
B) You happen to have 2 PCIe slots on your motherboard, and you plug identical cards in the 2 slots (e.g. 2 GPU cards or 2 Ethernet cards). This is almost never the case. So, you want option A.
OK, so the short version is that if you have an AMD Bxxx chipset, as near as I can tell from my own testing + lots of posts on the Internet, … *by design, across manufacturers*, AMD has the CPU present 1 x 16lane PCIe card slot as a single-device IOMMU Group, and has the CPU present 1×4 lane PCIe NVME/SSD slot as a single-device IOMMU Group, but, ALL the other slots + device connections provided by the Bxxx chipset in 1 giant group. This means you have USB, Ethernet, a possible 2nd NVME/SSD slot, SSD connectors, and sound card all in 1 Group. … And they *CANNOT* share 1 hardware driver, because they’re not the same hardware. duh?!
However, if you buy an AMD Xyyy chipset, as near as I can tell, all those other devices are more or less separated into their own groups?!
So, the moral of the story is that if you want to use IOMMU “Passthrough” to your KVM/QEMU virtual machine so that it has direct PCI control of add-in card hardware, you really need to buy an Xyyy motherboard for your AMD CPU.
3 other notes:
1) IOMMU groups represent all the ports talking to the same I/O bus. If they ONLY connect to the Northbridge chipset (e.g. X470), their grouping is individual port per group. However, if they talk to a “PCI/PCIe switch” that then connects to the Northbridge, then 1 PCI device can talk to 2nd PCI device without the Northbridge acting has a hardware communications firewall/enforcement point… in which case all the devices connected to the switch appear in 1 shared group
2) There is a software patch for Linux kernel that requires you to recompile your kernel from scratch in order to enable it…and it tells the kernel to separate PCI devices into their own IOMMU groups, even if not connected to Bxxx Northbridge instead of Xyyy Northbridge. This will literally work for allowing you to assign the hardware to a VM, but it is not secure at a hardware level. This insecurity is only relevant if you plan to host untrusted people or potentially-malicious programs in the virtual machine (e.g. Windows desktop for normal user). Why? Because the PCI device can abuse its access to a shared DMA controller, which can allow it to escape virtualization.
3) I haven’t work out all the minutiae, but…if you have an AMD APU (a CPU with on-chip/chip-integrated GPU), Linux won’t even load the IOMMU driver in most cases. The same system in every other respect, but with a different AMD CPU (but not APU) will load the IOMMU driver!
To be fair, the system I was testing this on had BOTH the APU + separate GPU in the PCIe x16 slot, which may have created some issue; but based on my experience, the AMD APU is its own problem, just to load the IOMMU driver, before even beginning to fight with IOMMU Groups.