Unable to passthrough GPU pci devices in SCALE

Description

Attempting to passthrough GPU components in SCALE. System has two GPU's.

lspci:

"01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ef)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E]
02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series]"

Device 02:00.0 used for console output.

As per the developer's notes for SCALE:

"truenas# virsh nodedev-detach pci_0000_01_00_0

Device pci_0000_01_00_0 detached

truenas# virsh nodedev-detach pci_0000_01_00_1

Device pci_0000_01_00_1 detached"

Devices using vfio drivers:

"01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev ef) (prog-if 00 [VGA controller])
Subsystem: XFX Pine Group Inc. Radeon RX 570 [1682:c570]
Physical Slot: 6
Flags: fast devsel, IRQ 11, NUMA node 0
Memory at c0000000 (64-bit, prefetchable) [disabled] [size=256M]
Memory at d0000000 (64-bit, prefetchable) [disabled] [size=2M]
I/O ports at e000 [disabled] [size=256]
Memory at fbe00000 (32-bit, non-prefetchable) [disabled] [size=256K]
Expansion ROM at fbe40000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Resizable BAR <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
Capabilities: [370] L1 PM Substates
Kernel driver in use: vfio-pci
Kernel modules: amdgpu

01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
Subsystem: XFX Pine Group Inc. Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1682:aaf0]
Physical Slot: 6
Flags: fast devsel, IRQ 82, NUMA node 0
Memory at fbe60000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel"

Successfully adding both video and audio components in SCALE GUI for created VM results when starting VM:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 136, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1194, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 973, in nf
return await f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py", line 46, in start
await self.middleware.run_in_thread(self._start, vm['name'])
File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
File "/usr/lib/python3/dist-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py", line 61, in _start
self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name))
File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor_base.py", line 152, in start
raise CallError('\n'.join(errors))
middlewared.service_exception.CallError: [EFAULT] internal error: Device 0000:01:00.1 is already in use

After successfully testing in both Debian and Arch with this system, I am unable to passthrough GPU in SCALE.

truenas# midclt call vm.device.passthrough_device_choices

{"pci_0000_01_00_0": {"capability": {"class": "0x030000", "domain": "0", "bus": "1", "slot": "0", "function": "0", "product": "Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]", "vendor": "Advanced Micro Devices, Inc. [AMD/ATI]"}, "iommu_group": {"number": 27, "addresses": [{"domain": "0x0000", "bus": "0x01", "slot": "0x00", "function": "0x0"}, {"domain": "0x0000", "bus": "0x01", "slot": "0x00", "function": "0x1"}]}, "drivers": ["vfio-pci"]}, "pci_0000_01_00_1": {"capability": {"class": "0x040300", "domain": "0", "bus": "1", "slot": "0", "function": "1", "product": "Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]", "vendor": "Advanced Micro Devices, Inc. [AMD/ATI]"}, "iommu_group": {"number": 27, "addresses": [{"domain": "0x0000", "bus": "0x01", "slot": "0x00", "function": "0x0"}, {"domain": "0x0000", "bus": "0x01", "slot": "0x00", "function": "0x1"}]}, "drivers": ["vfio-pci"]}}

truenas# midclt call vm.query

[{"id": 1, "name": "win10", "description": "", "vcpus": 2, "memory": 4096, "autostart": false, "time": "LOCAL", "grubconfig": null, "bootloader": "UEFI", "cores": 1, "threads": 1, "shutdown_timeout": 90, "cpu_mode": "HOST-PASSTHROUGH", "cpu_model": null, "devices": [{"id": 1, "dtype": "NIC", "attributes": {"type": "E1000", "mac": "00:a0:98:2c:35:a8", "nic_attach": "enp0s25"}, "order": 1003, "vm": 1}, {"id": 2, "dtype": "DISK", "attributes": {"path": "/dev/zvol/jails/osimages/win10-v3n5qn", "type": "AHCI", "physical_sectorsize": null, "logical_sectorsize": null}, "order": 1001, "vm": 1}, {"id": 3, "dtype": "CDROM", "attributes": {"path": "/mnt/jails/osimages/Win10_2004_English_x64.iso"}, "order": 1000, "vm": 1}, {"id": 4, "dtype": "VNC", "attributes": {"vnc_bind": "192.168.20.31", "vnc_password": "", "vnc_web": true, "vnc_resolution": "1024x768", "vnc_port": 5900, "wait": false}, "order": 1004, "vm": 1}, {"id": 5, "dtype": "PCI", "attributes": {"pptdev": "pci_0000_01_00_0"}, "order": 1002, "vm": 1}, {"id": 6, "dtype": "PCI", "attributes": {"pptdev": "pci_0000_01_00_1"}, "order": 1005, "vm": 1}, {"id": 7, "dtype": "PCI", "attributes": {"pptdev": "pci_0000_04_00_0"}, "order": 1006, "vm": 1}, {"id": 8, "dtype": "PCI", "attributes": {"pptdev": "pci_0000_05_00_3"}, "order": 1007, "vm": 1}], "status": {"state": "STOPPED", "pid": null, "domain_state": "SHUTOFF"}}]

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Waqar 
April 2, 2021 at 4:36 PM

Awesome, thanks for confirming!

Seth Shepard 
April 2, 2021 at 2:55 PM
(edited)

Edit: Brilliant!! Works as expected, thanks to all the devs.  

Waqar 
April 2, 2021 at 5:59 AM

can you please confirm if it works as desired for you now ?

Waqar 
April 1, 2021 at 10:24 PM

can you please confirm this after going through https://github.com/truenas/documentation/pull/819 ?

Also please use latest available nightlies.

Seth Shepard 
April 1, 2021 at 10:11 PM
(edited)

 

Edit: Using incorrect nightly:

 

Complete

Details

Assignee

Reporter

Labels

Affects versions

Priority

More fields

Katalon Platform

Created August 15, 2020 at 3:36 PM
Updated July 1, 2022 at 4:55 PM
Resolved March 31, 2021 at 9:02 PM