温 VM 上加载了 VM (sriov_numvfs>0) 和 ixgbevf (VF) 的 Intel 82599 ixgbe(PF) 驱动程序,则在 PF 上遇到 AER 会导致 VF 挂起

Wen Intel 82599 ixgbe(PF) driver loaded with VMs (sriov_numvfs>0) and ixgbevf (VF) is loaded on VM, encountering an AER on the PF results in VF hang

提问人:user22514921 提问时间:9/21/2023 更新时间:9/21/2023 访问量:39

问:

为什么英特尔 82599 驱动程序 (ixgbe, PF) 表现出以下行为: 当加载虚拟机 (sriov_numvfs>0) 并且 ixgbevf 驱动程序 (VF) 加载到这些虚拟机上时,在 PF 上遇到 AER(高级错误报告)会导致 VF 挂起,而 VF dmesg 中没有捕获任何相应的日志?在这种情况下,是否有防止 VF 挂起的解决方案(此处 PF 保持完全运行)。

我指的是 Linux 内核和操作系统,如下所示:

user@user-B460MDS3HV2:~$ uname -r

5.15.0

user@user-B460MDS3HV2:~$ lsb_release -A

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:    20.04
Codename:   focal

PF 上 lspci 命令的输出如下:

user@user-B460MDS3HV2:~$ LSPCI

00:00.0 Host bridge: Intel Corporation Device 9b43 (rev 05)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 05)
00:12.0 Signal processing controller: Intel Corporation Comet Lake PCH Thermal Controller
00:14.0 USB controller: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller
00:14.2 RAM memory: Intel Corporation Comet Lake PCH Shared SRAM
00:16.0 Communication controller: Intel Corporation Comet Lake HECI Controller
00:17.0 SATA controller: Intel Corporation Device 06d2
00:1b.0 PCI bridge: Intel Corporation Device 06c2 (rev f0)
00:1b.4 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Device 06bc (rev f0)
00:1d.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #9 (rev f0)
00:1d.4 PCI bridge: Intel Corporation Device 06b4 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device 0684
00:1f.3 Audio device: Intel Corporation Comet Lake PCH cAVS
00:1f.4 SMBus: Intel Corporation Comet Lake PCH SMBus Controller
00:1f.5 Serial bus controller [0c80]: Intel Corporation Comet Lake PCH SPI Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-V
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
03:00.0 Non-Volatile memory controller: Realtek Semiconductor Co., Ltd. Device 5765 (rev 01)
**04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)**
04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
04:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)

(这里,04:00.0 和 04:00.1 是我的英特尔 10G NIC 的接口)

以下是有关接口的信息:

user@user-B460MDS3HV2:~$ sudo lspci -s 04:00.0 -vvv

04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
    Subsystem: Intel Corporation Ethernet Server Adapter X520-2
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-`
    `Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 16
    Region 0: Memory at 4c500000 (32-bit, prefetchable) [size=1M]
    Region 2: I/O ports at 3020 [disabled] [size=32]
    Region 3: Memory at 4c700000 (32-bit, prefetchable) [size=16K]
    Expansion ROM at 4c280000 [disabled] [size=512K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Address: 0000000000000000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
        Vector table: BAR=3 offset=00000000
        PBA: BAR=3 offset=00002000
    Capabilities: [a0] Express (v2) Endpoint, MSI 00
        DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
        DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <1us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 5GT/s (ok), Width x4 (downgraded)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
             10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-, TPHComp-, ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
             AtomicOpsCtl: ReqEn-
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000001 00000002 00000003
    Capabilities: [140 v1] Device Serial Number 00-00-00-ff-ff-00-00-00
    Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
        ARICap: MFVC- ACS-, Next Function: 1
        ARICtl: MFVC- ACS-, Function Group: 0
    Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
        IOVCap: Migration-, Interrupt Message Number: 000
        IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
        IOVSta: Migration-
        Initial VFs: 64, Total VFs: 64, Number of VFs: 1, Function Dependency Link: 00
        VF offset: 128, stride: 2, Device ID: 10ed
        Supported Page Size: 00000553, System Page Size: 00000001
        Region 0: Memory at 000000004c704000 (64-bit, prefetchable)
        Region 3: Memory at 000000004c804000 (64-bit, prefetchable)
        VF Migration: offset: 00000000, BIR: 0
    Kernel driver in use: ixgbe
    Kernel modules: ixgbe

VF BDF 通过 PCIe 直通连接。 VF 上 lspci 命令的输出如下:

ubuntu@ubuntu-标准-PC-Q35-ICH9-2009:~$ lspci

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:02.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
02:00.0 Communication controller: Red Hat, Inc. Virtio console (rev 01)
03:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
04:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon (rev 01)
05:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio RNG (rev 01)
**06:00.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)**

(此处,06:00.0 是 VM 上的接口)

以下是有关接口的信息:

ubuntu@ubuntu-标准-PC-Q35-ICH9-2009:~$ sudo lspci -s 06:00.0 -vvv

06:00.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
    Subsystem: Intel Corporation 82599 Ethernet Controller Virtual Function
    Physical Slot: 0-5
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Region 0: Memory at fe000000 (64-bit, prefetchable) [size=16K]
    Region 3: Memory at fe004000 (64-bit, prefetchable) [size=16K]
    Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
        Vector table: BAR=3 offset=00000000
        PBA: BAR=3 offset=00002000
    Capabilities: [a0] Express (v1) Endpoint, MSI 00
        DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- SlotPowerLimit 0.000W
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed unknown (ok), Width x0 (ok)
            TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Kernel driver in use: ixgbevf
    Kernel modules: ixgbevf

我已经配置并编译了内核以支持 AER 注入。

user@user-B460MDS3HV2:~$ cat /boot/config-5.15.0 |grep AER

CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_PCIEAER=y
CONFIG_PCIEAER_INJECT=y

我的 grub 命令行 optios 就像 floows 一样

GRUB_CMDLINE_LINUX="pcie_ports=native intel_iommu=on iommu=pt"

我正在使用这个标准的 Linux 工具来注入软件级错误。

[谷歌](https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/)

在 PF 上注入错误(非致命)后(在 BDF 04:00.0 上),以下是在 PF 上成功发生错误恢复但 VF 挂起的日志。

PF 日志:

...
...
(Added some debug prints)
`[22464.094873] snd_hda_intel 0000:00:1f.3: saving config space at offset 0x30 (reading 0x0)
[22464.094877] snd_hda_intel 0000:00:1f.3: saving config space at offset 0x34 (reading 0x50)
[22464.094880] snd_hda_intel 0000:00:1f.3: saving config space at offset 0x38 (reading 0x0)
[22464.094883] snd_hda_intel 0000:00:1f.3: saving config space at offset 0x3c (reading 0x10b)
[22464.094905] snd_hda_intel 0000:00:1f.3: PME# enabled
[22464.111802] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[22484.225113] pcieport 0000:00:1c.0: aer_inject: Injecting errors 00000000/00008000 into device 0000:04:00.0
[22484.225158] pcieport 0000:00:1c.0: AER: Uncorrected (Non-Fatal) error received: 0000:04:00.0
[22484.225163] ixgbe 0000:04:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Completer ID)
[22484.225165] ixgbe 0000:04:00.0:   device [8086:10fb] error status/mask=00008000/00000000
[22484.225166] ixgbe 0000:04:00.0:    [15] CmpltAbrt             
[22484.225168] ixgbe 0000:04:00.0: AER:   TLP Header: 00000000 00000001 00000002 00000003
[22484.225171] pcieport 0000:00:1c.0: AER: broadcast error_detected message
[22484.225172] -------------->>>>---ixgbe io err detected---->>>>-------
[22484.225174] ---------detach device--------
[22484.289962] ---------ixgbe close--------
[22484.289967] ---------disable dev--------
[22484.289986] -------------->>>>---ixgbe io err detected---->>>>-------
[22484.289994] ---------detach device--------
[22484.355658] ---------ixgbe close--------
[22484.355663] ---------disable dev--------
[22484.355750] pcieport 0000:00:1c.0: AER: broadcast slot_reset message
[22484.355759] -------------->>>>---ixgbe io slot reset---->>>>-------
[22484.355781] ixgbe 0000:04:00.0: enabling bus mastering
[22484.355869] ixgbe 0000:04:00.0: restoring config space at offset 0x4 (was 0x100406, writing 0x100006)
[22484.355929] ixgbe 0000:04:00.0: saving config space at offset 0x0 (reading 0x10fb8086)
[22484.355952] ixgbe 0000:04:00.0: saving config space at offset 0x4 (reading 0x100406)
[22484.355965] ixgbe 0000:04:00.0: saving config space at offset 0x8 (reading 0x2000001)
[22484.355975] ixgbe 0000:04:00.0: saving config space at offset 0xc (reading 0x800010)
[22484.355994] ixgbe 0000:04:00.0: saving config space at offset 0x10 (reading 0x4c500008)
[22484.356003] ixgbe 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
[22484.356011] ixgbe 0000:04:00.0: saving config space at offset 0x18 (reading 0x3021)
[22484.356020] ixgbe 0000:04:00.0: saving config space at offset 0x1c (reading 0x4c700008)
[22484.356028] ixgbe 0000:04:00.0: saving config space at offset 0x20 (reading 0x0)
[22484.356051] ixgbe 0000:04:00.0: saving config space at offset 0x24 (reading 0x0)
[22484.356064] ixgbe 0000:04:00.0: saving config space at offset 0x28 (reading 0x0)
[22484.356083] ixgbe 0000:04:00.0: saving config space at offset 0x2c (reading 0xc8086)
[22484.356091] ixgbe 0000:04:00.0: saving config space at offset 0x30 (reading 0x4c280000)
[22484.356098] ixgbe 0000:04:00.0: saving config space at offset 0x34 (reading 0x40)
[22484.356106] ixgbe 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
[22484.356115] ixgbe 0000:04:00.0: saving config space at offset 0x3c (reading 0x10b)
[22484.420988] -------------->>>>---ixgbe io slot reset----else--->>>>-------
[22484.420995] -------------->>>>---ixgbe io slot reset---->>>>-------
[22484.421017] ixgbe 0000:04:00.1: enabling bus mastering
[22484.421105] ixgbe 0000:04:00.1: restoring config space at offset 0x4 (was 0x100406, writing 0x100006)
[22484.421207] ixgbe 0000:04:00.1: saving config space at offset 0x0 (reading 0x10fb8086)
[22484.421214] ixgbe 0000:04:00.1: saving config space at offset 0x4 (reading 0x100406)
[22484.421220] ixgbe 0000:04:00.1: saving config space at offset 0x8 (reading 0x2000001)
[22484.421226] ixgbe 0000:04:00.1: saving config space at offset 0xc (reading 0x800010)
[22484.421231] ixgbe 0000:04:00.1: saving config space at offset 0x10 (reading 0x4c600008)
[22484.421237] ixgbe 0000:04:00.1: saving config space at offset 0x14 (reading 0x0)
[22484.421243] ixgbe 0000:04:00.1: saving config space at offset 0x18 (reading 0x3001)
[22484.421248] ixgbe 0000:04:00.1: saving config space at offset 0x1c (reading 0x4c904008)
[22484.421254] ixgbe 0000:04:00.1: saving config space at offset 0x20 (reading 0x0)
[22484.421259] ixgbe 0000:04:00.1: saving config space at offset 0x24 (reading 0x0)
[22484.421264] ixgbe 0000:04:00.1: saving config space at offset 0x28 (reading 0x0)
[22484.421269] ixgbe 0000:04:00.1: saving config space at offset 0x2c (reading 0xc8086)
[22484.421275] ixgbe 0000:04:00.1: saving config space at offset 0x30 (reading 0x4c200000)
[22484.421280] ixgbe 0000:04:00.1: saving config space at offset 0x34 (reading 0x40)
[22484.421286] ixgbe 0000:04:00.1: saving config space at offset 0x38 (reading 0x0)
[22484.421291] ixgbe 0000:04:00.1: saving config space at offset 0x3c (reading 0x20a)
[22484.484968] -------------->>>>---ixgbe io slot reset----else--->>>>-------
[22484.484978] pcieport 0000:00:1c.0: AER: broadcast resume message
[22484.484984] -------------->>>>---ixgbe io resume---->>>>-------
[22484.611466] ---------- ixgbe open----------
[22484.611498] ---------- dev attched----------
[22484.611553] -------------->>>>---ixgbe io resume---->>>>-------
[22484.677281] ixgbe 0000:04:00.0 enp4s0f0: detected SFP+: 5
[22484.795454] ---------- ixgbe open----------
[22484.795487] ---------- dev attched----------
[22484.795586] pcieport 0000:00:1c.0: AER: device recovery successful
[22484.939510] ixgbe 0000:04:00.0 enp4s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[22485.001449] ixgbe 0000:04:00.1 enp4s0f1: detected SFP+: 6
[22485.263514] ixgbe 0000:04:00.1 enp4s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

VF 在此期间记录:

...
...
...
[  322.159514] ixgbevf 0000:06:00.0: NIC Link is Down

VM 挂起。

我不是PCIe专家。请向我提供一些见解,了解为什么会发生这种情况,以及在PF上触发AER时防止VF挂起的潜在解决方案。

错误处理 虚拟机 虚拟函数 PCI-E

评论


答: 暂无答案