检测 RDMA 网卡与 NVMe SSD 是否连接在同一个 CPU

检测 RDMA 网卡与 NVMe SSD 是否连接在同一个 CPU 上,可以通过查看设备的 NUMA (Non-Uniform Memory Access) 亲和性来实现。以下是具体步骤:

1. 安装必要的工具

首先,确保系统安装了 numactllspci 工具。这些工具通常预装在大多数 Linux 发行版中,但如果没有,可以使用以下命令进行安装:

sudo apt-get install numactl pciutils     # 在基于 Debian 的系统上
sudo yum install numactl pciutils # 在基于 Red Hat 的系统上

2. 查找设备的 PCI 地址

使用 lspci 工具找到 RDMA 网卡和 NVMe SSD 的 PCI 地址。以下是如何找到这些信息:

lspci | grep -i 'Ethernet\|Network\|Infiniband'    # 查找 RDMA 网卡
lspci | grep -i 'Non-Volatile memory controller' # 查找 NVMe SSD

示例输出:

makefile复制代码02:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981

3. 检查设备的 NUMA 亲和性

使用 numactl 工具查看每个设备的 NUMA 节点。以下是具体步骤:

sh复制代码cat /sys/bus/pci/devices/0000:02:00.0/numa_node    # 查看 RDMA 网卡的 NUMA 节点
cat /sys/bus/pci/devices/0000:03:00.0/numa_node    # 查看 NVMe SSD 的 NUMA 节点

在上面的命令中,将 0000:02:00.00000:03:00.0 分别替换为你的 RDMA 网卡和 NVMe SSD 的 PCI 地址。

4. 分析输出结果

根据输出结果分析设备是否在同一个 NUMA 节点上。如果两个设备的 NUMA 节点编号相同,则它们连接在同一个 CPU 上。例如:

cat /sys/bus/pci/devices/0000:02:00.0/numa_node0
cat /sys/bus/pci/devices/0000:03:00.0/numa_node0

如果两个结果都是 0,则 RDMA 网卡和 NVMe SSD 位于同一个 CPU 的 NUMA 节点上。

5. 额外的 NUMA 配置和信息

为了进一步确认或配置 NUMA 亲和性,可以使用以下命令:

numactl --hardware   # 查看 NUMA 配置和各个节点的信息

6. numa cpu绑定性能提升测试:



```
[root@test-worker-0 ~]# mst status -v
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE             MST      PCI       RDMA            NET                       NUMA  
ConnectX6(rev:0)        NA       71:00.0   mlx5_4          net-ib0                   7     

ConnectX4LX(rev:0)      NA       21:00.1   mlx5_1          net-enp33s0f1             2     

ConnectX4LX(rev:0)      NA       61:00.1   mlx5_3          net-enp97s0f1             6     

ConnectX6(rev:0)        NA       71:00.1   mlx5_5          net-ib1                   7     

ConnectX4LX(rev:0)      NA       21:00.0   mlx5_0          net-enp33s0f0             2     

ConnectX4LX(rev:0)      NA       61:00.0   mlx5_2          net-enp97s0f0             6   

```

> ConnectX6(rev:0)        NA       71:00.1   mlx5_5          net-ib1                   7  
>
> 其中  mlx5_5 对应 numa 7



**未绑定numa cpu 测试:**

```
[root@test-worker-0 ~]# ib_send_bw -d mlx5_5 -a 10.10.0.133
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : mlx5_5
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x2f QPN 0x0149 PSN 0x67c98e
 remote address: LID 0x2e QPN 0x0149 PSN 0xb55f5
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             4.25               4.17               2.188599
 4          1000             13.48              13.10              3.434255
 8          1000             26.49              25.93              3.398508
 16         1000             53.92              51.53              3.377363
 32         1000             108.03             107.56             3.524610
 64         1000             216.44             216.01             3.539131
 128        1000             431.34             426.79             3.496259
Conflicting CPU frequency values detected: 2478.675000 != 2372.897000. CPU Frequency is not max.
 256        1000             856.64             836.93             3.428075
 512        1000             1088.70            1075.92            2.203479
 1024       1000             3367.44            3223.17            3.300529
 2048       1000             6122.69            6112.92            3.129815
 4096       1000             6387.97            6369.01            1.630467
 8192       1000             6507.48            6496.60            0.831565
 16384      1000             6554.03            6492.23            0.415503
 32768      1000             6588.61            6587.97            0.210815
 65536      1000             6602.20            6602.08            0.105633
 131072     1000             6578.75            6177.12            0.049417
 262144     1000             6614.66            6614.66            0.026459
 524288     1000             6614.45            6614.44            0.013229
 1048576    1000             6616.61            6612.24            0.006612
 2097152    1000             6617.39            6617.27            0.003309
 4194304    1000             6614.28            6614.28            0.001654
Conflicting CPU frequency values detected: 2476.460000 != 2634.707000. CPU Frequency is not max.
 8388608    1000             6614.66            6612.28            0.000827
---------------------------------------------------------------------------------------
```



**绑定numaCPU后测试:**

```
[root@test-worker-0 ~]# numactl --cpunodebind=7 ib_send_bw -d mlx5_5 -a 10.10.0.133
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : mlx5_5
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x2f QPN 0x0157 PSN 0xb1fc91
 remote address: LID 0x2e QPN 0x0157 PSN 0x689c1f
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             6.80               6.68               3.502466
 4          1000             13.48              13.00              3.408195
 8          1000             27.25              26.41              3.461800
 16         1000             54.40              54.24              3.554942
Conflicting CPU frequency values detected: 2477.559000 != 2422.987000. CPU Frequency is not max.
 32         1000             108.99             108.46             3.554164
Conflicting CPU frequency values detected: 2476.061000 != 2405.157000. CPU Frequency is not max.
 64         1000             217.98             217.43             3.562420
Conflicting CPU frequency values detected: 2475.611000 != 2392.043000. CPU Frequency is not max.
 128        1000             435.19             415.30             3.402171
Conflicting CPU frequency values detected: 2473.253000 != 2381.183000. CPU Frequency is not max.
 256        1000             852.15             843.46             3.454829
 512        1000             1728.44            1676.51            3.433494
Conflicting CPU frequency values detected: 2476.529000 != 2422.768000. CPU Frequency is not max.
 1024       1000             3481.51            3466.43            3.549625
 2048       1000             6938.36            6926.55            3.546392
 4096       1000             11034.66            11011.34                  2.818902
 8192       1000             11380.20            11374.89                  1.455986
 16384      1000             11442.73            11442.56                  0.732324
 32768      1000             11480.54            11479.83                  0.367354
Conflicting CPU frequency values detected: 2477.817000 != 2423.141000. CPU Frequency is not max.
 65536      1000             11497.33            11496.78                  0.183949
 131072     1000             11505.94            11505.50                  0.092044
 262144     1000             11508.63            11508.60                  0.046034
Conflicting CPU frequency values detected: 2482.059000 != 2417.385000. CPU Frequency is not max.
 524288     1000             11510.78            11510.77                  0.023022
 1048576    1000             11513.05            11513.03                  0.011513
 2097152    1000             11512.46            11512.43                  0.005756
 4194304    1000             11513.04            11513.03                  0.002878
 8388608    1000             11514.02            11514.01                  0.001439
---------------------------------------------------------------------------------------
```

















暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇