首頁 > 軟體

RHEL5.8 下Infiniband驅動安裝

2020-06-16 18:05:01

RHEL5.8 下Infiniband驅動安裝過程筆記。

1      下載驅動

地址:http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

根據作業系統版本進行驅動選擇,建議使用ISO格式驅動包。

備註:RHEL5及以前版本選擇1.5.3系列驅動,RHEL6及以後版本選擇2.0及以上系列驅動。

2      驅動安裝

2.1  將下載好的驅動傳到伺服器上,掛載到/public/ofed目錄。

[root@node33 sourcecode]#mount -o loop MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.8-x86_64.iso /public/ofed/

[root@node33 sourcecode]# cd

[root@node33 ~]# df -h

Filesystem          Size  Used Avail Use% Mounted on

/dev/sda3            117G  9.8G  101G  9% /

/dev/sda1            494M  17M  452M  4% /boot

tmpfs                5.9G    0  5.9G  0% /dev/shm

/tftpboot/rhel.iso  3.9G  3.9G    0 100% /tftpboot/iso

/public/sourcecode/MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.8-x86_64.iso

267M  267M    0 100% /public/ofed

[root@node33 ~]#

2.2  執行安裝命令,開始軟體包安裝。

[root@node33 ~]# /public/ofed/mlnxofedinstall -y

Usage:/public/ofed/mlnxofedinstall [OPTIONS]

Options

-c|--config <packages config_file> Example of the configurationfile

can be found under docs

-n|--net <network config_file> Example of the networkconfiguration file

canbe found under docs

-k|--kernel-version <kernel version> Use provided kernel versioninstead of 'uname -r'

-p|--print-available      Printavailable packages for current platform

Andcreate corresponding ofed.conf file

--without-32bit            Skip32-bit libraries installation

--without-depcheck        SkipDistro's libraries check

--without-fw-update        Skip firmware update

--fw-update-only          Updatefirmware. Skip driver installation

--force-fw-update          Forcefirmware update

--force                    Forceinstallation

--all|--hpc|--basic|--msm  Install all, hpc, basic or Mellanox Subnetmanager packages

correspondingly

--vma|--vma-vpi            Installpackages required by VMA to support VPI

--vma-eth                  Install packages required by VMA towork over Ethernet

-v|-vv|-vvv                Setverbosity level

--umad-dev-rw              Grantnon root users read/write permission for umad devices instead of default

--hugepages-overcommit    Setting 80% of MAX_MEMORY as overcommitfor huge page allocation

--pfc <0|bitmask>        Priority based Flow Control policy on TX and RX [7:0].

Perpriority bit mask (uint). Default 0.

-q                        Setquiet - no messages will be printed

[root@node33 ~]# echo y |/public/ofed/mlnxofedinstall --basic --msm --umad-dev-rw --hugepages-overcommit

This program will install the MLNX_OFED_LINUX packageon your machine.

Note that all other Mellanox, OEM, OFED, orDistribution IB packages will be removed.

Do you want to continue?[y/N]:

Starting MLNX_OFED_LINUX-1.5.3-4.0.42 installation...

Installing mlnx-ofa_kernel RPM

Preparing...              ##################################################

mlnx-ofa_kernel            ##################################################

Installing kmod-mlnx-ofa_kernel RPM

Preparing...                ##################################################

kmod-mlnx-ofa_kernel      ##################################################

Installing kmod-mlnx-ofa_kernel-xen RPM

Preparing...              ##################################################

kmod-mlnx-ofa_kernel-xen  ##################################################

Installing kernel-mft RPM

Preparing...              ##################################################

kernel-mft                ##################################################

Installing user level RPMs:

Preparing...              ##################################################

mlnxofed-docs              ##################################################

Preparing...              ##################################################

ofed-scripts              ##################################################

Preparing...              ##################################################

libibverbs                ##################################################

Preparing...              ##################################################

libibverbs                ##################################################

Preparing...              ##################################################

libibverbs-utils          ##################################################

Preparing...              ##################################################

libmthca                  ##################################################

Preparing...                ##################################################

libmthca                  ##################################################

Preparing...              ##################################################

libmverbs                  ##################################################

Preparing...              ##################################################

libmverbs                  ##################################################

Preparing...              ##################################################

libmlx4                    ##################################################

Preparing...              ##################################################

libmlx4                    ##################################################

Preparing...              ##################################################

libcxgb3                  ##################################################

Preparing...              ##################################################

libcxgb3                    ##################################################

Preparing...              ##################################################

libnes                    ##################################################

Preparing...                ##################################################

libnes                    ##################################################

Preparing...              ##################################################

libipathverbs              ##################################################

Preparing...              ##################################################

libipathverbs              ##################################################

Preparing...              ##################################################

librdmacm                  ##################################################

Preparing...              ##################################################

librdmacm                  ##################################################

Preparing...                ##################################################

librdmacm-utils            ##################################################

Preparing...              ##################################################

mstflint                    ##################################################

Preparing...              ##################################################

libibumad                  ##################################################

Preparing...              ##################################################

libibumad                  ##################################################

Preparing...              ##################################################

libibmad                  ##################################################

Preparing...              ##################################################

libibmad                  ##################################################

Preparing...              ##################################################

mft                        ##################################################

Preparing...              ##################################################

opensm-libs                ##################################################

Preparing...                ##################################################

opensm-libs                ##################################################

Preparing...              ##################################################

infiniband-diags          ##################################################

Preparing...              ##################################################

opensm                    ##################################################

Preparing...              ##################################################

ibutils                    ##################################################

Device (06:00.0):

06:00.0InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR/ 10GigE] (rev b0)

LinkWidth: 8x

PCILink Speed: 2.5Gb/s

Installation finished successfully.

Programming HCA firmware for /dev/mst/mt26428_pci_cr0device

Running: mlxburn -d /dev/mst/mt26428_pci_cr0 -fw/public/ofed/firmware/fw-25408/2_9_1000/fw-ConnectX2-rel.mlx -dev_type25408  -no

-I- Querying device ...

-I- Using auto detected configuration file:/public/ofed/firmware/fw-25408/2_9_1000/MHQH19B-XTR_A1-A3.ini (PSID =MT_0D90110009)

-I- Generating image ...

Current FW version on flash: 2.7.626

New FW version:              2.9.1000

Burning FW image without signatures  - OK

Restoring signature                  - OK

-I- Image burn completed successfully.

Configuring /etc/security/limits.conf.

Please reboot your system for the changes to takeeffect.

[root@node33 ~]#

備註:安裝可選all、hpc、basic、msm四種方式。建議使用basic標準模式。管理節點需要安裝msm和basic兩種模式!!!安裝過程中會強制重新整理HCA卡韌體,非獨立HCA卡請嚴格注意韌體版本!!!

2.3  設定IB網絡卡IP地址

[root@node33 ~]# cat <<EOF >> /etc/sysconfig/network-scripts/ifcfg-ib0

>DEVICE=ib0

>BOOTPROTO=none

>ONBOOT=yes

>NETMASK=255.255.255.0

>IPADDR=12.12.12.3

> EOF

[root@node33 ~]#

[root@node33 ~]# cat/etc/sysconfig/network-scripts/ifcfg-ib0

DEVICE=ib0

BOOTPROTO=none

ONBOOT=yes

NETMASK=255.255.255.0

IPADDR=12.12.12.33

[root@node33 ~]#

2.4  啟動IB服務

[root@node33 ~]# chkconfig--list | grep open

openibd            0:off 1:off 2:on 3:on 4:on 5:on 6:off

opensmd        0:off 1:off 2:off 3:on 4:on 5:on 6:off

[root@node33 ~]# /etc/init.d/openibdrestart

Unloading HCA driver:                                      [  OK  ]

Loading HCA driver and Access Layer:                      [  OK  ]

Setting up InfiniBand network interfaces:

Bringing up interface ib0:                                [  OK  ]

Setting up service network . . .                          [  done  ]

[root@node33 ~]# /etc/init.d/opensmdrestart

Stopping IB Subnet Manager.                                [FAILED]

Starting IB Subnet Manager.                                [  OK  ]

[root@node33 ~]# ibstat

CA 'mlx4_0'

CAtype: MT26428

Numberof ports: 1

Firmwareversion: 2.9.1000

Hardwareversion: b0

NodeGUID: 0x0002c903000cc00e

Systemimage GUID: 0x0002c903000cc011

Port 1:

State:Active

Physicalstate: LinkUp

Rate:40

Baselid: 1

LMC:0

SMlid: 1

Capabilitymask: 0x0251086a

PortGUID: 0x0002c903000cc00f

Linklayer: InfiniBand

[root@node33 ~]#

備註:管理節點需要先啟動openibd,後啟動opensmd。計算節點只需要啟動openibd。設定完成後注意通過ibstat檢查速率和鏈路狀態。

3      解除安裝IB驅動

[root@node33 ~]#echo y | /public/ofed.uninstall.sh

This program will uninstall allMLNX_OFED_LINUX-1.5.3-4.0.42 packages on your machine.

Do you want to continue?[y/N]:y

rpm -e --allmatches --nodeps kmod-mlnx-ofa_kernel-xen-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libnes-1.1.1mlnx1-1 libcxgb3-1.3.1-1 libmverbs-0.1.0-3.15.gd28970elibibmad-1.3.8.MLNX_20120424-0.1 libmthca-1.0.6mlnx1-0.1.gbe5eef3 libibumad-1.3.7.MLNX_20130110_ff06102-0.1libibverbs-1.1.5mlnx2-1 libmlx4-1.0.2mlnx6-1 librdmacm-1.0.15-1kernel-mft-2.7.1-2.6.18_308.el5 libmverbs-0.1.0-3.15.gd28970elibipathverbs-1.2mlnx1-1 libibmad-1.3.8.MLNX_20120424-0.1mlnx-ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libibverbs-utils-1.1.5mlnx2-1 libcxgb3-1.3.1-1 mstflint-1.4mlnx4-1.21.gd948dddlibmlx4-1.0.2mlnx6-1 librdmacm-1.0.15-1 libmthca-1.0.6mlnx1-0.1.gbe5eef3libibumad-1.3.7.MLNX_20130110_ff06102-0.1 libibverbs-1.1.5mlnx2-1 librdmacm-utils-1.0.15-1mlnxofed-docs-1.5.3-4.0.42 libipathverbs-1.2mlnx1-1kmod-mlnx-ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libnes-1.1.1mlnx1-1 kernel-mft-2.7.1-2.6.18_308.el5ofed-scripts-1.5.3-OFED.1.5.3.4.0.42 mft-2.7.1a-1

Uninstall finished successfully

[root@node33 ~]#rm –rf/etc/infiniband

[root@node33 ~]#

4      排錯

4.1  檢視IB工作狀態

[root@node33 ~]# ibstat

CA 'mlx4_0'

CAtype: MT26428

Numberof ports: 1

Firmwareversion: 2.9.1000

Hardwareversion: b0

NodeGUID: 0x0002c903000cc00e

Systemimage GUID: 0x0002c903000cc011

Port 1:

State:Active

Physicalstate: LinkUp

Rate:40

Baselid: 1

LMC:0

SMlid: 1

Capabilitymask: 0x0251086a

PortGUID: 0x0002c903000cc00f

Linklayer: InfiniBand

[root@node33 ~]#

4.2  檢視hosts資訊

[root@node33 ~]# ibhosts

Ca    :0x0002c903000cc00a ports 1 "node34 HCA-1"

Ca    :0x0002c903000cc00e ports 1 "node33 HCA-1"

[root@node33 ~]#

4.3  檢視switch資訊

[root@node33 ~]# ibswitches

Switch      :0x0002c9020042bcc0 ports 36 "MF0;switch-1140a2:IS5030/U1" enhancedport 0 lid 4 lmc 0

[root@node33 ~]#

4.4  檢視拓撲資訊

[root@node33 ~]#ibnetdiscover

#

# Topology file: generated on Sun Mar  8 19:53:35 2015

#

# Initiated from node 0002c903000cc00e port0002c903000cc00f

vendid=0x2c9

devid=0xbd36

sysimgguid=0x2c9020042bcc3

switchguid=0x2c9020042bcc0(2c9020042bcc0)

Switch      36"S-0002c9020042bcc0"                #"MF0;switch-1140a2:IS5030/U1" enhanced port 0 lid 4 lmc 0

[30]  "H-0002c903000cc00e"[1](2c903000cc00f)          # "node33 HCA-1" lid 14xQDR

[31]  "H-0002c903000cc00a"[1](2c903000cc00b)                  # "node34HCA-1" lid 7 4xQDR

vendid=0x2c9

devid=0x673c

sysimgguid=0x2c903000cc00d

caguid=0x2c903000cc00a

Ca    1"H-0002c903000cc00a"                #"node34 HCA-1"

[1](2c903000cc00b)        "S-0002c9020042bcc0"[31]              # lid 7 lmc 0"MF0;switch-1140a2:IS5030/U1" lid 4 4xQDR

vendid=0x2c9

devid=0x673c

sysimgguid=0x2c903000cc011

caguid=0x2c903000cc00e

Ca    1"H-0002c903000cc00e"                #"node33 HCA-1"

[1](2c903000cc00f)        "S-0002c9020042bcc0"[30]              # lid 1 lmc 0"MF0;switch-1140a2:IS5030/U1" lid 4 4xQDR

[root@node33 ~]#

4.5  檢視報錯統計資訊

[root@node33 ~]# ibdiagnet -Pall=1

Loading IBDIAGNET from:/opt/ibutils/lib64/ibdiagnet1.5.7

-W- Topology file is not specified.

Reportsregarding cluster links will use direct routes.

Loading IBDM from: /opt/ibutils/lib64/ibdm1.5.7

-I- Using port 1 as the local port.

-I- Discovering ... 3 nodes (1 Switches & 2 CA-s)discovered.

-I---------------------------------------------------

-I- Bad Guids/LIDs Info

-I---------------------------------------------------

-I- No bad Guids were found

-I---------------------------------------------------

-I- Links With Logical State = INIT

-I---------------------------------------------------

-I- No bad Links (with logical state = INIT) werefound

-I---------------------------------------------------

-I- General Device Info

-I---------------------------------------------------

-I---------------------------------------------------

-I- PM Counters Info

-I---------------------------------------------------

-I- No illegal PM counters values were found

-I---------------------------------------------------

-I- Fabric Partitions Report (see ibdiagnet.pkey fora full hosts list)

-I---------------------------------------------------

-I-  PKey:0x7fff Hosts:2 full:2 limited:0

-I---------------------------------------------------

-I- IPoIB Subnets Check

-I---------------------------------------------------

-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1bMTU:2048Byte rate:10Gbps SL:0x00

-W- Suboptimal rate for group. Lowest memberrate:40Gbps > group-rate:10Gbps

-I---------------------------------------------------

-I- Bad Links Info

-I- No bad link were found

-I---------------------------------------------------

----------------------------------------------------------------

-I- Stages Status Report:

STAGE                                    ErrorsWarnings

Bad GUIDs/LIDs Check                    0      0

Link State Active Check                0      0

General Devices Info Report            0      0

Performance Counters Report            0      0

Partitions Check                        0      0

IPoIB Subnets Check                    0      1

Please see /tmp/ibdiagnet.log for complete log

----------------------------------------------------------------

-I- Done. Run time was 1 seconds.

[root@node33 ~]#

4.6  檢視全域性詳細報錯資訊

[root@node33 ~]# ibqueryerrors

Errors for 0x2c9020042bcc0"MF0;switch-1140a2:IS5030/U1"

GUID0x2c9020042bcc0 port ALL: [PortRcvSwitchRelayErrors == 64] [PortXmitDiscards ==29] [PortXmitWait == 240663]

GUID0x2c9020042bcc0 port 0: [PortXmitWait == 1232]

GUID0x2c9020042bcc0 port 1: [PortRcvSwitchRelayErrors == 2] [PortXmitDiscards == 3]

GUID0x2c9020042bcc0 port 2: [PortRcvSwitchRelayErrors == 3] [PortXmitDiscards == 3]

GUID0x2c9020042bcc0 port 3: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 3]

GUID0x2c9020042bcc0 port 4: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 1]

GUID0x2c9020042bcc0 port 5: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]

GUID0x2c9020042bcc0 port 6: [PortRcvSwitchRelayErrors == 2] [PortXmitDiscards == 3]

GUID0x2c9020042bcc0 port 7: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]

GUID0x2c9020042bcc0 port 8: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]

GUID0x2c9020042bcc0 port 9: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]

GUID0x2c9020042bcc0 port 10: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]

GUID0x2c9020042bcc0 port 11: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]

GUID0x2c9020042bcc0 port 12: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]

GUID0x2c9020042bcc0 port 13: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==1]

GUID0x2c9020042bcc0 port 14: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==1]

GUID0x2c9020042bcc0 port 30: [PortXmitWait == 4294967295]

GUID0x2c9020042bcc0 port 31: [PortRcvSwitchRelayErrors == 46] [PortXmitWait == 295]

GUID0x2c9020042bcc0 port 34: [PortXmitWait == 892]

GUID0x2c9020042bcc0 port 36: [PortXmitWait == 238245]

## Summary: 17 nodes checked, 1 bad nodes found

##          53ports checked, 19 ports have errors beyond threshold

## Thresholds:

## Suppressed:

[root@node33 ~]#


IT145.com E-mail:sddin#qq.com