在设备检查中发现GSR12406的一块SFC卡被Shutdown,系统日志信息如下:
Aug 11 19:42:42 GMT+8: %FABRIC-3-ERR_HANDLE: Due to CRC error from slot
16,shutdown the fabric card on slot 18
Aug 11 19:42:42 GMT+8: %MBUS-6-FABCONFIG: Switch Cards 0x1F (bitmask)
Primary Clock is CSC_0
Fabric Clock is Redundant
Bandwidth Mode : 10Gbps Bandwidth
为了判断故障是Slot18的SFC卡问题还是Slot16的CSC卡问题,采取拨插Slot18的SFC卡,然后抓取相关信息来判断,相关信息如下:
1、利用show controllers errors fabric和show controllers errors fabric counters命令相看产生的错误信息:
gsr#show controllers errors fabric counters
LC/RP FIA Software Error Counters/Bitmaps:
SLOT 0 :
CellDrop (lane0..0) 0
CRC CRC CRC CRC CRC LOS LOS LOS LOS LOS
Counter XBAR0 XBAR1 XBAR2 XBAR3 XBAR4 XBAR0 XBAR1 XBAR2 XBAR3 XBAR4
Lane0 0 0 1274 0 0 0 0 3 0 0
SLOT 1 :
CellDrop (lane0..0) 0
CRC CRC CRC CRC CRC LOS LOS LOS LOS LOS
Counter XBAR0 XBAR1 XBAR2 XBAR3 XBAR4 XBAR0 XBAR1 XBAR2 XBAR3 XBAR4
Lane0 0 0 1275 0 0 0 0 4 0 0
SLOT 2 :
CellDrop (lane0..0) 0
CRC CRC CRC CRC CRC LOS LOS LOS LOS LOS
Counter XBAR0 XBAR1 XBAR2 XBAR3 XBAR4 XBAR0 XBAR1 XBAR2 XBAR3 XBAR4
Lane0 0 0 1275 0 0 0 0 4 0 0
SLOT 4 :
CellDrop (lane0..0) 0
CRC CRC CRC CRC CRC LOS LOS LOS LOS LOS
Counter XBAR0 XBAR1 XBAR2 XBAR3 XBAR4 XBAR0 XBAR1 XBAR2 XBAR3 XBAR4
Lane0 0 0 0 0 0 0 0 0 0 0
gsr#show controllers errors fabric
SCA192 SCA192 SCA192 SCA192 XBAR192 XBAR192 CSCFPGA CSCFPGA CLKFPGA
LC_ENA BP_FRC LC_TYP DE_GNT DAT_LOS SEL_IDL LP_BAK LC_PRE CLKSTS
SLOT0 OK OK OK OK OK 00100 00100 OK OK
SLOT1 OK OK OK OK OK OK OK 00100 OK
SLOT2 OK OK OK OK OK OK OK OK OK
SLOT4 OK OK OK OK OK OK OK OK OK
Fabric error handling : enabled
通过上面信息基本上可以看出XBAR2产生CRC错误。
2、通过execute-on all show control fia查看Fabric工作情况
gsr#execute-on all show control fia
========= Line Card (Slot 1) =========
From Fabric FIA Errors
-----------------------
redund overflow 0 cell drops 0
cell parity 0
Switch cards present 0x001B Slots 16 17 19 20
Switch cards monitored 0x001B Slots 16 17 19 20
Slot: 16 17 18 19 20
Name: csc0 csc1 sfc0 sfc1 sfc2
-------- -------- -------- -------- --------
los 0 0 0 0 0
state Off Off Off Off Off
crc16 0 0 0 0 0
To Fabric FIA Errors
-----------------------
sca not pres 0 req error 0 uni fifo overflow 0
grant parity 0 multi req 0 uni fifo undrflow 0
cntrl parity 0 uni req 0
multi fifo 0 empty dst req 0 handshake error 0
cell parity 0
========= Line Card (Slot 2) =========
From Fabric FIA Errors
-----------------------
redund overflow 0 cell drops 0
cell parity 0
Switch cards present 0x001B Slots 16 17 19 20
Switch cards monitored 0x001B Slots 16 17 19 20
Slot: 16 17 18 19 20
Name: csc0 csc1 sfc0 sfc1 sfc2
-------- -------- -------- -------- --------
los 0 0 0 0 0
state Off Off Off Off Off
crc16 0 0 0 0 0
To Fabric FIA Errors
-----------------------
sca not pres 1 req error 0 uni fifo overflow 0
grant parity 0 multi req 0 uni fifo undrflow 0
cntrl parity 0 uni req 0
multi fifo 0 empty dst req 0 handshake error 0
cell parity 0
通过上面信息可以看到Slot18 no present。另外通过show controllers clock命令也可以看出。
gsr#sh controllers clock
Switch Card Configured 0x1F (bitmask), Primary Clock for system is CSC_0
System Fabric Clock is Redundant
Slot # Primary Clock Mode
0 CSC_0 Redundant
1 CSC_0 Redundant
2 CSC_0 Redundant
4 CSC_0 Redundant
16 CSC_0 Redundant
17 CSC_0 Redundant
18 None
19 CSC_0 Redundant
20 CSC_0 Redundant
3、查看系统日志信息:
gsr#sh log
Aug 12 08:52:19 GMT+8: %MBUS-6-OIR: Switch Fabric Card(6) OC-192 Removed from Slot 18
Aug 12 08:52:19 GMT+8: %MBUS-6-OIR: Switch Fabric Card(6) OC-192 Inserted into Slot 18
Aug 12 08:52:26 GMT+8: %MBUS-6-FABANALYZED: Switch card in slot 18 analyzed
Aug 12 08:52:26 GMT+8: %MBUS-6-FABCONFIG: Switch Cards 0x1F (bitmask)
Primary Clock is CSC_0
Fabric Clock is Redundant
Bandwidth Mode : 10Gbps Bandwidth
Aug 12 08:52:36 GMT+8: %FIA-3-LOS: LOS for slot 18 was detected.
SLOT 1:Aug 12 08:52:36 GMT+8: %FIA-3-LOS: LOS for slot 18 was detected.
SLOT 2:Aug 12 08:52:36 GMT+8: %FIA-3-LOS: LOS for slot 18 was detected.
Aug 12 08:52:38 GMT+8: %FABRIC-3-ERR_HANDLE: Due to CRC error from slot 2,shutdown the fabric card on slot 18
Aug 12 08:52:38 GMT+8: %MBUS-6-FABCONFIG: Switch Cards 0x1F (bitmask)
Primary Clock is CSC_0
Fabric Clock is Redundant
Bandwidth Mode : 10Gbps Bandwidth
Aug 12 08:52:42 GMT+8: %FIA-3-LOS: LOS for slot 18 was cleared.
SLOT 1:Aug 12 08:52:42 GMT+8: %FIA-3-LOS: LOS for slot 18 was cleared.
SLOT 2:Aug 12 08:52:42 GMT+8: %FIA-3-LOS: LOS for slot 18 was cleared.
通过上面日志可以看出,插入Slot18 SFC就会产生CRC错误,并被Shutdown,由此可初步判断是Slot18 SFC卡问题。幸好购买了质保,直接向Cisco开Case RMA。