你遇到的是什么问题呢?按照思科的文档做应该没什么大的问题呀。我们的SCE现在已经不在使用了。
@小庄
@路过
不好意思,搞错了,设备型号是uBR7225
re: Cisco7609电源风扇故障 梯玛 2009-06-18 13:18
于昨日将该告警的电源与另外一台7609的电源交换了一下,观察是电源问题,还是电源插槽问题。 同时仔细检查了一下配电是不是有问题,在检查中,发现电源插头一插片接触不良(做工太粗糙了),只好重新做了一个插头。 另外该电源在另外一台7609上还是出现了相同告警,如下:
Jun 18 09:29:01: %C6KPWR-SP-2-PSFAIL: power supply 2 output failed.
Jun 18 09:29:01: %C6KPWR-SP-4-PSREDUNDANTONESUPPLY: in power-redundancy mode, system is operating on one power supply.
Jun 18 09:29:03: %C6KPWR-SP-4-PSOK: power supply 2 turned on.
Jun 18 09:29:03: %C6KPWR-SP-4-PSREDUNDANTBOTHSUPPLY: in power-redundancy mode, system is operating on both power supplies.
看来,这个电源已经损坏。而且换过去的那个电源却无任何告警信息,得继续观察其运行情况。
通过上面的操作,我怀疑是插头问题导致供电不稳,从而损坏了电源。再观察几天基本上就可以得出结论了。
re: Cisco7609电源风扇故障 梯玛 2009-06-17 08:15
观察了这几天,在以下时间又出现相同告警。
Jun 16 22:17:28
Jun 16 22:59:12
Jun 17 00:32:50
Jun 17 00:54:09
6509重启前是slot6的SUP720在工作,重启后是slot5的SUP720工作,但在系统重启过程中,未出现异常信息,以下是关于slot6的相关信息:
Jun 15 19:52:11: %FABRIC-SP-5-CLEAR_BLOCK: Clear block option is off for the fabric in slot 6.
Jun 15 19:52:12: %FABRIC-SP-5-FABRIC_MODULE_BACKUP: The Switch Fabric Module in slot 6 became standby
Jun 15 19:52:13: %DIAG-SP-6-RUN_MINIMUM: Module 6: Running Minimal Diagnostics...
Jun 15 19:52:15: %DIAG-SP-6-DIAG_OK: Module 6: Passed Online Diagnostics
Jun 15 19:52:16: %OIR-SP-6-INSCARD: Card inserted in slot 6, interfaces are now online
00:01:59: %PFREDUN-SP-STDBY-6-STANDBY: Initializing for SSO mode
00:02:00: %SYS-SP-STDBY-3-LOGGER_FLUSHED: System was paused for 00:00:00 to ensure console debugging output.
00:02:19: SP-STDBY: SP: Currently running ROMMON from S (Gold) region
00:02:21: %DIAG-SP-STDBY-6-RUN_MINIMUM: Module 6: Running Minimal Diagnostics...
00:02:42: %DIAG-SP-STDBY-6-DIAG_OK: Module 6: Passed Online Diagnostics
00:03:03: %SYS-SP-STDBY-5-RESTART: System restarted --
00:03:03: %PFREDUN-SP-STDBY-6-STANDBY: Ready for SSO mode
从以上重启信息可以看出,Slot 6引擎工作正常,但今天出现以下两条信息,不知道是什么意思,在Cisco网站上也没查到相关说明:
Jun 16 06:49:58: %DIAG-SP-3-MONITOR_INTERVAL_ZERO: Module 6: Monitoring interval is 0. Cannot enable monitoring for Test #28
Jun 16 06:49:58: %DIAG-SP-STDBY-3-MONITOR_INTERVAL_ZERO: Module 6: Monitoring interval is 0. Cannot enable monitoring for Test #28
牛年不利啊,今年碰到的问题特别多。不幸的是昨天晚上6点左右,6509 CPU利用率高达99%,彻底瘫痪了。 以下是从Sh tech中摘录的系统在各个时间产生的一些log信息:
Jun 15 14:50:24: %SYS-2-MALLOCFAIL: Memory allocation of 524288 bytes failed from 0x404A4EB0, alignment 8
Pool: Processor Free: 7274168 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "Virtual Exec", ipl= 0, pid= 127
-Traceback= 41024F9C 41029278 404A4EB8 4046B3DC 4046B4A0 4045FE34 41057244 4046E294 4103EEB4 4103EEA0
Jun 15 15:28:42: %SYS-2-MALLOCFAIL: Memory allocation of 20000 bytes failed from 0x4103D64C, alignment 8
Pool: Processor Free: 7253720 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "CEF Scanner", ipl= 0, pid= 120
-Traceback= 41024F9C 4102ABCC 4103D654 413CE04C 413CE11C 40629394 4103EEB4 4103EEA0
之后产生的所有Log信息都是提示SYS-2-MALLOCFAIL,且都是CEF Scanner无法从0x4103D64C开始分配内存空间,Fee内存7.2M左右,通过show mem sum 命令查看0x4103D64C前后行信息如下:
0x41024B30 0000065536 0000000013 0000851968 ACE command context chunk pool
0x4103D64C 0001420224 0000000001 0001420224 (coalesced) (Free Blocks)
0x4103E330 0000000024 0000000034 0000000816 *Sched*
0x4103E330 0000000032 0000000010 0000000320 *Sched*
在这种状态下,于18点左右,系统disable DCEF、CEF、RPF、Hardware FIB forwarding,仅支持software forwarding,由此开始OSPF DOWN,无法建立邻居关系,设备停止转发数据,网络中断。
通过show proc cpu命令查看,显示CPU利用率高达99%, 并且是IP Input占用,摘录信息如下:
sh processes cpu | exclude 0.00
CPU utilization for five seconds: 99%/64%; one minute: 99%; five minutes: 99%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
9 1811883041096405161 165 1.59% 2.95% 2.58% 0 ARP Input
85 96940 12405 7814 0.23% 9.92% 7.57% 0 Exec
98 564841141141653 0 0.07% 0.03% 0.01% 0 ACE Tunnel Task
123 435855528 18501183 23558 32.71% 23.95% 25.82% 0 IP Input
179 5321104 95883114 55 0.07% 0.07% 0.07% 0 OSPF Hello
319 2352256 47712978 49 0.15% 0.06% 0.06% 0 OSPF Router 2385
联系Cisco开TOP1 Case,根据Cisco提示通过Console抓取相关信息后,重启6509解决问题。到今天早上Cisco TAC还未查明问题原因以及如何解决的办法,只是继续观察。
re: Cisco7609电源风扇故障 梯玛 2009-06-15 08:31
Cisco RMA了电源,换上去后正常一段时间,现在又有该问题了,LOG信息如下:
Jun 13 08:52:38: %C6KPWR-SP-4-PSFAIL: power supply 2 output failed.
Jun 13 08:52:38: %C6KPWR-SP-4-PSREDUNDANTONESUPPLY: in power-redundancy mode, system is operating on one power supply.
Jun 13 08:52:40: %C6KPWR-SP-4-PSOK: power supply 2 turned on.
Jun 13 08:52:40: %C6KPWR-SP-4-PSREDUNDANTBOTHSUPPLY: in power-redundancy mode, system is operating on both power supplies.
Jun 14 16:22:07: %C6KPWR-SP-4-PSFAIL: power supply 2 output failed.
Jun 14 16:22:07: %C6KPWR-SP-4-PSREDUNDANTONESUPPLY: in power-redundancy mode, system is operating on one power supply.
Jun 14 16:22:08: %C6KPWR-SP-4-PSOK: power supply 2 turned on.
Jun 14 16:22:08: %C6KPWR-SP-4-PSREDUNDANTBOTHSUPPLY: in power-redundancy mode, system is operating on both power supplies.
在Jun 13这天,从早上到晚上不断有告警,而Jun 14这天就一次告警,由此看来可能不是电源问题,而是这个电源槽位可能存在问题。 另外还有一种可能是这路电源存在问题。
这台旧设备是CatOS系统,为了确定问题所在,采取如下操作:
第一步:将Slot2引擎拨出,单独测试第1块引擎,开电启动后系统还是跟原来一样,反复重启,提示信息也一样;
第二步:将第2块引擎插入到Slot1采取同样操作,结果一样,由此怀疑Slot1槽位有点问题,有可能接触不良;
第三步:将第2块引擎插入到Slot2,采取原来步骤测试,正常启动,并工作正常;
第四步:将第1块引擎插入到Slot2,采取原来步骤测试,也正常启动,并工作正常;
通过上面四步测试,初步怀疑Slot1槽位问题。
因该Catalyst6509是CatOS系统,配置管理起来不是很熟悉方便,因此决定将其升级为Native IOS, 花了点时间将两块引擎都升级为IOS系统后, 怀着试试看的心态,将两块引擎都插好,看能不能正常启动。没想到的事情发生了,两块引擎竟然正常启动了,没有出现不断重启的情况。之后又多次测试两块引擎是否能够热备份,测试结果显示能够正常运行,只是备份引擎完全变成hot standby状态花的时间比较长,大概有好几分钟。
经过如上操作,也搞糊涂了,不知道到底是什么原因造成的,也许是插板卡时没插到位?Maybe!
不会的,这样操作只是把数据库中一些没用信息清理掉。
This command causes the database to be unloaded and reloaded to clear up the counters.
可以参考Cisco网站上的Thoubleshooting Guide:
http://www.cisco.com/en/US/docs/net_mgmt
/cisco_secure_access_control_server_for_windows
/4.2/trouble/guide/Ch1.html#wp1041539
后来经过查找,发现是做NAT的服务器配置上可能存在问题, NAT服务器不知道做地址转换的地址池的路由,在来回的转发,建立一个虚拟接口后,CPU利用率就下来不少。但还是觉得CPU利用率偏高, 显示如下:
3750G-S1#sh processes cpu sorted
CPU utilization for five seconds: 12%/3%; one minute: 13%; five minutes: 13%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
9 29701858 106576710 278 3.35% 4.49% 4.76% 0 ARP Input
191 67887 590569 114 0.31% 0.02% 0.00% 0 IP Input
213 18 31 580 0.15% 0.01% 0.00% 0 TCP Protocols
3 0 1 0 0.00% 0.00% 0.00% 0 CEF RP IPC Backg
2 25 65940 0 0.00% 0.00% 0.00% 0 Load Meter
4 426524 39121 10902 0.00% 0.08% 0.10% 0 Check heaps
后经Cisco TAC诊断也未查出什么问题, 还是用no shutdown命令操作后就正常了。
我猜可能是我用no shutdown命令后忘记保存了, 系统在异常重启时又恢复原样了。
Silly me.
通过查看SCE状态,发现Linecard 0处于shutdown
show interface LineCard 0
The application assigned to slot 0 is /tffs0/app/eng31679.sli
Silent is off
Configured shutdown is on
Shutdown due to sm-connection-failure is off
Resulting current shutdown state is on
WAP handling is disabled
当用no shutdown命令打开时, RDR Rate 就不为零了, 但不用多久又变为零了,Log信息如下:
2008-11-13 16:33:21 | INFO | CPU #000 | Starting Line Card on slot 0 state change to no shutdown
2008-11-13 16:33:21 | INFO | CPU #000 | Linecard on slot 0 is enabled
2008-11-13 16:43:49 | FATAL | CPU #000 | SE Self Sanity Checks Module: A Fatal Error occurred. Please report to Cisco's customer support
2008-11-13 16:43:50 | ERROR | CPU #000 | SE Watchdog Module: An Error occurred. Please report to Cisco's customer support
2008-11-13 16:43:50 | ERROR | CPU #000 | SE Watchdog Module: An Error occurred. Please report to Cisco's customer support
2008-11-13 16:43:51 | WARN | CPU #000 | SE Watchdog Module: A problem occurred. Please report to Cisco's customer support
2008-11-13 16:43:51 | INFO | CPU #000 | Party data base was closed.
实在不知道是什么原因,只好向思科开Case了。
刚过一周又出故障,彻底坏掉啦,重启也没用,干脆用一台ME3750更换解决问题
经过了解,原来在SCE Collection Manager安装以后,需要通过SCA BB Console 重新将应用一下策略(即Apply Service Configuration to SCE Devices),重新应用策略时会Updating CM,与SCE同步。这样在Templates中就有了SCE IP地址。下面是在应用配置时的日志信息:
+ 10/6/08 10:31:59 AM CST | INFO | Connecting to SCE ×.×.×.× ...
+ 10/6/08 10:32:00 AM CST | INFO | Validating password ...
+ 10/6/08 10:32:00 AM CST | INFO | Password is valid
+ 10/6/08 10:32:00 AM CST | INFO | Retrieving service configuration ...
+ 10/6/08 10:32:03 AM CST | INFO | Opening service configuration editor ...
+ 10/6/08 10:33:45 AM CST | INFO | Applying service configuration to SCE "SCE device"
+ 10/6/08 10:33:45 AM CST | INFO | Connecting to element
+ 10/6/08 10:33:45 AM CST | INFO | Connecting to device at ×.×.×.× ...
+ 10/6/08 10:33:45 AM CST | INFO | Reading SCE platform data ...
+ 10/6/08 10:33:50 AM CST | INFO | The script's SCASBB application compatibility has been set to version 3.1.6
+ 10/6/08 10:33:50 AM CST | INFO | Preparing configuration script for SCE2000 - 4xGBE ...
+ 10/6/08 10:33:53 AM CST | INFO | Sending configuration to SCE ...
+ 10/6/08 10:33:55 AM CST | INFO | Executing configuration script on SCE ...
+ 10/6/08 10:34:24 AM CST | INFO | The script's SCASBB application compatibility has been set to version 3.1.6
+ 10/6/08 10:34:29 AM CST | INFO | Updating configuration registry ...
+ 10/6/08 10:34:45 AM CST | INFO | Updating CM at ×.×.×.× with service configuration values ...
+ 10/6/08 10:34:48 AM CST | INFO | Updating the CM at ×.×.×.× completed.
+ 10/6/08 10:34:48 AM CST | INFO | Apply operation completed successfully