Catalyst6509 + SUP720 CEF导致内存泄漏问题处理

昨天在处理故障过程中,发现Catalyst6509不能保存配置, 使用copy run start命令,提示如下信息:
Jun 15 08:26:45: %SYS-2-MALLOCFAIL: Memory allocation of 1964024 bytes failed from 0x404A4824, alignment 8
Pool: Processor Free: 7701984 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "Virtual Exec", ipl= 0, pid= 127
-Traceback= 41024F9C 41029278 404A482C 404A47E0 4046B540 41889834 4073C96C 4045FE34 41057244 4046E294 4103EEB4 4103EEA0

在Cisco网站上查了下,详细见http://www.cisco.com/en/US/products/sw/iosswrel
/ps1831/products_tech_note09186a00800a6f3a.shtml,说是内存被用满,查看内存使用情况如下:
sh processes memory
Processor Pool Total: 391041424 Used: 383379816 Free: 7661608
I/O Pool Total: 67108864 Used: 10418768 Free: 56690096
确实跟保存配置时提示的内存差不多,但是建议的排除办法无法实施,因设备在使用,不能断网测试。 不知有没有两全的办法。

posted on 2009-06-15 08:52 梯玛 阅读(2446) 评论(2)  编辑 收藏 引用 所属分类: 网络知识

评论

# re: Catalyst6509 + SUP720 不能保存配置问题 2009-06-16 12:27 梯玛

牛年不利啊,今年碰到的问题特别多。不幸的是昨天晚上6点左右,6509 CPU利用率高达99%,彻底瘫痪了。 以下是从Sh tech中摘录的系统在各个时间产生的一些log信息:
Jun 15 14:50:24: %SYS-2-MALLOCFAIL: Memory allocation of 524288 bytes failed from 0x404A4EB0, alignment 8
Pool: Processor Free: 7274168 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool

-Process= "Virtual Exec", ipl= 0, pid= 127
-Traceback= 41024F9C 41029278 404A4EB8 4046B3DC 4046B4A0 4045FE34 41057244 4046E294 4103EEB4 4103EEA0

Jun 15 15:28:42: %SYS-2-MALLOCFAIL: Memory allocation of 20000 bytes failed from 0x4103D64C, alignment 8
Pool: Processor Free: 7253720 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool

-Process= "CEF Scanner", ipl= 0, pid= 120
-Traceback= 41024F9C 4102ABCC 4103D654 413CE04C 413CE11C 40629394 4103EEB4 4103EEA0

之后产生的所有Log信息都是提示SYS-2-MALLOCFAIL,且都是CEF Scanner无法从0x4103D64C开始分配内存空间,Fee内存7.2M左右,通过show mem sum 命令查看0x4103D64C前后行信息如下:
0x41024B30 0000065536 0000000013 0000851968 ACE command context chunk pool
0x4103D64C 0001420224 0000000001 0001420224 (coalesced) (Free Blocks)
0x4103E330 0000000024 0000000034 0000000816 *Sched*
0x4103E330 0000000032 0000000010 0000000320 *Sched*

在这种状态下,于18点左右,系统disable DCEF、CEF、RPF、Hardware FIB forwarding,仅支持software forwarding,由此开始OSPF DOWN,无法建立邻居关系,设备停止转发数据,网络中断。
通过show proc cpu命令查看,显示CPU利用率高达99%, 并且是IP Input占用,摘录信息如下:
sh processes cpu | exclude 0.00
CPU utilization for five seconds: 99%/64%; one minute: 99%; five minutes: 99%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
9 1811883041096405161 165 1.59% 2.95% 2.58% 0 ARP Input
85 96940 12405 7814 0.23% 9.92% 7.57% 0 Exec
98 564841141141653 0 0.07% 0.03% 0.01% 0 ACE Tunnel Task
123 435855528 18501183 23558 32.71% 23.95% 25.82% 0 IP Input
179 5321104 95883114 55 0.07% 0.07% 0.07% 0 OSPF Hello
319 2352256 47712978 49 0.15% 0.06% 0.06% 0 OSPF Router 2385
联系Cisco开TOP1 Case,根据Cisco提示通过Console抓取相关信息后,重启6509解决问题。到今天早上Cisco TAC还未查明问题原因以及如何解决的办法,只是继续观察。  回复  更多评论   

# re: Catalyst6509 + SUP720 不能保存配置问题及后续问题处理 2009-06-16 13:25 梯玛

6509重启前是slot6的SUP720在工作,重启后是slot5的SUP720工作,但在系统重启过程中,未出现异常信息,以下是关于slot6的相关信息:
Jun 15 19:52:11: %FABRIC-SP-5-CLEAR_BLOCK: Clear block option is off for the fabric in slot 6.
Jun 15 19:52:12: %FABRIC-SP-5-FABRIC_MODULE_BACKUP: The Switch Fabric Module in slot 6 became standby
Jun 15 19:52:13: %DIAG-SP-6-RUN_MINIMUM: Module 6: Running Minimal Diagnostics...
Jun 15 19:52:15: %DIAG-SP-6-DIAG_OK: Module 6: Passed Online Diagnostics
Jun 15 19:52:16: %OIR-SP-6-INSCARD: Card inserted in slot 6, interfaces are now online

00:01:59: %PFREDUN-SP-STDBY-6-STANDBY: Initializing for SSO mode
00:02:00: %SYS-SP-STDBY-3-LOGGER_FLUSHED: System was paused for 00:00:00 to ensure console debugging output.

00:02:19: SP-STDBY: SP: Currently running ROMMON from S (Gold) region
00:02:21: %DIAG-SP-STDBY-6-RUN_MINIMUM: Module 6: Running Minimal Diagnostics...
00:02:42: %DIAG-SP-STDBY-6-DIAG_OK: Module 6: Passed Online Diagnostics
00:03:03: %SYS-SP-STDBY-5-RESTART: System restarted --
00:03:03: %PFREDUN-SP-STDBY-6-STANDBY: Ready for SSO mode

从以上重启信息可以看出,Slot 6引擎工作正常,但今天出现以下两条信息,不知道是什么意思,在Cisco网站上也没查到相关说明:
Jun 16 06:49:58: %DIAG-SP-3-MONITOR_INTERVAL_ZERO: Module 6: Monitoring interval is 0. Cannot enable monitoring for Test #28
Jun 16 06:49:58: %DIAG-SP-STDBY-3-MONITOR_INTERVAL_ZERO: Module 6: Monitoring interval is 0. Cannot enable monitoring for Test #28

  回复  更多评论   

只有注册用户登录后才能发表评论。

导航

<2009年6月>
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011

统计

常用链接

留言簿(1)

随笔分类

随笔档案

文章分类

搜索

最新评论

阅读排行榜

评论排行榜