今天上午不断的接到开发同学的反馈,测试环境的一台mysql(5.1.48),对应的应用总是不断的报连接数拿不到。登陆到机器后发现mysql -uroot连接不上,而netstat -nal |grep 3306|wc -l也只有1200个,而这台机器设置的max_connections=5000,max_user_connections=0,innodb_open_files=60000。
因为这台机器上100+个schema,2w+个表,只是部分应用有反馈,大部分应用还处于可用状态,因此没有做任何处理,采取了kill session的方式(这一次刚好有同学在出问题之前用mysql -uroot连上了),kill后,过了一段时间恢复了,mysql -uroot可以连接了。想当然的认为是某个应用把连接拿完了导致的,还以为解决了。
但是到下午又有同学反馈同样的问题,这次ssh之后mysql -uroot依然连不上,ps发现mysqld是刚刚起的。怀疑mysqld异常重启,看了一下alert日志,发现有大量下面的信息:
121203 16:06:54 InnoDB: MySQL and InnoDB data dictionaries are out of sync.
InnoDB: Unable to find the AUTOINC column template_id in the InnoDB table test/tc_steppay_templa
te.
InnoDB: We set the next AUTOINC column value to 0,
InnoDB: in effect disabling the AUTOINC next value generation.
InnoDB: You can either set the next AUTOINC value explicitly using ALTER TABLE
InnoDB: or fix the data dictionary by recreating the table.
121203 16:06:54 InnoDB: Assertion failure in thread 46915963758912 in file handler/ha_innodb.cc
line 9366
InnoDB: Failing assertion: auto_inc > 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
121203 16:06:54 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=104857600
read_buffer_size=1048576
max_used_connections=4811
max_threads=5000
threads_connected=4719
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5773884 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd: 0x2aab595703c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
121203 16:06:54 [ERROR] Error in accept: Resource temporarily unavailable
121203 16:06:54 [ERROR] Error in accept: Resource temporarily unavailable
121203 16:06:54 [ERROR] Error in accept: Resource temporarily unavailable
日志里面有2个地方比较起眼,“Resource temporarily unavailable”,“ Unable to find the AUTOINC column template_id in the InnoDB table test/tc_steppay_template.”。前面一个应该是资源设置的问题,咨询了印风,是innodb_open_files设置过大(>1024)导致mysql在crash后恢复触发了bug(http://bugs.mysql.com/bug.php?id=48929),详细原因这个bug描述的也比较清楚,就不再多说,这个bug在这台机器的mysql版本里依旧存在。因为是测试环境,设置innodb_open_files=1000(这个参数对性能的影响,可以参考这里http://www.mysqlperformanceblog.com/2009/11/18/how-innodb_open_files-affects-performance/),重启解决。
到这里问题应该算解决了,但是看alert日志不禁要问,为什么mysqld会crash,根据前面的日志似乎是另外一个bug(http://bugs.mysql.com/bug.php?id=55993)。但是test.tc_steppay_template count(*)=0,这个表是没人操作的;另外这台机器物理内存比较小(5632000K)目前这个原因暂时不详。待找到原因再来补充。
posted on 2015-12-09 11:27
回忆之城 阅读(1004)
评论(0) 编辑 收藏 引用 所属分类:
unix/linux 、
数据库