🔍Page Cleaner 进程与 OOM Killer 的关联分析
📊Page Cleaner 进程的作用和影响
1. InnoDB Page Cleaner 角色
# 查看 Page Cleaner 状态mysql -e"SHOW ENGINE INNODB STATUS\G"|grep-A10-i"page cleaner"# 关键指标监控mysql -e" SELECT VARIABLE_NAME, VARIABLE_VALUE, CASE WHEN VARIABLE_NAME LIKE '%flush%' THEN '刷新相关' WHEN VARIABLE_NAME LIKE '%dirty%' THEN '脏页相关' ELSE '其他' END AS 类别 FROM performance_schema.global_status WHERE VARIABLE_NAME LIKE '%innodb%page%' OR VARIABLE_NAME LIKE '%innodb%flush%' OR VARIABLE_NAME LIKE '%innodb%dirty%' ORDER BY 类别, VARIABLE_NAME; "2. Page Cleaner 耗时过长的表现
# 查看 Page Cleaner 线程状态mysql -e"SHOW PROCESSLIST;"|grep-i"page"# 使用 performance_schema 深度分析mysql -e" SELECT THREAD_ID, EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000 as total_sec, AVG_TIMER_WAIT/1000000000 as avg_sec, MAX_TIMER_WAIT/1000000000 as max_sec FROM performance_schema.events_waits_summary_by_thread_by_event_name WHERE EVENT_NAME LIKE '%page%cleaner%' OR EVENT_NAME LIKE '%buf%flush%' GROUP BY THREAD_ID, EVENT_NAME HAVING COUNT_STAR > 0 ORDER BY total_sec DESC; "🔗Page Cleaner 与 OOM Killer 的关联机制
1. 内存占用分析
# Page Cleaner 的内存使用模式# 当 Page Cleaner 工作时:# 1. 读取脏页到内存# 2. 写入到 doublewrite buffer# 3. 写入到磁盘# 4. 释放内存# 检查 InnoDB 内存使用mysql -e" SELECT '缓冲池总大小' AS 项目, FORMAT(@@innodb_buffer_pool_size/1024/1024/1024, 2) AS '大小(GB)' UNION ALL SELECT '当前脏页数量', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '脏页占用内存(MB)', FORMAT(VARIABLE_VALUE * 16 / 1024, 0) FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '未刷新日志量(MB)', FORMAT(VARIABLE_VALUE / 1024 / 1024, 2) FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_os_log_pending_fsyncs' UNION ALL SELECT 'Pending writes', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_data_pending_writes'; "2. 内存压力传导机制
# Page Cleaner 慢 -> 脏页堆积 -> 内存占用增加 -> OOM 风险# 可视化这个流程:# 监控脏页增长趋势watch-n1' echo "=== 脏页监控 ===" mysql -Nse " SELECT UNIX_TIMESTAMP(NOW()) as timestamp, VARIABLE_VALUE as dirty_pages FROM performance_schema.global_status WHERE VARIABLE_NAME = \"Innodb_buffer_pool_pages_dirty\" " 2>/dev/null echo -e "\n=== 系统内存 ===" free -h | grep Mem '🔍诊断 Page Cleaner 问题的工具
1. InnoDB 状态分析脚本
#!/bin/bash# analyze-page-cleaner.shecho"=== Page Cleaner 深度分析$(date)==="# 1. 检查 Page Cleaner 配置echo-e"\n1. Page Cleaner 配置:"mysql -e" SHOW VARIABLES WHERE Variable_name LIKE 'innodb_page_cleaners' OR Variable_name LIKE 'innodb_lru_scan_depth' OR Variable_name LIKE 'innodb_io_capacity' OR Variable_name LIKE 'innodb_io_capacity_max' OR Variable_name LIKE 'innodb_max_dirty_pages_pct' OR Variable_name LIKE 'innodb_max_dirty_pages_pct_lwm'; "2>/dev/null# 2. 查看当前状态echo-e"\n2. 当前状态:"mysql -e" SELECT '缓冲池总页数' as metric, VARIABLE_VALUE as value FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_total' UNION ALL SELECT '脏页数量', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '脏页比例(%)', ROUND(VARIABLE_VALUE / ( SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_total' ) * 100, 2) FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '每秒刷新页数', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_flushed' UNION ALL SELECT 'Pending flushes', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pending_flushes_lru' OR VARIABLE_NAME = 'Innodb_buffer_pool_pending_flushes_list'; "2>/dev/null# 3. 检查磁盘性能echo-e"\n3. 磁盘 I/O 性能 (等待10秒采样):"iostat -dx25|grep-A5"Device"2. 实时监控脚本
#!/bin/bash# monitor-page-cleaner-oom.sh# 同时监控 Page Cleaner 和内存压力INTERVAL=5echo"时间戳,脏页数,脏页比例(%),可用内存(MB),OOM分数,缓冲池命中率">/tmp/page_cleaner_monitor.csvwhiletrue;doTIMESTAMP=$(date+%s)# 获取 MySQL 状态MYSQL_STATS=$(mysql -Nse " SELECT(SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty'),(SELECT ROUND(VARIABLE_VALUE / total *100,2)FROM performance_schema.global_status,(SELECT VARIABLE_VALUE as total FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_total')t WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty'),(SELECT ROUND((1- reads/requests)*100,2)FROM(SELECT VARIABLE_VALUE as reads FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_reads')r,(SELECT VARIABLE_VALUE as requests FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_read_requests')req)" 2>/dev/null || echo "000") # 获取系统内存 AVAILABLE_MEM=$(free-m|awk'/^Mem:/{print $7}')# 获取 MySQL 进程的 OOM 分数 MYSQL_PID=$(pidof mysqld)OOM_SCORE="N/A" if [ -n "$MYSQL_PID" ]; then OOM_SCORE=$(cat/proc/$MYSQL_PID/oom_score2>/dev/null||echo"N/A")fi # 输出并记录 echo "$TIMESTAMP,$MYSQL_STATS,$AVAILABLE_MEM,$OOM_SCORE" | \ awk -F, '{print$1","$2","$3","$5","$6","$4}' # 保存到文件 echo "$TIMESTAMP,$MYSQL_STATS,$AVAILABLE_MEM,$OOM_SCORE" | \ awk -F, '{print$1","$2","$3","$5","$6","$4}' >> /tmp/page_cleaner_monitor.csv # 检查紧急情况 DIRTY_PCT=$(echo$MYSQL_STATS|awk'{print $2}')if [ "$DIRTY_PCT" != "0" ] && [$(echo"$DIRTY_PCT> 75"|bc)-eq 1 ]; then echo "警告: 脏页比例超过75%:$DIRTY_PCT%" fi if [ "$AVAILABLE_MEM" -lt 1024 ]; then echo "警告: 可用内存低于1GB:${AVAILABLE_MEM}MB"fisleep$INTERVALdone⚠️Page Cleaner 导致 OOM 的场景分析
1. 典型故障场景
# 场景重现脚本#!/bin/bash# simulate-page-cleaner-oom.shecho"模拟 Page Cleaner 问题导致 OOM 的场景:"# 1. 创建大量脏页(模拟高负载写入)mysql -e" CREATE DATABASE IF NOT EXISTS stress_test; USE stress_test; CREATE TABLE IF NOT EXISTS large_table ( id INT AUTO_INCREMENT PRIMARY KEY, data TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ENGINE=InnoDB; # 插入大量数据 SET @@SESSION.autocommit=0; BEGIN; INSERT INTO large_table (data) SELECT REPEAT('X', 1024) FROM information_schema.columns c1, information_schema.columns c2 LIMIT 10000; COMMIT; "2>/dev/null&# 2. 监控内存使用watch-n1' echo "=== 内存压力测试中 ===" echo "脏页数量:" mysql -Nse "SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME=\"Innodb_buffer_pool_pages_dirty\"" 2>/dev/null echo -e "\n系统内存:" free -h | grep Mem echo -e "\nMySQL OOM 分数:" cat /proc/$(pidof mysqld)/oom_score 2>/dev/null '2. 诊断是否由 Page Cleaner 导致 OOM
# 检查 OOM 前的系统状态sudogrep-B30"Out of memory\|Killed process"/var/log/messages|tail-50# 查找关键证据sudojournalctl -k --since="1 hour ago"|grep-B10-A10"page.*cleaner\|flush"# 分析内核堆栈(如果配置了kdump)crash /var/crash/<dumpfile>btexit🔧优化 Page Cleaner 性能
1. MySQL 配置优化
# 生成优化配置cat>/tmp/innodb_page_cleaner_optimization.cnf<<'EOF' # Page Cleaner 和刷新优化配置 [mysqld] # 1. Page Cleaner 线程数(建议等于CPU核心数) innodb_page_cleaners = 8 # 2. LRU 扫描深度(降低以减少开销) innodb_lru_scan_depth = 256 # 默认1024,可降低 # 3. I/O 能力设置(根据磁盘性能调整) innodb_io_capacity = 2000 # SSD建议值 innodb_io_capacity_max = 4000 # 峰值I/O能力 # 4. 脏页控制 innodb_max_dirty_pages_pct = 75 # 最大脏页比例 innodb_max_dirty_pages_pct_lwm = 50 # 低水位线,触发异步刷新 # 5. 自适应刷新 innodb_adaptive_flushing = ON innodb_adaptive_flushing_lwm = 10 # 6. 日志文件大小(增加以减少检查点) innodb_log_file_size = 4G innodb_log_files_in_group = 3 # 7. 刷新邻居页 innodb_flush_neighbors = 0 # SSD设置为0,HDD可设为1 # 8. 双写缓冲 innodb_doublewrite = ON innodb_doublewrite_batch_size = 0 innodb_doublewrite_files = 4 EOF2. 内核优化
# 优化系统 I/O 调度echo"deadline">/sys/block/sda/queue/scheduler# 对于数据库负载# 调整内核参数cat>>/etc/sysctl.conf<<'EOF' # 优化虚拟内存 vm.dirty_ratio = 10 vm.dirty_background_ratio = 5 vm.dirty_expire_centisecs = 3000 vm.dirty_writeback_centisecs = 500 # 优化文件系统 fs.file-max = 65536 fs.aio-max-nr = 1048576 # 网络缓冲区(如果使用远程连接) net.core.rmem_default = 262144 net.core.wmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 EOFsysctl -p3. 硬件和存储优化
# 检查存储性能fio --name=randwrite --ioengine=libaio --iodepth=32\--rw=randwrite --bs=16k --direct=1--size=1G --numjobs=4\--runtime=60--group_reporting# 监控工具安装yuminstall-y sysstat iotophtop🛡️防止 Page Cleaner 导致 OOM 的策略
1. 内存保护配置
# 保护 MySQL 进程echo"-500">/proc/$(pidof mysqld)/oom_score_adj# 或者通过 systemd 配置cat>/etc/systemd/system/mysql.service.d/oom.conf<<'EOF' [Service] OOMScoreAdjust=-500 MemoryMax=45G # 限制MySQL最大内存使用 MemorySwapMax=2G EOFsystemctl daemon-reload2. 监控和告警
#!/bin/bash# page-cleaner-alert.sh# 阈值设置DIRTY_PAGES_WARN=500000# 50万脏页警告DIRTY_PAGES_CRIT=1000000# 100万脏页严重AVAIL_MEM_WARN=2048# 2GB可用内存警告AVAIL_MEM_CRIT=1024# 1GB可用内存严重# 获取指标DIRTY_PAGES=$(mysql -Nse" SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty' "2>/dev/null||echo0)AVAIL_MEM=$(free-m|awk'/^Mem:/{print $7}')# 检查并告警if[$DIRTY_PAGES-ge$DIRTY_PAGES_CRIT]||[$AVAIL_MEM-le$AVAIL_MEM_CRIT];thenecho"CRITICAL: 脏页:$DIRTY_PAGES, 可用内存:${AVAIL_MEM}MB"# 发送告警# wall "数据库内存紧急: 脏页过多或内存不足!"exit2elif[$DIRTY_PAGES-ge$DIRTY_PAGES_WARN]||[$AVAIL_MEM-le$AVAIL_MEM_WARN];thenecho"WARNING: 脏页:$DIRTY_PAGES, 可用内存:${AVAIL_MEM}MB"exit1elseecho"OK: 脏页:$DIRTY_PAGES, 可用内存:${AVAIL_MEM}MB"exit0fi3. 应急处理措施
#!/bin/bash# emergency-page-cleaner-fix.shecho"=== Page Cleaner 问题应急处理 ==="# 1. 降低负载mysql -e" SET GLOBAL innodb_max_dirty_pages_pct = 25; SET GLOBAL innodb_adaptive_flushing_lwm = 5; "2>/dev/null# 2. 手动触发刷新mysql -e"SET GLOBAL innodb_buffer_pool_dump_now = ON;"2>/dev/null# 3. 如果非常紧急,暂停写入mysql -e"SET GLOBAL innodb_fast_shutdown = 0;"2>/dev/null# 4. 监控恢复情况watch-n1' echo "脏页数量:" mysql -Nse "SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME=\"Innodb_buffer_pool_pages_dirty\"" 2>/dev/null echo -e "\n刷新速率:" mysql -Nse "SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME=\"Innodb_buffer_pool_pages_flushed\"" 2>/dev/null '📊根本原因分析和预防
1. 常见原因分析
# 诊断脚本#!/bin/bash# root-cause-analysis.shecho"=== Page Cleaner 问题根因分析 ==="# 1. 磁盘性能检查echo-e"\n1. 磁盘 I/O 性能:"iostat -dx13|tail-10# 2. 配置检查echo-e"\n2. MySQL 配置检查:"mysql -e" SELECT VARIABLE_NAME, VARIABLE_VALUE, CASE WHEN VARIABLE_NAME IN ('innodb_io_capacity', 'innodb_io_capacity_max') AND VARIABLE_VALUE < 1000 THEN '可能过低' WHEN VARIABLE_NAME = 'innodb_page_cleaners' AND VARIABLE_VALUE < 4 THEN '可能过少' WHEN VARIABLE_NAME = 'innodb_lru_scan_depth' AND VARIABLE_VALUE > 512 THEN '可能过高' ELSE '正常' END AS 评估 FROM performance_schema.global_variables WHERE VARIABLE_NAME IN ( 'innodb_io_capacity', 'innodb_io_capacity_max', 'innodb_page_cleaners', 'innodb_lru_scan_depth', 'innodb_buffer_pool_size', 'innodb_log_file_size' ); "2>/dev/null# 3. 工作负载分析echo-e"\n3. 当前工作负载:"mysql -e" SHOW PROCESSLIST; "2>/dev/null|head-202. 长期预防策略
# 定期优化脚本#!/bin/bash# mysql-preventive-maintenance.sh# 每月执行一次echo"=== MySQL 预防性维护 ==="# 1. 更新统计信息mysql -e"ANALYZE TABLE mysql.innodb_index_stats, mysql.innodb_table_stats;"# 2. 检查碎片mysql -e" SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, ROUND(DATA_LENGTH/1024/1024, 2) as data_mb, ROUND(INDEX_LENGTH/1024/1024, 2) as index_mb, ROUND(DATA_FREE/1024/1024, 2) as free_mb, ROUND((DATA_FREE/(DATA_LENGTH+INDEX_LENGTH))*100, 2) as frag_pct FROM information_schema.TABLES WHERE DATA_FREE > 100*1024*1024 -- 碎片超过100MB AND ENGINE = 'InnoDB' ORDER BY frag_pct DESC LIMIT 20; "# 3. 优化配置# 根据监控数据自动调整DIRTY_RATE=$(mysql -Nse " SELECT(MAX(dirty)- MIN(dirty))/ TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts))as dirty_per_sec FROM(SELECT NOW()as ts, VARIABLE_VALUE as dirty FROM performance_schema.global_status WHEREVARIABLE_NAME='Innodb_buffer_pool_pages_dirty'UNION ALL SELECT DATE_SUB(NOW(), INTERVAL10MINUTE),0)t" 2>/dev/null || echo 0) if [$DIRTY_RATE-gt 100 ]; then echo "检测到高脏页生成率,建议增加 innodb_io_capacity"fi🎯结论
Page Cleaner 确实可能导致 OOM,但可以通过以下方式解决:
是的,Page Cleaner 执行时间过长确实可能导致数据库进程被 OOM Killer 杀死!
主要原因链:
Page Cleaner 慢 → 脏页堆积 → InnoDB 缓冲池满 → 新数据需要内存 → 系统内存压力增加 → 触发 OOM Killer → 选择高 oom_score 进程终止解决方案优先级:
- ✅ 优化 Page Cleaner 配置(增加线程数,调整 I/O 容量)
- ✅ 保护 MySQL 进程(设置 oom_score_adj = -500)
- ✅ 监控和告警(脏页比例、可用内存)
- ✅ 硬件升级(使用更快的 SSD)
- ✅ 定期维护(优化表,更新统计信息)
关键配置建议:
# 必须调整的参数 innodb_page_cleaners = CPU核心数 innodb_io_capacity = 根据磁盘性能设置(SSD: 2000+) innodb_max_dirty_pages_pct = 75 innodb_max_dirty_pages_pct_lwm = 50 # 保护设置 OOMScoreAdjust = -500 # 在 systemd 中配置通过合理配置和监控,可以完全避免 Page Cleaner 导致的 OOM 问题。