news 2026/4/16 10:16:57

MySQL Page Cleaner 进程执行缓慢与 OOM Killer 的关联分析

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
MySQL Page Cleaner 进程执行缓慢与 OOM Killer 的关联分析

🔍Page Cleaner 进程与 OOM Killer 的关联分析

📊Page Cleaner 进程的作用和影响

1. InnoDB Page Cleaner 角色

# 查看 Page Cleaner 状态mysql -e"SHOW ENGINE INNODB STATUS\G"|grep-A10-i"page cleaner"# 关键指标监控mysql -e" SELECT VARIABLE_NAME, VARIABLE_VALUE, CASE WHEN VARIABLE_NAME LIKE '%flush%' THEN '刷新相关' WHEN VARIABLE_NAME LIKE '%dirty%' THEN '脏页相关' ELSE '其他' END AS 类别 FROM performance_schema.global_status WHERE VARIABLE_NAME LIKE '%innodb%page%' OR VARIABLE_NAME LIKE '%innodb%flush%' OR VARIABLE_NAME LIKE '%innodb%dirty%' ORDER BY 类别, VARIABLE_NAME; "

2. Page Cleaner 耗时过长的表现

# 查看 Page Cleaner 线程状态mysql -e"SHOW PROCESSLIST;"|grep-i"page"# 使用 performance_schema 深度分析mysql -e" SELECT THREAD_ID, EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000 as total_sec, AVG_TIMER_WAIT/1000000000 as avg_sec, MAX_TIMER_WAIT/1000000000 as max_sec FROM performance_schema.events_waits_summary_by_thread_by_event_name WHERE EVENT_NAME LIKE '%page%cleaner%' OR EVENT_NAME LIKE '%buf%flush%' GROUP BY THREAD_ID, EVENT_NAME HAVING COUNT_STAR > 0 ORDER BY total_sec DESC; "

🔗Page Cleaner 与 OOM Killer 的关联机制

1. 内存占用分析

# Page Cleaner 的内存使用模式# 当 Page Cleaner 工作时:# 1. 读取脏页到内存# 2. 写入到 doublewrite buffer# 3. 写入到磁盘# 4. 释放内存# 检查 InnoDB 内存使用mysql -e" SELECT '缓冲池总大小' AS 项目, FORMAT(@@innodb_buffer_pool_size/1024/1024/1024, 2) AS '大小(GB)' UNION ALL SELECT '当前脏页数量', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '脏页占用内存(MB)', FORMAT(VARIABLE_VALUE * 16 / 1024, 0) FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '未刷新日志量(MB)', FORMAT(VARIABLE_VALUE / 1024 / 1024, 2) FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_os_log_pending_fsyncs' UNION ALL SELECT 'Pending writes', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_data_pending_writes'; "

2. 内存压力传导机制

# Page Cleaner 慢 -> 脏页堆积 -> 内存占用增加 -> OOM 风险# 可视化这个流程:# 监控脏页增长趋势watch-n1' echo "=== 脏页监控 ===" mysql -Nse " SELECT UNIX_TIMESTAMP(NOW()) as timestamp, VARIABLE_VALUE as dirty_pages FROM performance_schema.global_status WHERE VARIABLE_NAME = \"Innodb_buffer_pool_pages_dirty\" " 2>/dev/null echo -e "\n=== 系统内存 ===" free -h | grep Mem '

🔍诊断 Page Cleaner 问题的工具

1. InnoDB 状态分析脚本

#!/bin/bash# analyze-page-cleaner.shecho"=== Page Cleaner 深度分析$(date)==="# 1. 检查 Page Cleaner 配置echo-e"\n1. Page Cleaner 配置:"mysql -e" SHOW VARIABLES WHERE Variable_name LIKE 'innodb_page_cleaners' OR Variable_name LIKE 'innodb_lru_scan_depth' OR Variable_name LIKE 'innodb_io_capacity' OR Variable_name LIKE 'innodb_io_capacity_max' OR Variable_name LIKE 'innodb_max_dirty_pages_pct' OR Variable_name LIKE 'innodb_max_dirty_pages_pct_lwm'; "2>/dev/null# 2. 查看当前状态echo-e"\n2. 当前状态:"mysql -e" SELECT '缓冲池总页数' as metric, VARIABLE_VALUE as value FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_total' UNION ALL SELECT '脏页数量', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '脏页比例(%)', ROUND(VARIABLE_VALUE / ( SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_total' ) * 100, 2) FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty' UNION ALL SELECT '每秒刷新页数', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pages_flushed' UNION ALL SELECT 'Pending flushes', VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_pending_flushes_lru' OR VARIABLE_NAME = 'Innodb_buffer_pool_pending_flushes_list'; "2>/dev/null# 3. 检查磁盘性能echo-e"\n3. 磁盘 I/O 性能 (等待10秒采样):"iostat -dx25|grep-A5"Device"

2. 实时监控脚本

#!/bin/bash# monitor-page-cleaner-oom.sh# 同时监控 Page Cleaner 和内存压力INTERVAL=5echo"时间戳,脏页数,脏页比例(%),可用内存(MB),OOM分数,缓冲池命中率">/tmp/page_cleaner_monitor.csvwhiletrue;doTIMESTAMP=$(date+%s)# 获取 MySQL 状态MYSQL_STATS=$(mysql -Nse " SELECT(SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty'),(SELECT ROUND(VARIABLE_VALUE / total *100,2)FROM performance_schema.global_status,(SELECT VARIABLE_VALUE as total FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_total')t WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty'),(SELECT ROUND((1- reads/requests)*100,2)FROM(SELECT VARIABLE_VALUE as reads FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_reads')r,(SELECT VARIABLE_VALUE as requests FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_read_requests')req)" 2>/dev/null || echo "000") # 获取系统内存 AVAILABLE_MEM=$(free-m|awk'/^Mem:/{print $7}')# 获取 MySQL 进程的 OOM 分数 MYSQL_PID=$(pidof mysqld)OOM_SCORE="N/A" if [ -n "$MYSQL_PID" ]; then OOM_SCORE=$(cat/proc/$MYSQL_PID/oom_score2>/dev/null||echo"N/A")fi # 输出并记录 echo "$TIMESTAMP,$MYSQL_STATS,$AVAILABLE_MEM,$OOM_SCORE" | \ awk -F, '{print$1","$2","$3","$5","$6","$4}' # 保存到文件 echo "$TIMESTAMP,$MYSQL_STATS,$AVAILABLE_MEM,$OOM_SCORE" | \ awk -F, '{print$1","$2","$3","$5","$6","$4}' >> /tmp/page_cleaner_monitor.csv # 检查紧急情况 DIRTY_PCT=$(echo$MYSQL_STATS|awk'{print $2}')if [ "$DIRTY_PCT" != "0" ] && [$(echo"$DIRTY_PCT> 75"|bc)-eq 1 ]; then echo "警告: 脏页比例超过75%:$DIRTY_PCT%" fi if [ "$AVAILABLE_MEM" -lt 1024 ]; then echo "警告: 可用内存低于1GB:${AVAILABLE_MEM}MB"fisleep$INTERVALdone

⚠️Page Cleaner 导致 OOM 的场景分析

1. 典型故障场景

# 场景重现脚本#!/bin/bash# simulate-page-cleaner-oom.shecho"模拟 Page Cleaner 问题导致 OOM 的场景:"# 1. 创建大量脏页(模拟高负载写入)mysql -e" CREATE DATABASE IF NOT EXISTS stress_test; USE stress_test; CREATE TABLE IF NOT EXISTS large_table ( id INT AUTO_INCREMENT PRIMARY KEY, data TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ENGINE=InnoDB; # 插入大量数据 SET @@SESSION.autocommit=0; BEGIN; INSERT INTO large_table (data) SELECT REPEAT('X', 1024) FROM information_schema.columns c1, information_schema.columns c2 LIMIT 10000; COMMIT; "2>/dev/null&# 2. 监控内存使用watch-n1' echo "=== 内存压力测试中 ===" echo "脏页数量:" mysql -Nse "SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME=\"Innodb_buffer_pool_pages_dirty\"" 2>/dev/null echo -e "\n系统内存:" free -h | grep Mem echo -e "\nMySQL OOM 分数:" cat /proc/$(pidof mysqld)/oom_score 2>/dev/null '

2. 诊断是否由 Page Cleaner 导致 OOM

# 检查 OOM 前的系统状态sudogrep-B30"Out of memory\|Killed process"/var/log/messages|tail-50# 查找关键证据sudojournalctl -k --since="1 hour ago"|grep-B10-A10"page.*cleaner\|flush"# 分析内核堆栈(如果配置了kdump)crash /var/crash/<dumpfile>btexit

🔧优化 Page Cleaner 性能

1. MySQL 配置优化

# 生成优化配置cat>/tmp/innodb_page_cleaner_optimization.cnf<<'EOF' # Page Cleaner 和刷新优化配置 [mysqld] # 1. Page Cleaner 线程数(建议等于CPU核心数) innodb_page_cleaners = 8 # 2. LRU 扫描深度(降低以减少开销) innodb_lru_scan_depth = 256 # 默认1024,可降低 # 3. I/O 能力设置(根据磁盘性能调整) innodb_io_capacity = 2000 # SSD建议值 innodb_io_capacity_max = 4000 # 峰值I/O能力 # 4. 脏页控制 innodb_max_dirty_pages_pct = 75 # 最大脏页比例 innodb_max_dirty_pages_pct_lwm = 50 # 低水位线,触发异步刷新 # 5. 自适应刷新 innodb_adaptive_flushing = ON innodb_adaptive_flushing_lwm = 10 # 6. 日志文件大小(增加以减少检查点) innodb_log_file_size = 4G innodb_log_files_in_group = 3 # 7. 刷新邻居页 innodb_flush_neighbors = 0 # SSD设置为0,HDD可设为1 # 8. 双写缓冲 innodb_doublewrite = ON innodb_doublewrite_batch_size = 0 innodb_doublewrite_files = 4 EOF

2. 内核优化

# 优化系统 I/O 调度echo"deadline">/sys/block/sda/queue/scheduler# 对于数据库负载# 调整内核参数cat>>/etc/sysctl.conf<<'EOF' # 优化虚拟内存 vm.dirty_ratio = 10 vm.dirty_background_ratio = 5 vm.dirty_expire_centisecs = 3000 vm.dirty_writeback_centisecs = 500 # 优化文件系统 fs.file-max = 65536 fs.aio-max-nr = 1048576 # 网络缓冲区(如果使用远程连接) net.core.rmem_default = 262144 net.core.wmem_default = 262144 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 EOFsysctl -p

3. 硬件和存储优化

# 检查存储性能fio --name=randwrite --ioengine=libaio --iodepth=32\--rw=randwrite --bs=16k --direct=1--size=1G --numjobs=4\--runtime=60--group_reporting# 监控工具安装yuminstall-y sysstat iotophtop

🛡️防止 Page Cleaner 导致 OOM 的策略

1. 内存保护配置

# 保护 MySQL 进程echo"-500">/proc/$(pidof mysqld)/oom_score_adj# 或者通过 systemd 配置cat>/etc/systemd/system/mysql.service.d/oom.conf<<'EOF' [Service] OOMScoreAdjust=-500 MemoryMax=45G # 限制MySQL最大内存使用 MemorySwapMax=2G EOFsystemctl daemon-reload

2. 监控和告警

#!/bin/bash# page-cleaner-alert.sh# 阈值设置DIRTY_PAGES_WARN=500000# 50万脏页警告DIRTY_PAGES_CRIT=1000000# 100万脏页严重AVAIL_MEM_WARN=2048# 2GB可用内存警告AVAIL_MEM_CRIT=1024# 1GB可用内存严重# 获取指标DIRTY_PAGES=$(mysql -Nse" SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty' "2>/dev/null||echo0)AVAIL_MEM=$(free-m|awk'/^Mem:/{print $7}')# 检查并告警if[$DIRTY_PAGES-ge$DIRTY_PAGES_CRIT]||[$AVAIL_MEM-le$AVAIL_MEM_CRIT];thenecho"CRITICAL: 脏页:$DIRTY_PAGES, 可用内存:${AVAIL_MEM}MB"# 发送告警# wall "数据库内存紧急: 脏页过多或内存不足!"exit2elif[$DIRTY_PAGES-ge$DIRTY_PAGES_WARN]||[$AVAIL_MEM-le$AVAIL_MEM_WARN];thenecho"WARNING: 脏页:$DIRTY_PAGES, 可用内存:${AVAIL_MEM}MB"exit1elseecho"OK: 脏页:$DIRTY_PAGES, 可用内存:${AVAIL_MEM}MB"exit0fi

3. 应急处理措施

#!/bin/bash# emergency-page-cleaner-fix.shecho"=== Page Cleaner 问题应急处理 ==="# 1. 降低负载mysql -e" SET GLOBAL innodb_max_dirty_pages_pct = 25; SET GLOBAL innodb_adaptive_flushing_lwm = 5; "2>/dev/null# 2. 手动触发刷新mysql -e"SET GLOBAL innodb_buffer_pool_dump_now = ON;"2>/dev/null# 3. 如果非常紧急,暂停写入mysql -e"SET GLOBAL innodb_fast_shutdown = 0;"2>/dev/null# 4. 监控恢复情况watch-n1' echo "脏页数量:" mysql -Nse "SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME=\"Innodb_buffer_pool_pages_dirty\"" 2>/dev/null echo -e "\n刷新速率:" mysql -Nse "SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME=\"Innodb_buffer_pool_pages_flushed\"" 2>/dev/null '

📊根本原因分析和预防

1. 常见原因分析

# 诊断脚本#!/bin/bash# root-cause-analysis.shecho"=== Page Cleaner 问题根因分析 ==="# 1. 磁盘性能检查echo-e"\n1. 磁盘 I/O 性能:"iostat -dx13|tail-10# 2. 配置检查echo-e"\n2. MySQL 配置检查:"mysql -e" SELECT VARIABLE_NAME, VARIABLE_VALUE, CASE WHEN VARIABLE_NAME IN ('innodb_io_capacity', 'innodb_io_capacity_max') AND VARIABLE_VALUE < 1000 THEN '可能过低' WHEN VARIABLE_NAME = 'innodb_page_cleaners' AND VARIABLE_VALUE < 4 THEN '可能过少' WHEN VARIABLE_NAME = 'innodb_lru_scan_depth' AND VARIABLE_VALUE > 512 THEN '可能过高' ELSE '正常' END AS 评估 FROM performance_schema.global_variables WHERE VARIABLE_NAME IN ( 'innodb_io_capacity', 'innodb_io_capacity_max', 'innodb_page_cleaners', 'innodb_lru_scan_depth', 'innodb_buffer_pool_size', 'innodb_log_file_size' ); "2>/dev/null# 3. 工作负载分析echo-e"\n3. 当前工作负载:"mysql -e" SHOW PROCESSLIST; "2>/dev/null|head-20

2. 长期预防策略

# 定期优化脚本#!/bin/bash# mysql-preventive-maintenance.sh# 每月执行一次echo"=== MySQL 预防性维护 ==="# 1. 更新统计信息mysql -e"ANALYZE TABLE mysql.innodb_index_stats, mysql.innodb_table_stats;"# 2. 检查碎片mysql -e" SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, ROUND(DATA_LENGTH/1024/1024, 2) as data_mb, ROUND(INDEX_LENGTH/1024/1024, 2) as index_mb, ROUND(DATA_FREE/1024/1024, 2) as free_mb, ROUND((DATA_FREE/(DATA_LENGTH+INDEX_LENGTH))*100, 2) as frag_pct FROM information_schema.TABLES WHERE DATA_FREE > 100*1024*1024 -- 碎片超过100MB AND ENGINE = 'InnoDB' ORDER BY frag_pct DESC LIMIT 20; "# 3. 优化配置# 根据监控数据自动调整DIRTY_RATE=$(mysql -Nse " SELECT(MAX(dirty)- MIN(dirty))/ TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts))as dirty_per_sec FROM(SELECT NOW()as ts, VARIABLE_VALUE as dirty FROM performance_schema.global_status WHEREVARIABLE_NAME='Innodb_buffer_pool_pages_dirty'UNION ALL SELECT DATE_SUB(NOW(), INTERVAL10MINUTE),0)t" 2>/dev/null || echo 0) if [$DIRTY_RATE-gt 100 ]; then echo "检测到高脏页生成率,建议增加 innodb_io_capacity"fi

🎯结论

Page Cleaner 确实可能导致 OOM,但可以通过以下方式解决:

是的,Page Cleaner 执行时间过长确实可能导致数据库进程被 OOM Killer 杀死!

主要原因链:

Page Cleaner 慢 → 脏页堆积 → InnoDB 缓冲池满 → 新数据需要内存 → 系统内存压力增加 → 触发 OOM Killer → 选择高 oom_score 进程终止

解决方案优先级:

  1. ✅ 优化 Page Cleaner 配置(增加线程数,调整 I/O 容量)
  2. ✅ 保护 MySQL 进程(设置 oom_score_adj = -500)
  3. ✅ 监控和告警(脏页比例、可用内存)
  4. ✅ 硬件升级(使用更快的 SSD)
  5. ✅ 定期维护(优化表,更新统计信息)

关键配置建议:

# 必须调整的参数 innodb_page_cleaners = CPU核心数 innodb_io_capacity = 根据磁盘性能设置(SSD: 2000+) innodb_max_dirty_pages_pct = 75 innodb_max_dirty_pages_pct_lwm = 50 # 保护设置 OOMScoreAdjust = -500 # 在 systemd 中配置

通过合理配置和监控,可以完全避免 Page Cleaner 导致的 OOM 问题。

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/4/16 10:14:43

护网(HVV)到底是什么?必备的技术能力有哪些?

一、什么是护网行动&#xff1f; 护网行动是以公安部牵头的&#xff0c;用以评估企事业单位的网络安全的活动。 具体实践中。公安部会组织攻防两方&#xff0c;进攻方会在一个月内对防守方发动网络攻击&#xff0c;检测出防守方&#xff08;企事业单位&#xff09;存在的安全…

作者头像 李华
网站建设 2026/4/15 21:59:26

紧急通知:超过70%的MCP服务因错误配置环境变量导致API KEY泄露

第一章&#xff1a;MCP服务中API KEY泄露事件的警示 近期&#xff0c;某企业在使用MCP&#xff08;Microservice Control Platform&#xff09;服务过程中发生一起严重的API KEY泄露事件&#xff0c;导致其后端服务遭到未授权访问&#xff0c;大量敏感数据被批量抓取。该事件暴…

作者头像 李华
网站建设 2026/4/16 10:13:31

MCP Server API KEY配置全攻略(从入门到生产级防护)

第一章&#xff1a;MCP Server API KEY配置全攻略&#xff08;从入门到生产级防护&#xff09; 在构建现代微服务架构时&#xff0c;MCP Server&#xff08;Microservice Control Plane Server&#xff09;的API KEY配置是保障系统安全通信的核心环节。合理的密钥管理机制不仅能…

作者头像 李华
网站建设 2026/4/15 22:02:19

对话历史丢失太可怕?Dify聊天记录导出为文本的4大安全方案

第一章&#xff1a;对话历史丢失太可怕&#xff1f;Dify聊天记录导出为文本的4大安全方案 在使用 Dify 构建 AI 聊天应用时&#xff0c;对话历史是用户交互的核心资产。一旦因系统故障、配置错误或误操作导致数据丢失&#xff0c;将严重影响用户体验与业务连续性。为防止此类风…

作者头像 李华
网站建设 2026/4/3 6:18:13

Dify对话数据批量导出实战(Python脚本+API调用完整代码)

第一章&#xff1a;Dify对话数据批量导出的核心价值与应用场景在构建和优化AI驱动的对话系统过程中&#xff0c;Dify平台提供了强大的对话管理能力。其中&#xff0c;对话数据的批量导出功能不仅是数据资产管理的重要环节&#xff0c;更是实现模型迭代、服务质量监控与合规审计…

作者头像 李华
网站建设 2026/4/13 1:42:11

揭秘MCP Server开源发布流程:如何5分钟内让他人高效调用你的服务

第一章&#xff1a;MCP Server开源发布的意义与价值 MCP Server的开源发布标志着分布式系统基础设施领域的一次重要突破。该项目为开发者提供了一套高效、可扩展的服务编排与管理框架&#xff0c;广泛适用于微服务治理、边缘计算和云原生架构场景。 推动技术透明与社区协作 开…

作者头像 李华