pg-game-monitor

Prometheus + Grafana 监控方案,专为**单台物理机部署多个 Java 游戏进程 + 自建 MySQL**的场景设计。Java 进程通过 jstat/jcmd 无侵入采集 JVM 运行时指标(堆内存、新生代、老年代、GC、线程、类内存),MySQL 通过 pymysql 采集(Buffer Pool、进程内存)。监控重点围绕**内存使用率**,适配游戏服高频 GC 和大内存占用的特点。通过 Pushgateway 推送数据,Grafana 可视化,Alertmanager + 飞书 Webhook 告警,支持 Ansible 批量部署。

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pg-game-monitor" with this command: npx skills add freepengyang/pg-game-monitor

Java 多进程游戏服监控方案

基于 Prometheus + Grafana,专为单台物理机多 Java 进程 + 自建 MySQL场景设计。监控重点围绕内存使用率,适配游戏服大内存、高 GC 压力的特点。

架构

┌─────────────────────────────────────────────────────────┐
│                     单台物理游戏服务器                      │
│                                                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────┐  │
│  │ Java进程1 │  │ Java进程2 │  │ Java进程3 │  │ MySQL  │  │
│  │ (JVM)    │  │ (JVM)    │  │ (JVM)    │  │        │  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └───┬────┘  │
│       │ jstat/jcmd  │ jstat/jcmd  │ jstat/jcmd   │ pymysql
│       └─────────────┴──────┬──────┴──────────────┘
│                            │
│                    game_agent.py
│                            │
│                       push │ Pushgateway ◄── Prometheus ──► Alertmanager
│                            │                                   │
│                            │                            Feishu/Lark Webhook
│                            │
│                      query │ Grafana
└─────────────────────────────────────────────────────────┘

环境变量配置

所有配置均通过环境变量注入,无硬编码。

game_agent.py(采集 Agent)

变量必填默认值说明
PUSHGATEWAYPushgateway 地址,如 http://pushgateway:9091
PUSH_INTERVAL60采集推送间隔(秒)
MYSQL_USERMySQL 用户名(无 MySQL 时可留空)
MYSQL_PASSWORDMySQL 密码
MYSQL_HOST127.0.0.1MySQL 地址
MYSQL_PORT3306MySQL 端口
GAME_ROOT_DIR/data/game游戏服目录根路径
GAME_PROCESS_NAMEjavaJava 进程标识(cmdline 匹配关键字)
AGENT_LOG_FILE/var/log/game_monitor/agent.log日志路径
HISTO_INTERVAL600类直方图采集间隔(秒,0 表示关闭)

webhook.py(告警接收服务)

变量必填默认值说明
FEISHU_WEBHOOK_URL飞书机器人 Webhook URL
WEBHOOK_PORT5000监听端口
PUSH_INTERVAL60与 agent 保持一致(用于格式化)

快速部署

监控服务器(All-in-One)

# 一键部署(需先设置飞书 Webhook)
FEISHU_WEBHOOK_URL="https://open.larksuite.com/..." bash monitor_install.sh

游戏服务器(Ansible 批量)

ansible-playbook main.yml -l <hosts> \
  -e "PUSHGATEWAY=http://<pushgateway>:9091" \
  -e "MYSQL_USER=<user>" \
  -e "MYSQL_PASSWORD=<your_mysql_password>" \
  -e "GAME_ROOT_DIR=/data/game" \
  -e "GAME_PROCESS_NAME=java"

核心文件

文件说明
monitor_install.sh监控服务器一键部署(Prometheus/Grafana/Alertmanager/Pushgateway)
game_agent.py游戏服务器指标采集脚本(jstat + jcmd,无侵入)
main.ymlAnsible Playbook,批量部署 agent
webhook.py告警 Webhook 服务,转发至飞书/ Lark
rules.ymlPrometheus 告警规则(JVM + MySQL)
game_monitor.servicesystemd 服务配置
game_monitorlogrotate 日志轮转配置

指标清单

JVM 指标(通过 jstat 采集)

指标说明标签
heap_used_bytes堆内存已使用量hostname, game_dir
heap_committed_bytes堆内存提交量hostname, game_dir
young_used_bytes新生代(Eden+S0+S1)已使用hostname, game_dir
old_used_bytes老年代已使用hostname, game_dir
gc_time_seconds本周期 GC 耗时(delta)hostname, game_dir
gc_count本周期 GC 次数(delta)hostname, game_dir
threads_live活跃线程数hostname, game_dir
jvm_upJVM 上线状态hostname, game_dir
jvm_class_bytesTop 50 类内存占用hostname, game_dir, class
jvm_class_instance_countTop 50 类实例数量hostname, game_dir, class

MySQL 指标(通过 pymysql 采集)

指标说明标签
mysql_upMySQL 连接状态hostname
mysql_process_resident_memory_bytesMySQL 进程 RSShostname
innodb_buffer_pool_bytes_dataInnoDB Buffer Pool 已使用hostname
innodb_buffer_pool_bytes_totalInnoDB Buffer Pool 总大小hostname

Grafana Dashboard

预置 3 个 Dashboard JSON,开箱即用:references/files/

文件面板内容
jvm_dashboard.json堆内存使用量/率、新生代、老年代、GC 耗时、线程数
jvm_class_dashboard.jsonTop10 类内存/实例数排行、选中类趋势、Top20 快照表
mysql_dashboard.jsonMySQL 存活、RSS 内存、Buffer Pool 概览及使用率

导入方式:Grafana → Dashboards → Import → 上传 JSON → 选择 Prometheus 数据源。

告警规则

告警条件级别
JVMHeapUsageHigh堆内存使用率 > 85%,持续 5 分钟(JVM 预热 1 小时后)warning
JVMGcPressureWarningGC 时间占采集周期 > 30%,持续 3 分钟warning
JVMGcPressureCriticalGC 时间占采集周期 > 60%,持续 2 分钟critical
MySQLDownMySQL 连接失败,持续 2 分钟critical
MySQLBufferPoolHighInnoDB Buffer Pool > 85%,持续 5 分钟warning
MySQLMemoryHighMySQL RSS > 8GB,持续 5 分钟warning

故障排查

Agent 不上线

systemctl status game_monitor
tail -f /var/log/game_monitor/agent.log
# 检查环境变量
cat /opt/game_monitor/env.conf

Pushgateway 无数据

curl -s http://<pushgateway>:9091/-/healthy
curl -s "http://<prometheus>:9090/api/v1/targets" | jq '.data.activeTargets'

飞书告警未收到

curl -X POST "http://<webhook>:5000/webhook?level=warning" \
  -H "Content-Type: application/json" -d '{"alerts":[]}'
journalctl -u feishu -f

参考文档

  • 部署详细步骤:references/deploy.md
  • 告警规则说明:references/rules.md
  • Grafana 配置指南:references/grafana.md
  • 监控架构规划:references/planning.md
  • 常见问题:references/faq.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Let Me Know

Notify the user before starting any long-running task and keep them updated. Use when a task will take noticeable time (>2-3 minutes). Send a start message, schedule a 5‑minute heartbeat update, and send a completion message immediately when done.

Registry SourceRecently Updated
1.9K0fogyoy
General

Free Ride - Unlimited free AI

Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...

Registry SourceRecently Updated
General

Free Ride - Unlimited free AI

Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...

Registry SourceRecently Updated
59.5K425shaivpidadi
General

Glance

Create, update, and manage Glance dashboard widgets. Use when user wants to: add something to their dashboard, create a widget, track data visually, show metrics/stats, display API data, or monitor usage.

Registry SourceRecently Updated