Prometheus + Grafana VPS 监控面板

想看服务器状态，用 top 太土了。Prometheus + Grafana 才是正经方案。

为什么 Prometheus + Grafana

- 指标全面 — CPU、内存、磁盘、网络、容器…

- 可视化强 — Grafana 图表好看

- 告警 — 超阈值自动通知

- 免费开源

架构

Exporter（采集） → Prometheus（存储） → Grafana（展示）

部署

version: "3.8"
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
networks:
- app

grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3002:3000"
volumes:
- ./grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin123
networks:
- app

networks:
app:
driver: bridge

配置 Prometheus

prometheus.yml：

global:
scrape_interval: 15s

scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']

- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']

Node Exporter（服务器指标）

  node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
networks:
- app

配置 Grafana

1. 访问 http://IP:3002

2. 账号 admin，密码 admin123

3. 添加 Data Source：Prometheus → http://prometheus:9090

4. 导入 Dashboard：搜「Node Exporter」用 1860 号模板

常用监控指标

| 指标 | 含义 |

|------|------|

| node_cpu_seconds_total | CPU 使用时间 |

| node_memory_MemTotal_bytes | 总内存 |

| node_memory_MemAvailable_bytes | 可用内存 |

| node_filesystem_avail_bytes | 磁盘可用 |

| container_cpu_usage_seconds_total | 容器 CPU |

告警规则

groups:
- name: alerts
rules:
- alert: HighMemory
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "内存使用超过 90%"

Alertmanager 配合告警通知（需要额外部署）。

踩坑记录

坑 1：Prometheus 连不上 Exporter

检查网络：

docker exec prometheus ping node-exporter

同一 network 才行。

坑 2：磁盘增长

Prometheus 默认存 15 天。出问题：

--storage.tsdb.retention.time=7d

坑 3：Grafana 加载慢

Dashboard 太多/太复杂。删掉不用的。

我的 Dashboard

- 服务器概览（CPU/内存/磁盘/网络）

- Docker 容器状态

- Nginx 请求量

- MySQL 慢查询

总结

Prometheus + Grafana 是监控的事实标准：

- Prometheus — 采集 + 存储

- Node Exporter — 服务器指标

- Grafana — 展示 + 告警

一条命令部署，几分钟就有漂亮的面板。

彼方の旅人

Nginx/OpenResty PrometheusGrafana监控