nohup ./redis_exporter -redis.addr redis://h:Lcsmy.312==/@172.18.11.139:7000 redis://h:Lcsmy.312==/@172.18.11.139:7001 redis://h:Lcsmy.312==/@172.18.11.140:7002 redis://h:Lcsmy.312==/@172.18.11.140:7003 redis://h:Lcsmy.312==/@172.18.11.141:7004 redis://h:Lcsmy.312==/@172.18.11.141:7005 -redis.password xxxxx &
本来想采取最low 的方法,一个实例启一个 redis_exporter。这样子的话,集群那里很多语句都用不了,比如 cluster_slot_fail。放弃该方法
nohup ./redis_exporter -redis.addr 172.18.11.139:7000 -redis.password xxxxxx -web.listen-address 172.18.11.139:9121 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.139:7001 -redis.password xxxxxx -web.listen-address 172.18.11.139:9122 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.140:7002 -redis.password xxxxxx -web.listen-address 172.18.11.139:9123 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.140:7003 -redis.password xxxxxx -web.listen-address 172.18.11.139:9124 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.141:7004 -redis.password xxxxxx -web.listen-address 172.18.11.139:9125 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.141:7005 -redis.password xxxxxx -web.listen-address 172.18.11.139:9126 > /dev/null 2>&1 &
最后只好去 github 提 issue。用我的中国式英语和作者交流,终于明白了。。。其实官方文档已经写了。
scrape_configs:
## config for the multiple Redis targets that the exporter will scrape
- job_name: 'redis_exporter_targets'
static_configs:
- targets:
- redis://first-redis-host:6379
- redis://second-redis-host:6379
- redis://second-redis-host:6380
- redis://second-redis-host:6381
metrics_path: /scrape
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: <<REDIS-EXPORTER-HOSTNAME>>:9121
## config for scraping the exporter itself
- job_name: 'redis_exporter'
static_configs:
- targets:
- <<REDIS-EXPORTER-HOSTNAME>>:9121
Redis 集群实际操作
启动 redis_exporter
nohup ./redis_exporter -redis.password xxxxx &
在 prometheus 里面如何配置:
- job_name: 'redis_exporter_targets'
static_configs:
- targets:
- redis://172.18.11.139:7000
- redis://172.18.11.139:7001
- redis://172.18.11.140:7002
- redis://172.18.11.140:7003
- redis://172.18.11.141:7004
- redis://172.18.11.141:7005
metrics_path: /scrape
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.18.11.139:9121
## config for scraping the exporter itself
- job_name: 'redis_exporter'
static_configs:
- targets:
- 172.18.11.139:9121
这样子就能采集到集群的数据了。但是日志里提示
time="2019-12-17T09:10:49+08:00" level=error msg="Couldn't connect to redis instance"
午休的时候突然想明白了,只要能连接到一个集群的一个节点,自然就能查询其他节点的指标了。于是启动命令改为:
nohup ./redis_exporter -redis.addr 172.18.11.141:7005 -redis.password xxxxx &
Prometheus 配置不变
送上几张图片:
groups:
- name: Redis
rules:
- alert: RedisDown
expr: redis_up == 0
for: 5m
labels:
severity: error
annotations:
summary: "Redis down (instance {{ $labels.instance }})"
description: "Redis 挂了啊,mmp\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: MissingBackup
expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
for: 5m
labels:
severity: error
annotations:
summary: "Missing backup (instance {{ $labels.instance }})"
description: "Redis has not been backuped for 24 hours\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: OutOfMemory
expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Out of memory (instance {{ $labels.instance }})"
description: "Redis is running out of memory (> 90%)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ReplicationBroken
expr: delta(redis_connected_slaves[1m]) < 0
for: 5m
labels:
severity: error
annotations:
summary: "Replication broken (instance {{ $labels.instance }})"
description: "Redis instance lost a slave\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: TooManyConnections
expr: redis_connected_clients > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "Too many connections (instance {{ $labels.instance }})"
description: "Redis instance has too many connections\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: NotEnoughConnections
expr: redis_connected_clients < 5
for: 5m
labels:
severity: warning
annotations:
summary: "Not enough connections (instance {{ $labels.instance }})"
description: "Redis instance should have more connections (> 5)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: RejectedConnections
expr: increase(redis_rejected_connections_total[1m]) > 0
for: 5m
labels:
severity: error
annotations:
summary: "Rejected connections (instance {{ $labels.instance }})"
description: "Some connections to Redis has been rejected\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"