Sentinal Redis

Monitor Redis server health, BullMQ queues, memory, and performance from any messaging channel. Ask questions in plain English — get actionable diagnostics.

When to Use

✅ USE this skill when:

User asks about Redis server health, status, or info
User wants to check memory usage or diagnose OOM issues
User asks about BullMQ queue depths, failed jobs, or stuck workers
User wants to inspect slow queries or latency issues
User asks to diagnose why Redis is slow or unresponsive
User mentions queue backlog, dead letter queue, or job failures
User wants a quick health summary of their Redis instance

When NOT to Use

❌ DON'T use this skill when:

User wants to manage PostgreSQL, MySQL, or other non-Redis databases
User wants to manage Kafka, RabbitMQ, or SQS queues (not BullMQ)
User needs help writing application code that uses Redis
User wants to set up Redis from scratch (use official Redis docs instead)

Safety Rules

⚠️ CRITICAL: This skill is READ-ONLY. No exceptions.

NEVER run destructive commands (FLUSHDB, FLUSHALL, DEL, UNLINK, SET, EXPIRE) — even if the user asks. Explain why and suggest they run it manually instead.
NEVER modify Redis configuration (CONFIG SET) — direct the user to do it themselves.
NEVER print or expose the full REDIS_URL in output — it may contain passwords. Always mask credentials before displaying.
When in doubt, show the command first and ask for confirmation

Connection

If REDIS_URL is set, use it for all commands:

redis-cli -u "$REDIS_URL" <command>

If REDIS_URL is not set, default to localhost:

redis-cli <command>

For password-protected instances without REDIS_URL:

redis-cli -h <host> -p <port> -a <password> <command>

Always test connectivity first:

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" ping

Server Health

Quick Status

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" ping

Full Server Info

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info server

Connected Clients

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info clients

Uptime and Version

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info server | grep -E "redis_version|uptime_in_days|uptime_in_seconds|connected_clients"

Memory Analysis

Memory Overview

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info memory

Key Metrics to Check

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info memory | grep -E "used_memory_human|used_memory_peak_human|used_memory_rss_human|mem_fragmentation_ratio|maxmemory_human|maxmemory_policy"

Memory Doctor (Redis 4.0+)

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" memory doctor

Memory Usage of a Specific Key

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" memory usage <key>

Find Big Keys (scan-based, safe for production)

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" --bigkeys

Interpreting Memory Results

mem_fragmentation_ratio > 1.5 → High fragmentation, consider restarting Redis
mem_fragmentation_ratio < 1.0 → Redis is swapping to disk, CRITICAL
used_memory approaching maxmemory → Eviction will start based on maxmemory_policy
memory doctor reports "Sam, I have no memory problems" → All good

Slow Queries & Performance

Check Slow Log

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" slowlog get 10

Slow Log Length

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" slowlog len

Current Slow Log Threshold

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" config get slowlog-log-slower-than

Latency Check

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" --latency -c 10

Latency History

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" --latency-history -i 1 -c 5

Keyspace Stats

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info keyspace

Command Stats (most called commands)

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info commandstats

Client Monitoring

List Connected Clients

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" client list

Client Count and Summary

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" info clients | grep -E "connected_clients|blocked_clients|tracking_clients"

Find Idle Clients (idle > 300 seconds)

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" client list | awk -F' ' '{for(i=1;i<=NF;i++) if($i ~ /^idle=/) print $0}' | grep -E 'idle=[3-9][0-9]{2,}|idle=[0-9]{4,}'

BullMQ Queue Monitoring

BullMQ uses Redis as its backend. Queues follow the key pattern bull:<queue-name>:<state>.

Discover All Queues

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" scan 0 match "bull:*:meta" count 100

Queue Depth (all states)

For a queue named <queue>:

echo "=== Queue: <queue> ==="
echo -n "Waiting: "; redis-cli -u "${REDIS_URL:-redis://localhost:6379}" llen "bull:<queue>:wait"
echo -n "Active: "; redis-cli -u "${REDIS_URL:-redis://localhost:6379}" llen "bull:<queue>:active"
echo -n "Delayed: "; redis-cli -u "${REDIS_URL:-redis://localhost:6379}" zcard "bull:<queue>:delayed"
echo -n "Failed: "; redis-cli -u "${REDIS_URL:-redis://localhost:6379}" zcard "bull:<queue>:failed"
echo -n "Completed: "; redis-cli -u "${REDIS_URL:-redis://localhost:6379}" zcard "bull:<queue>:completed"
echo -n "Paused: "; redis-cli -u "${REDIS_URL:-redis://localhost:6379}" llen "bull:<queue>:paused"

Inspect Failed Jobs

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" zrange "bull:<queue>:failed" 0 9

Get Job Details

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" hgetall "bull:<queue>:<jobId>"

Check Job Payload and Error

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" hmget "bull:<queue>:<jobId>" data failedReason stacktrace attemptsMade timestamp processedOn finishedOn

Find Stale Active Jobs

Active jobs that haven't been updated recently may be stuck:

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" lrange "bull:<queue>:active" 0 -1

Then for each job ID, check processedOn timestamp:

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" hmget "bull:<queue>:<jobId>" processedOn name

If processedOn is more than 10 minutes old and job is still active, it may be stuck.

Check Queue Workers (via event streams)

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" xinfo groups "bull:<queue>:events" 2>/dev/null || echo "No event stream found"

BullMQ Repeat Jobs

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" zrange "bull:<queue>:repeat" 0 -1

Key Inspection

Find Keys by Pattern

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" scan 0 match "<pattern>" count 100

Key Type and TTL

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" type <key>
redis-cli -u "${REDIS_URL:-redis://localhost:6379}" ttl <key>

Key Encoding (memory efficiency check)

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" object encoding <key>
redis-cli -u "${REDIS_URL:-redis://localhost:6379}" object idletime <key>

Count Keys by Prefix (useful for auditing)

redis-cli -u "${REDIS_URL:-redis://localhost:6379}" eval "local count = 0; local cursor = '0'; repeat local result = redis.call('SCAN', cursor, 'MATCH', ARGV[1], 'COUNT', 1000); cursor = result[1]; count = count + #result[2]; until cursor == '0'; return count" 0 "<prefix>*"

Diagnostics — Full Health Check

Run the health check script for a comprehensive overview:

bash scripts/redis-health.sh "${REDIS_URL:-redis://localhost:6379}"

This script outputs:

Connectivity status
Server version and uptime
Memory usage and fragmentation
Connected and blocked clients
Slow query count
All BullMQ queue depths
Warnings for any anomalies detected

Troubleshooting Decision Trees

Redis is slow

Check latency: redis-cli --latency -c 10
If latency > 1ms → check slow log: slowlog get 10
If slow log has KEYS/SMEMBERS/HGETALL on large collections → advise using SCAN variants
Check memory fragmentation → if > 1.5, recommend restart
Check connected_clients → if > 1000, investigate connection pooling
Check blocked_clients → if > 0, check BLPOP/BRPOP consumers

Redis OOM / high memory

Run info memory → check used_memory vs maxmemory
Run --bigkeys → find largest keys
Check maxmemory_policy → is eviction configured?
Run memory doctor → follow recommendations
Check for missing TTLs on keys: scan and check ttl on large keys

BullMQ jobs stuck / not processing

Check queue depth → are jobs piling up in wait?
Check active list → are jobs stuck in active state?
Check for stale active jobs → processedOn too old
Check event stream → xinfo groups to verify workers are connected
Check failed set → read failedReason and stacktrace
Check Redis connectivity → can workers reach Redis?

BullMQ high failure rate

Get recent failed jobs: zrange bull:<queue>:failed -10 -1
For each, read failedReason and stacktrace
Group errors by type → is it one recurring error or varied?
Check attemptsMade → are retries exhausted?
Check job data → is the payload malformed?

Notes

All commands default to redis://localhost:6379 if REDIS_URL is not set
The scan command is safe for production (non-blocking), unlike keys which should NEVER be used in production
BullMQ key patterns assume default prefix bull:. If a custom prefix is used, replace bull: accordingly
For Redis Cluster, add -c flag to redis-cli commands
For Redis Sentinel, connect to the sentinel first to discover the master