Server Management
Server management principles for production operations. Learn to THINK, not memorize commands.
- Process Management Principles
Tool Selection
Scenario Tool
Node.js app PM2 (clustering, reload)
Any app systemd (Linux native)
Containers Docker/Podman
Orchestration Kubernetes, Docker Swarm
Process Management Goals
Goal What It Means
Restart on crash Auto-recovery
Zero-downtime reload No service interruption
Clustering Use all CPU cores
Persistence Survive server reboot
- Monitoring Principles
What to Monitor
Category Key Metrics
Availability Uptime, health checks
Performance Response time, throughput
Errors Error rate, types
Resources CPU, memory, disk
Alert Severity Strategy
Level Response
Critical Immediate action
Warning Investigate soon
Info Review daily
Monitoring Tool Selection
Need Options
Simple/Free PM2 metrics, htop
Full observability Grafana, Datadog
Error tracking Sentry
Uptime UptimeRobot, Pingdom
- Log Management Principles
Log Strategy
Log Type Purpose
Application logs Debug, audit
Access logs Traffic analysis
Error logs Issue detection
Log Principles
-
Rotate logs to prevent disk fill
-
Structured logging (JSON) for parsing
-
Appropriate levels (error/warn/info/debug)
-
No sensitive data in logs
- Scaling Decisions
When to Scale
Symptom Solution
High CPU Add instances (horizontal)
High memory Increase RAM or fix leak
Slow response Profile first, then scale
Traffic spikes Auto-scaling
Scaling Strategy
Type When to Use
Vertical Quick fix, single instance
Horizontal Sustainable, distributed
Auto Variable traffic
- Health Check Principles
What Constitutes Healthy
Check Meaning
HTTP 200 Service responding
Database connected Data accessible
Dependencies OK External services reachable
Resources OK CPU/memory not exhausted
Health Check Implementation
-
Simple: Just return 200
-
Deep: Check all dependencies
-
Choose based on load balancer needs
- Security Principles
Area Principle
Access SSH keys only, no passwords
Firewall Only needed ports open
Updates Regular security patches
Secrets Environment vars, not files
Audit Log access and changes
- Troubleshooting Priority
When something's wrong:
-
Check if running (process status)
-
Check logs (error messages)
-
Check resources (disk, memory, CPU)
-
Check network (ports, DNS)
-
Check dependencies (database, APIs)
- Anti-Patterns
❌ Don't ✅ Do
Run as root Use non-root user
Ignore logs Set up log rotation
Skip monitoring Monitor from day one
Manual restarts Auto-restart config
No backups Regular backup schedule
Remember: A well-managed server is boring. That's the goal.