Lesson 3: The Watchdog (System Monitoring)
At 3 AM, an alert fires: "Server unresponsive." Before panicking, a good DevOps engineer checks the vitals — just like a doctor with a stethoscope. Is it out of memory? Is the disk full? Is a process eating all the CPU?
The Vital Signs
1. Memory: free -h
Shows how much RAM is being used and how much is available.
total used free shared buff/cache available
Mem: 7.8G 3.2G 1.1G 0B 3.4G 4.3G
The -h flag means human-readable (GB instead of bytes).
2. Disk Space: df -h
Shows how full each disk partition is.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 32G 18G 64% /
If Use% hits 100%, your server stops working.
3. Directory Size: du -sh
Find out how much space a specific folder is consuming.
du -sh /var/log
-s— Summary (don't list every sub-file).-h— Human-readable.
4. Live Monitoring: top / htop
Like a heart monitor, top shows you real-time CPU and memory usage per process. htop is the prettier, interactive version.
Mission Objective
The server is slow. Diagnose the issue:
- Check the pulse: Run
free -hto see memory status. - Scan the storage: Run
df -hto check if any disk is full. - Find the culprit: Run
du -sh /varto see if logs are eating up space.
Pro Tip
A full /var/log directory is one of the most common causes of server crashes. Set up log rotation (logrotate) to prevent this!