Lesson 3: The Watchdog (System Monitoring)

At 3 AM, an alert fires: "Server unresponsive." Before panicking, a good DevOps engineer checks the vitals — just like a doctor with a stethoscope. Is it out of memory? Is the disk full? Is a process eating all the CPU?

The Vital Signs

1. Memory: `free -h`

Shows how much RAM is being used and how much is available.

              total    used    free    shared    buff/cache   available
Mem:           7.8G    3.2G    1.1G       0B        3.4G       4.3G

The -h flag means human-readable (GB instead of bytes).

2. Disk Space: `df -h`

Shows how full each disk partition is.

Filesystem      Size  Used  Avail  Use%  Mounted on
/dev/sda1        50G   32G    18G   64%  /

If Use% hits 100%, your server stops working.

3. Directory Size: `du -sh`

Find out how much space a specific folder is consuming.

du -sh /var/log

-s — Summary (don't list every sub-file).
-h — Human-readable.

4. Live Monitoring: `top` / `htop`

Like a heart monitor, top shows you real-time CPU and memory usage per process. htop is the prettier, interactive version.

booting...

Mission Objective

The server is slow. Diagnose the issue:

Check the pulse: Run free -h to see memory status.
Scan the storage: Run df -h to check if any disk is full.
Find the culprit: Run du -sh /var to see if logs are eating up space.

Pro Tip

A full /var/log directory is one of the most common causes of server crashes. Set up log rotation (logrotate) to prevent this!

Lesson 3: The Watchdog (System Monitoring)

The Vital Signs

1. Memory: free -h

2. Disk Space: df -h

3. Directory Size: du -sh