Lesson 2: The Troubleshooter (Capstone Part 2)
It's 3 AM. The pager goes off: "Production server unresponsive." This is the moment everything you've learned comes together. You need to diagnose and fix the issue — fast.
The Troubleshooting Playbook
Every experienced DevOps engineer follows a systematic approach:
1. CHECK LOGS → What error messages exist?
2. CHECK PROCESSES → Is something hogging resources?
3. CHECK NETWORK → Can the server communicate?
4. CHECK DISK → Is storage full?
5. CHECK MEMORY → Is RAM exhausted?
Step 1: Read the Logs
Logs are your crime scene evidence. Check them first:
grep -i 'error' /var/log/syslog | tail -20 # Recent errors
journalctl -xe # Systemd journal
tail -f /var/log/nginx/error.log # Watch logs live
Step 2: Hunt Rogue Processes
A runaway process can eat all CPU or memory:
ps aux --sort=-%cpu | head -10 # Top CPU consumers
ps aux --sort=-%mem | head -10 # Top memory consumers
kill -9 <PID> # Force-kill a rogue process
Step 3: Verify Network
Is the server actually accepting connections?
ss -tlnp | grep 80 # Is port 80 listening?
ping -c 3 google.com # Does it have internet?
curl -I http://localhost # Is the web server responding?
Step 4: Check Storage
A full disk is a silent killer:
df -h # Overall disk usage
du -sh /var/log/* | sort -rh | head -5 # Biggest log files
Step 5: Generate a Report
Document your findings for the team:
echo "=== Health Report ===" > report.txt
date >> report.txt
free -h >> report.txt
df -h >> report.txt
booting...
Final Mission
A production server is down. Follow the playbook:
- Read the evidence: Run
grep -i 'error' /var/log/syslog | tail -5. - Hunt the culprit: Find top CPU consumers with
ps aux --sort=-%cpu | head -5. - Test the network: Check if port 80 is open with
ss -tlnp | grep 80. - Check the storage: Look for full disks with
df -h | grep -E '[89][0-9]%|100%'. - Document it: Generate a health report to share with your team.
🎉 Congratulations!
You've completed the Linux Foundations for DevOps course! You now have the skills to navigate, secure, monitor, automate, and troubleshoot Linux servers — the foundation of every DevOps career.
Next Steps:
- Practice these commands on a real Linux VM (try DigitalOcean or AWS Free Tier).
- Learn Docker — containerization is the next level.
- Explore CI/CD pipelines with GitHub Actions or Jenkins.