Linux Commands Cheat Sheet (Practical, Under-Pressure Edition)

The last time I truly needed a Linux command cheat sheet wasn’t during a tutorial—it was during an incident. A service was flapping, logs were rolling over, disk space was vanishing, and I had exactly one SSH session that I didn’t want to lose. In moments like that, you don’t want trivia. You want a small set of commands you can trust, plus the patterns that keep you from making things worse.\n\nThis post is that: the commands I reach for daily when I’m building, debugging, and operating modern systems—containers, VMs, bare metal, cloud instances, CI runners, and dev machines. I’ll show you how to inspect files safely, search fast, combine tools into repeatable pipelines, reason about permissions, and control processes without guesswork. I’ll also call out the mistakes I see even experienced developers make (including a few I’ve made myself) and the safety rails that prevent one-liner heroics from turning into outages.\n\nIf you keep nothing else, keep the mental model: Linux commands are small tools that become powerful when you compose them.\n\n## A Shell Cheat Sheet You Can Trust Under Pressure\nI treat the terminal like a workshop bench: every tool is small, but the layout matters. The layout in Linux is the pipe-and-filter model—commands read from standard input, write to standard output, and you connect them.\n\nThree habits I recommend:\n\n1) Prefer read-only commands first. When you’re uncertain, start with ls, stat, cat, head, tail, ps, ss, df. Save write commands (rm, mv, chmod, kill) for when you can explain the outcome in a sentence.\n\n2) Make output predictable. The moment you plan to script something, use machine-friendly output:\n- ls is for humans; find -print0 + xargs -0 is for scripts.\n- ps -eo pid,ppid,cmd --sort=-%cpu is friendlier than default ps.\n- Prefer --color=auto for interactive tools, but disable colors in scripts if they pollute output.\n\n3) Know your escape hatches. If a command might do damage, preview it:\n- Replace rm with ls first.\n- Replace xargs actions with echo first.\n- Add -n to tools that support dry-run behavior (examples below).\n\nA tiny starter pack I keep in muscle memory:\n\n- Where am I? pwd\n- What’s here? ls -lah\n- What’s big? du -xh --max-depth=1


 sort -h

\n- What’s eating disk? df -h\n- What’s running?

ps aux

head\n- What’s listening? ss -lntp\n- What happened recently? journalctl -u service-name --since ‘30 min ago‘\n\nTwo more habits that save me constantly:\n\n- Use history intentionally. history


 tail

, reverse search with Ctrl+R, and if you’re about to run something scary, prefix with a space when your shell ignores space-prefixed history (some configs do).\n- Don’t trust your memory for flags under stress.

command --help

less and man command are part of the cheat sheet. When the system is on fire, correctness beats speed.\n\n## Files: Create, Inspect, Copy, Move, Delete\nFile operations are where speed matters—and where mistakes are expensive. I’ll group the essentials by what you’re trying to do.\n\n### Create and modify files safely\n- Create an empty file or update its timestamp:\n\n # Create empty file if missing, otherwise bump mtime\n touch server.env\n\n- Print text (great for scripts):\n\n echo ‘LOGLEVEL=info‘ >> server.env\n\nIf the text contains backslashes or escape sequences, I prefer printf over echo in scripts:\n\n # Predictable across shells\n printf ‘%s\n‘ ‘LOGLEVEL=info‘ >> server.env\n\nPractical safety tip: if you’re generating config files, I like writing to a temporary file and then moving it into place. A mv within the same filesystem is atomic, which prevents half-written config files when a script crashes.\n\n tmp=$(mktemp)\n printf ‘%s\n‘ ‘LOGLEVEL=info‘ ‘PORT=8080‘ > "$tmp"\n mv — "$tmp" ./server.env\n\n### View files without surprises\n- Quick view:\n\n cat app.log\n\n- Paginated view (I use this constantly):\n\n less -RS app.log\n # -R keeps color codes readable\n # -S avoids wrapping long lines\n\nLess keys I actually use in real life:\n- /pattern then n / N to jump\n- G to end, g to start\n- F to follow like tail -f (hit Ctrl+C to stop following)\n- -S toggle line wrapping if needed\n\n- Head/tail for large logs:\n\n head -n 50 app.log\n\n # Follow a log as it grows\n tail -n 200 -f app.log\n\n- Reverse a file (handy for latest-first):\n\n tac app.log

head -n 50\n\nPractical scenario: when you’re chasing a crash loop, I often do both a reverse view and a follow view—reverse to find the last good boundary, follow to watch the next iteration arrive.\n\n### Identify what a file actually is\nLinux doesn’t rely on extensions. When a file looks wrong, I ask the system:\n\n file download.bin\n\nFor weird binary inspection:\n\n # Hex/char dump for debugging formats\n od -Ax -tx1z -N 256 download.bin\n\nIf you’re debugging text encodings (a real source of pain in automation), file will often hint at UTF-8 vs ASCII vs binary. If the file is text but looks garbled, I’ll check for hidden control characters with something like:\n\n cat -A suspicious.txt

head\n\n### Copy, move, and rename\nThese are the commands that quietly shape your repo and deployments.\n\n # Copy a file\n cp app.conf app.conf.bak\n\n # Copy a directory recursively (preserve metadata)\n cp -a ./config ./config.backup\n\n # Move/rename\n mv app.conf app.conf.old\n\nI use cp -a when I care about preserving permissions, timestamps, and symlinks. For deployments and backups, that’s usually the right default. For quick local copies, cp -r is fine, but know what you’re giving up.\n\nFor batch renames, rename is powerful but varies by distro. When I need portability, I’ll often use a loop:\n\n # Rename .log to .log.bak (portable)\n for f in .log; do\n [ -e "$f" ] break\n mv — "$f" "$f.bak"\n done\n\nTwo edge cases to watch:\n- If filenames contain spaces, your loop must quote variables. (In the example above, it does.)\n- If filenames can start with -, use -- before the filename to stop option parsing (mv -- "$f" ...).\n\n### Links: symlinks vs hard links\n- Symbolic link (most common):\n\n ln -s /etc/nginx/sites-available/app /etc/nginx/sites-enabled/app\n\n- Hard link (same inode; only works on same filesystem):\n\n ln important.db important.db.hardlink\n\nWhen you’re debugging why editing one file changed another, check for hard links:\n\n ls -li\n\nPractical note: symlinks break if their target moves; hard links don’t (but hard links can be surprising during cleanup). If you’re chasing a disk leak and a file refuses to disappear, hard links are one of the reasons.\n\n### Compare and validate file integrity\nWhen you need certainty:\n\n # Compare byte-by-byte\n cmp fileA.bin fileB.bin\n\n # Compare line-by-line\n diff -u old.conf new.conf\n\n # Three-way merge comparison\n diff3 base.conf ours.conf theirs.conf\n\nChecksums for verifying downloads:\n\n cksum artifact.tgz\n\nIn practice I more often use sha256sum:\n\n sha256sum artifact.tgz\n\nA sanity rule I follow: if you’re comparing config files, prefer diff -u and read the output. If you’re comparing artifacts, prefer checksums and verify you’re hashing the same exact bytes (watch out for a decompressed file vs compressed archive confusion).\n\n### Split, join, and reshape files\nThese commands are underrated for data handling and CI logs.\n\n # Split a large file into 50MB chunks\n split -b 50m big.log chunk\n\n # Split by pattern (split a file at lines matching BEGIN)\n csplit -f part big.txt ‘/^BEGIN/‘ ‘{}‘\n\n # Join two files on a key (requires both sorted by key)\n join -t‘,‘ -1 1 -2 1 customers.csv orders.csv\n\n # Paste columns side-by-side\n paste -d‘,‘ left.csv right.csv\n\nPerformance consideration: join requires sorted input. If you forget, it won’t error loudly—it will just give wrong results. I treat sorting as part of the pipeline, so I don’t rely on input being pre-sorted unless it’s guaranteed.\n\n### Archives and compression\nFor packaging logs or shipping artifacts:\n\n # Create a tar.gz archive\n tar -czf logs2026-02-01.tgz /var/log/myapp\n\n # List contents\n tar -tzf logs2026-02-01.tgz

head\n\n # Extract\n tar -xzf logs2026-02-01.tgz -C ./restore\n\nTwo practical tips:\n- If you’re archiving a directory that changes while you read it, results can be inconsistent. For production incident evidence, I prefer snapshotting (filesystem snapshot, volume snapshot, or at least copying the subset to a staging directory).\n- Compression costs CPU. If CPU is already pegged, consider tar -cf (no compression) and compress elsewhere.\n\n### Delete and destroy (carefully)\nI treat deletion as a two-step ritual: list first, delete second.\n\n # Preview what would be deleted\n ls -lah ./tmp/to-delete\n\n # Then remove\n rm -r ./tmp/to-delete\n\nCommon mistakes I see:\n- Running rm -rf with a variable that is empty or unset.\n- Running rm -rf $DIR/ when $DIR contains spaces.\n- Running rm -rf ./some/dir and forgetting that glob might match more than expected.\n\nSafety rails I use:\n\n # In bash scripts: fail on unset variables and errors\n set -euo pipefail\n\n # Use — to end option parsing and quote paths\n rm -r — "${DIR}"\n\nAnother safety pattern: avoid relying on globs in destructive commands. Prefer find with a narrow match and preview the list. For example, to delete .tmp files older than 7 days, I’ll do:\n\n find /tmp -type f -name ‘.tmp‘ -mtime +7 -print\n\n…and only if the list looks correct, then:\n\n find /tmp -type f -name ‘.tmp‘ -mtime +7 -delete\n\nIf you truly need secure deletion, shred exists, but you should know its limits on copy-on-write filesystems and SSD wear-leveling. I use it mostly for lab environments and removable media, and I still assume the safest approach for secrets is not writing them unencrypted in the first place.\n\n shred -u -z secrets.txt\n\n## Directories and Finding Things Fast\nDirectory commands are about speed and orientation. If you can’t find things quickly, everything else becomes slow.\n\n### Move around with intent\n cd /var/log\n cd – # jump back to previous directory\n pwd\n\nA small habit that pays off: I keep paths short by using ~ and by using project-local shells (for example, direnv or a devcontainer), but even without those tools, cd - is a great time-saver.\n\n### List directories in ways that answer real questions\n # Human-friendly listing\n ls -lah\n\n # Show directories first (GNU ls)\n ls -lah –group-directories-first\n\nSome systems have dir as an alternative to ls:\n\n dir -lah\n\nWhen I’m scanning for recent activity, I’ll sort by time:\n\n ls -lat\n\n…and when I’m looking for big artifacts, I often do size sorting:\n\n ls -lahS

head\n\n### Create and remove directories\n mkdir -p ./data/exports/2026-02-01\n rmdir ./empty-dir\n\nrmdir only removes empty directories. For non-empty directories, use rm -r deliberately.\n\n### Extract path pieces\nThese are small, but they make scripts cleaner:\n\n basename /var/log/nginx/access.log\n # access.log\n\n dirname /var/log/nginx/access.log\n # /var/log/nginx\n\nWhen symlinks get involved:\n\n readlink -f ./current\n\n### Find files the reliable way\nWhen accuracy matters, find is the workhorse.\n\n # Find files modified in last 24 hours\n find /var/log -type f -mtime -1 -print\n\n # Find large files (>200MB)\n find / -type f -size +200M -print 2>/dev/null\n\n # Find and delete .tmp files older than 7 days (preview first!)\n find /tmp -type f -name ‘.tmp‘ -mtime +7 -print\n # If the output looks correct:\n find /tmp -type f -name ‘.tmp‘ -mtime +7 -delete\n\nWhen filenames might contain spaces/newlines, use NUL separators:\n\n # Safe iteration with NUL terminators\n find . -type f -name ‘.log‘ -print0

xargs -0 -n 1 wc -l\n\nOr avoid xargs entirely with -exec:\n\n find . -type f -name ‘.log‘ -exec wc -l {} +\n\nFor speed on developer machines, locate can be dramatically faster because it uses an index:\n\n locate package.json\n\nBut locate can be stale (depends on updatedb). For live production debugging, I trust find more.\n\n### A practical 2026 comparison table: classic tools vs modern defaults\nOn many teams, I see a modern CLI stack installed on dev machines while servers remain conservative. I keep both in mind:\n\n

Task
Traditional default

Modern dev default (2026)
What I recommend
\n
—
—
—
—
\n

Find files by name
find
fd
Use find on servers; use fd locally for speed and nicer UX
\n
Search inside files
grep
rg (ripgrep)

Use grep everywhere; use rg when available
\n
Directory jumping
cd + memory
zoxide
Keep cd skills; add zoxide if you live in many repos
\n
Disk usage
du
dust, dua
Learn du first; use modern tools for convenience
\n\n## Text Pipelines: Turn Logs Into Answers\nText processing is where Linux really shines. I think of it like a factory line: each command does one job, and the pipe is the conveyor belt.\n\n### Select columns and fields\nIf you have delimiter-separated data:\n\n # Extract 1st and 3rd columns from a CSV-like file\n cut -d‘,‘ -f1,3 users.csv\n\nFor fixed-width data:\n\n # Expand tabs to spaces (useful before fixed-position cuts)\n expand -t 4 report.txt > report.expanded.txt\n\n # Reverse: turn spaces back into tabs (when needed)\n unexpand -a report.expanded.txt > report.tabs.txt\n\nWhen cut isn’t enough, I reach for awk because it’s everywhere and extremely expressive. Two quick patterns I use constantly:\n\n # Print fields 1 and 7 from whitespace-delimited logs\n awk ‘{print $1, $7}‘ access.log
head\n\n # Filter lines where field 9 (status) is 500\n awk ‘$9 == 500 {print}‘ access.log
head\n\n### Sort, deduplicate, count\nThese are the pattern detectors for noisy output:\n\n # Top 20 most frequent IPs in an access log\n cut -d‘ ‘ -f1 /var/log/nginx/access.log
sort
uniq -c
sort -nr
head -n 20\n\nQuick line/word counts:\n\n wc -l app.log\n wc -w README.md\n wc -c artifact.bin\n\nA subtle but important detail: uniq only removes adjacent duplicates. If you don’t sort first, you’re not deduplicating—you’re just compressing runs. Under pressure, I mentally treat sort
uniq as a single unit.\n\n### Search and filter\nEven without any extra tools, the general pattern is:\n- filter lines you care about\n- reduce them to key fields\n- count or summarize\n\nWith base tools:\n\n # Show error lines and 30 lines before/after (if grep supports context flags)\n grep -n -C 30 ‘ERROR‘ app.log
less -R\n\nMore patterns I use in the real world:\n\n # Case-insensitive search\n grep -i ‘timeout‘ app.log
head\n\n # Exclude noisy lines\n grep -v ‘healthcheck‘ app.log
head\n\n # Match whole words (avoid false positives)\n grep -w ‘panic‘ app.log
head\n\nIf your environment has ripgrep (rg), it’s often faster and nicer for code and configs. But I never rely on it being installed on servers; I treat it as a bonus.\n\n### Combine outputs without losing them\ntee is the splitter in the conveyor belt. I use it to save intermediate results while still continuing the pipeline.\n\n # Save filtered lines and also view them\n grep ‘timeout‘ app.log
tee timeouts.log
wc -l\n\nI also use tee when I’m building a pipeline interactively and want a breadcrumb trail for later. Under stress, it’s easy to run a command and lose the output in scrollback. tee turns exploration into an artifact.\n\n### Fold long lines for readable terminals\nWhen a JSON line is 20,000 characters, your terminal becomes useless.\n\n fold -w 120 -s giant-line.log
less\n\nIf you control the app, this is your sign to emit structured logs with newlines (or to log JSON with pretty-print enabled in non-production environments). If you don’t control the app, folding and paging is a practical compromise.\n\n### Stream editing: quick transforms with sed\nI try not to write complicated sed scripts under pressure, but simple substitutions are incredibly useful:\n\n # Replace tabs with spaces\n sed ‘s/\t/ /g‘ file.txt
head\n\n # Remove ANSI color codes (common in CI logs)\n sed ‘s/\x1b\[[0-9;]m//g‘ colored.log

head\n\nIf that last one looks scary, that’s the point: in scripts, I try to avoid fragile regex unless it’s been tested. In an incident, keep it as simple as possible.\n\n### Common mistakes (and fixes)\n- Mistake: forgetting that sort


 uniq

only removes adjacent duplicates.\n – Fix: always sort before uniq when deduplicating.\n- Mistake: parsing logs with fragile assumptions about spacing.\n – Fix: prefer structured logs (JSON) when you control the app; otherwise reduce scope (extract only the first field, or only the timestamp).\n- Mistake: piping binary data through text tools.\n – Fix: use file and od first; treat binary as binary.\n- Mistake: forgetting that tools behave differently in different locales (sorting order, collation).\n – Fix: for scripts, set LCALL=C for stable behavior: LCALL=C sort.\n\n## Permissions and Safety Rails (Because Permission denied Is a Feature)\nPermissions look annoying until they save you from yourself. I explain them with a simple analogy: ownership is the name on the door, permissions are the lock, and groups are who you handed spare keys to.\n\n### Read permissions quickly\n ls -l\n\nYou’ll see something like:\n- -rw-r--r-- for a file\n- drwxr-xr-x for a directory\n\nThe three triplets map to: user (owner), group, others.\n\nTwo practical reminders that explain a lot of confusion:\n- Directories need execute (x) to traverse. You can have read permission on a file and still not access it if a parent directory blocks traversal.\n- Write permission on a directory controls creating/deleting entries. This surprises people: you can delete a file you don’t own if you have write permission on the directory that contains it (subject to sticky bit rules like /tmp).\n\n### Change permissions intentionally\nI prefer symbolic modes for clarity:\n\n # Add execute for the user\n chmod u+x deploy.sh\n\n # Remove write for group and others\n chmod go-w config.yml\n\nNumeric modes are fine when you truly know what you want:\n\n chmod 644 README.md\n chmod 755 ./scripts\n\n### Ownership: who can change what\n # Change owner and group\n chown appuser:appgroup /var/lib/myapp\n\n # Recursive ownership change (be careful!)\n chown -R appuser:appgroup /var/lib/myapp\n\nI treat recursive ownership changes as high-risk. On a big tree, a mistaken chown -R can break running services in a way that’s hard to diagnose later. If I need to change ownership, I try to target the smallest directory possible and verify with ls -l and stat afterwards.\n\n### Access checks and the reality of why can’t I read this?\nSometimes permissions look fine but access still fails due to parent directory permissions, ACLs, or mount options. I debug it in this order:\n\n1) Check each parent directory:\n\n ls -ld /path /path/to /path/to/file\n\n2) Check ACLs if available:\n\n getfacl /path/to/file\n\n3) Check mount options (read-only mounts, noexec, etc.):\n\n mount

grep ‘ /path ‘

true\n\n(ACL tools aren’t installed everywhere, but when they are, they answer questions quickly.)\n\n### Practical guidance I use on teams\n- For application secrets: don’t rely on filesystem permissions alone. Use your platform’s secret store, and when you must store locally, restrict to the service account and keep them out of images and repos.\n- For shared dev environments: set reasonable defaults with umask.\n\n # Common default: new files are 644, dirs are 755\n umask 022\n\n # More restrictive: new files are 600, dirs are 700\n umask 077\n\n### When not to just sudo it\nI use sudo when:\n- I’m installing packages.\n- I’m changing system config.\n- I’m inspecting privileged logs.\n\nI avoid sudo when:\n- I’m running arbitrary scripts I didn’t read.\n- I’m testing file operations that could expand globs in unexpected places.\n\nA safe pattern for edits:\n\n # Edit system files without changing their ownership\n sudoedit /etc/ssh/sshdconfig\n\nAnother pattern I like: run the minimum privileged command rather than opening a root shell. Under stress, the fewer privileged keystrokes you type, the better.\n\n## Processes and System Health: What’s Running, What’s Stuck, What’s Burning Resources\nWhen a system feels slow, I translate that into: CPU, memory, disk I/O, or network. Then I go measure.\n\n### See processes and sort by what matters\n # Snapshot of processes (works almost everywhere)\n ps aux
head\n\n # Sort by CPU (GNU ps)\n ps -eo pid,ppid,%cpu,%mem,etime,cmd –sort=-%cpu
head -n 20\n\n # Sort by memory\n ps -eo pid,ppid,%mem,%cpu,etime,cmd –sort=-%mem
head -n 20\n\nIf the system is really hurting, interactive tools can fail to start or update slowly. I treat ps as my reliable baseline because it’s cheap and ubiquitous.\n\n### Stop a runaway process safely\nI don’t start with -9. I escalate.\n\n # Graceful stop\n kill \n\n # If it ignores SIGTERM, then:\n kill -9 \n\nIf you’re dealing with a process group or a daemon tree, I find it safer to identify the parent/child chain with ps first rather than firing signals blindly.\n\nTwo useful variations:\n\n # See process tree (if available)\n pstree -ap \n\n # Kill by name (be careful; match can be broad)\n pkill -f ‘pattern‘\n\nI rarely use pkill in production unless I’m confident the match is specific. It’s too easy to kill the wrong thing when multiple processes share similar command lines.\n\n### Services on systemd-based systems\nOn most modern distros, systemd is the control plane for services and logs.\n\n # Service status\n systemctl status myapp\n\n # Restart\n sudo systemctl restart myapp\n\n # Follow logs\n journalctl -u myapp -f\n\n # Last boot logs for a unit\n journalctl -u myapp -b\n\nI also use these constantly:\n\n # Show failed units\n systemctl –failed\n\n # Show recent logs with timestamps\n journalctl -u myapp –since ‘2 hours ago‘\n\n # Confirm the unit file and drop-ins\n systemctl cat myapp\n\n### Disk and memory: the basics that catch 80% of incidents\n # Disk space per filesystem\n df -h\n\n # Disk usage by directory\n du -xh –max-depth=1 /var
sort -h\n\nWhen disk is full, I prioritize two questions:\n1) Which filesystem is full? (df -h)\n2) Which directory is consuming it? (du -xh --max-depth=1)\n\nThen I narrow down the biggest offenders:\n\n # Biggest files under /var/log\n find /var/log -type f -size +100M -print 2>/dev/null
head\n\nIf you suspect deleted-but-still-open files (classic mystery disk usage), lsof is the fastest answer if available:\n\n # List open files that were deleted but still held by a process\n lsof
grep ‘(deleted)‘
head\n\nMemory checks I keep simple:\n\n free -h\n\nAnd for kernel + system overview:\n\n uname -a\n uptime\n\nIf you have top or htop, they’re great—but I treat them as second step. First step is always cheap, scriptable commands that work over a flaky SSH session.\n\n## Networking: Ports, DNS, Routes, and Quick Connectivity Checks\nNetworking issues are often not the network. They’re frequently DNS, TLS, firewall rules, or the service listening on the wrong interface. I debug from the bottom up.\n\n### What’s listening (and on which address)?\nss is my default on modern systems (it replaced netstat in many distros):\n\n # Listening TCP sockets with process info\n ss -lntp\n\nKey interpretation:\n- 0.0.0.0:443 means listening on all IPv4 interfaces\n- 127.0.0.1:5432 means loopback only (not reachable externally)\n- :::80 means IPv6 any (often also covers IPv4 via dual-stack, depending on config)\n\n### Quick reachability\n # Basic connectivity (ICMP may be blocked; don’t over-trust ping)\n ping -c 3 1.1.1.1\n\n # DNS resolution\n getent hosts example.com\n\nIf getent is unfamiliar: it asks the system’s name service switch (NSS) which respects /etc/hosts, DNS, and other configured sources. That makes it more representative than a standalone DNS tool in many environments.\n\nFor raw TCP checks, nc (netcat) is handy when installed:\n\n # Check if a TCP port is reachable\n nc -vz example.com 443\n\nIf nc isn’t installed, I fall back to curl because it’s commonly present:\n\n curl -I https://example.com\n\n### Routes and interfaces\nWhen packets go to the wrong place, look at the routing table and interfaces:\n\n ip addr\n ip route\n\nTypical scenario: you’re in a container or VM and the default route is missing or pointing somewhere unexpected. ip route makes that obvious.\n\n### DNS debugging basics\nIf you do have DNS tools installed, these are the ones I use:\n\n # If present\n dig example.com\n nslookup example.com\n\nBut again: if you’re debugging why an app can’t resolve a name, getent hosts is often the most relevant command because it matches what applications use.\n\n### Firewall quick checks (cautiously)\nFirewall tooling varies a lot. If I suspect blocked ports, I start by verifying the service listens locally (ss -lntp) and then check firewall status if I’m on a system where I know the tooling. Examples you might see:\n\n # Some systems\n sudo ufw status\n\n # Others\n sudo iptables -S\n sudo nft list ruleset\n\nI don’t recommend learning firewall syntax during an incident. If you’re in that situation, your cheat sheet should include the exact firewall system your environment uses and a safe rollback plan.\n\n## Users, Groups, and Identity: Who Am I Actually Running As?\nAccess bugs are often identity bugs. I make identity explicit early.\n\n### Identify your current user and groups\n whoami\n id\n\nThe id output answers questions like:\n- Is this user in the docker group?\n- Does this user have the expected supplementary groups?\n- Is the UID/GID what the service expects (common in containers)?\n\n### Switch user safely\nIf I need to assume another identity locally:\n\n sudo -u appuser -H bash\n\nIf I need root for a specific command, I’ll do that instead of switching shells globally:\n\n sudo systemctl restart myapp\n\n### Inspect system users\n # View users\n cut -d: -f1 /etc/passwd
head\n\n # View groups\n cut -d: -f1 /etc/group
head\n\nIn production, I try not to edit these files directly unless I’m on a minimal system without higher-level tools.\n\n## Packages: Installing Tools Without Making a Mess\nPackage managers are distro-specific, but the mental model is consistent: update metadata, then install. Under stress, I keep a small set of commands for the distro I’m on.\n\nExamples you might run depending on environment:\n\n # Debian/Ubuntu\n sudo apt-get update\n sudo apt-get install -y curl lsof\n\n # RHEL/CentOS/Fedora family\n sudo dnf install -y curl lsof\n\nIf you’re on a locked-down production system, you might not be allowed to install anything (and that can be a good thing). In those cases, your cheat sheet needs stronger fallback patterns using only built-in tools (ss, ps, journalctl, find, grep).\n\n## Scheduling: Cron and Systemd Timers (Know What Runs When You’re Not Looking)\nBackground jobs can create mysterious load spikes, log bursts, or disk usage growth. When something happens at a predictable time, suspect a scheduler.\n\n### Cron\n # List current user cron\n crontab -l\n\n # System cron directories\n ls -lah /etc/cron.*\n\nCron often logs to syslog or journald depending on distro. I’ll search logs around the time of the spike.\n\n### Systemd timers\nOn systemd systems, timers are increasingly common:\n\n systemctl list-timers –all\n\nIf you see a timer that aligns with your issue, inspect its service unit and logs:\n\n systemctl status some-timer.timer\n systemctl status some-timer.service\n journalctl -u some-timer.service –since ‘24 hours ago‘\n\n## SSH Survival Patterns: Keep Your Session, Copy Files, and Don’t Lock Yourself Out\nSSH is often your lifeline. I treat it like a fragile pipeline: don’t break it, and don’t change critical auth settings without a rollback.\n\n### Keep a session alive\nIf you control the client, consider using keepalive settings in your SSH config, but even without that, I use two operational habits:\n- Keep a second session open when changing SSH config (or use a multiplexer).\n- Avoid restarting sshd blindly; validate config first.\n\n### Validate SSH config before restart\n # Validate sshd config (path may vary)\n sudo sshd -t\n\nThen restart carefully:\n\n sudo systemctl restart sshd\n\n### Copy files in/out\nFrom your local machine, scp is simple:\n\n scp user@server:/var/log/myapp/app.log ./app.log\n\nFor large transfers or flaky links, rsync is often better when available:\n\n rsync -avz user@server:/var/log/myapp/ ./myapp-logs/\n\nOperational tip: if you’re collecting incident evidence, preserve timestamps with rsync -a and consider hashing archives after transfer.\n\n## Practical Incident Recipes (The Ones I Actually Run)\nThis is where cheat sheets become valuable: repeatable mini-playbooks that get you to answers quickly.\n\n### Recipe 1: Service is flapping\nGoal: determine if it’s crashing, being OOM-killed, failing health checks, or restarting due to configuration.\n\n systemctl status myapp\n journalctl -u myapp –since ‘1 hour ago‘
tail -n 200\n\nIf you suspect a crash loop with quick restarts, look for patterns like exit codes and timestamps. If the logs are too noisy, filter for signals:\n\n journalctl -u myapp –since ‘1 hour ago‘
grep -i ‘exit\
failed\
killed\
oom‘
tail -n 200\n\nThen check resources quickly:\n\n ps -eo pid,ppid,%cpu,%mem,etime,cmd –sort=-%cpu
head\n free -h\n df -h\n\n### Recipe 2: Disk space is vanishing\nGoal: identify the filesystem, the directory, and the root cause (logs, artifacts, temp files, deleted-but-open files).\n\n df -h\n du -xh –max-depth=1 /var
sort -h\n\nNarrow down the heavy directory (example /var/log):\n\n du -xh –max-depth=1 /var/log
sort -h\n find /var/log -type f -size +200M -print 2>/dev/null
head -n 50\n\nIf available, check for deleted-but-open files:\n\n lsof
grep ‘(deleted)‘
head -n 50\n\nFix pattern I like: rotate/compress logs safely (or reduce verbosity), then restart the service only if required. Deleting files blindly is how you delete the one thing you needed for a postmortem.\n\n### Recipe 3: CPU is pegged\nGoal: identify the process and whether it’s a tight loop, high load due to traffic, or a runaway job.\n\n uptime\n ps -eo pid,ppid,%cpu,%mem,etime,cmd –sort=-%cpu
head -n 20\n\nIf you see a suspicious PID, confirm it’s not just doing expected work and then inspect context:\n\n ps -p -o pid,ppid,etime,cmd\n\nIf the culprit is a service, check logs around the start time for correlation. If it’s an unexpected binary, confirm what it is and where it’s running from.\n\n### Recipe 4: Port is open but requests fail\nGoal: distinguish between listener, routing, DNS, and application-level failures.\n\n ss -lntp

grep ‘:443‘| true\n curl -I http://127.0.0.1:PORT\n curl -I http://SERVERIP:PORT\n\nIf it works on loopback but not on the server IP, it’s likely bound to localhost only. If it works on the server but not externally, suspect firewall/security group or upstream routing.\n\n## Expansion Strategy\nAdd new sections or deepen existing ones with:\n- Deeper code examples: More complete, real-world implementations\n- Edge cases: What breaks and how to handle it\n- Practical scenarios: When to use vs when NOT to use\n- Performance considerations: Before/after comparisons (use ranges, not exact numbers)\n- Common pitfalls: Mistakes developers make and how to avoid them\n- Alternative approaches: Different ways to solve the same problem\n\n## If Relevant to Topic\n- Modern tooling and AI-assisted workflows (for infrastructure/framework topics)\n- Comparison tables for Traditional vs Modern approaches\n- Production considerations: deployment, monitoring, scaling\n\n### A final note I keep taped to my mental monitor\nWhen you’re under pressure, the goal isn’t to type fast—it’s to reduce uncertainty quickly. Prefer commands that are:\n- read-only first\n- cheap to run\n- easy to interpret\n- easy to undo (or at least easy to explain)\n\nThat’s the real cheat sheet: protect the system, protect your evidence, and make small, confident moves.

You maybe like,

Related Posts