The strace command in Linux is an invaluable tool for debugging and understanding application behavior. This comprehensive guide will teach you everything you need to know about using strace.

What is Strace?

Strace is a diagnostic and debugging utility that captures and records all system calls made by a process. System calls are functions used by applications to request services from the Linux kernel.

By tracing these calls, strace allows you to:

  • Monitor all system calls made by a process
  • Print a list of called functions, arguments passed, return values, errors etc
  • Time each system call and analyze performance
  • Identify software bugs and errors
  • Troubleshoot crashes or unexpected behavior

Put simply – strace lets you see everything an application is doing under the hood by spying on its communication with Linux.

Installing Strace

Strace comes pre-installed on most Linux distributions. To confirm, open a terminal and type:

strace --version

If not installed already, use your distribution‘s package manager to install. For example on Debian/Ubuntu:

sudo apt install strace

Or on RHEL/CentOS:

sudo yum install strace

Strace Basic Usage

The basic syntax for the strace command is:

strace [options] <command> [arguments]

This runs <command> while tracing all resulting system calls and their outputs.

For example, to trace calls made by the ls command:

strace ls -l

This will print pages of output showing every library function and system call made during the listing process.

Here is a snippet tracing the mkdir command:

strace mkdir test

mkdir("test", 0777)                          = 0
+++ exited with 0 +++

We can see the mkdir system call being made along with its arguments and return value.

Let‘s go over some common strace options to tailor and filter this output.

Counting Calls with -c

The -c option prints a handy summary of all calls made instead of a complete trace.

For example:

strace -c ls -l 

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.81    0.000100           3        36           read
  5.19    0.000006           2         3           open
  0.00    0.000000           0        18           close
  0.00    0.000000           0         8           fstat
  0.00    0.000000           0         8         8 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           getdents
  0.00    0.000000           0        10           mmap
  0.00    0.000000           0         4           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3         1 stat
  0.00    0.000000           0         1           brk
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1           ioctl
  0.00    0.000000           0         1         1 clone
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000106                    93         9 total

This prints a percentage and count of calls made, sorted by frequency. We can quickly see that read dominated, followed by opens, closes etc.

Very useful for high level analysis.

Timing Calls with -T

To time how long each call takes, use -T:

strace -T ls -l

18:10:33 open("." <unfinished ...>
18:10:33 <... open resumed> )      = 3 <0.000010>
18:10:33 getdents(3, /* 3 entries */, 32768) = 48 <0.000012>
18:10:33 getdents(3, /* 0 entries */, 32768) = 0 <0.000008>
18:10:33 close(3)                       = 0 <0.000007>
18:10:33 open("lib64", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 <0.000008>

Now we can see the start, end and elapsed time for each call like open and getdents above. This lets you identify any laggy operations.

Increase Trace Verbosity with -v

Add multiple -v flags to control output verbosity:

strace -v ls -l

At higher verbosity levels, you‘ll see extremely detailed output of parameters, return values, errors etc.

This helps dig deeper during investigations but can result in a flood of data.

Trace Child Processes with -f

Applications often create child processes. -f ensures these are traced as well:

strace -f node app.js 

Now you can debug the full lifecycle including all forked processes.

Filtering Output with -e

One of strace‘s most useful options is -e. This traces only specific system calls you care about.

For example, to trace just file related activity:

strace -e trace=file ls -l

Many categories are available for tracing:

  • file – File related calls like open, close, rename etc
  • process – Process stuff such as clone, execve, kill
  • network – Socket calls, send, recv etc
  • signal – Signals like kill, rt_sig, sig
  • ipc – Interprocess communication
  • desc – File descriptor handling
  • memory – Brk, mmap, mremap and friends

You can even trace by exact call name. This traces only open and close:

strace -e open,close ls -l

The -e trace=all option traces every single call if needed.

This filtering transforms strace from a firehose to a precise debugging laser beam.

Attaching to Running Processes with -p

The -p option attaches strace to an already running process by PID:

strace -p 2346

Then detach again with Ctrl + C.

No need to restart anything – perfect for production debugging.

Tracing Multiple Processes with -f and -p

Combine -f and -p to trace a program launch and subsequent child processes:

strace -f -p $(pidof php-fpm)

This attaches to php-fpm and all its children pools, workers etc. Powerful stuff.

Output to a File with -o

To save strace output for later analysis:

strace -o /tmp/trace.log <command>

This keeps your terminal clutter free for real-time viewing with -f for example.

Time of Day Output with -t

Tag every line with the current date and time using -t:

strace -t curl google.com

10:30:01 open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 3 <0.000011>
10:30:01 read(3, "# The \"order\" line is only used by old versions of the C library.\n".., 179) = 179 <0.000010>
10:30:01 close(3)                       = 0 <0.000007> 

Helps create a detailed timeline of all traced activity.

Putting it All Together: Strace One-Liners

Here are some handy one-liner examples combining options:

Trace all child processes of PID 2423 and time each system call:

strace -fT -p 2423

Count calls made by a Node app writing to a file:

strace -c -e trace=file node test.js > /dev/null 

Full verbose trace of everything Bash does, saving output to a file:

strace -fvvv -o bash.log bash

And many more combinations are possible. Chain those options!

Using Strace to Debug Issues

Let‘s go through some real world examples where strace can help identify or troubleshoot software problems.

An Application Crash

When apps suddenly crash without explanation, strace often provides clues.

As a basic example, let‘s force Node.js into a segmentation fault:

// index.js
let a = {}; 
a.b.c = 10; // crash!

Running this straight in Node shows little:

> node index.js
Segmentation fault

Hmm, not too helpful. Let‘s invoke strace:

$ strace -f node index.js 2>&1 > /dev/null

... 

rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000038>
rt_sigaction(SIGSEGV, {sa_handler=0x7ff494701420, sa_mask=[SEGV], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ff4947c09d0}, NULL, 8) = 0 <0.000030>
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000028>
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV +++
Segmentation fault

Ah ha! We can see the exact crashing signal above (SIGSEGV = segmentation violation). The crash address is also printed (si_addr).

This pinpoints the crash location for easy debugging. Stack traces, core dumps and other tools can then reveal additional details about the crashing code.

Let‘s try another real world example…

High Service Latency

If an application suddenly slows down, strace helps narrow down the cause:

$ curl https://myapp.com
<hangs for 10 seconds>
{
   "status": "ok"
}

Uh oh, our service is hanging! We run strace:

$ strace -T -ttT curl https://myapp.com 2>&1 | grep myapp

16:03:11 connect(5, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("192.0.2.22")}, 16) = 0 <1.000071>
16:13:22 getsockname(5, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("192.0.2.22")}, [16]) = 0 <10.002398>
16:13:22 send(5, "GET / HTTP/1.1\r\nHost: myapp\r\nUser-Agent: curl\r\nAccept: */*\r\n\r\n".., 91, 0) = 91 <0.000046>

Look at that! The 10 second delay clearly happened during getsockname.

We‘ve quickly identified socket operations as the culprit rather than disk, CPU or something else. Further debugging of the OS networking stack can now explain the stall.

This again demonstrates strace‘s power to pinpoint issues accurately.

Pros and Cons of Strace

Pros

  • Robust debugging of process activity
  • Identifies software crashes and errors
  • Locates sources of latency
  • Works on any app without rebuilding
  • Minimal performance overhead

Cons

  • Very verbose output takes practice to interpret
  • Heavily multithreaded apps are challenging
  • Some info leaks possible from memory errors
  • Debug symbols needed for mapping addresses

So in summary – strace gives you incredible visibility into processes, but also requires some expertise using it.

Strace Alternatives

There are other Linux tools that fill similar roles to strace:

  • ltrace – Traces library calls instead of system calls. Handy for seeing inside app code rather than the kernel interface.
  • dtrace and SystemTap – More advanced tracing frameworks with custom logic and output formatting.
  • perf – Low level performance analysis and sampling profiler.
  • Process monitor tools – GUI tools to inspect processes and resources.

However, strace remains the simplest way to get started troubleshooting on Linux.

Conclusion

Strace is one of Linux‘s most invaluable power tools for debugging mysterious application issues.

It sheds light on process internals by tracing all underlying system calls. This pinpoints crashes, slowdowns and other odd behavior.

With some practice, you‘ll be shocked how quickly strace homes in on problems. It‘s a universal debugger that requires no code changes or app restarts either.

Mastering strace takes time due to the overwhelming detail exposed. Start slowly tracing small commands to learn the output format and available options. As you gain experience, you‘ll gain the ability to analyze any misbehaving process on your system.

So whenever things go wrong – reach for strace! It should provide the clues you need to resolve even the trickiest issues that erupt in production.

Similar Posts