Core dumps provide an invaluable glimpse into the state of a crashed Linux process. When enabled and configured properly, these snapshot files can empower developers to diagnose elusive software defects. This comprehensive guide covers core dump fundamentals, enabling unlimited dumps, customizing output, analyzing files, managing storage, and disabling when finished.

Core Dump Basics

A core dump is essentially a memory snapshot recorded when a Linux process crashes unexpectedly. The system attempts to write out the entire contents of the process‘s address space along with other contextual metadata.

Developers can use this post-mortem data to reconstruct program state right before the failure occurred. Armed with this information, the root cause of many complex bugs can be identified and fixed.

Some key characteristics of core dump files:

  • Contain complete memory contents – stack, heap, mapped files, etc
  • Include process metadata – registers, signal numbers, pid, environment variables
  • Often quite large – megabytes up to the size of a process‘s full address space
  • Written out to local disk by the kernel at crash time

The main use case is examining dumps with the GNU debugger (GDB) to pinpoint defects in custom software. Commercial Linux distributions have other diagnostics tools as well, like Automatic Bug Reporting Tool (ABRT) on Red Hat.

Overall core dumps provide a precious forensic record for post-crash analysis. Let‘s explore how to enable and leverage them.

Security Implications

While immensely useful for diagnostics, core dump files also introduce security risks that must be considered. Since dumps contain raw memory snapshots, many types of sensitive information could get exposed:

  • Application passwords and keys
  • User credentials or personal data
  • Code injection opportunities allowing exploits
  • Reconnaissance for focused software attacks

For example, a core file from a crashed web server could reveal SSL private keys, database credentials, and proprietary application logic. This data exposed intentionally or accidentally could significantly compromise security.

Administrators should carefully safeguard core dumps similar to other forensic artifacts like network traffic captures and logs:

  • Encrypt files at rest via filesystem protections
  • Restrict access permissions to key staff
  • Rapidly copy core dumps off vulnerable servers
  • Scrub sensitive information during analysis

Finding the right balance between security and debuggability helps realize the immense value of cores safely.

Checking Core Dump Status

The first step is verifying whether core dumps are currently allowed on your Linux distribution. The ulimit shell builtin offers an easy way to check the per-process limit:

ulimit -c

This will print out the maximum core dump file size allowed in kilobytes. A value of 0 indicates dumping is completely disabled. Unlimited has a special value of "unlimited".

Here is sample disabled output on an Ubuntu desktop:

  
user@desktop:~$ ulimit -c
0

Let‘s go enable core dumps next.

Enabling Dumps with ulimit

The simplest way to permit core dumps is using ulimit again:

ulimit -c unlimited   

This sets the internal kernel per-process limit to unlimited on a running shell. Any crashes from that point will automatically generate dump files.

Let‘s confirm the change was applied:

user@desktop:~$ ulimit -c
unlimited  

Success! The ulimit setting applies only to the current shell process though. To enable system-wide, we need to modify service configuration files.

Global Configuration with limits.conf

The /etc/security/limits.conf file controls resource allowances on Linux from system startup and services. We can add a directive here to permit core dumps globally:

# Enable core dumps for all users   
* soft core unlimited

The asterisk matches all users and groups. Reboot to apply the system-wide unlimit.

Now we are ready to start generating crash dumps!

Customizing Core File Locations and Permissions

By default, Linux just writes core dump files to the current working directory, which can be inconvenient for analysis:

  • Clutters application folders with large forensic files
  • May lack available storage space depending on filesystem
  • Data only accessible to dumping process owner

We can customize the output location and permissions using the /proc/sys/kernel/core_pattern file. View the default:

cat /proc/sys/kernel/core_pattern
/var/crash/core_%e_%p_%t

The special parameter placeholders control naming and placement:

%p Process PID
%P Global PID
%u User ID
%g Group ID
%s Signal number
%t Timestamp

To store core dumps under /srv/cores:

sudo sysctl -w kernel.core_pattern="/srv/cores/core.%e.%p.%h" 

We can also set special file permissions using a prefix like 0770 to control access.

Centrally storing cores simplifies analysis and management greatly.

Remote Core Dump Storage

For servers and capacity constrained devices, consider redirecting core dumps to a remote storage host instead. This avoids local disk usage while retaining debuggability.

Common approaches include:

  • NFS Mount – Simple but may lose data on network failure
  • SSH Pipe – Secure stream into a container or tape archive
  • Syslog-ng Forward – Reliable syslog protocol handling

Here is an example NFS mount in /etc/fstab:

nfs-host:/cores    /var/cores        nfs   defaults        0 0

And syslog-ng configuration:

destination d_cores { file("/cores/${HOST}/core.%PID"); };
log { source(src); destination(d_cores); };

Consider the reliability, security, and scalability needs before picking a network storage strategy.

Intentional Crash Demonstration

To see core dumps in action, we need a program to crash intentionally. Here is a simple demo app in C that segfaults:

#include <stdio.h>

int main(int argc, char *argv[]) {
  char* p = NULL;
  *p = ‘x‘; // segfault on null pointer dereference

  return 0; 
}

After compiling, let‘s run it and check for a core file:

  
user@desktop:~$ ./crash
Segmentation fault (core dumped)

user@desktop:~$ ls -l core* -rw------- 1 user user 1421056 Feb 12 16:20 core

We have a full featured core dump file ready for analysis!

Inspecting Crash Dumps with GDB

The GNU debugger utility (gdb) can parse Linux core dumps to uncover crash root causes. Launching gdb on our example:

gdb ./crash core  

This loads debugging symbols from the crashed executable and attaches the core dump contents. We can now investigate with standard commands like backtrace and info registers:

Core was generated by `./crash‘.  
Program terminated with signal SIGSEGV, Segmentation fault.  
#0  0x0000000000400526 in main (argc=1, argv=0x7ffeec2555e8) at crash.c:6    
6         *p = ‘x‘; // segfault on null pointer dereference
(gdb) info registers     
rax            0x4004e6 4195974  
rbx            0x0  0
rcx            0x4004e0 4195968
rdx            0x7ffeec2555e8   140737349210312 
rsi            0x1  1
rdi            0x1  1  
rbp            0x7ffeec2555c0   0x7ffeec2555c0
rsp            0x7ffeec2555c0   0x7ffeec2555c0 

Powerful! With some gdb skill, we can pinpoint many crash causes this way.

Generating Userspace Dumps

The gcore utility offers another option to explicitly dump process memory contents. This works without a crash for debugging running programs:

gcore -o test.core $PID

Gdb can analyze the userspace dump similarly:

gdb -c test.core $Executable

Dynamic debugging with gcore/gdb can be more convenient than just crash dumps alone.

Commercial Crash Analysis Tools

While gdb allows deep analysis, production environments often utilize robust centralized crash capturing tools:

  • ABRT – Automatic Bug Reporting Tool on RHEL
  • Kdump – Kernel crash capturing daemon

These monitor for crashes across the Linux system:

  • Standardized configuration for dump handling
  • Automatic remote transport off-host
  • Deduplication of related crashes
  • Web and CLI search across dumps
  • Triage ranking and impact analysis
  • Integrations with issue trackers

Simplifying core dump wrangling is where commercial tools shine!

Disabling Core Dumps

Once root causing is complete, best practice is disabling core dumps again. This prevents needless disk usage and information leakage long-term.

First, set the per-process ulimit back to 0:

ulimit -c 0   

This disables dumping for your current shell.

To apply globally, edit limits.conf:

# Disable core dumps for all users
* soft core 0

The crash safety net is turned off until needed again!

Core Dump Alternatives

While full memory core dumps are a goldmine for deep diagnostics, other lightweight crash analysis options exist:

Kernel Ftrace

Ftrace catches kernel internal crashes by recording the call trace stack:

  
#0 [ffff8800001e6f98] store_stack_trace+0x43/0x50
#1 [ffff8800001e6ce8] dump_stack+0x63/0x70
#2 [ffff8800001f352c] panic+0xc4/0x22f
#3 [ffff8800019f0625] bug_handler+0x95/0xa0
#4 [ffff8800019e7c06] __die+0x36/0x40
#5 [ffff8800019e7c3b] do_trap+0x8b/0xb0
#6 [ffff8800019e7d5a] do_error_trap+0xaa/0xb0
#7 [ffff8800019e5452] end_repeat_nmi+0x52/0x60   

Less information than cores but very lightweight.

X86 Branch Records

Modern Intel CPUs can log branch target addresses to reconstruct code flow leading to crashes:

...
0x4c27f3: jge 0x4c27fb (A)  
0x4c27f7: jmpq *0x10(%rax) (C)
0x4c2800: nopl 0x0(%rax)
0x4c280a: jmp 0x4c27fa  

Last Branch Record: 0x4c27f7: jmpq *0x10(%rax) (C)

Again limited context but low overhead.

Understanding pros and cons of methods helps apply the right tooling.

Common Crash Failure Modes

While crashes manifest in unlimited ways, several common programming defects show up routinely:

Double Free Corruption

Freeing heap memory twice corrupts internal allocators:

Memory block 1234 freed twice! first free: 0x27ff3b second: 0x29d812
[1]    29977 abort (core dumped)  ./leak

Tools like glibc heap can detect on second free.

Use After Free

Dereferencing pointers to already freed memory is undefined:

  
*ERROR* invalid chunk pointer xffbf87ffbf8*
Chunk got overwritten!?
*ERROR* invalid chunk pointer xffbf87ffbf8*
Chunk got overwritten!?  

Heap protection tools detect invalid references.

Stack Overflows

Smashing the stack with uncontrolled data execution transfers often crashes randomly:

#0  0x77616164 in ?? ()
#1  0xffffd6b3 in ?? ()
#2  0xffffd6cc in ?? () 
#3  0xffffd6e5 in ?? ()

Look for return addresses replaced by input buffers smash.

Knowing patterns helps narrow root causes quicker.

Troubleshooting Missing Core Dumps

If core dumps don‘t appear on crashes as expected, several common configuration issues could be the culprit:

Limits Problems

Check the per-process and global limits allow cores:

ulimit -c 
cat /proc/sys/kernel/core_pattern  

If disabled, enable as shown earlier.

Disk Space Exhaustion

Core dumps can be large files – ensure free space available:

df -h /var/cores

The output directory needs capacity for crash dumps.

Permissions Issues

The process must be able to write the core pattern location:

  
ls -ld /var/cores 
d--------rwx 2 root root 126 Feb 12 16:21 /var/cores

Add group or world writability if missing.

Knowing common failure modes accelerates problem identification and remediation.

Conclusion

Mastering core dump analysis is a key Linux debugging skill. When utilized properly, memory snapshots unlock deep diagnostics other systems can rarely provide. We covered enabling dumps, customizing output, reviewing files, managing security, and disabling when finished.

With practice across real-world software crashes, the core file workflow will solve many seemingly impossible "hidden" bugs!

Similar Posts