As a Linux system administrator or developer, application crashes and core dumps are an inevitable reality. While infrequent on a stable systems, knowing how to configure and manage core dumps effectively is essential.

This comprehensive 4-part guide aims to demystify Linux core dump management for professionals supporting enterprise workloads.

An Introduction to Linux Core Dumps

First, a quick refresher – what constitutes a core dump? Upon unexpected termination of a Linux or UNIX process, the kernel can save an image of the application‘s address space memory to disk. This core dump file records the state of execution at the precise instant of the crash.

Core files provide valuable clues to developers when diagnosing application crashes in production:

  • Register and stack contents pinpoint the disrupted code flow
  • Variable values isolate unexpected data states
  • Dependency versions identify module conflicts

By analyzing this forensic data and reproducing bugs in staging, developers can quickly issue patched builds for deployment. This speeds up restoration of service uptime.

On modern Linux systems, systemd-coredump handles collecting and processing generated core dumps via the /proc/sys/kernel/core_pattern mechanism.

Fun fact: Core dump files get their name from the old magnetic-core memory in early computers. The entire raw contents of memory was literally dumped to diskage viaIterative circuits!

Now let‘s examine how to configure where dumps are stored and how they are handled at scale.

Core Dump Storage and Formats

The Linux kernel supports multiple formats for application core files written to disk. Each have tradeoffs to consider for stability, space efficiency, and portability.

Raw ELF Dumps

The default format is plain Executable and Linking Format (ELF). This universally supported UNIX format contains a full memory impression of the crashed process mapped into an executable file.

For short-lived processes, ELF core files provide maximum detail for developers to debug with. But the uncontrolled binary size presents challenges:

  • Disk partitions may rapidly fill up under load impacting other services
  • Dumping terabytes of memory for a database process is unfeasible
  • Lack of context makes it hard to match dumps to crashes

Disk Dump Format

To improve storage efficiency for long-lived processes, Linux supports the disk dump format via the kdump kernel driver. This writes only changes in memory since initial startup to diagose incremental data corruption issues.

Smaller disk dump sizes allow retaining more history:

  • Only diffs are dumped reducing storage volume
  • Kernel crash dumps can still exceed 1GB each
  • Manual correlation of diffs to incidents can still be painful

For kernel profiling, disk dumps strikes a balance, but long running user-space applications still require heavy storage.

KVM Dump Format

The Kernel-based Virtual Machine (KVM) dump format was introduced in Linux 3.14 as a portable compressed alternative. It leverages the QEMU emulator to transform x86 memory into architecture-independent bytecode:

  • Dumps average about 10-20% of ELF sizes allowing longer retention
  • Portable format aids remote analysis by non x86 developers
  • Suffers minimal data loss across OS kernels compared to raw formats
  • Abstract format can increase crash recreation effort

In all cases, disk speed impacts application pause times during larger dumps. Finding space triggers stability issues before retention limits are reached.

This requires planning coordinated log rotation for crashes and core dumps in unison – which we will cover later.

Now let‘s explore where Linux actually allows saving these dumps.

Configuring Core Dump Storage Locations

The Linux kernel‘s /proc/sys/kernel/core_pattern exposes a file that dictates where core dumps are written to on disk. By default, this is simply set to core yielding dumps like ./core.1234 in the crashed program‘s current working directory.

But this approach leads to disorganization at scale and hinders security:

  • Core dumps scattered across different paths need manual cleanup
  • Sensitive data may leak into globally readable directories
  • Resource exhaustion can take down unrelated applications

Instead we can specify a custom dump directory with more restrictions:

Setting the Core Location Path

Create a dedicated crash directory and set group permissions first:

sudo mkdir -p /var/crash 
sudo chgrp coredump /var/crash
sudo chmod g+rwxs /var/crash

Next configure the kernel to point there using sysctl:

sudo sysctl -w kernel.core_pattern=/var/crash/core.%h.%e.%p.%t

Now crashed process memory will be collected together safely under /var/crash.

The file name includes these helpful identifiers:

  • %h: Hostname where dump originated
  • %e: Executable filename
  • %p: Process ID
  • %t: Timestamp

Contextual naming eases correlating dumps to exact crash events.

Restrict Core Dump Permissions

Since memory dumps contain sensitive application data, take care to configure file permissions for security.

By default dumps receive 0644 permissions -rw-r--r-- owned by the crashing process user. Instead restrict access using access groups:

  • Set the directory group sticky bit chmod g+s /var/crash
  • Revoke ‘other‘ permissions chmod o-rx /var/crash
  • Configure dumped core files chmod 0640 readable only by the group

Now production core dumps remain isolated from unauthorized users by policy.

Retaining Cores Within Capacity

To avoid filling disks or retaining overly large dumps:

  1. Set quota limits on the crash partition:

     sudo setquota -ug coredump 4G 0 0 /var

    Allowing 4GB total ensures sufficient sampling even for large memory systems.

  2. Enable auto-cleanup of older core dumps:

     sudo yum install cronie
     crontab -e

    Add this auto-delete crontab:

     @daily find /var/crash -type f -mtime +7 -delete

Now exactly one week of compressed core dump history can be maintained.

Handling Linux Crashes Gracefully

Simply writing core dumps to disk alone gives you only one piece of the debugging puzzle. Actively processing every crash event is equally important.

Linux systems include two common interception mechanisms for handling crashes that generate core dumps – abrt and systemd-coredump.

ABRT Core Handling

On RHEL, CentOS, and Fedora – the Automatic Bug Reporting Tool (abrt) is enabled by default. Abrt was created by Red Hat to ease bug reporting from enterprise Linux systems.

It hooks application crashes system-wide and bundles various forensic data like cores, stack traces, and relevant journal logs. This aggregated content gets written into /var/spool/abrt/ccpp-* crash directories.

For example, decoding a sample directory name:

/var/spool/abrt/ccpp-2022-01-01-10:03:25-3652  

timestamp: 01/01/2022 10:03:25 AM
pid: 3652

Reveals the process details.

These crashes can then be automatically reported to a centralized bug tracking server for analysis by developers.

Customizing Abrt Behavior

Administrators can customize details of abrt crash handling in /etc/abrt/abrt.conf like:

  • To disable core dumps completely:

      DumpCore = no 
  • To run a custom script when crashes occur:

      EventHandler = /path/to/myscript.sh

See the abrt configuration guide for more examples.

Integrating abrt with internal issue tracking systems allows streamlining incident response workflows in enterprises.

Systemd-Coredump Processing

Alternatively on CentOS 8 and recent Ubuntu releases, core dumps are handled by a systemd service called systemd-coredump. It has less automation than abrt out-of-box but offers different benefits:

  • Hooks from systemd service files for richer orchestration
  • Journal integration for builtin event auditability
  • Flexible export to external crash database services

For example, captures memory metadata to enrich core analysis:

CoreDump=full
ProcessSizeMax=2G
Storage=external
Compress=yes
CollectMode=read-only

And trigger notifications on crashes:

Service
Type=oneshot
ExecStart=/path/to/alertscript ‘%h/%e_%p_%t.core %c %u %p %P‘  

In general, systemd integration offers administrators more composable tooling around core dumps where security auditing is valued over automated bug reporting.

Now let‘s tackle identifying and recovering from common core dump issues.

Troubleshooting Core Dumps on Linux

Despite best efforts configuring Linux core dumps, you may still encounter issues around stability or missing crash data. Here are some common patterns and solutions.

Core Files Not Generated

If expected core files don‘t appear during crashes:

  1. Check system limits are allowing core dumps

     ulimit -c
    
     # If limit is 0
     ulimit -c unlimited
  2. Review the core pattern path is writeable

     sudo touch /var/crash/test
     sudo rm /var/crash/test 
  3. Check for disk space exhaustion

     df -h /var/crash
  4. Franatically dig through other debug logs for clues

Setting system limits too low or lacking write access are common misconfigurations hindering dumps.

Truncated Core Files

If core files show unusually small given a processes normal memory footprint:

  1. Double check ulimit settings (-c unlimited)

  2. Set the core pattern suid bit for consistent ownership

    chmod u+s /var/crash 
  3. Specify a maximum core dump size well above expected RAM usage:

    echo 2G > /proc/sys/kernel/core_pipe_limit

Setting ownership and size limits increases likelihood of fully sized memory dumps.

Core Files Dissapearing

If core dumps work intermittently but seem to disappear over time:

  1. Review cleanup scripts for overly aggressive find/delete cycles

  2. Monitor the crash directory space usage for spikes

    watch df /var/crash
  3. Plot disk usage graphs over time to identify unusual patterns

    sudo yum install pydf 
    pydf -h /var/crash --save = /tmp/coreusage
  4. Check SELinux policies and access errors

    grep crash /var/log/audit/audit.log

Careful collaboration with storage admins can identify issues around disk latency, quotas, and access controls impacting reliability.

By combining various tools to chart, audit, and alert on core dump activity you can pinpoint SYSTEMIC data loss issues as they emerge.

Now let‘s wrap by solidifying configuration changes and exploring further extensions.

Persisting Configuration Across Reboots

By now we have set non-default options for the core dump storage path, permissions, copressor, retention policies and more. However all these sysctl and tmpfs tweaks will be reset after the next server reboot.

To persist settings permanently:

  1. Update /etc/sysctl.conf

     # Core Dump Configuration  
     kernel.core_pattern = /var/crash/core.%h.%e.%p.%t
     kernel.core_pipe_limit = 1G
     kernel.core_uses_pid = 1
    
     # System limits  
     fs.file-max = 65536
  2. Append any other related limits in /etc/security/limits.conf

    * soft core unlimited
    * hard core unlimited
  3. Reboot the server and verify settings stuck

    sysctl -a | grep core

Now critical availability and debugging infrastructure around crash collection will endure across both planned maintenance and unexpected outages.

Further Extensions

With core dumps now archived and handled gracefully, some further extensions can improve productivity:

  • Centralized federation – Sync compressed dumps efficiently to a single remore server for access by distributed teams
  • Automated triage – Ingest dumps into existing ITSM or development tools like ServiceNow or JIRA for assignment and tracking
  • Tagging failures – Many cores may map to a single bug or event – attach shared metadata tags
  • Refined alerts – Develop smarter core count or size thresholds tailored to application profiles
  • Developer sandbox – Streamline tools like gdb with configuration presets and scripted workflows

What other innovations help unlock core file value for your teams?

In Summary

Effectively harvesting Linux application crashes is crucial for maintaining production service resilience and accelerating issue diagnosis. Configuring flexible core dump storage and handlers unlocks transparency across heterogeneous enterprise environments.

With these expanded examples and techniques, Linux professionals can further explore customizing core dumps unique to the scale, performance, and security demands of their organization.

Now go enable richer crash forensics today across your mission critical systems!

Similar Posts