As a Linux system administrator or developer, application crashes and core dumps are an inevitable reality. While infrequent on a stable systems, knowing how to configure and manage core dumps effectively is essential.
This comprehensive 4-part guide aims to demystify Linux core dump management for professionals supporting enterprise workloads.
An Introduction to Linux Core Dumps
First, a quick refresher – what constitutes a core dump? Upon unexpected termination of a Linux or UNIX process, the kernel can save an image of the application‘s address space memory to disk. This core dump file records the state of execution at the precise instant of the crash.
Core files provide valuable clues to developers when diagnosing application crashes in production:
- Register and stack contents pinpoint the disrupted code flow
- Variable values isolate unexpected data states
- Dependency versions identify module conflicts
By analyzing this forensic data and reproducing bugs in staging, developers can quickly issue patched builds for deployment. This speeds up restoration of service uptime.
On modern Linux systems, systemd-coredump handles collecting and processing generated core dumps via the /proc/sys/kernel/core_pattern mechanism.
Fun fact: Core dump files get their name from the old magnetic-core memory in early computers. The entire raw contents of memory was literally dumped to diskage viaIterative circuits!
Now let‘s examine how to configure where dumps are stored and how they are handled at scale.
Core Dump Storage and Formats
The Linux kernel supports multiple formats for application core files written to disk. Each have tradeoffs to consider for stability, space efficiency, and portability.
Raw ELF Dumps
The default format is plain Executable and Linking Format (ELF). This universally supported UNIX format contains a full memory impression of the crashed process mapped into an executable file.
For short-lived processes, ELF core files provide maximum detail for developers to debug with. But the uncontrolled binary size presents challenges:
- Disk partitions may rapidly fill up under load impacting other services
- Dumping terabytes of memory for a database process is unfeasible
- Lack of context makes it hard to match dumps to crashes
Disk Dump Format
To improve storage efficiency for long-lived processes, Linux supports the disk dump format via the kdump kernel driver. This writes only changes in memory since initial startup to diagose incremental data corruption issues.
Smaller disk dump sizes allow retaining more history:
- Only diffs are dumped reducing storage volume
- Kernel crash dumps can still exceed 1GB each
- Manual correlation of diffs to incidents can still be painful
For kernel profiling, disk dumps strikes a balance, but long running user-space applications still require heavy storage.
KVM Dump Format
The Kernel-based Virtual Machine (KVM) dump format was introduced in Linux 3.14 as a portable compressed alternative. It leverages the QEMU emulator to transform x86 memory into architecture-independent bytecode:
- Dumps average about 10-20% of ELF sizes allowing longer retention
- Portable format aids remote analysis by non x86 developers
- Suffers minimal data loss across OS kernels compared to raw formats
- Abstract format can increase crash recreation effort
In all cases, disk speed impacts application pause times during larger dumps. Finding space triggers stability issues before retention limits are reached.
This requires planning coordinated log rotation for crashes and core dumps in unison – which we will cover later.
Now let‘s explore where Linux actually allows saving these dumps.
Configuring Core Dump Storage Locations
The Linux kernel‘s /proc/sys/kernel/core_pattern exposes a file that dictates where core dumps are written to on disk. By default, this is simply set to core yielding dumps like ./core.1234 in the crashed program‘s current working directory.
But this approach leads to disorganization at scale and hinders security:
- Core dumps scattered across different paths need manual cleanup
- Sensitive data may leak into globally readable directories
- Resource exhaustion can take down unrelated applications
Instead we can specify a custom dump directory with more restrictions:
Setting the Core Location Path
Create a dedicated crash directory and set group permissions first:
sudo mkdir -p /var/crash
sudo chgrp coredump /var/crash
sudo chmod g+rwxs /var/crash
Next configure the kernel to point there using sysctl:
sudo sysctl -w kernel.core_pattern=/var/crash/core.%h.%e.%p.%t
Now crashed process memory will be collected together safely under /var/crash.
The file name includes these helpful identifiers:
%h: Hostname where dump originated%e: Executable filename%p: Process ID%t: Timestamp
Contextual naming eases correlating dumps to exact crash events.
Restrict Core Dump Permissions
Since memory dumps contain sensitive application data, take care to configure file permissions for security.
By default dumps receive 0644 permissions -rw-r--r-- owned by the crashing process user. Instead restrict access using access groups:
- Set the directory group sticky bit
chmod g+s /var/crash - Revoke ‘other‘ permissions
chmod o-rx /var/crash - Configure dumped core files
chmod 0640readable only by the group
Now production core dumps remain isolated from unauthorized users by policy.
Retaining Cores Within Capacity
To avoid filling disks or retaining overly large dumps:
-
Set quota limits on the crash partition:
sudo setquota -ug coredump 4G 0 0 /varAllowing 4GB total ensures sufficient sampling even for large memory systems.
-
Enable auto-cleanup of older core dumps:
sudo yum install cronie crontab -eAdd this auto-delete crontab:
@daily find /var/crash -type f -mtime +7 -delete
Now exactly one week of compressed core dump history can be maintained.
Handling Linux Crashes Gracefully
Simply writing core dumps to disk alone gives you only one piece of the debugging puzzle. Actively processing every crash event is equally important.
Linux systems include two common interception mechanisms for handling crashes that generate core dumps – abrt and systemd-coredump.
ABRT Core Handling
On RHEL, CentOS, and Fedora – the Automatic Bug Reporting Tool (abrt) is enabled by default. Abrt was created by Red Hat to ease bug reporting from enterprise Linux systems.
It hooks application crashes system-wide and bundles various forensic data like cores, stack traces, and relevant journal logs. This aggregated content gets written into /var/spool/abrt/ccpp-* crash directories.
For example, decoding a sample directory name:
/var/spool/abrt/ccpp-2022-01-01-10:03:25-3652
timestamp: 01/01/2022 10:03:25 AM
pid: 3652
Reveals the process details.
These crashes can then be automatically reported to a centralized bug tracking server for analysis by developers.
Customizing Abrt Behavior
Administrators can customize details of abrt crash handling in /etc/abrt/abrt.conf like:
-
To disable core dumps completely:
DumpCore = no -
To run a custom script when crashes occur:
EventHandler = /path/to/myscript.sh
See the abrt configuration guide for more examples.
Integrating abrt with internal issue tracking systems allows streamlining incident response workflows in enterprises.
Systemd-Coredump Processing
Alternatively on CentOS 8 and recent Ubuntu releases, core dumps are handled by a systemd service called systemd-coredump. It has less automation than abrt out-of-box but offers different benefits:
- Hooks from systemd service files for richer orchestration
- Journal integration for builtin event auditability
- Flexible export to external crash database services
For example, captures memory metadata to enrich core analysis:
CoreDump=full
ProcessSizeMax=2G
Storage=external
Compress=yes
CollectMode=read-only
And trigger notifications on crashes:
Service
Type=oneshot
ExecStart=/path/to/alertscript ‘%h/%e_%p_%t.core %c %u %p %P‘
In general, systemd integration offers administrators more composable tooling around core dumps where security auditing is valued over automated bug reporting.
Now let‘s tackle identifying and recovering from common core dump issues.
Troubleshooting Core Dumps on Linux
Despite best efforts configuring Linux core dumps, you may still encounter issues around stability or missing crash data. Here are some common patterns and solutions.
Core Files Not Generated
If expected core files don‘t appear during crashes:
-
Check system limits are allowing core dumps
ulimit -c # If limit is 0 ulimit -c unlimited -
Review the core pattern path is writeable
sudo touch /var/crash/test sudo rm /var/crash/test -
Check for disk space exhaustion
df -h /var/crash -
Franatically dig through other debug logs for clues
Setting system limits too low or lacking write access are common misconfigurations hindering dumps.
Truncated Core Files
If core files show unusually small given a processes normal memory footprint:
-
Double check ulimit settings (
-c unlimited) -
Set the core pattern suid bit for consistent ownership
chmod u+s /var/crash -
Specify a maximum core dump size well above expected RAM usage:
echo 2G > /proc/sys/kernel/core_pipe_limit
Setting ownership and size limits increases likelihood of fully sized memory dumps.
Core Files Dissapearing
If core dumps work intermittently but seem to disappear over time:
-
Review cleanup scripts for overly aggressive find/delete cycles
-
Monitor the crash directory space usage for spikes
watch df /var/crash -
Plot disk usage graphs over time to identify unusual patterns
sudo yum install pydf pydf -h /var/crash --save = /tmp/coreusage -
Check SELinux policies and access errors
grep crash /var/log/audit/audit.log
Careful collaboration with storage admins can identify issues around disk latency, quotas, and access controls impacting reliability.
By combining various tools to chart, audit, and alert on core dump activity you can pinpoint SYSTEMIC data loss issues as they emerge.
Now let‘s wrap by solidifying configuration changes and exploring further extensions.
Persisting Configuration Across Reboots
By now we have set non-default options for the core dump storage path, permissions, copressor, retention policies and more. However all these sysctl and tmpfs tweaks will be reset after the next server reboot.
To persist settings permanently:
-
Update
/etc/sysctl.conf# Core Dump Configuration kernel.core_pattern = /var/crash/core.%h.%e.%p.%t kernel.core_pipe_limit = 1G kernel.core_uses_pid = 1 # System limits fs.file-max = 65536 -
Append any other related limits in
/etc/security/limits.conf* soft core unlimited * hard core unlimited -
Reboot the server and verify settings stuck
sysctl -a | grep core
Now critical availability and debugging infrastructure around crash collection will endure across both planned maintenance and unexpected outages.
Further Extensions
With core dumps now archived and handled gracefully, some further extensions can improve productivity:
- Centralized federation – Sync compressed dumps efficiently to a single remore server for access by distributed teams
- Automated triage – Ingest dumps into existing ITSM or development tools like ServiceNow or JIRA for assignment and tracking
- Tagging failures – Many cores may map to a single bug or event – attach shared metadata tags
- Refined alerts – Develop smarter core count or size thresholds tailored to application profiles
- Developer sandbox – Streamline tools like gdb with configuration presets and scripted workflows
What other innovations help unlock core file value for your teams?
In Summary
Effectively harvesting Linux application crashes is crucial for maintaining production service resilience and accelerating issue diagnosis. Configuring flexible core dump storage and handlers unlocks transparency across heterogeneous enterprise environments.
With these expanded examples and techniques, Linux professionals can further explore customizing core dumps unique to the scale, performance, and security demands of their organization.
Now go enable richer crash forensics today across your mission critical systems!


