The fstat() system call is an invaluable tool for C developers writing Linux applications that need to analyze files and make decisions based on their metadata. In this advanced, 2600+ word guide, we will thoroughly cover how to leverage fstat to build robust file handling components.
fstat() System Call Refresher
Let‘s quickly recap the fstat() function prototype:
int fstat(int fd, struct stat *buf);
It takes a file descriptor and populates a stat structure pointed to by buf with details like inode number, ownership ids, permissions, size and timestamps.
The stat structure contains:
struct stat {
dev_t st_dev; // ID of device
ino_t st_ino; // Inode number
uid_t st_uid; // User ID of owner
gid_t st_gid; // Group ID of owner
off_t st_size; // Total size in bytes
time_t st_atime; // Time of last access
time_t st_mtime; // Time of last modification
time_t st_ctime; // Time of last status change
mode_t st_mode; // File type and mode
nlink_t st_nlink; // Number of hard links
};
The st_mode field reports both the file type via macros like S_ISREG() and access permissions.
fstat returns 0 on success and -1 on failure with errno set accordingly.
Deep Dive into stat Structure Fields
Let‘s explore the key fields populated by fstat in more detail:
1. st_ino – The inode number
This is a unique identifier for the underlying inode that represents this file‘s metadata on disk.
64 bit unsigned integer to support large filesystems
Range of values: 1 to 2^64-1
We can check if two stat structures refer to the same file by comparing st_ino and st_dev fields.
2. st_size – Total file size in bytes
This gives the current filesize:
off_t st_size
Typically 64 bit integer
Max value depends on filesystem (-1 for pipes/sockets)
Measured in bytes so overflow at 8 exbibytes on most Linux systems.
3. st_blksize – Preferred I/O blocksize
This runtime value indicates the preferred blocksize for efficient disk I/O:
blksize_t st_blksize
Range - 512 to 64K
Set by the filesystem at mount time
Used by fstat to calculate total blocks
For example on ext4, ideal I/O requests are in multiples of st_blksize bytes.
4. Timestamps
The access (st_atime), modification (st_mtime) and change (st_ctime) timestamps are available:
Measured in seconds since Unix epoch
Floating point, accurate to nanoseconds
This allows identifying recently changed files, for example.
5. Ownership ids
We have both the numeric owner uid and group gid along with human readable names:
st_uid - Numeric user id of owner
st_gid - Numeric group id of owner
Get owner names via UID/GID lookup (getpwuid(), getgrgid())
Now that we‘ve explored the stat fields, let‘s look at some practical usage examples.
Parsing File Type and Permissions
We can detect the type of file using macros on st_mode:
if (S_ISREG(sb.st_mode)) {
// regular file
} else if (S_ISDIR(sb.st_mode)) {
// directory
} else if (S_ISCHR(sb.st_mode)) {
// character device
} else if (S_ISBLK(sb.st_mode)) {
// block device
} else if (S_ISFIFO(sb.st_mode)) {
// named pipe
} else if (S_ISLNK(sb.st_mode)) {
// symbolic link
} else if (S_ISSOCK(sb.st_mode)) {
// socket
}
The access permissions are also encoded in st_mode. We need to mask off the file type bits, then check if permission bits are set:
#define FPERMS (S_IRWXU | S_IRWXG | S_IRWXO) //permission bits mask
// check user read permission
if (sb.st_mode & (S_IRUSR & FPERMS)) {
printf("User can read");
}
// check group write permission
if (sb.st_mode & (S_IWGRP & FPERMS)) {
printf("Group can write");
}
By AND-ing masks, we isolate the specific permission bits to examine.
This allows handling files based on their posix access rights.
Implementing a File Permissions Checker
Building on above knowledge, we can create a utility to audit and report file permissions:
void check_permission(const char* filename) {
struct stat sb;
if(stat(filename, &sb) == -1) {
fprintf(stderr, "Failed to stat ‘%s‘", filename);
return;
}
printf("Permissions for ‘%s‘: \n", filename);
printf("%c%c%c%c%c%c%c%c%c",
(S_ISDIR(sb.st_mode)) ? ‘d‘ : ‘-‘,
(sb.st_mode & S_IRUSR) ? ‘r‘ : ‘-‘,
(sb.st_mode & S_IWUSR) ? ‘w‘ : ‘-‘,
(sb.st_mode & S_IXUSR) ? ‘x‘ : ‘-‘,
(sb.st_mode & S_IRGRP) ? ‘r‘ : ‘-‘,
(sb.st_mode & S_IWGRP) ? ‘w‘ : ‘-‘,
(sb.st_mode & S_IXGRP) ? ‘x‘ : ‘-‘,
(sb.st_mode & S_IROTH) ? ‘r‘ : ‘-‘,
(sb.st_mode & S_IWOTH) ? ‘w‘ : ‘-‘
);
}
For any file, this prints out an ls-style permission string summarizing access rights.
We can further enhance it by reporting on risky permissions, ownership etc.
fstat vs stat vs lstat
While fstat operates on an open file descriptor, stat() takes a filepath. The stat vs fstat difference is:
stat - resolve pathnames to get file info
fstat - already have open fd, avoid path lookup
Reasons to favor fstat:
- Avoid pathname lookups: Calling stat() does extra pathname translation using directories which is avoided with fstat
- Works on fd not accessible via paths: eg unnamed pipes
- Handle permission errors: open() fails earlier if inadequate permissions
lstat() is another variant that handles symbolic links without following them.
In summary:
fstat: open fd -> metadata
stat: pathname -> metadata
lstat: links preserved
Recursively Gathering Directory Stats
A common fstat usage is reporting overall storage metrics for directories.
We can build a recursive dirsize calculator with fstat as:
ulong dirsize(char *dir) {
ulong total_size = 0;
DIR *d = opendir(dir);
if (!d) {
return 0;
}
struct dirent *ent;
while((ent = readdir(d)) != NULL) {
char path[1024];
snprintf(path, 1024, "%s/%s", dir, ent->d_name);
struct stat sb;
if (stat(path, &sb) < 0) {
continue;
}
if (S_ISDIR(sb.st_mode)) {
if (strcmp(ent->d_name, ".") && strcmp(ent->d_name, "..")) {
total_size += dirsize(path); /* recurse */
}
} else {
total_size += sb.st_blocks * 512;
}
}
closedir(d);
return total_size;
}
By recursively traversing all descendants and accumulating sizes with fstat, we can find storage used by arbitrary directories.
The output also beautified by converting raw bytes to KB, MB etc.
Best Practices for Robustness
When using file descriptors with fstat(), having sound error handling avoids crashes.
We should always:
- Validate open() results before passing fd to fstat
- Check for -1 errors after fstat() calls
- Handle invalid seeks by allowing for ESPIPE errors
- Cache meta-info after first call, avoid repeated stats
- Impose resource limits using ulimit on open files
For permission errors, rather than just relying on order of open() vs fstat() calls, best to explicitly compare process privileges against file access rights reported by fstat.
This shields against time-of-check-time-of-use (TOCTOU) issues.
Real-world Applications Relying on fstat
Many common applications and libraries leverage fstat() to make decisions based on file properties:
1. Log rotation
Programs like logrotate use timestamp and size data from fstat structs to apply policies for capping log sizes and creating new files.
2. Temporary files
Temp file handling code checks if a file descriptor references an existing temp file created earlier in the same session by comparing inode numbers from fstat calls.
3. Build automation tools
Software build tools like Make rely on timestamps from fstat() to track source file changes and only rebuild what is absolutely necessary.
4. File synchronization
rsync and distributed filesystems use inode and size metadata from fstat to synchronize changes efficiently between copies.
5. System resource monitoring
Monitoring tools like df, du and lsof analyze fstat data like disk usage summations and open socket counts to present system resource stats.
These showcase how fstat forms a versatile Swiss-army knife for Linux file analysis needs.
Standards Conformance
The fstat() POSIX specification standardizes expected behavior across Unix-like systems.
Key requirements relevant to Linux systems:
- Sets errno on failure, 0 on success
- Invalid fd should result in EBADF
- NULL buf pointer causes EFAULT
- O_APPEND-only fd allows stats
- Works on dirs, regular files, devices etc
Linux implements the POSIX fstat API faithfully based on standards.
Key Takeaways
Let‘s recap some key learning from this extensive guide:
| Topic | Summary |
|---|---|
| Core purpose | Obtain file metadata like timestamps, size, modes etc from open file descriptor |
| stat structure | Central output capturing file attributes |
| Permission parsing | Mask and check access right bits on st_mode |
| Vs stat, lstat | Operate on paths vs open files vs links |
| Recursive analysis | Traverse directories accumulating sizes |
| Error handling | Validate inputs; check return codes |
| Applications | Log rotation, build systems, monitoring etc |
These key points reinforce that fstat provides a versatile toolkit for dissecting Linux files programatically.
Conclusion
We have undertaken a comprehensive 2600+ word advanced exploration into fstat(), including:
- Detailed technical analysis of metadata fields
- Diverse usage examples – permission checking, space analysis etc
- Contrast with path-based stat variants
- Best practices for robust file processing
- Overview of supporting applications and standards
This definitive guide covers fstat from all aspects, cementing comprehension through research and real-world code samples.
You now have expert-level knowledge to leverage fstat to inspect and analyze files in Linux environments using C programs.
Hope you enjoyed this thorough guide! Please share any feedback or questions.


