The file descriptor is a low-level system handle for accessing files and devices. This article provides a comprehensive, expert-level guide to the fileno() method in Python for retrieving the integer file descriptor.
We will dive into key use cases, provide code examples, discuss alternatives like Unix domain sockets, detail descriptor limits in various systems, and offer best practices around proper descriptor management.
Overview: Describing the Descriptors
Let‘s start with a quick high-level overview before digging into the technical details.
The fileno() method returns an integer file descriptor for an open I/O stream in Python. This descriptor uniquely identifies the file to the operating system.
Some key points about file descriptors:
- Integer handle used by OS to access files and devices
- Unique per open file
- Needed for lower-level file operations
- Can share access to open files between processes
- Limited number allowed per process
Contrast this to the higher level Python file object, which provides convenient methods to read, write, and manipulate files.
The descriptor bridges these Python handles to what the operating system uses under the hood. Understanding this distinction between the Python file abstraction and OS-level handle is key to using fileno() effectively.
With that quick preface, let‘s move on to some common examples and uses cases.
Getting and Using the Descriptor
Here is some sample code to open a file and print its descriptor:
f = open("data.txt", "r")
desc = f.fileno()
print(desc) # 3
print(type(desc)) # <class ‘int‘>
As shown, fileno() simply returns the underlying integer file handle the OS uses to track f.
We can then use this descriptor directly with lower level OS functions. For example, with the os module in Python:
import os
f = open("data.txt")
desc = f.fileno()
data = os.read(desc, 100) # Read 100 bytes
print(data)
f.close()
This uses the descriptor with os.read() instead of the higher level file operations. The benefit is direct control over the system file handle.
Another major use case is sharing access between processes using sockets, pipes or queues.
Let‘s take a quick look at an example using sockets. First the server:
import socket
import os
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
f = open("data.txt")
desc = f.fileno()
s.bind("/tmp/my_sock")
s.listen()
conn, addr = s.accept()
os.send(conn.fileno(), str(desc))
And client:
import socket
import os
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect("/tmp/my_sock")
desc = int(s.recv(4))
f = os.fdopen(desc, "r")
print(f.readline())
s.close()
f.close() # Close both!
By passing the integer descriptor over sockets, we can share open files between processes! The server shares access, while the client can read and modify the same file via the descriptor.
Of course, there are higher level approaches like multiprocessing.Manager() for sharing resources safely. But utilizing descriptors directly as shown gives finer, system-level control over file sharing across processes.
Descriptor Limits and Process Constraints
Now that we‘ve seen some examples, it‘s important to discuss some lower-level constraints around keeping descriptors open.
On Linux and other Unix-like systems, the operating system limits the number of file descriptors a process can have open to manage usage.
Common per-process limits imposed by the OS:
| System | Soft Limit | Hard Limit |
|---|---|---|
| Linux | 1024 | 4096 |
| macOS | 256 | 1024 |
| Solaris | 256 | 65536 |
Table 1: Default open file limits per-process on various systems (source: progress.com)
The soft limit is what‘s initially enforced for a process, but can be increased up to the hard limit.
Hitting these limits in production can cause confusing errors. So while the OS frees descriptors after process exit, it‘s still good practice to explicitly close() files when done rather than waiting on garbage collection.
Let‘s check the limit in Python:
import resource
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print(f"Soft limit: {soft} - Hard limit: {hard}")
We can also increase the soft limit if needed:
import resource
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, hard)) # Set soft limit to 4096
But this requires elevated permissions. So understanding the default limits is important regardless when managing descriptors.
Descriptors vs Handles: Key Differences
It‘s also helpful to distinguish file descriptors from file handles on Windows to avoid confusion.
While descriptors provide integer identifiers to an open resource on Unix-like systems, Windows uses pointer-based handles to refer to open objects.
Some key differences:
| Descriptors | Handles |
|---|---|
| Integer numbered indexes | Pointer addresses to kernel object |
| Accessed via descriptors | Accessed via method callbacks |
| Shared via passing descriptors | Shared by duplicating handles |
Table 2: Comparison of Unix file descriptors vs Windows handles
So while handles and descriptors serve a similar purpose, there are some low-level differences in their implementation:
- Descriptors are indexes mapped to a per-process file table
- Handles use object pointers in a global namespace
This means sharing an open file via descriptors is slightly more efficient than copying full handles.
But in Python the file abstraction smooths over most of these OS-level differences between descriptors and handles for portability.
Best Practices for Sharing Descriptors
To avoid issues when sharing descriptors between processes, here are some best practices:
Close files when finished: As mentioned, be sure to close files so descriptors get freed after sharing access rather than relying on garbage collection.
Set O_CLOEXEC flag: This flag ensures the descriptor gets closed if a process execs another executable. Prevents descriptor leaks.
Use context managers: Leverage Python‘s with statement instead of manual open/close:
with open("file.txt") as f:
# descriptor automatically closed when exits block
Add error handling: Check for exceptions in case socket gets disconnected or underlying file gets closed prematurely.
Following these tips will prevent subtle bugs around leaking descriptors or accessing invalid files.
Alternatives to Sharing Descriptors
While passing descriptors over sockets provides direct control, there are a some alternatives worth considering:
Unix domain sockets: sockets residing in the filesystem rather than network. Removes serialization/deserialization overhead.
Shared memory: Maps same memory pages into multiple processes. Fast reads/writes.
Multiprocessing library: Python package with abstractions like Pipe and Queue for IPC.
The advantage of descriptors is bypassing kernel marshaling by directly accessing the same handles. But options like Unix sockets have less overhead than TCP while providing IPC.
Understanding these alternatives helps pick the right approach for sharing data across processes.
Manipulating Descriptors Directly
Beyond just getting and passing descriptors, the OS also enables explicitly manipulating them within a process.
A common scenario is duplicating an existing descriptor to create additional references to the same open file. This avoids reopening a file and creates another "handle" for different read/write positions.
Here is an example duplicating stdin in Python by leveraging the os and fcntl modules:
import os
import fcntl
std_in = sys.stdin.fileno() # Get stdin descriptor
dupe = os.dup(std_in) # Duplicate stdin
print(f.fcntl(std_in, F_GETFL)) # Flags on original
print(f.fcntl(dupe, F_GETFL)) # Flags passed to dupe
This creates dupe as another descriptor pointing to the same stdin stream. The duplicate gets its own read/write position, buffering, etc.
Other manipulation options include:
os.dup2()to force a descriptor to refer to a different filefcntl.fcntl()to set properties like O_APPEND on a descriptoros.close()to explicitly close a descriptor
These lower-level operations can control descriptors beyond just getting them from fileno().
Risks Around Descriptor Leaks
One security consideration around directly passing descriptors is the risk of leaks. If descriptors get shared unintentionally, it can expose access to sensitive files.
Some common examples of descriptor leaks:
- Socket sharing: Passing a database connection descriptor over a TCP socket that doesn‘t authenticate clients.
- Process forking: Forking a process creates descriptor copies referring to same files.
- Executing child processes: Executing other programs without setting
O_CLOEXEC.
In these cases, sensitive file descriptors get exposed accidentally to other processes or remote users.
Setting O_CLOEXEC when opening files mitigates this by having the OS close descriptors that would otherwise leak to child processes.
But extra care should be taken when directly accessing descriptors to avoid sharing them inadvertently outside the process.
Key Takeaways and Best Practices
Let‘s recap some key points on properly and safely utilizing file descriptors in Python:
- Get descriptor with
fileno(): Use to get integer handle for lower-level access. - Share access between processes: Pass over socket or pipe to share open file.
- Set O_CLOEXEC flag: Prevents child process descriptor leaks.
- Duplicate descriptively cautiously: Can expose files accidentally if not careful.
- Mind the limits: Remember OS caps total descriptors per process.
- Close when finished: Don‘t rely solely on garbage collection.
Following best practices around proper descriptor management will prevent difficult to diagnose system-level issues around leaks, access, and resource limits.
Conclusion
Python‘s file abstraction provides a convenient high-level interface for accessing files and streams. But understanding the lower-level descriptor model is key to precise system-level manipulation and sharing files between processes.
The fileno() method bridges the Python file object and underlying OS handle. It enables techniques like passing access across sockets, mapping the same memory across processes, and directly controlling file access at a system call level.
But descriptors also introduce risks around hitting file limits and inadvertently leaking access if not properly managed. Settting flags like O_CLOEXEC, duplicating cautiously, and explicitly closing files prevents nasty issues.
Overall, the descriptor returned by fileno gives developers direct access to the OS-level file handles abstracted away by Python‘s convenient I/O model. Used properly, it facilitates powerful approaches to interprocess communication and system-level manipulation.
I hope this advanced guide to managing file descriptors in Python was helpful! Let me know if you have any other questions on this niche but extremely useful topic.


