The os module in the standard library includes the path submodule, which provides utilities for working with file and directory paths.
To use the various utilities offered in the path module, we will first have to import it in our program. As shown below.
Parsing Paths
Parsing refers to breaking down data into meaningful tokens which can then be used for further processing.
The path module offers functions for parsing a string that represents a path. The functions do not validate whether the given path actually exists they simply operates on the paths as mere strings.
The parsing functions depends on some os variables in order to function correctly. These variables are as outlined below:
os.sep- The character used to separate the parts of a path. It is typically a forward slash(/) on Unix-based systems and a backslash(\\)on Windows-based systems.os.extsep- The character used to separate an extension from the rest of a filename. This is usually a period (.)os.curdir- The character used to represent the current directory. It is usually a period(.)os.pardir- The character used to go up a directory. it is usually two periods(..)
path.split()
The split() function breaks a path into two parts: the directory path, and the base filename. It simply splits the string at the last position where the os.sep character appears.
path.split(p)
The function returns a tuple which contains the directory and the filename.
If the given string ends with the os.sep character, the second value in the returned tuple will be an empty string.
path.basename()
The basename() function returns an equivalent value to the second value in the tuple returned by the split() function. It returns the text that comes after the last slash/os.sep character.
os.basename(p)
If the input string ends with the os.sep character, the basename() function returns an empty string.
path.dirname()
The dirname() function returns a string equivalent to the first value of the tuple returned by the split() function. The returned value represents the directory name for the given path. It is simply all the characters up to the last os.sep character in the path.
path.dirmname(p)
path.splitext()
The splitext() function works just like the split() function except that it use the os.extsep character to split the path. It returns a tuple containing the root and the file extension.
path.splitext(p)
The function splits the input string at the last occurrence of os.extsep character.
path.commonprefix()
The commonprefix() function takes an iterable of paths as an argument and returns the longest common sub-path of all the given paths.
path.commonprefix(m)
Creating Paths
It is common to create paths from existing strings. The path module offers various functions that can be used for this purpose.
path.join()
The join() function is a common tool for creating a path by joining two or more segments.
path.join(path, *paths)
path |
The base path. |
*paths |
A series of arbitrary additional paths to be joined to the base path. |
The join() function uses the os.sep variable to join the elements of the given path.
In the above example:
- We imported the
pathmodule fromos - we instantiated a tuple containing the segments to be joined to the base path.
- We called the path.join with
'basedir'as the base path and the segments as the rest of the parameters. The unpacking operator(*) ensures that the segments are passed to the function one at a time.
If an argument to join begins with the os.sep character, it is regarded as a full path and is treated as the base path, all the arguments that precedes it are therefore discarded.
In the above example, in each iteration, the arguments preceding the os.sep character(/) are disregarded and the one beginning with the os.sep character becomes the beginning value of the returned path.
path.expanduser()
The expanduser() function expands a pathname that may start with a tilde (~) character to represent the user's home directory. It simply replaces the ~ and ~user constructs at the beginning of a path with the absolute path to the home directory of the current user.
path.expanduser(p)
from os import path home = path.expanduser('~') print(home)
Output:C:\Users\John
We can use expanduser() function together with the join() function to create absolute paths.
from os import path my_path = "~\\desktop\\project.py" print(path.expanduser(my_path))
Output:C:\Users\John\desktop\project.py
path.expandvars()
The expandvars() function is more general than the expanduser() function in that it expands any environment variables in the given path.
path.expandvars(path)
The function replaces values in the path which are of the forms( $var, ${var} or %var% ) with the corresponding environment variables. It does not validate whether the resulting path actually exists.
import os from os import path os.environ['DESKTOP'] = 'C:\\Users\\Admin\\Desktop' os.environ['TEST_PATH'] = 'project\\tests.py' abs_test_path = path.expandvars('$DESKTOP\\$TEST_PATH')\ print(abs_test_path)
Output:C:\Users\Admin\Desktop\project\tests.py
path normalization
Normalizing paths is the process of converting a given path into a canonical form. This may involve:
- Removing any redundant elements from the path such as duplicate separators e.g //
- Resolving any relative paths (e.g.
'.'or'..') - Converting a path to the platform-specific format(e.g. foward slashes for Linux, backslashes for Windows)
- Resolving any symbolic links.
- Removing trailing slashes.
The main aim of path normalization is to ensure that all elements of the path are consistent and unambiguous. This can be especially necessary when the path has been generated using the join() function
path.normpath()
The normpath() function provides an easy way to normalize paths> It converts a path to its simplest form by eliminating redundant separators, references to current and parent directories, and symbolic links.
The normpath() function makes the path easier to read and more compatible across systems. For example, on Linux, it replaces backslashes '\ ' with forward slashes '/' and removes '..' and '.' references. On Unix-based systems, it removes '//' references.
In the above example, mypath is an inconsistently formatted path . The normpath() function in this case transforms the path into a valid Linux path. Running the same program on windows will result in the forward slashes being replaced with backslashes.
on windows
from os import path #the path to normalize mypath = 'desktop//./project//tests.py' #normalize the path normalized = path.normpath(mypath) print(normalized)
Output:desktop\project\tests.py
path.abspath()
The abspath() function is used to get the absolute path of a given relative path. It returns the absolute file path, which is the full path of a file or directory, that is, the complete path starting from the root of the file tree relative to the working directory.
path.abspath(p)
on windows with 'Desktop' as the working directory.
from os import path relative = 'media/images/me.jpg' absolute = path.abspath(relative) print(absolute)
Output:C:\Users\John\Desktop\media\images\me.jpg
Retrieve file and directory properties
The path module contains functions that returns file or directory properties such as when it was last modified, when it was created and the amount of data it contains. Unlike the previous functions, these functions depends on the file actually existing in the memory.
In the above example:
- We defined the
pparameter to contains the path of the current working directory. - The
path.getsize()function returns the amount of data stored in bytes. - The
path.getctime()function returns a timestamp for when it was created - The
path.getmtime()function returns a timestamp for when it was last modified - The
path.getatimereturns a timestamp for when it was last accessed. - We used the
time.ctime()function to convert the various times into a human friendly format.
Testing files and directories
The module contains various functions which can be used to check whether some properties of a file are True or False. Such as checking whether a path is a file or a directory, checking whether a path is an absolute or relative, whether a file exists or not, etc.
check if path is a file/directory
In the above example:
- The
pvariable holds the path of the current working directory. - The
path.isfile()function checks whether the input path is a file. ReturnsTrueif it is a file andFalseotherwise. - The
path.isdir()function checks whether the input path is a directory. ReturnsTrueif it is a file,Falseotherwise.
Check if path exists
All of the testing functions can be summarized as shown below:
| function | usage |
|---|---|
isabs(p) |
Checks whether path p is an absolute path. |
isfile(p) |
Checks whether path p is a file. |
isdir(p) |
Checks whether path p is a directory. |
islink(p) |
Checks whether path p is a link. |
ismount(p) |
Checks whether path p is a mount point. |
exists(p) |
Checks whether path p exists in the memory. |
lexists(p) |
Checks whether link p actually exists. |