As an experienced Perl developer, one comes across various scenarios for pattern-based file retrieval in scripts. Whether it is processing log files, analyzing source code or managing configurations – being able to flexibly search or filter filenames is critical.
This is where Perl‘s glob() function comes in handy. In this advanced guide, we‘ll uncover deeper insights and real-world tips for expert-level adoption of glob().
Diving Deep into glob() File Pattern Matching
The glob() function returns an array of file or directory names matching a wildcard pattern. For example:
@text_files = glob("*.txt");
However, there is far more depth in terms of what patterns you can apply here as per regular expression (regex) rules.
Going Beyond Wildcards
Typically, glob patterns utilize:
*– match any number of any characters?– match exactly one character
So you could fetch doc files with:
@files = glob("doc*");
Or 4-digit named files with:
@files = glob("????");
But glob patterns also support regex elements like anchors, character classes etc.
For example, to match txt files ending with a digit:
@files = glob("*txt[0-9]");
The [0-9] class matches any digit character.
This allows far more flexible and powerful pattern definitions.
Branching Out File Selections
You can also combine multiple selections or exclusions using Perl regex syntax with branches (|) and negated classes (^).
For instance, to match .py or .rb files while excluding .txt ones:
@files = glob("*.(py|rb)" ."^*.txt");
Here the alternation operator (|) selects .py or .rb. And ^ negates .txt matches.
Glimpse Under the Hood
It‘s also good to recognize what glob() is doing under the hood.
As per the implementation, it:
- Applies the File::DosGlob regex engine for POSIX compatibility
- Walks the directory tree recursively
- Statistically probes for file types to exclude non-files
- Handles errors due to permissions etc.
Knowing this helps tune performance and avoid unexpected issues.
Optimizing for Speed: Key Considerations
Like with any filesystem procedure, avoiding expensive operations is crucial for speed.
Let‘s explore techniques to optimize glob() based file fetching.
1. Match In Single Directory
Recall that glob() traverses the entire directory tree by default searching matches. This can be slow in big trees.
You can restrict to a single directory with the GLOB_MARK flag:
@files = glob("*.xml", GLOB_MARK);
This is faster in cases where recursion is unnecessary.
2. Front-load Pattern Validation
Now Perl has to validate the glob pattern on every run. Complex patterns with backreferences and assertions specially can get expensive:
@files = glob("(*text|log)[0-9]{3}.txt");
It‘s better to compile it into regex object upfront:
$pattern = qr((*text|log)[0-9]{3}.txt);
@files = glob($pattern);
Here qr precompiles the pattern for reuse.
3. Benchmark with Benchmark
To test optimizations, leverage Perl‘s Benchmark module:
use Benchmark qw(timethese);
timethese(10_000, {
original => sub {
my @f = glob("*.log");
},
optimized => sub {
my $re = qr(.[a-z_]{3}.log);
my @f = glob($re);
}
});
This runs the routines 10,000 times reporting runtimes. Monitor speedup from tweaks.
4. Profile Memory with Devel::Size
Memory usage matters too. Say we load all config files upfront:
my @configs = glob("/etc/*.conf");
Rather than assume, actually inspect memory with Devel::Size:
use Devel::Size qw(size total_size);
print "Memory for \@configs: ", total_size(\@configs);
Profiling at all optimization stages is key.
Alternatives and Use Case Fit
While glob() is handy, it pays to evaluate alternatives too based on the use case.
1. File::Finder for More Power
The File::Finder module offers advanced capabilities like:
- Custom file sorting
- File attributes filtering
- Callbacks for matching files
- Alternate match modes (OR vs AND)
- Directory exclusions
Though more coding is needed. Example pattern with File::Finder:
use File::Finder;
my $finder = File::Finder->new(
wanted => sub { /^foo../ },
no_chdir => 1
);
$finder->find(‘./src‘);
If basic pattern matching satisfies needs, glob() may suffice over File::Finder.
2. File::Glob for Versatility
We already touched upon the File::Glob module earlier. It enhances glob() with features like:
- Brace expansion
- Recursive level limits
- Error handling
- Result sorting
This snippet loads PHP files sorted by Mtimes:
use File::Glob ‘:csh‘;
my @php_files = glob("{*.php}", GLOB_NOSORT|GLOB_ALTDIRFUNC);
@files = grep { -f $_ } sort { (stat($b))[9] <=> (stat($a))[9] } @php_files;
So consider File::Glob where heavy file processing is required in Perl.
3. Shell Utilities Work Too
Finally, existing Unix shell utilities like find, ls etc. can also be used by invoking them via backticks or system() instead of reimplementing in Perl.
For quick one-off scripts with minimal logic, shelling out may be simpler over glob().
Putting glob() in Practice
While it helps to understand glob() internals, what matters more is applying it to real problems. Here we explore some practical use cases.
1. Log Rotation by Pattern
Log files frequently need rotation to avoid storage overflow. This script rotates .log files in /var/log over 1 GB to .log.old:
use File::Basename;
foreach my $file (glob("/var/log/*.log")){
my $size = -s $file;
if($size > 1_048_576){ # 1 GB
my $name = basename($file);
rename($file, "$file.old");
}
}
This demonstrates selectively picking files for management via glob.
2. Gitignore from Templates
.gitignore defines exclusions for marking non-source files in a Git repo. Typically starting off a .gitignore involves picking boilerplate entries from a set of templates.
Rather than manually copy-paste snippets across desired templates, we can programmatically concatenate them as:
use File::Slurper qw(read_text);
my @templates = glob("$HOME/.gitignore_snippets/*");
my $gitignore = do {
open my $output, ‘>‘, ‘.gitignore‘ or die $!;
foreach my $template (@templates) {
print {$output} read_text($template);
}
$output;
};
Here glob() neatly selects all template files for merging.
3. Testing File Write Access
When handling file uploads, we need to validate if the destination directories are writable.
A handy test function with glob() is:
sub can_write {
my($dir) = @_;
if(not -d $dir){
return 0;
}
my $temp_file = $dir . "/" . time() . ".tmp";
{
local(>$temp_file);
print $temp_file time();
}
my @found = glob("$temp_file");
unlink $temp_file;
return scalar(@found) ? 1 : 0;
}
if(can_write("/home/u/uploads")){
# Dir writable .. handle upload
}
else {
# Failed permission
}
Here we attempt creating a temp file in the directory via glob() to check writability.
Conclusion: Why Master glob()
Like any component in an expert‘s toolkit – be it functions or modules – fully grasping details and use patterns is key towards mastery.
Hopefully these deeper insights into inner mechanics, optimizations and applications of Perl‘s versatile glob() function can help boost effectiveness in harnessing it.
From efficiently selecting resource files to cleaning up logs to secue uploads and beyond – leveraging file globs flexibly based on needs is an important skill to cultivate for any advanced Perl developer.


