Recursively remove files with the same name as the ones that end in .part
I want to remove all files with the ".part" extension in the current directory and its subdirectories, including files with the same name but different extension.
Is this correct?
find . -name '*.part' -exec sh -c 'base="$(basename "$1" .part)"; find . -name "$base*" -delete' sh {} \;
3 answers
It is incorrect for two reasons.
1. File names containing glob characters
This is an edge case scenario.
Consider this structure:
.
├── abc
├── abc.part
├── cde
└── c*e.part
The outermost Find will find
-
abc.part, sobase=abcand the innermost Find looks for files matching the globabc*, which matches theabcfile. Good. -
c*e.part, sobase=c*eand the innermost Find looks for files matching the globc*e*, which matches thecdefile. Bad, becausecdedoes not containc*e.
2. File names with extra characters
If you have abcde and abc.part files, the former will be deleted because it matches abc* as should be clear from the previous case discussion.
This particular problem would be easily fixed by changing $base* -> $base.*.
Proposed solution
Point 1 is the real challenge: It is quite involved to feed the file names back again into another Find's -name argument and escape the meta-characters, which is always a mine field.
I propose instead to use a shell with support for **, the recursive glob, for example Bash or Ksh with globstar option set or Zsh.
#!/bin/bash
shopt -s globstar #Not needed in Zsh
for f in ./**/*.part; do
rm ./**/"$(basename "$f" .part)".*
done
For a breakdown,
- In line 2,
**/*.partmatches./a.partbut also./a/b/c.part(hence "recursive glob"). - In line 3,
"$(basename "$f" .part)"removes all directory components of the file name and its.partextension. This would boil down toaandcin our example.
So the full linerm ./**/"$(basename "$f" .part)".*recursively removes files matching thea.*andc.*patterns.
It is crucial not to quote the * characters in the example, because we want it to act as a glob (and not to be parsed literally).
0 comment threads
I might be inclined to try...
find . -type f -name '*.part' -exec sh -c '
[ -f "${1%.part}" ] && rm -i -- "${1%.part}";
for f in "${1%.part}".*; do
[ -f "$f" ] && rm -i -- "$f";
done
' -- {} \;
(newlines for readability; can be elided if one-liner means something to you...)
-
find . -type f -name '*.part'— find files ending with .part -
-exec sh -c '...' -- {} \;— run a shell script ... for each found file; path to file is in $1 in child script -
"${1%.part}"— strip .part from the end of the filename in $1 (same asbasenamebut without the extra process) -
[ -f "${1%.part}" ] && ...;— if a file exists with no extension, do the ... bit -
rm -i -- "${1%.part}"— delete the file with no extension -
for f in "${1%.part}".*; do ... done— loop each found path matching the filename with any extension; path is stored in $f (this includes the one with the .part extension) -
[ -f "$f" ] && ...;— if the path in $f exists and is a file, do the ... bit -
rm -i -- "$f"— remove the file in $f
Note that I'm using various checks that the thing I'm asking to delete is a file, not a directory, link, fifo, etc.
If limiting only to files is less of a concern, you might well be able to shorten this to...
find . -name '*.part' -exec sh -c 'rm -i -- "${1%.part}" "${1%.part}".*' -- {} \;
The shell may write errors if the args to rm don't expand to existing paths, hide that with judicious use of 2>/dev/null redirection, if you care.
For fewer subshells, you may be able to pass all found files to the same shell in one go, with...
find . -name '*.part' -exec sh -c 'while [ -n "$1" ]; do rm -i -- "${1%.part}" "${1%.part}".*; shift; done' -- {} \+
...but this might be painful for larger file lists.
In general, note there's is technically a race condition between the various tests and the eventual delete, but that's only a concern if multiple processes are acting on that directory tree. Not sure how to avoid that.
Finally, rm -i is used to prompt y/n for each file to delete, as a safety net. Remove the -i switch from the rm calls if you are confident.
For each file named foo.xyz, you want to delete foo.xyz.part. It doesn't matter if foo.xyz.part exists, you can just attempt it and skip errors.
You can get a list of all files with find etc. But you don't want the ones with .part, so you use grep to take them out: find | grep -v '\.part$'. $ means end of string and \. is because otherwise . means any character in regex.
You can then attempt to delete each one: find | grep -v '\.part$' | parallel rm {}
If the file doesn't exist, Parallel will show you the error message, but it will still delete the ones that do exist. You can do a bunch of extra filtering with comm to only attempt to delete those files which do exist, but there's no need in this case.

0 comment threads