Find
Find is a great tool, though often infuriatingly hard to remember exactly how to do things
As it is quite powerful and has a lot of flags, it can be tricky to remember how to do things.
Here is a quick run down and best practices
Always Use Absolute Path¶
Especially if you are on a server, you can avoid misery by always using an absolute path to search in.
If you are ever going to actually do anything with find, such as delete files, then if you use .
(current directory) then you have a trap waiting in your command history that can (and does) cause accidental server destruction.
You can completely mitigate this risk by using an absolute path, so worst case, your previous command will be rerun in the directory it was originally intended.
DANGEROUS!¶
find . -type f -exec rm -rf {} \;
SAFE :)¶
find /abs/path/to/folder/with/things/to/delete -type f -exec rm -rf {} \;
Deleting Files¶
A common use case of find is to find and remove files
For example, a folder of logs or backups, you might want to delete files that are older than a certain number of days.
To do this, you should - as described above - use the absolute path to the folder. You should also use the most accurate file name mask you can. Using *
is the most dangerous, much better would be something like *.gz
to only delete the .gz
files that you are planning to delete.
When crafting your deletion command, always run it without the deletion first, just to be absolutey sure what you are going to delete.
Deleting Files Older Than X Days¶
Find Command:¶
numDays=7
find /abs/path/to/folder -type f -mtime +${numDays}
Find and Delete Command:¶
numDays=7
find /abs/path/to/folder -type f -mtime +${numDays} -exec rm -rf {} \;
Finding Big Files or Folders¶
If you want to search a directory for big files, you can have a read of this article
The best recommendation for doing this is:
First, Find Big Files and Save to List¶
find -type f -size +10M -exec du -Sh {} + | sort -rh > /tmp/filesList.txt
You now have a text file you can look at and process further as you see fit, without having to repeat a potentially expensive find command
Second, Filter the List¶
Now you have a big list, but it no doubt includes things that you do not want to delete.
You can repeatedly grep through this list to filer things out like this:
cd /tmp
cp filesList.txt filesListFiltered.txt
#Filter by path:
sed -i '/\/abs\/path\/to\/folder\/containing\/things\/we\/should\/not\/delete/d' filesFilteredList.txt
#Filter by extension:
sed -i '/\.ext/d' filesListFiltered.txt
#Remove git paths
sed -i '/\.git\//d' filesListFiltered.txt
Third, Pull out Directories¶
Now you have a list of big files, lets get the directories they exist in:
grep -Po '/(.+)/' filesListFiltered.txt | sort -u
To look at the biggest files/folders in a directory, try:
for directory in $(grep -Po '/(.+)/' /tmp/filesListFiltered.txt | sort -u);
do
echo "$directory";
cd "/var/www/vhosts/$directory";
du -hs * | sort -rh | head -n 5;
printf "\n\n\n";
done
From here, Take Action¶
At this point, you should start to have a good idea of where you need to look to delete your big files
Warning
Always be very paranoid when doing automated deletions on a live server!