By the way, using checksum or hash is a good idea. My script doesn't use it. But if files are small and amount of files are not big (like 10-20 files), this script will work quite fast. If you are have 100 files and more, 1000 lines in every file, than time will be more than 10 second.
Usage: ./duplicate_removing.sh files/*
#!/bin/bash
for target_file in "$@"; do
shift
for candidate_file in "$@"; do
compare=$(diff -q "$target_file" "$candidate_file")
if [ -z "$compare" ]; then
echo the "$target_file" is a copy "$candidate_file"
echo rm -v "$candidate_file"
fi
done
done
Testing
Create random files: ./creating_random_files.sh
#!/bin/bash
file_amount=10
files_dir="files"
mkdir -p "$files_dir"
while ((file_amount)); do
content=$(shuf -i 1-1000)
echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
((file_amount--))
done
Run ./duplicate_removing.sh files/* and get output
the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied
echo $?is equivalent to$?. Also$(command)is preferred to using back ticks.md5sum `find -type f` | sort | uniq -D -w 32(superuser.com/questions/259148/…) It does not automatically delete duplicates but it could be changed if you really want to.