Revisions to Find duplicate files - Unix & Linux Stack Exchange

added 15 characters in body

Source Link

edited Dec 28, 2019 at 20:28

252.7k
69
481
719

Another good tool is fslint:

fslint is a toolset to find various problems with filesystems, including duplicate files and problematic filenames etc.

Individual command line tools are available in addition to the GUI and to access them, one can change to, or add to $PATH the /usr/share/fslint/fslint directory on a standard install. Each of these commands in that directory have a --help option which further details its parameters.
   findup - find DUPlicate files

On debian-based systems, youcan install it with:

sudo apt-get install fslint

You can also do this manually if you don't want to or cannot install third party tools. The way most such programs work is by calculating file checksums. Files with the same md5sum almost certainly contain exactly the same data. So, you could do something like this:

find / -type f -exec md5sum {} \; > md5sums
gawkawk '{print $1}' md5sums | sort | uniq -d > dupes
while read -r d; do echo "---"; grep $d-- "$d" md5sums | cut -d ' ' -f 2-; done < dupes

Sample output (the file names in this example are the same, but it will also work when they are different):

$ while read -r d; do echo "---"; grep $d-- "$d" md5sums | cut -d ' ' -f 2-; done < dupes 
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/if_bonding.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/if_bonding.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/route.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/route.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/drm/Kbuild
 /usr/src/linux-headers-3.2.0-4-common/include/drm/Kbuild
---

This will be much slower than the dedicated tools already mentioned, but it will work.

Another good tool is fslint:

fslint is a toolset to find various problems with filesystems, including duplicate files and problematic filenames etc.

Individual command line tools are available in addition to the GUI and to access them, one can change to, or add to $PATH the /usr/share/fslint/fslint directory on a standard install. Each of these commands in that directory have a --help option which further details its parameters.
   findup - find DUPlicate files

On debian-based systems, youcan install it with:

sudo apt-get install fslint

You can also do this manually if you don't want to or cannot install third party tools. The way most such programs work is by calculating file checksums. Files with the same md5sum almost certainly contain exactly the same data. So, you could do something like this:

find / -type f -exec md5sum {} \; > md5sums
gawk '{print $1}' md5sums | sort | uniq -d > dupes
while read d; do echo "---"; grep $d md5sums | cut -d ' ' -f 2-; done < dupes

Sample output (the file names in this example are the same, but it will also work when they are different):

$ while read d; do echo "---"; grep $d md5sums | cut -d ' ' -f 2-; done < dupes 
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/if_bonding.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/if_bonding.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/route.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/route.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/drm/Kbuild
 /usr/src/linux-headers-3.2.0-4-common/include/drm/Kbuild
---

This will be much slower than the dedicated tools already mentioned, but it will work.

Another good tool is fslint:

fslint is a toolset to find various problems with filesystems, including duplicate files and problematic filenames etc.

Individual command line tools are available in addition to the GUI and to access them, one can change to, or add to $PATH the /usr/share/fslint/fslint directory on a standard install. Each of these commands in that directory have a --help option which further details its parameters.
   findup - find DUPlicate files

On debian-based systems, youcan install it with:

sudo apt-get install fslint

You can also do this manually if you don't want to or cannot install third party tools. The way most such programs work is by calculating file checksums. Files with the same md5sum almost certainly contain exactly the same data. So, you could do something like this:

find / -type f -exec md5sum {} \; > md5sums
awk '{print $1}' md5sums | sort | uniq -d > dupes
while read -r d; do echo "---"; grep -- "$d" md5sums | cut -d ' ' -f 2-; done < dupes

Sample output (the file names in this example are the same, but it will also work when they are different):

$ while read -r d; do echo "---"; grep -- "$d" md5sums | cut -d ' ' -f 2-; done < dupes 
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/if_bonding.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/if_bonding.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/route.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/route.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/drm/Kbuild
 /usr/src/linux-headers-3.2.0-4-common/include/drm/Kbuild
---

This will be much slower than the dedicated tools already mentioned, but it will work.

deleted 188 characters in body

Source Link

edited Apr 4, 2013 at 16:06

terdon ♦

252.7k
69
481
719

Another good tool is fslint:

fslint is a toolset to find various problems with filesystems, including duplicate files and problematic filenames etc.

Individual command line tools are available in addition to the GUI and to access them, one can change to, or add to $PATH the /usr/share/fslint/fslint directory on a standard install. Each of these commands in that directory have a --help option which further details its parameters.
   findup - find DUPlicate files

On debian-based systems, youcan install it with:

sudo apt-get install fslint

You can also do this manually if you don't want to or cannot install third party tools. The way most such programs work is by calculating file checksums. Files with the same md5sum almost certainly contain exactly the same data. So, you could do something like this:

find / -type f -exec md5sum {} \; > md5sums
gawk '{print $1}' md5sums | sort | uniq -d > md5dupesdupes
while read md5;d; do echo "---"; grep $md5$d md5sums | cut -d ' ' -f 2-; done < md5dupesdupes

Sample output (the file names in this example are the same, but it will also work when they are different):

$ while read md5;d; do echo "---"; grep $md5$d md5sums | cut -d ' ' -f 2-; done < md5dupes |dupes head
---
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-3-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-4-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040 ---
 /usr/src/linux-headers-3.2.0-23-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/include/linux/if_bondingroute.h
---
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-34-common/include/asm-genericlinux/topologyroute.h
0046cc8e6b0aa92ac2efe2771984fedc ---
 /usr/src/linux-headers-3.2.0-43-common/include/asm-genericdrm/topology.hKbuild
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-24-common/include/asm-genericdrm/topology.hKbuild
---

This will be much slower than the dedicated tools already mentioned, but it will work.

Another good tool is fslint:

fslint is a toolset to find various problems with filesystems, including duplicate files and problematic filenames etc.

Individual command line tools are available in addition to the GUI and to access them, one can change to, or add to $PATH the /usr/share/fslint/fslint directory on a standard install. Each of these commands in that directory have a --help option which further details its parameters.
   findup - find DUPlicate files

On debian-based systems, youcan install it with:

sudo apt-get install fslint

You can also do this manually if you don't want to or cannot install third party tools. The way most such programs work is by calculating file checksums. Files with the same md5sum almost certainly contain exactly the same data. So, you could do something like this:

find / -type f -exec md5sum {} \; > md5sums
gawk '{print $1}' md5sums | sort | uniq -d > md5dupes
while read md5; do echo "---"; grep $md5 md5sums ; done < md5dupes

Sample output:

$ while read md5; do echo "---"; grep $md5 md5sums ; done < md5dupes | head
---
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-3-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-4-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-2-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/include/linux/if_bonding.h
---
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-3-common/include/asm-generic/topology.h
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-4-common/include/asm-generic/topology.h
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-2-common/include/asm-generic/topology.h
---

This will be much slower than the dedicated tools already mentioned, but it will work.

Another good tool is fslint:

fslint is a toolset to find various problems with filesystems, including duplicate files and problematic filenames etc.

Individual command line tools are available in addition to the GUI and to access them, one can change to, or add to $PATH the /usr/share/fslint/fslint directory on a standard install. Each of these commands in that directory have a --help option which further details its parameters.
   findup - find DUPlicate files

On debian-based systems, youcan install it with:

sudo apt-get install fslint

You can also do this manually if you don't want to or cannot install third party tools. The way most such programs work is by calculating file checksums. Files with the same md5sum almost certainly contain exactly the same data. So, you could do something like this:

find / -type f -exec md5sum {} \; > md5sums
gawk '{print $1}' md5sums | sort | uniq -d > dupes
while read d; do echo "---"; grep $d md5sums | cut -d ' ' -f 2-; done < dupes

Sample output (the file names in this example are the same, but it will also work when they are different):

$ while read d; do echo "---"; grep $d md5sums | cut -d ' ' -f 2-; done < dupes 
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/if_bonding.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/if_bonding.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/linux/route.h
 /usr/src/linux-headers-3.2.0-4-common/include/linux/route.h
---
 /usr/src/linux-headers-3.2.0-3-common/include/drm/Kbuild
 /usr/src/linux-headers-3.2.0-4-common/include/drm/Kbuild
---

This will be much slower than the dedicated tools already mentioned, but it will work.

Source Link

answered Apr 4, 2013 at 16:00

terdon ♦

252.7k
69
481
719

Another good tool is fslint:

fslint is a toolset to find various problems with filesystems, including duplicate files and problematic filenames etc.

Individual command line tools are available in addition to the GUI and to access them, one can change to, or add to $PATH the /usr/share/fslint/fslint directory on a standard install. Each of these commands in that directory have a --help option which further details its parameters.
   findup - find DUPlicate files

On debian-based systems, youcan install it with:

sudo apt-get install fslint

You can also do this manually if you don't want to or cannot install third party tools. The way most such programs work is by calculating file checksums. Files with the same md5sum almost certainly contain exactly the same data. So, you could do something like this:

find / -type f -exec md5sum {} \; > md5sums
gawk '{print $1}' md5sums | sort | uniq -d > md5dupes
while read md5; do echo "---"; grep $md5 md5sums ; done < md5dupes

Sample output:

$ while read md5; do echo "---"; grep $md5 md5sums ; done < md5dupes | head
---
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-3-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-4-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/src/linux-headers-3.2.0-2-common/include/linux/if_bonding.h
001e325e7c919ab89c52c4e9194b3040  /usr/include/linux/if_bonding.h
---
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-3-common/include/asm-generic/topology.h
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-4-common/include/asm-generic/topology.h
0046cc8e6b0aa92ac2efe2771984fedc  /usr/src/linux-headers-3.2.0-2-common/include/asm-generic/topology.h
---

This will be much slower than the dedicated tools already mentioned, but it will work.

Stack Exchange Network

Return to Answer