Several things:
du -sh /my-downloads
Gives the cumulative d
isk u
sage (not size) of the /my-downloads
directory file and all the unique files and directories found by a recursive descent. It's intended to give you an indication of how much disk space you would reclaim, would you remove that directory and its contents recursively¹.
In:
find /my-downloads/* -maxdepth 1 -type d -exec du -sb {} \;
The shell expands /my-donloads/*
to the list of files (of any type, including directory) whose name doesn't start with .
(considered hidden by shell globs by default and some other tools such as ls
, but not find
nor du
themselves), and passes them to find
.
For those that are of type directory, find
will descend into them but up to one level only, the ones of other types will be discarded as -type d
will filter them out.
The files that are not of type directory in any of those directories will also be omitted.
For instance if you have:
size |
disk usage |
path |
4096 |
4096 |
/my-downloads |
4096 |
4096 |
/my-downloads/.dir |
1 |
4096 |
/my-downloads/file |
4096 |
4096 |
/my-downloads/dir1 |
10000000 |
0 |
/my-downloads/dir1/file |
4096 |
4096 |
/my-downloads/dir1/subdir1 |
10000 |
12288 |
/my-downloads/dir1/subdir1/file |
4096 |
4096 |
/my-downloads/dir1/subdir2 |
10000 |
12288 |
/my-downloads/dir1/subdir2/file |
find
, will run:
du -sb /my-downloads/dir1
du -sb /my-downloads/dir1/subdir1
du -sb /my-downloads/dir1/subdir2
- The
/my-download
directory file itself is omitted as it's never passed as argument to any du
invocation.
/my-downloads/.dir
is omitted because its name starts with .
/my-downloads/file
and /my-downloads/dir1/file
are omitted because they're not of type directory
/my-downloads/dir1/subdir{1,2}/file
are omitted because they're at depth 2
du -sb file
(-b
, like -h
being a GNU extension), gives the apparent size (not disk usage) in bytes of the file, and for those of type directory also includes the size (not disk usage) of every unique file and directory underneath.
See how /my-downloads/file
(which you can create with echo > /my-downloads/file
) has an apparent size of 1 byte but takes up 4KiB of disk space (as is common on ext4 file systems where file data is usually allocated in blocks of 4KiB) and /my-downloads/dir1/file
(which you can create with truncate -s10000000 /my-downloads/dir1/file
) which appears to be 10MB (all-null) bytes large, but doesn't take any space on disk as it's a fully sparse file.
The size of the /my-downloads/dir1/subdir{1,2}
and /my-downloads/dir1/subdir{1,2}/file
files, will be counted twice, once as part of of the cumulative size of /my-downloads/dir1
and once as part of that of /my-downloads/dir1/subdir{1,2}
. /my-downloads/dir1/file
itself will be counted once (unless for instance there's another hardlink to it in /my-downloads/dir2
, see below).
Since you're running separate du
invocations for each directory at depth 0 and 1, if there are files that are found in more that one directory, like if /my-downloads/dir1/subdir1/file
is a hard link to /my-downloads/dir1/subdir2/file
, its size will be counted once for the /my-downloads/dir1
cumulative size, once for /my-downloads/dir1/subdir1
cumulative size and once for /my-downloads/dir1/subdir2
, so 3 times instead of just one.
To sum up, the many reasons why they're different:
- disk usage vs apparent size
- top level hidden files and directories omitted
- top level non-directory files omitted.
- some files and dirs counted several times because you're passing directories at both depth 0 and 1.
- some hardlinks counted several times because they can't be deduplicated as they're passed to separate invocations of
du
.
- Also beware that if there are files with newline characters in their path, that can throw off the computation.
If you wanted a closer match, you'd do something like:
find /my-downloads -mindepth 1 -maxdepth 1 -print0 |
du -sB1 --files0-from=- --null |
awk -v RS='\0' '
{sum += $1}
END {print sum / 1024 / 1024 / 1024 "GiB"}'
(assuming an awk
implementation that supports using byte 0 as the R
ecord S
eparator such a GNU awk
or recent versions of mawk
).
Where:
- we list all (not just the non-hidden, directory ones) files in
my-downloads
- use
-B1
instead of -b
which sets the block-size to 1 but without switching to apparent size.
- we call
du
only once, by passing the list on standard input rather as arguments (whose size is limited), so du
can do its deduplication.
- we tell
du
to print the list null-delimited so it can work with arbitrary file paths.
It's still missing the disk usage of the /my-downloads
directory file itself.
In any case, the only involvement of bash
(the shell) in your shell code is just:
- the expansion of
/my-downloads/*
into the list of matching files in your case.
- the starting of two concurrent processes, one in which it executes
find
and one in which it executes awk
, with the output of one connected to the input of the other via a pipe (a kernel IPC system, not a shell one).
After those are started, the shell is not involved at all, it just waits for them to finish. Other than the initial glob expansion, the shell is not involved in what files are found, what du
commands are executed or the calculation.
/my-downloads/*
is simple enough a glob that its expansion would be the same in every shell² whether better or worse than bash
even non-Bourne-like shells³.
With my alternative command, even the glob expansion by the shell is removed.
Also, careful not to confuse
- Gigabyte, nowadays abbreviated GB for 109 (1000 × 1000 × 1000) bytes with
- Gibibyte, nowadays abbreviated GiB or still often G for 230 (1024 × 1024 × 1024) bytes.
The GNU implementation of du
, when passed the -h
option uses suffixes in the later category unless passed the --si
option (though using the same abbreviations without B
nor iB
suffix in both cases unfortunately).
¹ in practice, that might not be the case if there are more hard links to the files within outside the directory, or some of their contents is reflinked in other files, or there are some forms of snapshotting in place at filesystem level, or there is some data not accounted by du
such as some extended attributes on some file systems, etc.
² The only differences you might find with other shells would be in the order those files are expanded, some listing them in locale collation order, some using a simpler order where file names are compared byte to byte; another different could arise if there's no matching file where some shells share the misdesign of bash (inherited from the Bourne shell) where the pattern is passed literally to find
and some where an error is reported instead and find
is not run.
³ That command line is portable to and would work the same in most shells. An exception would be shells of the rc
family where {}
needs to be quoted (as '{}'
; same in older versions of fish
) and where \
is not a quoting operator (except in es
) where you'd need ';'
instead of \;
.