You could create a script like this to hash multiple files, and then hash the concatenation of their hashes. Hashing in two parts like this instead of concatenating all data first should work to prevent mixups where the concatenation loses information on the borders between the inputs (e.g. ab
+c
!= a
+bc
).
#!/bin/bash
# function to get the hashes
H() {
sha256sum "$@" |
LC_ALL=C sed '
s/[[:blank:]].*//; # retain only the hash
s/^\\//; # remove a leading \ that GNU sha256sum at least
# inserts for file names where it escapes some
# characters (such as CR, LF or backslash).'
}
# workaround for command substitution removing final newlines
hashes=$(H "$@"; echo .)
hashes=${hashes%.}
# just for clarity
printf "%s\n" "----"
printf "%s" "$hashes"
printf "%s\n" "----"
# hash the hashes
final=$(printf "%s" "$hashes" | H)
echo "final hash of $# files: $final"
An example with two files:
$ echo hello > hello.txt
$ echo world > world.txt
$ bash hash.sh hello.txt world.txt
----
5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317
----
final hash of 2 files: 27201be8016b0793d29d23cb0b1f3dd0c92783eaf5aa7174322c95ebe23f9fe8
You could also use process substitution to insert a string instead, this should give the same output:
$ bash hash.sh hello.txt <(echo world)
[...]
final hash of 2 files: 27201be8016b0793d29d23cb0b1f3dd0c92783eaf5aa7174322c95ebe23f9fe8
Giving the same input data (hello\nworld\n
) with a different separation gives a different hash:
$ bash hash.sh <(printf h) <(printf "ello\nworld\n")
[...]
final hash of 2 files: 0453f1e6ba45c89bf085b77f3ebb862a4dbfa5c91932eb077f9a554a2327eb8f
Of course, changing the order of the input files should also change the hash.
The part between the dashes in the output is just for clarity here, it shows the data that goes to the final sha256sum
. You should probably remove it for actual use.
Above, I used sed
to remove the filename(s) from the output of sha256sum
. If you remove the | sed ...
part, the filenames will be included and e.g. hash.sh hello.txt world.txt
would instead hash the string
5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03 hello.txt
e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317 world.txt
The sub-hashes are the same, but the input to the final hash is different,
giving f27b5175dec88c76dc6a7b368167cd18875da266216506e10c503a56befd7e14
as the result. Obviously, changing the filenames, including going from hello.txt
to ./hello.txt
would change the hash. Also using process substitution would be less useful here, as they'd show up with odd implementation-dependent filenames (like /dev/fd/63
with Bash on Linux).
In the above, the input to the final hash is the hex encoding of the hashes of the input elements, with newlines terminating each. I don't think you need more separation than that, and could technically even drop the newlines as the hashes have a fixed length anyway (but we get the newlines for free and they make it easier to read for a human).
Though note that sha256sum
gives just plain hashes. If you're looking for something to generate authentication tags, you should probably look into HMAC or such, and be wary of length-extension attacks (which a straightforward H(key + data)
may be vulnerable to) etc.
Depending on your use-case, you might want to consider going to security.SE or crypto.SE, or hiring an actual expert.
foo
+bardoo
andfoobar
+doo
both concatenate tofoobardoo
and have the same hash, or should the two pairs have a different hash?