1

I want to create a hash of more than one source in Bash.

I am aware that I can:

echo -n "STRING" | sha256sum

or

sha256sum [FILE]

What I need is:

  1. STRING + FILE
  2. FILE + FILE
  3. STRING + STRING
  4. STRING + FILE + STRING

For example STRING + FILE

  1. Save the hash of STRING in a variable and the hash of the [FILE] in a variable. Compute and create a hash of the sum.

  2. Save the hash of the STRING in a file and the hash of the [FILE] in the same file and create a hash of this file.

Can I create a hash using a single command?

For example: echo "STRING" + [FILE] | sha256sum

How can I accomplish this, and what is the recommended or correct method?

UPDATE

With Romeo Ninov's answer, EXAMPLE 1:

echo -n "STRING" && cat [FILE] | sha256sum

When I do:

EXAMPLE 2:

echo $(echo -n "STRING" | sha256sum) $(sha256sum [FILE]) | sha256sum

What should I use? I'm getting different results. What is the correct method to achieve this?

7
  • 1
    In example 2 you have space, dash, space between two hashes and space+name of the file on the end Commented Sep 1, 2022 at 16:12
  • 1
    in example 1 you have hash of concatenation of string and file content Commented Sep 1, 2022 at 16:13
  • 3
    ok concatenation and spaces i have to differ, ty Commented Sep 1, 2022 at 16:15
  • 1
    do you want the hash of the concatenation of two values, or do you want the separation between the two values to be kept intact? I.e., should pairs like foo+bardoo and foobar+doo both concatenate to foobardoo and have the same hash, or should the two pairs have a different hash?
    – ilkkachu
    Commented Sep 1, 2022 at 19:52
  • 5
    @BlockchainOffice It sounds like you should look into Merkle hash trees. You also really need to define what you're trying to protect and what you're trying to protect it against; unless you know what you're trying to accomplish, it's impossible to define what will be successful/better/worse/whatever at accomplishing it. Commented Sep 1, 2022 at 23:12

2 Answers 2

7

You could create a script like this to hash multiple files, and then hash the concatenation of their hashes. Hashing in two parts like this instead of concatenating all data first should work to prevent mixups where the concatenation loses information on the borders between the inputs (e.g. ab+c != a+bc).

#!/bin/bash

# function to get the hashes
H() {
    sha256sum "$@" |
      LC_ALL=C sed '
        s/[[:blank:]].*//; # retain only the hash
        s/^\\//; # remove a leading \ that GNU sha256sum at least
                 # inserts for file names where it escapes some
                 # characters (such as CR, LF or backslash).'
}   

# workaround for command substitution removing final newlines
hashes=$(H "$@"; echo .)
hashes=${hashes%.}

# just for clarity
printf "%s\n" "----"
printf "%s" "$hashes"
printf "%s\n" "----"

# hash the hashes
final=$(printf "%s" "$hashes" | H)

echo "final hash of $# files: $final"

An example with two files:

$ echo hello > hello.txt
$ echo world > world.txt
$ bash hash.sh hello.txt world.txt
----
5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317
----
final hash of 2 files: 27201be8016b0793d29d23cb0b1f3dd0c92783eaf5aa7174322c95ebe23f9fe8

You could also use process substitution to insert a string instead, this should give the same output:

$ bash hash.sh hello.txt <(echo world)
[...]
final hash of 2 files: 27201be8016b0793d29d23cb0b1f3dd0c92783eaf5aa7174322c95ebe23f9fe8

Giving the same input data (hello\nworld\n) with a different separation gives a different hash:

$ bash hash.sh <(printf h) <(printf "ello\nworld\n")
[...]
final hash of 2 files: 0453f1e6ba45c89bf085b77f3ebb862a4dbfa5c91932eb077f9a554a2327eb8f

Of course, changing the order of the input files should also change the hash.

The part between the dashes in the output is just for clarity here, it shows the data that goes to the final sha256sum. You should probably remove it for actual use.


Above, I used sed to remove the filename(s) from the output of sha256sum. If you remove the | sed ... part, the filenames will be included and e.g. hash.sh hello.txt world.txt would instead hash the string

5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03  hello.txt
e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317  world.txt

The sub-hashes are the same, but the input to the final hash is different, giving f27b5175dec88c76dc6a7b368167cd18875da266216506e10c503a56befd7e14 as the result. Obviously, changing the filenames, including going from hello.txt to ./hello.txt would change the hash. Also using process substitution would be less useful here, as they'd show up with odd implementation-dependent filenames (like /dev/fd/63 with Bash on Linux).


In the above, the input to the final hash is the hex encoding of the hashes of the input elements, with newlines terminating each. I don't think you need more separation than that, and could technically even drop the newlines as the hashes have a fixed length anyway (but we get the newlines for free and they make it easier to read for a human).

Though note that sha256sum gives just plain hashes. If you're looking for something to generate authentication tags, you should probably look into HMAC or such, and be wary of length-extension attacks (which a straightforward H(key + data) may be vulnerable to) etc.

Depending on your use-case, you might want to consider going to security.SE or crypto.SE, or hiring an actual expert.

0

After receiving all the information and comments, in my opinion, one possible solution is as follows:

  • Hash each source individually.
  • Avoid concatenating the sources unless they have been individually hashed beforehand.
  • Consider using delimiters or salt when hashing the sources.
  • For further processing and storage, such as in a ledger with blocks, the best approach is to use a hash tree (Merkle hash trees), similar to how most private and public blockchains currently operate.

Examples:

Same hash result for:

HASH_OF((abc) + (def))

HASH_OF((ab) + (cdef))

HASH_OF((abcde) + (f))

Different hash result for:

HASH_OF( (HASH_OF(abc)) + (HASH_OF(def)) )

HASH_OF( (HASH_OF(ab)) + (HASH_OF(cdef)) )

HASH_OF( (HASH_OF(abcde)) + (HASH_OF(f)) )

My current approach, incorporating delimiters/salt, is as follows:

HASH_OF( (HASH_OF(abcde + [delimters/salt])) + (HASH_OF(f + [delimters/salt])) )

I will continue and expand upon this example to suit my specific requirements.

It would be more convenient and clearer to implement it within a script.

echo $(echo -n "STRING1" | sha256sum)$(echo -n "STRING2" | sha256sum) | sha256sum

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.