1

For some reason, my code below is giving inconsistent results. The files in files do not ever change. However, the result of hasher.hexdigest() is giving different values each time this function runs. My goal with this code is to only generate a new settings file if and only if the checksum/hash in the current settings file does not match the result of the three settings files hashed with hashlib. Does anyone see what I might be doing wrong?

def should_generate_new_settings(qt_settings_generated_path: Path) -> tuple[bool, str]:
    """ compare checksum of user_settings.json and the current ini file to what is stored in the currently generated settings file """
    generate = False
    hasher = hashlib.new('md5')
    if not qt_settings_generated_path.exists():
        generate = True

    try:
        # if the file is corrupt, it may have a filesize of 0.
        generated_file = qt_settings_generated_path.stat()
        if generated_file.st_size < 1:
            generate = True

        files = [paths.user_settings_path, paths.settings_generated_path, Path(__file__)]
        for path in files:
            file_contents = path.read_bytes()
            hasher.update(file_contents)

        with qt_settings_generated_path.open('r') as file:
            lines = file.read().splitlines()

        checksum_prefix = '# checksum: '
        for line in lines:
            if line.startswith(checksum_prefix):
                file_checksum = line.lstrip(checksum_prefix)
                if file_checksum != hasher.hexdigest():
                    generate = True
                    break
    except FileNotFoundError:
        generate = True

    return (generate, hasher.hexdigest())
5
  • When you say ”each time the function runs”, do you mean each time within the same program, or each time the program runs? Because hashes in Python are effectively randomized every time Python launches. Commented Jan 14, 2023 at 19:11
  • @SimonLundberg, are you saying that, in Python, the MD5 hash of the same value should be expected to differ between executions; or have I misunderstood? Commented Jan 14, 2023 at 19:18
  • Hash randomization applies only to the built-in hash() function, to reduce the risk of denial-of-service attacks. It does not apply to the stuff in hashlib. Commented Jan 14, 2023 at 19:23
  • An mmd5 hash of bytes is consistent. I think @SimonLundberg is speaking about the hash/id of objects like class instances in python. One thing I wonder about though is when would file_checksum != hasher.hexdigest() ever be false given file_checksum is per file and hasher is all bytes for all files combined. Also, you might want to print the exception as who knows what file might not be found and at the moment you eat that information. Commented Jan 14, 2023 at 19:23
  • Yeah, sorry, I should have read that more carefully. The md5 should be consistent. Commented Jan 14, 2023 at 19:29

1 Answer 1

2

I figured out the issue. The solution was simply to store the hash digest in another file other than the file I'm generating the settings into.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh yeah that would break stuff for sure. If you store the hash in the file you're hashing the hash will be different.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.