Custom compression tool

Question

I have made my own compression tool for a stack-based language that uses a dictionary compression method. The code for the decompression is the following:

elif current_command == "\u201d":
    temp_string = ""
    temp_string_2 = ""
    temp_index = ""
    temp_position = pointer_position
    while temp_position < len(commands) - 1:
        temp_position += 1
        try:
            current_command = commands[temp_position]
            if dictionary.unicode_index.__contains__(current_command):
                temp_index += str(dictionary.unicode_index.index(current_command))
                temp_position += 1
                pointer_position += 2
                current_command = commands[temp_position]
                temp_index += str(dictionary.unicode_index.index(current_command))
                if temp_string == "":
                    temp_string += dictionary.dictionary[int(temp_index)].title()
                else:
                    temp_string += " " + dictionary.dictionary[int(temp_index)].title()
                temp_index = ""
            elif current_command == "\u201d":
                pointer_position += 1
                break
            elif current_command == "\u00ff":
                temp_string += str(pop_stack(1))
                pointer_position += 1
            else:
                temp_string += current_command
                pointer_position += 1
        except:
            pointer_position += 1
            break
        if debug:print(str(pointer_position) + " with " + str(hex(ord(current_command))))

    stack.append(temp_string)

The temporary variables are temp_string, temp_string_2, temp_index and temp_position. I know I should not be using names like these, but I didn't know what else to name it. This code just doesn't feel clean, and I don't know if it's just me, but it's not pleasant to look at. The code above can be found here. The dictionary.unicode_index is found here.

The best way to explain this is with an example:

For example, we want to compress the sentence Hello, World!. We first look at the index of the word Hello, which has index 2420. Since the language is 0-indexed, we need to substract 1 from this, leaving 2419. We now slice this into two pieces, 24 and 19. Then we find the corresponding indices, found here.

24 gives Ÿ
19 gives ™

The concatenation of the indices is done at this part:

temp_index += str(dictionary.unicode_index.index(current_command))

After the concatenation, we search up the index in the dictionary, which is done here:

if temp_string == "":
    temp_string += dictionary.dictionary[int(temp_index)].title()
else:
    temp_string += " " + dictionary.dictionary[int(temp_index)].title()

We append a comma, and we repeat the process for the next word, which is World. This has the index 119. Again, substract 1, which gives us the following compressed word: Œ‰. This gives us the following code for my language:

”Ÿ™,Œ‰!

And decompresses to Hello, World!, which can be verified here.

I was wondering how I can make the code look more clean, because right now, it looks like a complete mess.

No comment for the close vote from VTCer + question looks reasonable = Vote to leave open. — Pimgd
– Pimgd, Commented Feb 25, 2016 at 16:47

Dan Oberlam · Accepted Answer · 2016-02-25 18:49:01Z

You need better constants - "\u201d" is too magic - I would have no idea that it is a right double quotation mark just looking at that.

Don't use a bare except - you'll catch things you don't want to (for example, Ctrl+C). Also, narrow the try as much as you can - if it is this big, it will be very hard to tell what caused the error. You should strongly prefer catching specific errors over all errors - errors you aren't expecting are probably bugs, and you'll want to learn about them.

Instead of calling __contains__ use in.

Don't use string concatenation the way you are - that's gonna be wasteful. Instead, use string.join.

You also don't need to forward declare all of these variables.

Using a format string will help the debug section be clearer.

You probably also want to factor out the first if branch into its own function, but given that it seems a little similar to the outer branch and the loop. My guess is that the entire function this is in could be redesigned to be simpler and more modular, however without seeing that I can't speak exactly to it.

If you always increment pointer_position, put it in a finally block.

Overall, I'd recommend something more like this

elif current_command == "\u201d":
    compressed_strings = []
    position = pointer_position
    while position < len(commands) - 1:
        position += 1

        try:
            current_command = commands[position]
            if current_command in dictionary.unicode_index:
                position += 1
                pointer_position += 1
                next_command = commands[position]

                f = lambda x: str(dictionary.unicode_index.index(x))
                index = f(current_command) + f(next_command)

                next_string = dictionary.dictionary[int(temp_index)].title()

                compressed_strings.append(("" if not compressed_strings else " ") + next_string)
            elif current_command == "\u201d":
                break
            elif current_command == "\u00ff":
                compressed_strings.append(str(pop_stack(1)))
            else:
                compressed_strings.append(current_command)
        except Exception:
            break
        finally:
            pointer_position += 1
        if debug:
            print("{} with {}".format(pointer_position, hex(ord(current_command))))

    stack.append(''.join(compressed_strings))

That looks a lot better than my code. Thanks for the tips :). — Adnan
– Adnan, Commented Feb 26, 2016 at 0:16

Stack Exchange Network

Custom compression tool

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Custom compression tool

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions