Parsing strings in python

Question

So my problem is this, I have a file that looks like this:

[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1

This would of course translate to

' This is an example file!'

I am looking for a way to parse the original content into the end content, so that a [BACKSPACE] will delete the last character(spaces included) and multiple backspaces will delete multiple characters. The [SHIFT] doesnt really matter as much to me. Thanks for all the help!

Are [BACKSPACE] and [SHIFT] the only markups that you need to worry about? — inspectorG4dget
– inspectorG4dget, Commented Feb 3, 2011 at 3:12

Joe Kington · Accepted Answer · 2011-02-03 03:47:19Z

Here's one way, but it feels hackish. There's probably a better way.

def process_backspaces(input, token='[BACKSPACE]'):
    """Delete character before an occurence of "token" in a string."""
    output = ''
    for item in (input+' ').split(token):
        output += item
        output = output[:-1]
    return output

def process_shifts(input, token='[SHIFT]'):
    """Replace characters after an occurence of "token" with their uppecase 
    equivalent. (Doesn't turn "1" into "!" or "2" into "@", however!)."""
    output = ''
    for item in (' '+input).split(token):
        output += item[0].upper() + item[1:]
    return output

test_string = '[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1'
print process_backspaces(process_shifts(test_string))

Inaimathi · Accepted Answer · 2011-02-03 03:48:48Z

1

If you don't care about the shifts, just strip them, load

(defun apply-bspace ()
  (interactive)
  (let ((result (search-forward "[BACKSPACE]")))
    (backward-delete-char 12)
    (when result (apply-bspace))))

and hit M-x apply-bspace while viewing your file. It's Elisp, not python, but it fits your initial requirement of "something I can download for free to a PC".

Edit: Shift is trickier if you want to apply it to numbers too (so that [SHIFT]2 => @, [SHIFT]3 => #, etc). The naive way that works on letters is

(defun apply-shift ()
  (interactive)
  (let ((result (search-forward "[SHIFT]")))
    (backward-delete-char 7)
    (upcase-region (point) (+ 1 (point)))
    (when result (apply-shift))))

edited Feb 3, 2011 at 3:48

answered Feb 3, 2011 at 3:31

Inaimathi

14.1k11 gold badges54 silver badges96 bronze badges

2 Comments

Joe Kington Over a year ago

+1 for an Elisp answer! It's (not too suprisingly, I guess) quite good at this sort of thing... I'm a vim person, personally, but things like this sometimes pull me towards emacs.

Inaimathi Over a year ago

@Joe Kington - Hehe. To be truthful, this is the sort of thing I'd handle with a keyboard macro and maybe an alist unless there were multiple, large files that needed parsing. It's just that a function is easier to share and explain.

inspectorG4dget · Accepted Answer · 2011-02-03 03:58:12Z

This does exactly what you want:

def shift(s):
    LOWER = '`1234567890-=[];\'\,./'
    UPPER = '~!@#$%^&*()_+{}:"|<>?'

    if s.isalpha():
        return s.upper()
    else:
        return UPPER[LOWER.index(s)]

def parse(input):
    input = input.split("[BACKSPACE]")
    answer = ''
    i = 0
    while i<len(input):
        s = input[i]
        if not s:
            pass
        elif i+1<len(input) and not input[i+1]:
            s = s[:-1]
        else:
            answer += s
            i += 1
            continue
        answer += s[:-1]
        i += 1

    return ''.join(shift(i[0])+i[1:] for i in answer.split("[SHIFT]") if i)

>>> print parse("[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1")
>>> This is an example file!

The debug is complete and the result is exactly what you want

gahooa · Accepted Answer · 2011-02-03 03:11:13Z

0

It seems that you could use a regular expression to search for (something)[BACKSPACE] and replace it with nothing...

re.sub('.?\[BACKSPACE\]', '', YourString.replace('[SHIFT]', ''))

Not sure what you meant by "multiple spaces delete multiple characters".

answered Feb 3, 2011 at 3:11

gahooa

138k14 gold badges101 silver badges104 bronze badges

4 Comments

payne Over a year ago

-1 How will this work for "blah[BACKSPACE][BACKSPACE][BACKSPACE]arf"?

Matthew Downey Over a year ago

But it needs to delete one space BEFORE the backspace as well as the '[BACKSPACE]' itslef

payne Over a year ago

That's my point -- gahooa's solution won't work for my blah-barf example.

Matthew Downey Over a year ago

Yeah, i just saw what you were saying, so far the only way I can think of to do it would combine python with autoit or another manual macro/automation service, but the results would be tedious at best, and possibly not 100% functioning

Gonzalo Larralde · Accepted Answer · 2011-02-03 03:31:41Z

You need to read the input, extract the tokens, recognize them, and give a meaning to them.

This is how I would do it:

# -*- coding: utf-8 -*-

import re

upper_value = {
    1: '!', 2:'"',
}

tokenizer = re.compile(r'(\[.*?\]|.)')
origin = "[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1"
result = ""

shift = False

for token in tokenizer.findall(origin):
    if not token.startswith("["):
        if(shift):
            shift = False
            try:
                token = upper_value[int(token)]
            except ValueError:
                token = token.upper()

        result = result + token
    else:
        if(token == "[SHIFT]"):
            shift = True
        elif(token == "[BACKSPACE]"):
            result = result[0:-1]

It's not the fastest, neither the elegant solution, but I think it's a good start.

Hope it helps :-)

Collectives™ on Stack Overflow

Parsing strings in python

5 Answers 5

Comments

2 Comments

2 Comments

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

2 Comments

4 Comments

Comments

Related