Parsing a lisp file with Python

Question

I have the following lisp file, which is from the UCI machine learning database. I would like to convert it into a flat text file using python. A typical line looks like this:

(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))

I would like to parse this into a text file like:

time pitch duration keysig timesig fermata
8    67    4        1      12      0
12   67    8        1      12      0

Is there a python module to intelligently parse this? This is my first time seeing lisp.

What's the learning curve involved in learning enough lisp to do that? — qua
– qua, Commented Dec 27, 2012 at 18:06

Community · Accepted Answer · 2017-05-23 11:46:27Z

As shown in this answer, pyparsing appears to be the right tool for that:

inputdata = '(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'

from pyparsing import OneOrMore, nestedExpr

data = OneOrMore(nestedExpr()).parseString(inputdata)
print data

# [['1', [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']], [['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]]]

For the completeness' sake, this is how to format the results (using texttable):

from texttable import Texttable

tab = Texttable()
for row in data.asList()[0][1:]:
    row = dict(row)
    tab.header(row.keys())
    tab.add_row(row.values())
print tab.draw()

+---------+--------+----+-------+-----+---------+
| timesig | keysig | st | pitch | dur | fermata |
+=========+========+====+=======+=====+=========+
| 12      | 1      | 8  | 67    | 4   | 0       |
+---------+--------+----+-------+-----+---------+
| 12      | 1      | 12 | 67    | 8   | 0       |
+---------+--------+----+-------+-----+---------+

To convert that data back to the lisp notation:

def lisp(x):
    return '(%s)' % ' '.join(lisp(y) for y in x) if isinstance(x, list) else x

d = lisp(d[0])

This is definitely the answer since the Op asked for "a python module to intelligently parse this"

6502 · Accepted Answer · 2012-12-28 11:21:48Z

If you know that the data is correct and the format uniform (seems so at a first sight), and if you need just this data and don't need to solve the general problem... then why not just replacing every non-numeric with a space and then going with split?

import re
data = open("chorales.lisp").read().split("\n")
data = [re.sub("[^-0-9]+", " ", x) for x in data]
for L in data:
    L = map(int, L.split())
    i = 1  # first element is chorale number
    while i < len(L):
        st, pitch, dur, keysig, timesig, fermata = L[i:i+6]
        i += 6
        ... your processing goes here ...

Roland Smith · Accepted Answer · 2012-12-27 18:24:31Z

Separate it into pairs with a regular expression:

In [1]: import re

In [2]: txt = '(((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'

In [3]: [p.split() for p in re.findall('\w+\s+\d+', txt)]
Out[3]: [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0'], ['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]

Then make it into a dictionary:

dct = {}
for p in data:
    if not p[0] in dct.keys():
        dct[p[0]] = [p[1]]
    else:
        dct[p[0]].append(p[1])

The result:

In [10]: dct
Out[10]: {'timesig': ['12', '12'], 'keysig': ['1', '1'], 'st': ['8', '12'], 'pitch': ['67', '67'], 'dur': ['4', '8'], 'fermata': ['0', '0']}

Printing:

print 'time pitch duration keysig timesig fermata'
for t in range(len(dct['st'])):
    print dct['st'][t], dct['pitch'][t], dct['dur'][t], 
    print dct['keysig'][t], dct['timesig'][t], dct['fermata'][t]

Proper formatting is left as an exercise for the reader...

ealfonso · Accepted Answer · 2022-05-15 23:50:47Z

cSince the data is already in Lisp, use lisp itself to manipulate the data into a well-known format like CSV or TSV:

    (let ((input '(1 ((ST 8) (PITCH 67) (DUR 4) (KEYSIG 1) (TIMESIG 12) (FERMATA 0))
                    ((ST 12) (PITCH 67) (DUR 8) (KEYSIG 1) (TIMESIG 12) (FERMATA 0)))))
               (let*
                   ((headers (mapcar #'first (cadr input)))
                    (rows (cdr input))
                    (row-data (mapcar (lambda (row) (mapcar #'second row)) rows))
                    (csv (cons headers row-data)))
                 (format t "~{~{~A~^,~}~^~%~}" csv)))

ST,PITCH,DUR,KEYSIG,TIMESIG,FERMATA
8,67,4,1,12,0
12,67,8,1,12,0

personal_cloud · Accepted Answer · 2025-10-09 15:57:34Z

Don't use pyparsing, it's horribly slow. Roland's suggestion to use re makes a lot more sense, though his answer works only on specific forms of lisp input. Given the question's title, I imagine that a lot of folks come here looking to parse more general structures (in my case I was trying to parse a KiCad file). The following function is fully general and 50X faster than pyparsing for moderate sized input:

import re
TOK = re.compile('[()]|[^()" \t\n]+|("([^\\\\\"]|\\\\.)*")')

def Parse(inp):
    p     = 0
    stack = []
    while True:
        m = TOK.search(inp, p)
        g = m.group(0)
        p = m.end()
        #print(len(stack), g)
        if g == '(':
            stack.append([])
        elif g == ')':
            e = stack.pop()
            if not stack:
                return e
            stack[-1].append(e)
        else:
            stack[-1].append(g)

p = Parse('(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))')
from pprint import pprint
pprint(p)

The result is a raw parse tree:


['1',
 [['st', '8'],
  ['pitch', '67'],
  ['dur', '4'],
  ['keysig', '1'],
  ['timesig', '12'],
  ['fermata', '0']],
 [['st', '12'],
  ['pitch', '67'],
  ['dur', '8'],
  ['keysig', '1'],
  ['timesig', '12'],
  ['fermata', '0']]]

Then you can put each element in a dictionary and deal with it according to the application. We could probably improve the error handling a bit (it should stop if the space between tokens is not whitespace; it's probably an unterminated string literal).

In the case of my 8,000 line KiCad file, here is a time comparison:

2.1 seconds to parse using pyparsing
0.04 seconds to parse using re

Always be skeptical when anyone advises that something is "the right tool" for a job, unless you get paid by the hour. (Definitely get paid by the hour if you're required to work with hyped-up software packages.)

Collectives™ on Stack Overflow

Parsing a lisp file with Python

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Linked

Related