1

I have strings in the format "1-3 6:10-11 7-9" and from them I want to create number sets as follows {1,2,3,6,10,11,7,8,9}.

For creating the set from the range of numbers, I have the following code:

def create_set(src):
    lset = []
    if len(src) > 0:
        pos = src.find('-')
        if pos != -1:
            first = int(src[:pos])
            last  = int(src[pos+1:])
        else:
            return [int(src)]  # Only one number
        for j in range (first, last+1): 
            lset.append(j)
        return set(lset)

But I cannot figure out how to correctly treat the ':' when it appears in the string. Can someone help me?

Thanks in advance!

EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?

5
  • I would be tempted to parse it with a regular expression - I am no expert but that would be the way I would do it - since the 'syntax' seems to be regular. Commented Aug 20, 2016 at 22:02
  • @xnx my thoughts exactly Commented Aug 20, 2016 at 22:03
  • 1
    Why does the 6 have a colon? Commented Aug 20, 2016 at 22:04
  • The colon denotes that the number before it should be added as a single element in the set. And I tried the solution suggested by xnx, but it does not work because the code as is does not recognize a string like '1-2-3' as a valid range. Commented Aug 20, 2016 at 22:08
  • I meant more like "why can't you just have ranges or single numbers?", then split on a space, and handle the ranges appropriately, else just add the single number. Commented Aug 20, 2016 at 22:13

2 Answers 2

5

Something like this might work for you:

s = '1-3 6:10-11 7-9'
s = s.replace(':', ' ')
lset = set()
fs = s.split()
for f in fs:
    r = f.split('-')
    if len(r)==1:
        # add a single number
        lset.add(int(r[0]))
    else:
        # add a range of numbers (inclusive of the endpoints)
        lset |= set(range(int(r[0]), int(r[1])+1))
print(lset)
Sign up to request clarification or add additional context in comments.

1 Comment

This answer is fine but se below for an alternative, perhaps simpler, option.
1

EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?

Perhaps a cleaner (and slightly more efficient) way:

import re
import itertools

allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)
expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]
print {x for x in itertools.chain.from_iterable(expanded)}

Explanations:

Match all strings like 'a-b' or 'a:' and return a list of (a, b) and (a, '') pairs respectively:

allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)

This produces:

[('1', '3'), ('6', ''), ('10', '11'), ('7', '9')]

Using list comprehension expand all pairs of (x, y) into the full list of numbers in the range (x, y + 1), taking care to handle the (x, '') case as (x, x+1):

expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]

This produces:

[[1, 2, 3], [6], [10, 11], [7, 8, 9]]

Use itertools.chain.from_iterable() to transform the list of lists into a single iterable which is iterated by a set comprehension into the final set:

print {x for x in itertools.chain.from_iterable(expanded)}

This produces:

set([1, 2, 3, 6, 7, 8, 9, 10, 11])

1 Comment

Thanks, FujiApple, this solution has also the advantage of returning a sorted list of numbers.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.