11

How many substrings can you make out of a string like abcd?

How can I get all of its substrings:

['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd']
2
  • 3
    You are asking two different questions: "How many combinations can you make…?" and, "How can I get it to look like…". The answers to those two questions are very different. Commented Oct 17, 2012 at 23:39
  • @MarceloCantos I don't see how that's true. One is just the length of the other. You can make sum(1...n), i.e. n*n(-1)/2 substrings of a string of length n. Commented Dec 17, 2016 at 22:09

5 Answers 5

12

Try this:

def consecutive_groups(iterable):
    s = tuple(iterable)
    for size in range(1, len(s)+1):
        for index in range(len(s)+1-size):
            yield iterable[index:index+size]

>>> print list(consecutive_groups('abcd'))
['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd']

And the number of combinations is simply equal to the sum from 1 to the length of the string, which is equivalent to n * (n + 1) / 2.

By the way, if you want to avoid duplicates, you can simply use a locally-defined set in the generator function, like so:

def consecutive_groups(iterable):
    s = tuple(iterable)
    seen = set()
    for size in range(1, len(s)+1):
        for index in range(len(s)+1-size):
            slc = iterable[index:index+size]
            if slc not in seen:
                seen.add(slc)
                yield slc

That code is a little more unwieldy and could probably be optimized for indentation, but it will do for a proof of concept.

Sign up to request clarification or add additional context in comments.

6 Comments

I'm not your downvoter, but it's not quite right. Try map(partial(reduce, operator.add), powerset('abcd')).
generators make hulk happy! +1 O.o
I just noticed this is a fine approach as long as the OP doesn't care for duplicated substrings. (e.g. what happens with 'abcdab')
either n or n + 1 are even, so a division by 2 will always yield an integer result. What do you mean with the comment about the true division?
@hochl Nothing, brain lapse. (I probably was mistakenly thinking that the division would be evaluated first, before the multiplication.) Editing.
|
11

Would this do?

import itertools
def substrings(x):
    for i, j in itertools.combinations(xrange(len(x)+1), 2):
        yield x[i:j]

or as generator expression:

(x[i:j] for i, j in itertools.combinations(xrange(len(x)+1), 2))

The expanded result for your example looks like this:

['a', 'ab', 'abc', 'abcd', 'b', 'bc', 'bcd', 'c', 'cd', 'd']

To order by length, use sort key=len.

Comments

2

This is what you want:

In [260]: S = 'abcd'

In [261]: list(itertools.chain.from_iterable([list(itertools.combinations(S,i)) for i in range(1,len(S))]))
Out[261]: 
[('a',),
 ('b',),
 ('c',),
 ('d',),
 ('a', 'b'),
 ('a', 'c'),
 ('a', 'd'),
 ('b', 'c'),
 ('b', 'd'),
 ('c', 'd'),
 ('a', 'b', 'c'),
 ('a', 'b', 'd'),
 ('a', 'c', 'd'),
 ('b', 'c', 'd')]

Or if you really want them all as strings, you could do:

In [262]: combos  = list(itertools.chain.from_iterable([list(itertools.combinations(S,i)) for i in range(1,len(S))]))

In [263]: [''.join(c) for c in combos]
Out[263]: 
['a',
 'b',
 'c',
 'd',
 'ab',
 'ac',
 'ad',
 'bc',
 'bd',
 'cd',
 'abc',
 'abd',
 'acd',
 'bcd']

EDIT To get only substrings of S:

In [270]: list(itertools.chain.from_iterable([[S[i:i+k] for i in range(len(S)-k)] for k in range(1,len(S)+1)])) + [S]
Out[270]: ['a', 'b', 'c', 'ab', 'bc', 'abc', 'abcd']

4 Comments

I think you want itertools.combinations(S, i+1), to get the unchanged string too. Also the list and square brackets around your inner generator are unnecessary. chain.from_iterable is perfectly happy taking generators from a generator.
is the empty string part of a solution? If yes you are missing it too. I think ac is not a valid solution either since b is missing in between.
for the second one (EDIT To get only substrings of S:) how would I find only substrings of length x?
@Mathime: You should be able to filter them after the fact: answer = [s for s in all_substrings if len(s) == x]
2

There are two questions there.

The first, How many substrings can you make out of a string like “abcd”? is a combinations like this:

import itertools
s='abcd'
com=[list(itertools.combinations(s,x)) for x in range(1,len(s)+1)]

print [''.join(e) for e in sum(com,[])]

prints:

['a', 'b', 'c', 'd', 'ab', 'ac', 'ad', 'bc', 'bd', 'cd', 'abc', 'abd', 'acd', 'bcd', 'abcd']

The second question is how to replicate your example (which is not a 'combination'). You can do that with this code:

>>> [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd']

6 Comments

I did not downvote but most probably because ac is not part of the solution.
actually ac still is not a substring because a and c are not neighbours to each other.
@hochl: Please look at the edit. It produces his example exactly. The problem is he is asking two questions.
if ac is a substring then quite possibly ca is a substring too since a and c are part of the string. I think he means a substring consists of consecutive ranges of characters.
@hochl: Please look at the second list. It addresses your comments. ac is not part of ['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd'] It replicates the example exactly.
|
2

I think this works too and while is not the most efficient, it has the attractive of using less complex features.

S = "abcd"
substrings = [S[i:j] for i in range(len(S)) for j in range(i+1,len(S)+1)]
substrings.sort(key=len)

Note however that this approach does not remove identical substrings that might appear. For example if the original substring was "abcdab", a, b and ab would appear twice.

1 Comment

In order to not include duplicate substrings, change the list comprehension [...] to a set comprehension {...}.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.