Python URL matching (Regex)

Question

I've tried to match a the below URL for a couple of hours and can't seem to figure it out and Im quite sure its not that difficult:

The URL can be this:

/course/lesson-one/

or it can also be:

/course/lesson-one/chapter-one/

What I have is the following which matches the second URL:

/course/([a-zA-Z]+[-a-zA-Z]*)/([a-zA-Z]+[-a-zA-Z]*)/

What I want is for the second part to be optional but I can't figure it out the closest I got was the following:

/course/([a-zA-Z]+[-a-zA-Z]*)/*([a-zA-Z]+[-a-zA-Z]*)/

But the above for some reason leaves out the last letter of the word for example if the URL is

/course/computers/

I end up with the string 'computer'

kennytm · Accepted Answer · 2013-05-08 20:33:19Z

1

You use ? if you need optional parts.

/course/([a-zA-Z][-a-zA-Z]*)/([a-zA-Z][-a-zA-Z]*/)?
#                                                 ^

(Note that [a-zA-Z]+[-a-zA-Z]* is equivalent to [a-zA-Z][-a-zA-Z]*.)

Use an additional grouping (?:…) to exclude the / from the match, while allowing multiple elements to be optional at once:

/course/([a-zA-Z][-a-zA-Z]*)/(?:([a-zA-Z][-a-zA-Z]*)/)?
#                            ~~~                     ~^

Your 2nd regex swallows the last character, because:

  /course/([a-zA-Z]+[-a-zA-Z]*)/*([a-zA-Z]+[-a-zA-Z]*)/
          ^^^^^^^^^^^^^^^^^^^^^  ~~~~~~~~~~~~~~~~~~~~~
        this matches 'computer'  and this matches the 's'.

The second group in this regex required to match some alphabets with length 1 or more due to the +, so the 's' must belong there.

edited May 8, 2013 at 20:33

answered May 8, 2013 at 20:27

kennytm

526k110 gold badges1.1k silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tkingovr Over a year ago

Ok thank you, it's the question mark that Im missing. Just glanced at the docs and its a one liner explains why I overlooked it!

Tkingovr Over a year ago

The second regex you included above is exactly what I needed also thank you for explaining it real well +100. Thanks to everyone that contributed below.

Nolen Royalty · Accepted Answer · 2013-05-08 20:28:08Z

1

use a "?" after something to make it considered optional.

>>> r = r"/course/([a-zA-Z]+[-a-zA-Z]*)(/[A-Z[a-z]+[-a-zA-Z]*)?"
>>> s = "/course/lesson-one/chapter-one/"
>>> re.match(r, s).groups()
('lesson-one', '/chapter-one')
>>> s = "/course/computers/"
>>> re.match(r, s).groups()
('computers', None)

answered May 8, 2013 at 20:28

Nolen Royalty

18.7k4 gold badges43 silver badges51 bronze badges

Comments

Simeon Visser · Accepted Answer · 2013-05-08 20:29:06Z

You can use the following regex:

'/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?'

This makes the second part optional and still matches each of the parts of the URL.

Note that the second part of the URL has two groups: one that matches /chapter-one/ and one that matches chapter-one

>>> re.match('/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?', '/course/lesson-one/chapter-one/').groups()
('lesson-one', '/chapter-one/', 'chapter-one')

Similarly:

>>> re.match('/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?', '/course/lesson-one/').groups()
('lesson-one', None, None)

Collectives™ on Stack Overflow

Python URL matching (Regex)

3 Answers 3

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Linked

Related