Python and Regex

Python and regular expressions play very nicely together - indeed the re package ships with the Python Standard Library. I find the documentation, however, to be more than a little dense, and it always takes me an age to wade through and find what I want in there.

This post is intended to be a quick reference for typical use cases. It will probably be updated as I come across more "standard" jobs.

Finding All Occurrences of a Pattern

This would find all the capital letters:

string = 'Why hello fine Sir, don\'t You look Lovely today!'
pattern = re.compile('[A-Z]') 
results = re.findall(pattern, string)

Printing the results then returns:

['W', 'S', 'Y', 'L']

Replacing All Occurrences of a Pattern

This would find and replace all repeated letters:

string = u'aa b c dd'
pattern = re.compile(ur'(\w)(\1)', re.UNICODE)
match = pattern.search(string)
while match:
    string = string.replace(match.group(0), 'HERE')
    match = pattern.search(string)

String then contains:

u'HERE b c HERE'

Alternatively you can use re.sub or re.subn:

substituted = re.sub(pattern, 'HERE', string)

A couple of things to note here:

  • When using unicode strings (generally a good idea), you need to remember the re.UNICODE flag.
  • It's also a good idea to use raw strings when comiling the expression (so that backslashes aren't interpreted as escape characters).

blogroll

social