未加星标

Python: Regular Expressions Part One

字体大小 | |
[开发(python) 所属分类 开发(python) | 发布者 店小二04 | 时间 2016 | 作者 红领巾 ] 0人收藏点击收藏

Regular expressions are a powerful tool for various kinds of string manipulation.

They are a domain specific language (DSL) that is present as a library in most modern programming languages, not just python.

They are useful for two main tasks:

verifying that strings match a pattern (for instance, that a string has the format of an email address),

performing substitutions in a

string

(such as changing all American spellings to British ones).

Domain specific languages are highly specialized mini programming languages.

Regular expressions are a popular example, and SQL (for database manipulation) is another.

Private domain-specific languages are often used for specific industrial purposes.

Regular expressions in Python can be accessed using the re module, which is part of the standard library.

After you’ve defined a regular expression, the re.match function can be used to determine whether it matches at the beginning of a string .

If it does, match returns an object representing the match, if not, it returns None .

To avoid any confusion while working with regular expressions, we would use raw strings as r”expression” .

Raw strings don’t escape anything, which makes use of regular expressions easier.

import re
pattern = r"spam"
if re.match(pattern, "spamspamspam"):
print("Match")
else:
print("No match")

The above example checks if the pattern “spam” matches the string and prints “Match” if it does.

Here the pattern is a simple word, but there are various characters, which would have special meaning when they are used in a regular expression.

Other functions to match patterns are re.search and re.findall.

The function re.search finds a match of a pattern anywhere in the string.

The function re.findall returns a list of all substrings that match a pattern.

Example: import re
pattern = r"spam"
if re.match(pattern, "eggspamsausagespam"):
print("Match")
else:
print("No match")
if re.search(pattern, "eggspamsausagespam"):
print("Match")
else:
print("No match")
print(re.findall(pattern, "eggspamsausagespam"))

In the example above, the match function did not match the pattern, as it looks at the beginning of the string.

The search function found a match in the string.

The function re.finditer does the same thing as re.findall , except it returns an iterator, rather than a list.

The regex search returns an object with several methods that give details about it.

These methods include group which returns the string matched, start and end which return the start and ending positions of the match, and span which returns the start and end positions as a tuple.

import re
pattern = r"pam"
match = re.search(pattern, "eggspamsausage")
if match:
print(match.group())
print(match.start())
print(match.end())
print(match.span())

Search & Replace

One of the most important re methods that use regular expressions is sub.

Syntax:

re.sub(pattern, repl, string, max=0)

This method replaces all occurrences of the pattern in string with repl, substituting all occurrences, unless max provided. This method returns the modified string.

Example:

import re
str = "My name is David. Hi David."
pattern = r"David"
newstr = re.sub(pattern, "Amy", str)
print(newstr) Metacharacters:

Metacharacters are what make regular expressions more powerful than normal string methods.

They allow you to create regular expressions to represent concepts like “one or more repetitions of a vowel”.

The existence of metacharacters poses a problem if you want to create a regular expression (or regex ) that matches a literal metacharacter, such as “$”. You can do this by escaping the metacharacters by putting a backslash in front of them.

However, this can cause problems, since backslashes also have an escaping

function

in normal Python strings. This can mean putting three or four backslashes in a row to do all the escaping.

To avoid this, you can use a raw string , which is a normal string with an “r” in front of it. We saw usage of raw strings in the previous lesson.

The first metacharacter we will look at is . (dot).This matches any character, other than a new line.

Example:

import re
pattern = r"gr.y"
if re.match(pattern, "grey"):
print("Match 1")
if re.match(pattern, "gray"):
print("Match 2")
if re.match(pattern, "blue"):
print("Match 3")

The next two metacharacters are ^ and $. These match the start and end of a string, respectively.

Example:

import re
pattern = r"^gr.y$"
if re.match(pattern, "grey"):
print("Match 1")
if re.match(pattern, "gray"):
print("Match 2")
if re.match(pattern, "stingray"):
print("Match 3")

The pattern “ ^gr.y$ ” means that the string should start with gr , then follow with any character, except a newline, and end with y .

Character Classes:

Character classes provide a way to match only one of a specific set of characters.

A character class is created by putting the characters it matches inside square brackets .

Example:

import re
pattern = r"[aeiou]"
if re.search(pattern, "grey"):
print("Match 1")
if re.search(pattern, "qwertyuiop"):
print("Match 2")
if re.search(pattern, "rhythm myths"):
print("Match 3") The pattern [aeiou] in the search function matches all strings that contain any one of the characters defined.

Character classes can also match ranges of characters.

Some examples:

The class [a-z] matches any lowercase alphabetic character. The class [G-P] matches any uppercase character from G to P. The class [0-9] matches any digit. Multiple ranges can be included in one class. For example, [A-Za-z] matches a letter of any case.

Example:

import re
pattern = r"[A-Z][A-Z][0-9]"
if re.search(pattern, "LS8"):
print("Match 1")
if re.search(pattern, "E3"):
print("Match 2")
if re.search(pattern, "1ab"):
print("Match 3")

The pattern in the example above matches strings that contain two alphabetic uppercase letters followed by a digit.

Place a ^ at the start of a character class to invert it.

This causes it to match any character other than the ones included.

Other metacharacters such as $ and . , have no meaning within character classes.

The metacharacter ^ has no meaning unless it is the first character in a class.

Example:

import re
pattern = r"[^A-Z]"
if re.search(pattern, "this is all quiet"):
print("Match 1")
if re.search(pattern, "AbCdEfG123"):
print("Match 2")
if re.search(pattern, "THISISALLSHOUTING"):
print("Matc

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

主题: PythonSQLUTTI
分页:12
转载请注明
本文标题:Python: Regular Expressions Part One
本站链接:http://www.codesec.net/view/479735.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(46)