亚洲人成人一区二区三区,亚洲男人在线无码视频,亚洲AV无码国产精品色

Python學習筆記（二）

python 異常、正則表達式
http://docs.python.org/library/re.html
http://docs.python.org/howto/regex.html#regex-howto

例 6.1. 打開一個不存在的文件
>>> fsock = open("/notthere", "r")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
IOError: [Errno 2] No such file or directory: '/notthere'
>>> try:
...     fsock = open("/notthere")
... except IOError:
...     print "The file does not exist, exiting gracefully"
... print "This line will always print"
The file does not exist, exiting gracefully
This line will always print

# Bind the name getpass to the appropriate function
try:
      import termios, TERMIOS
except ImportError:
      try:
          import msvcrt
      except ImportError:
          try:
              from EasyDialogs import AskPassword
          except ImportError:
              getpass = default_getpass
          else:
              getpass = AskPassword
      else:
          getpass = win_getpass
else:
      getpass = unix_getpass

例 6.10. 遍歷 dictionary
>>> import os
>>> for k, v in os.environ.items():
... print "%s=%s" % (k, v)
USERPROFILE=C:\Documents and Settings\mpilgrim
OS=Windows_NT
COMPUTERNAME=MPILGRIM
USERNAME=mpilgrim

[...略...]
>>> print "\n".join(["%s=%s" % (k, v)
... for k, v in os.environ.items()])
USERPROFILE=C:\Documents and Settings\mpilgrim
OS=Windows_NT
COMPUTERNAME=MPILGRIM

例 6.13. 使用 sys.modules
>>> import fileinfo
>>> print '\n'.join(sys.modules.keys())
win32api
os.path
os
fileinfo
exceptions

>>> fileinfo
<module 'fileinfo' from 'fileinfo.pyc'>
>>> sys.modules["fileinfo"]
<module 'fileinfo' from 'fileinfo.pyc'>

下面的例子將展示通過結合使用 __module__ 類屬性和 sys.modules dictionary 來獲取已知類所在的模塊。

例 6.14. __module__ 類屬性
>>> from fileinfo import MP3FileInfo
>>> MP3FileInfo.__module__
'fileinfo'
>>> sys.modules[MP3FileInfo.__module__]
<module 'fileinfo' from 'fileinfo.pyc'> 每個 Python 類都擁有一個內置的類屬性 __module__，它定義了這個類的模塊的名字。
將它與 sys.modules 字典復合使用，你可以得到定義了某個類的模塊的引用。

例 6.16. 構造路徑名
>>> import os
>>> os.path.join("c:\\music\\ap\\", "mahadeva.mp3")
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.join("c:\\music\\ap", "mahadeva.mp3")
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.expanduser("~")
'c:\\Documents and Settings\\mpilgrim\\My Documents'
>>> os.path.join(os.path.expanduser("~"), "Python")
'c:\\Documents and Settings\\mpilgrim\\My Documents\\Python'

例 7.2. 匹配整個單詞
>>> s = '100 BROAD'
>>> re.sub('ROAD$', 'RD.', s)
'100 BRD.'
>>> re.sub('\\bROAD$', 'RD.', s)
'100 BROAD'
>>> re.sub(r'\bROAD$', 'RD.', s)
'100 BROAD'
>>> s = '100 BROAD ROAD APT. 3'
>>> re.sub(r'\bROAD$', 'RD.', s)
'100 BROAD ROAD APT. 3'
>>> re.sub(r'\bROAD\b', 'RD.', s)
'100 BROAD RD. APT 3'

我真正想要做的是，當 'ROAD' 出現在字符串的末尾，并且是作為一個獨立的單詞時，而不是一些長單詞的一部分，才對他進行匹配。為了在正則表達式中表達這個意思，你利用 \b，它的含義是“單詞的邊界必須在這里”。在 Python 中，由于字符 '\' 在一個字符串中必須轉義，這會變得非常麻煩。有時候，這類問題被稱為“反斜線災難”，這也是 Perl 中正則表達式比 Python 的正則表達式要相對容易的原因之一。另一方面，Perl 也混淆了正則表達式和其他語法，因此，如果你發現一個 bug，很難弄清楚究竟是一個語法錯誤，還是一個正則表達式錯誤。
為了避免反斜線災難，你可以利用所謂的“原始字符串”，只要為字符串添加一個前綴 r 就可以了。這將告訴 Python，字符串中的所有字符都不轉義；'\t' 是一個制表符，而 r'\t' 是一個真正的反斜線字符 '\'，緊跟著一個字母 't'。我推薦只要處理正則表達式，就使用原始字符串；否則，事情會很快變得混亂 (并且正則表達式自己也會很快被自己搞亂了)。

例 7.4. 檢驗百位數
>>> import re
>>> pattern = '^M?M?M?(CM|CD|D?C?C?C?)$'
>>> re.search(pattern, 'MCM')
<SRE_Match object at 01070390>
>>> re.search(pattern, 'MD')
<SRE_Match object at 01073A50>
>>> re.search(pattern, 'MMMCCC')
<SRE_Match object at 010748A8>
>>> re.search(pattern, 'MCMC')
>>> re.search(pattern, '')
<SRE_Match object at 01071D98>

例 7.5. 老方法：每一個字符都是可選的
>>> import re
>>> pattern = '^M?M?M?$'
>>> re.search(pattern, 'M')
<_sre.SRE_Match object at 0x008EE090>
>>> pattern = '^M?M?M?$'
>>> re.search(pattern, 'MM')
<_sre.SRE_Match object at 0x008EEB48>
>>> pattern = '^M?M?M?$'
>>> re.search(pattern, 'MMM')
<_sre.SRE_Match object at 0x008EE090>
>>> re.search(pattern, 'MMMM')
>>>

例 7.6. 一個新的方法：從 n 到 m
>>> pattern = '^M{0,3}$'
>>> re.search(pattern, 'M')
<_sre.SRE_Match object at 0x008EEB48>
>>> re.search(pattern, 'MM')
<_sre.SRE_Match object at 0x008EE090>
>>> re.search(pattern, 'MMM')
<_sre.SRE_Match object at 0x008EEDA8>
>>> re.search(pattern, 'MMMM')
>>>

對于個位數的正則表達式有類似的表達方式，我將省略細節，直接展示結果。

>>> pattern = '^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$'
用另一種 {n,m} 語法表達這個正則表達式會如何呢？這個例子展示新的語法。

例 7.8. 用 {n,m} 語法確認羅馬數字
>>> pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'
>>> re.search(pattern, 'MDLV')
<_sre.SRE_Match object at 0x008EEB48>
>>> re.search(pattern, 'MMDCLXVI')
<_sre.SRE_Match object at 0x008EEB48>

例 7.9. 帶有內聯注釋 (Inline Comments) 的正則表達式
>>> pattern = """
    ^                   # beginning of string
    M{0,3}              # thousands - 0 to 3 M's
    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                        #            or 500-800 (D, followed by 0 to 3 C's)
    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                        #        or 50-80 (L, followed by 0 to 3 X's)
    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                        #        or 5-8 (V, followed by 0 to 3 I's)
    $                   # end of string
    """
>>> re.search(pattern, 'M', re.VERBOSE)
<_sre.SRE_Match object at 0x008EEB48>
>>> re.search(pattern, 'MCMLXXXIX', re.VERBOSE)
<_sre.SRE_Match object at 0x008EEB48>
>>> re.search(pattern, 'MMMDCCCLXXXVIII', re.VERBOSE)
<_sre.SRE_Match object at 0x008EEB48>
>>> re.search(pattern, 'M')
當使用松散正則表達式時，最重要的一件事情就是：必須傳遞一個額外的參數 re.VERBOSE，該參數是定義在 re 模塊中的一個常量，標志著待匹配的正則表達式是一個松散正則表達式。正如你看到的，這個模式中，有很多空格 (所有的空格都被忽略)，和幾個注釋 (所有的注釋也被忽略)。如果忽略所有的空格和注釋，它就和前面章節里的正則表達式完全相同，但是具有更好的可讀性。
>>> re.search(pattern, 'M')
這個沒有匹配。為什么呢？因為沒有 re.VERBOSE 標記，所以 re.search 函數把模式作為一個緊湊正則表達式進行匹配。Python 不能自動檢測一個正則表達式是為松散類型還是緊湊類型。Python 默認每一個正則表達式都是緊湊類型的，除非你顯式地標明一個正則表達式為松散類型。

例 7.16. 解析電話號碼 (最終版本)
>>> phonePattern = re.compile(r'''
                # don't match beginning of string, number can start anywhere
    (\d{3})     # area code is 3 digits (e.g. '800')
    \D*         # optional separator is any number of non-digits
    (\d{3})     # trunk is 3 digits (e.g. '555')
    \D*         # optional separator
    (\d{4})     # rest of number is 4 digits (e.g. '1212')
    \D*         # optional separator
    (\d*)       # extension is optional and can be any number of digits
    $           # end of string
    ''', re.VERBOSE)
>>> phonePattern.search('work 1-(800) 555.1212 #1234').groups()
('800', '555', '1212', '1234')
>>> phonePattern.search('800-555-1212')
('800', '555', '1212', '')

現在，你應該熟悉下列技巧：

^ 匹配字符串的開始。
$ 匹配字符串的結尾。
\b 匹配一個單詞的邊界。
\d 匹配任意數字。
\D 匹配任意非數字字符。
x? 匹配一個可選的 x 字符 (換言之，它匹配 1 次或者 0 次 x 字符)。
x* 匹配0次或者多次 x 字符。
x+ 匹配1次或者多次 x 字符。
x{n,m} 匹配 x 字符，至少 n 次，至多 m 次。
(a|b|c) 要么匹配 a，要么匹配 b，要么匹配 c。
(x) 一般情況下表示一個記憶組 (remembered group)。你可以利用 re.search 函數返回對象的 groups() 函數獲取它的值。

http://www.woodpecker.org.cn/diveintopython/regular_expressions/phone_numbers.html

Regular expression pattern syntax

Element

Meaning

.

Matches any character except \n (if DOTALL, also matches \n)

^

Matches start of string (if MULTILINE, also matches after \n)

$

Matches end of string (if MULTILINE, also matches before \n)

*

Matches zero or more cases of the previous regular expression; greedy (match as many as possible)

+

Matches one or more cases of the previous regular expression; greedy (match as many as possible)

?

Matches zero or one case of the previous regular expression; greedy (match one if possible)

*? , +?, ??

Non-greedy versions of *, +, and ? (match as few as possible)

{m,n}

Matches m to n cases of the previous regular expression (greedy)

{m,n}?

Matches m to n cases of the previous regular expression (non-greedy)

[...]

Matches any one of a set of characters contained within the brackets

|

Matches expression either preceding it or following it

(...)

Matches the regular expression within the parentheses and also indicates a group

(?iLmsux)

Alternate way to set optional flags; no effect on match

(?:...)

Like (...), but does not indicate a group

(?P<id>...)

Like (...), but the group also gets the name id

(?P=id)

Matches whatever was previously matched by group named id

(?#...)

Content of parentheses is just a comment; no effect on match

(?=...)

Lookahead assertion; matches if regular expression ... matches what comes next, but does not consume any part of the string

(?!...)

Negative lookahead assertion; matches if regular expression ... does not match what comes next, and does not consume any part of the string

(?<=...)

Lookbehind assertion; matches if there is a match for regular expression ... ending at the current position (... must match a fixed length)

(?<!...)

Negative lookbehind assertion; matches if there is no match for regular expression ... ending at the current position (... must match a fixed length)

\number

Matches whatever was previously matched by group numbered number (groups are automatically numbered from 1 up to 99)

\A

Matches an empty string, but only at the start of the whole string

\b

Matches an empty string, but only at the start or end of a word (a maximal sequence of alphanumeric characters; see also \w)

\B

Matches an empty string, but not at the start or end of a word

\d

Matches one digit, like the set [0-9]

\D

Matches one non-digit, like the set [^0-9]

\s

Matches a whitespace character, like the set [ \t\n\r\f\v]

\S

Matches a non-white character, like the set [^ \t\n\r\f\v]

\w

Matches one alphanumeric character; unless LOCALE or UNICODE is set, \w is like [a-zA-Z0-9_]

\W

Matches one non-alphanumeric character, the reverse of \w

\Z

Matches an empty string, but only at the end of the whole string

\\

Matches one backslash character

Regular expression pattern syntax
Element	Meaning
.	Matches any character except `\n` (if `DOTALL`, also matches `\n`)
^	Matches start of string (if `MULTILINE`, also matches after `\n`)
$	Matches end of string (if `MULTILINE`, also matches before `\n`)
*	Matches zero or more cases of the previous regular expression; greedy (match as many as possible)
+	Matches one or more cases of the previous regular expression; greedy (match as many as possible)
?	Matches zero or one case of the previous regular expression; greedy (match one if possible)
`*?` , `+?`, `??`	Non-greedy versions of `*`, `+`, and `?` (match as few as possible)
{`m`,`n`}	Matches `m` to `n` cases of the previous regular expression (greedy)
{`m`,`n`}?	Matches `m` to `n` cases of the previous regular expression (non-greedy)
[...]	Matches any one of a set of characters contained within the brackets
\|	Matches expression either preceding it or following it
(...)	Matches the regular expression within the parentheses and also indicates a group
(?iLmsux)	Alternate way to set optional flags; no effect on match
(?:...)	Like `(...)`, but does not indicate a group
(?P<`id`>...)	Like `(...)`, but the group also gets the name `id`
(?P=`id`)	Matches whatever was previously matched by group named `id`
(?#...)	Content of parentheses is just a comment; no effect on match
(?=...)	Lookahead assertion; matches if regular expression `..`. matches what comes next, but does not consume any part of the string
(?!...)	Negative lookahead assertion; matches if regular expression `..`. does not match what comes next, and does not consume any part of the string
(?<=...)	Lookbehind assertion; matches if there is a match for regular expression `..`. ending at the current position (`..`. must match a fixed length)
(?<!...)	Negative lookbehind assertion; matches if there is no match for regular expression `..`. ending at the current position (`..`. must match a fixed length)
\`number`	Matches whatever was previously matched by group numbered `number` (groups are automatically numbered from 1 up to 99)
\A	Matches an empty string, but only at the start of the whole string
\b	Matches an empty string, but only at the start or end of a word (a maximal sequence of alphanumeric characters; see also `\w`)
\B	Matches an empty string, but not at the start or end of a word
\d	Matches one digit, like the set `[0-9]`
\D	Matches one non-digit, like the set `[^0-9]`
\s	Matches a whitespace character, like the set `[` `\t\n\r\f\v]`
\S	Matches a non-white character, like the set `[^` `\t\n\r\f\v]`
\w	Matches one alphanumeric character; unless `LOCALE` or `UNICODE` is set, `\w` is like `[a-zA-Z0-9_]`
\W	Matches one non-alphanumeric character, the reverse of `\w`
\Z	Matches an empty string, but only at the end of the whole string
\\	Matches one backslash character

posted on 2009-08-22 23:48 Frank_Fang 閱讀(1883) 評論(0) 編輯收藏所屬分類: Python學習

Regular expression pattern syntax

常用鏈接

留言簿(1)

隨筆分類(204)

隨筆檔案(100)

收藏夾(8)

牛人博客鏈接

搜索

最新評論

閱讀排行榜

評論排行榜


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: Python學習筆記（二） Python學習筆記一