環(huán)境: oracle 10g r2
更多內(nèi)容可以參考:
<<Oracle? Database Application Developer's Guide - Fundamentals 10g Release 2 (10.2)>>必須明白的是,oracle的正則表達式和其它的語言的并不是完全一樣。
oracle支持以下標準的正則表達式:
- IEEE Portable Operating System Interface (POSIX) standard draft 1003.2/D11.2
- Unicode Regular Expression Guidelines of the Unicode Consortium
- Extends the matching capabilities for multilingual data beyond what is specified in the POSIX standard(即posix的多語言支持).
- Adds support for the common Perl regular expression extensions that are not
included in the POSIX standard but do not conflict with it. Oracle Database
provides built-in support for some of the most heavily used Perl regular
expression operators, for example, character class shortcuts, the non-greedy
modifier, and so on(即部分perl標準,不包含在posix標準,但是又不與之沖突的)
---2008-09-08 做了個實驗,關(guān)鍵的要點是三個,重復次數(shù){},+,*,?
---以及匹配模式[...] ,[^...],(...),(...)
--以及轉(zhuǎn)義符號 \d,\D,\\等。
------------------------------------------------------------------------------------------------------------------
ID VALUE
--------------------------------------- ----------------------------------------
9 lzf is a chinese
10 1231231231lzf
11 perfect
1 13625061300
2 059183383335
3 13959029697
4 13950323157
5 who is your daddy?
6 who are you?
6 what is your name?
6 I am 20 years old!
7 股市下挫到2100點左右
8 lzf ' is very happy 136ok
8 do you know lzf,ok?
8 ok,lzf is my brother
------------------------------------------------------------------------------------------------------------------
select * from test_reg where regexp_like(value,'^1(.+)7$') --1開頭,7結(jié)尾的內(nèi)容
select * from test_reg where regexp_like(value,'\d+') --一個或者多個數(shù)字
select * from test_reg where regexp_like(value,'^[a-zA-Z]{1,}[a-zA-Z]+$') --全部字符的.[:digit:]
select * from test_reg where regexp_like(value,'^[0-9]{1,}[0-9]+$') --全部數(shù)字的.
select * from test_reg where regexp_like(value,'[y][o][u]') --包含you
select * from test_reg where regexp_like(value,'(you)') --包含you
select * from test_reg where regexp_like(value,'[you]') --包含y o u的,不分順序
select * from test_reg where regexp_like(value,'(lzf){1,}(.){0,}(ok)') --包含lzf 和ok的. 但是也僅僅限制與lzf 在ok前
以上試驗是在看了下文以及參考了OReilly.Oracle.PL.SQL.Programming.4th.Edition.Aug.2005.chs后作出的.
--------------
2010/08/04 增加以下例子---
--以'POSIX('開頭的串,以下幾個表達式的結(jié)果都是一樣的。
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d{1,}?\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1}),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]{1}),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]+),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select 1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
----------
--位于字符串中的串
select 1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select 1 from dual where regexp_like('P_OSI_X(JYL_GP_R,N,13)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)')
--測試regexp_substr,基本的還是regexp_like
SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)') FROM DUAL 2 / REGEXP_SUBSTR('P_OSI_X(JYL_GP_ ------------------------------ P_OSI_X(JYL_GP_R,N,13) SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)',1,2) FROM DUAL 2 / REGEXP_SUBSTR('P_OSI_X(JYL_GP_ ------------------------------ AVG(JYL_JJ_R,Y,3) |
2010/08/26 新增例子
-- delete the first () select regexp_replace('dad(disc_id=50003118)add','(^([A-Z]{1,}\()|(\)[A-Z]{1,})$)','',1,0,'i') FROM DUAL -- match pattern using classes select regexp_substr(upper('dad(ssdsd89\9 dd== (sddf(d)),sdsd)add'),'^([A-Z]{1,}\()[[:space:][:alnum:]\,\_\(\)\=]{1,}\)[A-Z]{1,}$',1,1,'i') FROM DUAL
|
OReilly. 的參考內(nèi)容Section A.1. Metacharacters
A.1. Metacharacters
The R2 column in Tables A-1 through A-3 indicates which metacharacters were introduced in Oracle Database 10g Release 2. All others were introduced in Release 1.
Table A-1. Character-matching metacharacters
Syntax |
R2? |
Description |
. |
|
Matches any single character except for newline. Will match newline when the n flag is set. On Windows, Linux, and Unix platforms, chr(10) is recognized as the newline. |
[ ... ] |
|
Defines a matching list that matches any character listed between the brackets. You may specify ranges of characters, as in a-z. These ranges are interpreted based on the NLS_SORT setting.
A dash (-) is a literal when it occurs first or last in the list (e.g., [abc-]). A closing-bracket (]) is a literal when it occurs first in the list (e.g., []abc]). A caret (^) in the first position makes the list a nonmatching list (see the next entry). |
[^ ... ] |
|
Matches any character not listed between the brackets. Referred to as a "nonmatching list." |
[:class:] |
|
Matches any character that belongs to the specified character class. May only be used within a matching list: [[:class:]abc] is a valid expression, [:class:]abc is not. Table A-5 lists the valid character class names. |
[.coll.] |
|
Matches the specified collation element, which may be one or more characters. May only be used within a matching list. For example, the expression [[.ch.]] matches the Spanish letter "ch". Table A-4 lists the valid collation elements. |
[=char=] |
|
Matches all characters that share the same base character as char. May be used only within a matching list. For example, [[=e=]] matches any of: "eé?è???E". |
\d |
|
Matches any digit. Equivalent to [[:digit:]]. |
\D |
|
Matches any nondigit. Equivalent to [^[:digit:]] |
\w |
|
Matches any "word character." Word characters are defined to be alphabetic characters, numeric characters, and the underscore. |
\W |
|
Matches any nonword character. |
\s |
|
Matches any whitespace character. Equivalent to [[:space:]]. |
\S |
|
Matches nonwhitespace characters. Equivalent to [^[:space:]]. |
Table A-2. Quantifiers
Syntax |
R2? |
Description |
+ |
|
One or more |
? |
|
Zero or one |
* |
|
Zero or more |
{m} |
|
Exactly m occurrences |
{m,} |
|
At least m occurrences |
{m,n} |
|
At least m, and at most n occurrences |
+? |
|
One or more, but nongreedy |
?? |
|
Zero or one, but nongreedy |
{m}? |
|
The same as {m} |
{m,}? |
|
At least m occurrences, but nongreedy and stops as soon as m occurrences are reached |
{m,n}? |
|
At least m, and at most n occurrences, but nongreedy; when possible, m occurrences are matched |
Table A-3. Other metacharacters
Syntax |
R2? |
Description |
| |
|
Specifies an alternation. An alternation within a subexpression doesn't extend beyond the subexpression. |
( ...) |
|
Defines a subexpresson. |
\n |
|
References the text matched by the nth subexpression. Backreferences may range from \1 through \9. |
\ |
|
When not followed by a digit, the \ is an escape character. For example, use the pattern \\1 to look for a single backslash followed by the digit 1, use \( to look for an opening-parentheses (rather than begin a subexpression), etc. |
^ |
|
Anchors an expression to the beginning of the string (in multiline mode, to the beginning of a line). |
$ |
|
Anchors an expression to the end of the string (in multiline mode, to the end of a line). |
\A |
|
Anchors an expression to the beginning of the string regardless of whether multiline mode is specified. |
\Z |
|
Anchors an expression to the end of the string, or the a newline that happens to be ending a string, regardless of whether multiline mode is specified. |
\z |
|
Anchors an expression to the end of the string regardless of whether multiline mode is specified. |
Table A-4. Collation elements
NLS_SORT |
Multicharacter collation elements |
XDANISH |
aa
oe |
AA
OE |
Aa
Oe |
XSPANISH |
ch
ll |
CH
LL |
Ch
Ll |
XHUNGARIAN |
cs
gy
ly
ny
sz
ty
zs |
CS
GY
LY
NY
SZ
TY
ZS |
Cs
Gy
Ly
Ny
Sz
Ty
Zs |
XCZECH |
ch |
CH |
Ch |
XCZECH_PUNCTUATION |
ch |
CH |
Ch |
XSLOVAK |
dz
d_
ch |
DZ
D_
CH |
Dz
D_
Ch |
XCROATIAN |
d_
lj
nj |
D_
LJ
Nj |
D_
Lj
NJ |
Table A-5. Supported character classes
Class |
Description |
[:alnum:] |
Alphanumeric characters (same as [:alpha:] + [:digit:]) |
[:alpha:] |
Alphabetic characters only |
[:blank:] |
Blank space characters, such as space and tab |
[:cntrl:] |
Nonprinting, or control characters |
[:digit:] |
Numeric digits |
[:graph:] |
Graphical characters (same as [:punct:] + [:upper:] + [:lower:] + [:digit:]) |
[:lower:] |
Lowercase letters |
[:print:] |
Printable characters |
[:punct:] |
Punctuation characters |
[:space:] |
Whitespace characters such as space, formfeed, newline, carriage return, horizontal tab, and vertical tab |
[:upper:] |
Uppercase letters |
[:xdigit:] |
Hexadecimal characters |
|
對于類的使用,要特別注意點,因為基本上要包含在一對中括號中。
例一:檢驗非漢字的函數(shù)例如可以定義一個檢驗非漢字的函數(shù),不考慮其它UNICODE
CREATE OR REPLACE Function func_existNormalChar(chars In Varchar2) Return Pls_Integer
Is
vResult Pls_Integer;
Begin
Select 1
Into vresult From dual
Where regexp_like(chars,'[!-~ ,。、?’“;:『』【】+=-——~!◎#¥%……※×()]+')
Or regexp_like(chars,'[[:space:]+]')
Or regexp_like(chars,'[[:blank:]+]')
Or regexp_like(chars,'[[:cntrl:]+]');
Return vresult;
Exception
When Others Then
Return 0;
End;
例二:出現(xiàn)次數(shù)的檢驗
序號 |
例子 |
結(jié)果 |
說明 |
1 |
regexp_replace('a12c','[[:digit:]]+','*') |
a*c |
測試"+",即出現(xiàn)一次或者更多次的.由于采取的是連續(xù)的匹配,所以結(jié)果上是把多個數(shù)字合并為一個替代字符串,為了理解連續(xù)匹配的意思,看序號9更能夠明白.這是總體偏少的模式 |
2 |
regexp_replace('a12c','[[:digit:]]?','*') |
*a***c* |
測試"?",即0或者一次.這個符號,會在任意一個其它字符邊上插入一個假設(shè)出現(xiàn)的匹配類型符號,如例子,a的左邊本來是沒有,但是表達式認為數(shù)字字符在a的左邊出現(xiàn)了0次(這個是成立的).然后還要逐個替換過去,這是總體偏多的模式,過于匹配和逐個匹配結(jié)合
|
3 |
regexp_replace('a12c','[[:digit:]]*','*')
|
*a**c* |
測試"*",即0或者更多次數(shù),這個模式類似與?類似,采取連續(xù)匹配+多余匹配 |
4 |
regexp_replace('a12c','[[:digit:]]+?','*')
|
a**c |
測試"+?",匹配在前面一次或者多次的串,采取的是逐個匹配的策略(如果有連續(xù)的情況出現(xiàn)),這個是通常最希望使用的模式. |
5 |
regexp_replace('a12c','[[:digit:]]*?','*') |
*a*1*2*c* |
測試"*?",匹配0或者更多次,難于理解! 但可以簡單地理解為為在各個字符中以及兩邊插入替代串.
|
6 |
regexp_replace('a12c','[[:digit:]]??','*') |
*a*1*2*c* |
測試"??",匹配0或者1次,難于理解啊! 但可以簡單地理解為為在各個字符中以及兩邊插入替代串. |
7 |
regexp_replace('a12c','[[:digit:]]{1,}','*')
|
a*c |
測試{1,},同+,結(jié)果已經(jīng)說明 |
8 |
|
|
|
9 |
regexp_replace('a1x2c','[[:digit:]]+','*') |
a*x*c |
測試"+",這個例子說明,如果被檢測的字符串中特定的匹配串如果不是連續(xù)的(多次),那么替代的還是屬于正常的范圍. |
其它的次數(shù)模式暫時沒有測試,對于 *,?,+ ,*?,+?,??,基本有其對應的模式 {n,m}? 或者是{n,}? ,除了*,*?之外,因為此二者表示的都是0或者多次(大于1),無法用{n,m}的方式表示.
例三:多個類的并用
很多時候使用類還是很方便的,例如[:digit:]表示數(shù)字.[:space:]表示空格,換行,回車等等.
如果希望把特定串中的數(shù)字和回車空格等都替換為'-',則可以使用 "|"(替代操作符,其實就是類似一般高級語言中的或操作符號).
select regexp_replace('adfa jkjk jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-') from dual;
結(jié)果如下:
adfa-jkjk-jk-jk
已經(jīng)把空格和回車符號都替換為'-'了.
又例如:
Select regexp_replace('a1b c1d jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-') From dual
結(jié)果如下:
a-b-c-d-jk-jk
例四:迭代替換
例如希望把以下串替換掉
'<xxxx <often> <common> <fat> >'
關(guān)于范圍操作符號 -
發(fā)音:橫桿或者減號
在列表匹配操作情況下,如果沒有特別的設(shè)置,就是表示匹配一個范圍:
例子:把字符串中非數(shù)字+-()的符號替換為空
這是一個不成功的匹配
SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+\-\(\)]','',1) from dual; REGEXP_REPLACE('1+[2C-3([AD]KM ------------------------------ 1+23(4
這也是一個錯誤的匹配,因為減號這個時候表示一個范圍 SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+-\(\)]','',1) from dual; REGEXP_REPLACE('1+[2C-3([AD]KM ------------------------------ 1+[2-3([,4
二者的區(qū)別在于單獨的一個減號,如果前面沒有跟上斜杠\,那么就是表示范圍,否則就是減號本身。
但即使如此,依然沒有達到目的,第一個把減號還是給去除了,雖然排除了其它的需要排除的符號,
第二個,則是當作范圍符號,大部分的沒有排除掉。
但是不太明白的是,按照oracle文檔的約定,在范圍匹配中形如[\-]表示匹配一個減號,為何依然匹配補上了,尤其是當夾著其它需要匹配的模式的時候。
現(xiàn)在只能嘗試著把這個模式放在最后:
SQL> select regexp_replace('1+[2c- 3([ad4','[^0-9\(+\)\-]','',1) from dual 2 / REGEXP_REPLACE('1+[2C-3([AD4', ------------------------------ 1+2-3(4
結(jié)果是可以,因為模式"\-"被放到了范圍中的尾巴。
那是文檔不全,還是理解不透徹了,
看來以后遇到這樣的問題還是先把減號放到尾巴去吧。 |