俘獲組:
Groovy正則表達式的一個最有用的特性就是能用正則表達式從另一個正則
表達式中俘獲數據.看下面這個例子,如果我們想精確定位到Liverpool, England:
locationData = "Liverpool, England: 53° 25? 0? N 3° 0? 0?"
我們能用string的split()方法,來截取我們需要的Liverpool, England(這里需要把
逗號除去).或許我們可以采用正則表達式,對于下面的例子,您對語法可能有一點生疏.
第一步,我們定義一個正則表達式,把我們感興趣的內容都放入圓括號內:
myRegularExpression = /([a-zA-Z]+), ([a-zA-Z]+): ([0-9]+). ([0-9]+). ([0-9]+). ([A-Z]) ([0-9]+). ([0-9]+). ([0-9]+)./
下面我們定義一個matcher,它是用=~操作符來完成的.
matcher = ( locationData =~ myRegularExpression )
變量matcher包含 java.util.regex.Matcher ,并被Groovy進行了增強.你可以訪問你的數據像在Java平臺上一樣對一個Matcher對象.一個更棒的方式就是用matcher,來訪問一個二維數組.
我們可以來看看數據的第一維:
["Liverpool, England: 53° 25? 0? N 3° 0? 0?", "Liverpool", "England", "53", "25", "0", "N", "3", "0", "0"]
已經把滿足條件的string加上原來的strng,組合成了一個數組.
這樣我們就可以方便的輸出我們想要的數據:
if(matcher.matches()) {
println(matcher.getCount()+ " occurrence of the regular expression was found in the string.");
println(matcher[0][1] + " is in the " + matcher[0][6] + " hemisphere. (According to: " + matcher[0][0] + ")")
for(int i = 0;i < matcher[0].size; i ++)
{
println(matcher[0][i])
}
}
非俘獲組:
有時候我們需要定義一個非俘獲組,來獲得我們想要的數據.來看下面的例子,我們的目標是
過濾掉它的middle name:
names = [
"Graham James Edward Miller",
"Andrew Gregory Macintyre"
]
printClosure = {
matcher = (it =~ /(.*?)(?: .+)+ (.*)/); // notice the non-matching group in the middle
if (matcher.matches())
println(matcher[0][2]+", "+matcher[0][1]);
}
names.each(printClosure);
輸出:
Miller, Graham
Macintyre, Andrew
有人可能對非俘獲組不太明白,通俗點說就是在已經俘獲的組除去你不想要的字符或符號.
比如:
names =
[
"ZDW love beijing",
"Angel love beijing",
"Ghost hate beijing"
]
我們只想要開頭名字和結尾的城市,過濾掉love.這時
就用到了非俘獲組.表示方法就是用?: 加上你要過濾的正則前面.
nameClosure = {
myMatcher = (it =~ /(.*?)(?: .+)+ (.*)/)
if(myMatcher.matches())
{
println(myMatcher[0][1] + " " + myMatcher[0][2])
}
}
names.each(nameClosure);
我們來分析一下這個:
(?: .+)
組都用()括起來,?:表示這是一個非俘獲組 其中中間是有一個空格的.這個取決
于原字符串中間的空格,如果是逗號或其它符號,換成相應的就可以了.
.+ 任意多個字符(最少1個)
替換:
我們可能有這樣的需要,在一個字符串中,把指定的字符串或符號,換成我們想要的.
比如:
excerpt = "At school, Harry had no one. Everybody knew that Dudley's gang hated that odd Harry Potter "+
"in his baggy old clothes and broken glasses, and nobody liked to disagree with Dudley's gang.";
matcher = (excerpt =~ /Harry Potter/);
excerpt = matcher.replaceAll("Tanya Grotter");
matcher = (excerpt =~ /Harry/);
excerpt = matcher.replaceAll("Tanya");
println("Publish it! "+excerpt);
這個例子中我們做了兩件事情.一個是把Harry Potter換成了Tanya Grotter,另一個是
把Harry換成了Tanya.
Reluctant Operators
對于這個還是不翻譯的好"勉強操作符"?.
對于.,*,+操作默認都是貪心的.意思就是說有時候把我們不想要的也
匹配進去了.這時我們就要用到Relucatant operators.
popesArray = [
"Pope Anastasius I 399-401",
"Pope Innocent I 401-417",
"Pope Zosimus 417-418",
"Pope Boniface I 418-422",
"Pope Celestine I 422-432",
"Pope Sixtus III 432-440",
"Pope Leo I the Great 440-461",
"Pope Hilarius 461-468",
"Pope Simplicius 468-483",
"Pope Felix III 483-492",
"Pope Gelasius I 492-496",
"Pope Anastasius II 496-498",
"Pope Symmachus 498-514"
]
我們只想要皇帝的名字和所在世紀.
/Pope (.*)(?: .*)? ([0-9]+)-([0-9]+)/
上面是正常分組表達式,我們簡單的在.*+后面再加上個?就表示Reluctant operators.
自己試驗一下看看輸出什么:
popesArray = [
"Pope Anastasius I 399-401",
"Pope Innocent I 401-417",
"Pope Zosimus 417-418",
"Pope Boniface I 418-422",
"Pope Celestine I 422-432",
"Pope Sixtus III 432-440",
"Pope Leo I the Great 440-461",
"Pope Hilarius 461-468",
"Pope Simplicius 468-483",
"Pope Felix III 483-492",
"Pope Gelasius I 492-496",
"Pope Anastasius II 496-498",
"Pope Symmachus 498-514"
]
myClosure = {
myMatcher = (it =~ /Pope (.*?)(?: .*)? ([0-9]+)-([0-9]+)/);
if (myMatcher.matches())
println(myMatcher[0][1]+": "+myMatcher[0][2]+" to "+myMatcher[0][3]);
}
popesArray.each(myClosure);
基本上滿足了我們的要求.
你可以嘗試一下如果不加?看看會發生什么錯誤~.