超強的正則表達式（zt）

package testreg;
2

import java.util.regex.Matcher;
3

import java.util.regex.Pattern;
4

/**
6

* <p>Title: 正則表達式的研究</p>
7

* <p>Description:
8

* 最近在工作中常常用到一些正則表達式的使用問題，到網上去找介紹大多是一鱗半爪。求人不如
9

* 求已。一狠心，自己看!這兩天利用我們項目兩期之間的一點空閑對J2SE所支持的正則表達式來
10

* 了個徹底研究!代價是……就是浪廢了部門近十二張白紙。閑話少說，書歸正傳。
11

* 原理：
12

* 正則表達式的原理是有限狀態自動機，自動機內部有有限個狀態，有一個初始狀態，有一個
13

* 結束狀態。自動機根據輸入和自身內部的當前狀態來決定下一步于什么。呵呵，這是很久以前學
14

* 的東東了也記不清了，大家只作參照吧。
15

* Java中的正則表達式：
16

* 從J2SE1.4起Java增加了對正則表達式的支持就是java.util.regex包，這個包中主要有
17

* 3個類:Pattern,代表模式，就是正則表達式自身，Matcher，是一個有限狀態自動機，其實大多
18

* 數的活還是讓Pattern類于了，Matcher往往只是簡單的調用Pattern，不知道這是什么模式。這
19

* 兩個類寫的都很經典，還有不少算法在內值得有功力的人仔細研究一下。另一個是一個異常類當所
20

* 用正則表達式不正確時拋出，是運行時異常。
21

* 幾個難點：
22

* 1.line terminator
23

* line terminator 中文意終結符，是指一個或兩個字符組成的字符序列。java中的
24

* 所有line terminator:
25

* A newline (line feed) character ('\n'),
26

* -----------換行符(0A)
27

* A carriage-return character followed immediately by a newline character ("\r\n"),
28

* -----------回車+換行(0D0A)
29

* A standalone carriage-return character ('\r'),
30

* -----------回車(0D)
31

* A next-line character ('\u0085'),
32

* ------------下一行符？(？表示我也不知道是什么，請大家明白的給我發mail
33

* A line-separator character ('\u2028'), or
34

* ------------行分隔符？
35

* A paragraph-separator character ('\u2029).
36

* ------------段落分隔符？
37

* If UNIX_LINES mode is activated, then the only line terminators recognized are newline characters.
38

* 如果使用unix模式則只有\n被認為是line terminator，也就是在使用pattern時如下：
39

* Pattern p=Pattern.compile("正則表達式",Pattern.UNIX_LINE);
40

* 或 Pattern p=Pattern.compile("(?d)正則表達式")
41

* "."匹配除line terminator以外的所有字符(未指定DOTALL時)
42

* 在指定DOTAll模式時"."匹配所有字符
43

* 2.Quantifiers,Greedy,Reluctant and Possessive.
44

* 這幾個詞不太好譯，原文是Greedy Quantifiers,Reluctant Quantifiers and Possessive
45

* Quantifiers憑我這點英語我只好譯作貪婪的量子，不情愿的量子和占有欲強的量子？這也太搞笑了，
46

* 好在我理解了他們的意思。這點等下我細說。
47

* 3. 對于[a-zA-Z],[a-d[h-i]],[^a-f],[b-f&&[a-z]],[b-f&&[^cd]]等形式的理解
48

* 對于上述，原文用range,union,negation,intersection,subtraction等來描述
49

* range表是范圍，union是并集，negation是取反，intersection是交集，subtraction
50

* 是……是減法？？反正是減去一部分的意思
51

* range a-z 從a到z的小寫字母
52

* negation [^a-f]除了a-f之外所有的，全集是所有字符
53

* union [a-d[h-i]] a-d并h-i
54

* subtraction [b-f&&[^cd]] 就是b-f中除了cd以外的都是
55

* intersection[b-f&&[a-z]] 就是b-f與a-z中的公共部分
56

* 我總結了一下，其實就是方括號表示一個集合，集合中的元素用列舉法表示如[abcd]，但太多
57

* 了怎么為？總不能把從a到z的全列舉吧？那就用a-z表示且省略了方括號，交集用&&表示，并集
58

* 省略，差集(對subtraction譯成差集還差不多)用交集和取反來表示。所以，以上的可表示為：
59

* [[a-z][A-Z]],[[a-d][h-i]],[^a-f],[[b-f]&&[a-z]],[[b-f]&&[^cd]]
60

* 這樣是不是和我們的習慣相符了.
61

* 4.各個標志的意義
62

* 在生成pattern時可以同時使用幾個標志來指定進行匹配時的方案。
63

* 用法形如：Pattern p=Pattern.compile(".*a?",Pattern.UNIX_LINES);
64

* 當同時指定多個標志時可以使用"|"操作符連接如：
65

* Pattern p=Pattern.compile(".*a?,Pattern.UNIX_LINES|Pattern.DOTALL);
66

* 也可以在表達式中指定如：
67

* Pattern p=Pattern.compile("(?d).*a?");
68

* Pattern p=Pattern.compile("(?d)(?s).*a?");
69

* 以上兩個定義和前面兩個對應等價
70

* 所有的標志如下：
71

* Constant Equivalent Embedded Flag Expression
72

Pattern.CANON_EQ None Enables canonical equivalence
73

Pattern.CASE_INSENSITIVE (?i) Enables case-insensitive matching
74

Pattern.COMMENTS (?x) Permits whitespace and comments in pattern.
75

Pattern.MULTILINE (?m) Enables multiline mode.
76

Pattern.DOATALL (?s) Enables dotall mode
77

Pattern.UNICODE_CASE (?u) Enables Unicode-aware case folding.
78

Pattern.UNIX_LINES (?d) Enables Unix lines mode
79

CANON_EQ 指定使用規范等價模式？這個我理解的也有限，是不是說只要指定了這個模式則
81

ascii碼的'a'就可以和unicode的'a'還有XXX碼的'a'相等？請教各位。(mail to me)
82

CASE_INSENSITIVE 指定使用大小寫不敏感的匹配模式，這個好理解，但要注意這個標志只是
84

對ascii碼有效，要使unicode在比較時也忽略大小寫要同時指定UNICODE_CASE,就是要指定
85

CASE_INSENSITIVE|UNICODE_CASE或使用(?i)(?u)
86

COMMENTS 指定使用注釋和忽略空白，也就是".*a"==". *a #this is comments"我想這個
88

* 在正則表達式很大，而且是在文件中輸入時比較有用，平時我看也用不上。
89

*
90

* MULTILINE In multiline mode the expressions ^ and $ match just after
91

* or just before, respectively, a line terminator or the end of the
92

* input sequence. By default these expressions only match at the beginning
93

* and the end of the entire input sequence
94

* 指定使用多行匹配模式，在默認模式下，^和$分別只匹配一個輸入的開始和結束。
95

* 在這種模式下，^和$ 除了匹配整個輸入的開始和結束外還匹配一個line terminator的后邊和
96

* 前邊(不是前邊和后邊，就是說^匹配line terminator的后邊$匹配line terminator的前邊。
97

*
98

* DOATALL 如指定了這個模式則"."可匹配任何字符包括line terminator
99

*
100

* UNIX_LINES 指定這個模式時只有\n被認為是line terminator而\r和\r\n都不是
101

*
102

* 其他的我一時想不起來了，在具體介紹時再說吧。
103

* </p>
104

*/
105

public class TestReg2
106

{
107

108

public static void main(String[] args)
109

{
110

String str1 = "";
111

Object str = "";
112

//注意：\r,\n,\b等轉義字符在java字符串常量中要寫成\\r,\\n,\\b等，否則編譯都過不去
113

//\s匹配\r,\n,\r和空格
114

System.out.println("\\s匹配\\r,\\n,\\r和空格 "+" \t\n\r".matches("\\s{4}"));
115

//\S和\s互逆
116

System.out.println("\\S和\\s互逆 "+"/".matches("\\S"));
117

//.不匹配\r和\n
118

System.out.println(".不匹配\\r和\\n "+"\r".matches("."));
119

System.out.println("\n".matches("."));
120

121

//\w匹配字母，數字和下劃線
122

System.out.println("\\w匹配字母，數字和下劃線 "+"a8_".matches("\\w\\w\\w"));
123

//\W和\w互逆
124

System.out.println("\\W和\\w互逆 "+"&_".matches("\\W\\w"));
125

//\d匹配數字
126

System.out.println("\\d匹配數字 "+"8".matches("\\d"));
127

//\D與\d互逆
128

System.out.println("\\D與\\d互逆"+"%".matches("\\D"));
129

//兩者都匹配但意文不同
130

System.out.println("======================");
131

System.out.println("表示\\000a匹配\\000a "+"\n".matches("\n"));
132

System.out.println("表示\\n匹配換行 "+"\n".matches("\\n"));
133

System.out.println("======================");
134

//兩者都匹配但意文不同
135

System.out.println("\r".matches("\r"));
136

System.out.println("\r".matches("\\r"));
137

System.out.println("======================");
138

//^匹配開頭
139

System.out.println("^匹配開頭"+"hell".matches("^hell"));
140

System.out.println("abc\nhell".matches("^hell"));
141

//$匹配結束
142

System.out.println("$匹配結束"+"my car\nabc".matches(".*ar$"));
143

System.out.println("my car".matches(".*ar$"));
144

//\b匹配界
145

System.out.println("\\b匹配界 "+"bomb".matches("\\bbom."));
146

System.out.println("bomb".matches(".*mb\\b"));
147

//\B與\b互逆
148

System.out.println("\\B與\\b互逆"+"abc".matches("\\Babc"));
149

150

//[a-z]匹配a到z的小寫字母
151

System.out.println("[a-z]匹配a到z的小寫字母"+"s".matches("[a-z]"));
152

System.out.println("S".matches("[A-Z]"));
153

System.out.println("9".matches("[0-9]"));
154

155

//取反
156

System.out.println("取反"+"s".matches("[^a-z]"));
157

System.out.println("S".matches("[^A-Z]"));
158

System.out.println("9".matches("[^0-9]"));
159

160

//括號的作用
161

System.out.println("括號的作用"+"aB9".matches("[a-z][A-Z][0-9]"));
162

System.out.println("aB9bC6".matches("([a-z][A-Z][0-9])+"));
163

//或運算
164

System.out.println("或運算"+"two".matches("two|to|2"));
165

System.out.println("to".matches("two|to|2"));
166

System.out.println("2".matches("two|to|2"));
167

168

//[a-zA-z]==[a-z]|[A-Z]
169

System.out.println("[a-zA-z]==[a-z]|[A-Z]"+"a".matches("[a-zA-Z]"));
170

System.out.println("A".matches("[a-zA-Z]"));
171

System.out.println("a".matches("[a-z]|[A-Z]"));
172

System.out.println("A".matches("[a-z]|[A-Z]"));
173

174

//體會一下以下四個
175

System.out.println("體會一下以下四個\n==========================");
176

System.out.println(")".matches("[a-zA-Z)]"));
177

System.out.println(")".matches("[a-zA-Z)_-]"));
178

System.out.println("_".matches("[a-zA-Z)_-]"));
179

System.out.println("-".matches("[a-zA-Z)_-]"));
180

System.out.println("==========================");
181

System.out.println("b".matches("[abc]"));
182

//[a-d[f-h]]==[a-df-h]
183

System.out.println("[a-d[f-h]]==[a-df-h]"+"h".matches("[a-d[f-h]]"));
184

System.out.println("a".matches("[a-z&&[def]]"));
185

//取交集
186

System.out.println("取交集"+"a".matches("[a-z&&[def]]"));
187

System.out.println("b".matches("[[a-z]&&[e]]"));
188

//取并
189

System.out.println("取并"+"9".matches("[[a-c][0-9]]"));
190

//[a-z&&[^bc]]==[ad-z]
191

System.out.println("[a-z&&[^bc]]==[ad-z]"+"b".matches("[a-z&&[^bc]]"));
192

System.out.println("d".matches("[a-z&&[^bc]]"));
193

//[a-z&&[^m-p]]==[a-lq-z]
194

System.out.println("[a-z&&[^m-p]]==[a-lq-z]"+"d".matches("[a-z&&[^m-p]]"));
195

System.out.println("a".matches("\\p{Lower}"));
196

///注意以下體會\b的用法(注意，在字符串常量中十目直接寫\b表退格，所以要寫\\b
197

System.out.println("*********************************");
198

System.out.println("aawordaa".matches(".*\\bword\\b.*"));
199

System.out.println("a word a".matches(".*\\bword\\b.*"));
200

System.out.println("aawordaa".matches(".*\\Bword\\B.*"));
201

System.out.println("a word a".matches(".*\\Bword\\B.*"));
202

System.out.println("a word a".matches(".*word.*"));
203

System.out.println("aawordaa".matches(".*word.*"));
204

//體會一下組的用法
205

//組的順序，只數"("第一個為第一組第二個是第二組……
206

//第0組表示整個表達式
207

System.out.println("**************test group**************");
208

Pattern p = Pattern.compile("(([abc]+)([123]+))([-_%]+)");
209

Matcher m = p.matcher("aac212-%%");
210

System.out.println(m.matches());
211

m = p.matcher("cccc2223%_%_-");
212

System.out.println(m.matches());
213

System.out.println("======test group======");
214

System.out.println(m.group());
215

System.out.println(m.group(0));
216

System.out.println(m.group(1));
217

System.out.println(m.group(2));
218

System.out.println(m.group(3));
219

System.out.println(m.group(4));
220

System.out.println(m.groupCount());
221

System.out.println("========test end()=========");
222

System.out.println(m.end());
223

System.out.println(m.end(2));
224

System.out.println("==========test start()==========");
225

System.out.println(m.start());
226

System.out.println(m.start(2));
227

//test backslash測試反向引用？
228

Pattern pp1=Pattern.compile("(\\d)\\1");//這個表達式表示必須有兩相同的數字出現
229

//\1表示引用第一個組\n表示引用第n個組(必須用\\1而不能用\1因\1在字符串中另有意義(我也知道是什么)
230

Matcher mm1=pp1.matcher("3345");//33匹配但45不匹配
231

System.out.println("test backslash測試反向引用");
232

System.out.println(mm1.find());
233

System.out.println(mm1.find());
234

235

//體會以下不同
236

System.out.println("==============test find()=========");
237

System.out.println(m.find());
238

System.out.println(m.find(2));
239

240

System.out.println("這是從第三個字符(index=2)開始找的group結果");
241

System.out.println(m.group());
242

System.out.println(m.group(0));
243

System.out.println(m.group(1));
244

System.out.println(m.group(2));
245

System.out.println(m.group(3));
246

m.reset();
247

System.out.println(m.find());
248

//測試一個模式可多次匹配一個串
249

System.out.println("測試一個模式可多次匹配一個串");
250

Pattern p1 = Pattern.compile("a{2}");
251

Matcher m1 = p1.matcher("aaaaaa");
252

//這說明Matcher的matchs()方法是對事個字符串的匹配，
253

System.out.println(m1.matches());
254

System.out.println(m1.find());
255

System.out.println(m1.find());
256

System.out.println(m1.find());
257

System.out.println(m1.find());
258

//再測試matchs()
259

System.out.println("再測試matchs()");
260

Pattern p2 = Pattern.compile("(a{2})*");
261

Matcher m2 = p2.matcher("aaaa");
262

System.out.println(m2.matches());
263

System.out.println(m2.matches());
264

System.out.println(m2.matches());
265

//所以find是在一個串中找有沒有對應的模式，而matchs是完全匹配
266

//test lookupat()
267

System.out.println("test lookupat()");
268

Pattern p3 = Pattern.compile("a{2}");
269

Matcher m3 = p3.matcher("aaaa");
270

System.out.println(p3.flags());
271

System.out.println(m3.lookingAt());
272

System.out.println(m3.lookingAt());
273

System.out.println(m3.lookingAt());
274

//總結以上matchs()是整個匹配且總是從頭開始，find是部分匹配且從上一次匹配結束時開始找
275

//lookingAt也是從頭開始，但是部分匹配
276

System.out.println("======test 空白行========");
277

System.out.println(" \n".matches("^[ \\t]*$\\n"));
278

279

//演示appendXXX的用法
280

System.out.println("=================test append====================");
281

Pattern p4 = Pattern.compile("cat");
282

Matcher m4 = p4.matcher("one cat two cats in the yard");
283

StringBuffer sb = new StringBuffer();
284

boolean result = m4.find();
285

int i=0;
286

System.out.println("one cat two cats in the yard");
287

while(result)
288

{m4.appendReplacement(sb, "dog");
289

System.out.println(m4.group());
290

System.out.println("第"+i+++"次:"+sb.toString());
291

result = m4.find();
292

}
293

System.out.println(sb.toString());
294

m4.appendTail(sb);
295

System.out.println(sb.toString());
296

297

//test UNIX_LINES
298

System.out.println("test UNIX_LINES");
299

Pattern p5=Pattern.compile(".",Pattern.UNIX_LINES);
300

Matcher m5=p5.matcher("\n\r");
301

System.out.println(m5.find());
302

System.out.println(m5.find());
303

304

//test UNIX_LINES
305

System.out.println("test UNIX_LINES");
306

Pattern p6=Pattern.compile("(?d).");
307

Matcher m6=p6.matcher("\n\r");
308

System.out.println(m6.find());
309

System.out.println(m6.find());
310

311

//test UNIX_LINES
312

System.out.println("test UNIX_LINES");
313

Pattern p7=Pattern.compile(".");
314

Matcher m7=p7.matcher("\n\r");
315

System.out.println(m7.find());
316

System.out.println(m7.find());
317

318

//test CASE_INSENSITIVE
319

System.out.println("test CASE_INSENSITIVE");
320

Pattern p8=Pattern.compile("a",Pattern.CASE_INSENSITIVE);
321

Matcher m8=p8.matcher("aA");
322

System.out.println(m8.find());
323

System.out.println(m8.find());
324

System.out.println("test CASE_INSENSITIVE");
325

Pattern p9=Pattern.compile("(?i)a");
326

Matcher m9=p9.matcher("aA");
327

System.out.println(m9.find());
328

System.out.println(m9.find());
329

System.out.println("test CASE_INSENSITIVE");
330

Pattern p10=Pattern.compile("a");
331

Matcher m10=p10.matcher("aA");
332

System.out.println(m10.find());
333

System.out.println(m10.find());
334

335

//test COMMENTS
336

System.out.println("test COMMENTS");
337

Pattern p11=Pattern.compile(" a a #ccc",Pattern.COMMENTS);
338

Matcher m11=p11.matcher("aa a a #ccc");
339

System.out.println(m11.find());
340

System.out.println(m11.find());
341

System.out.println("test COMMENTS");
342

Pattern p12 = Pattern.compile("(?x) a a #ccc");
343

Matcher m12 = p12.matcher("aa a a #ccc");
344

System.out.println(m12.find());
345

System.out.println(m12.find());
346

347

//test MULTILINE這個大家多試試參照我上面對多行模式的理解
348

System.out.println("test MULTILINE");
349

Pattern p13=Pattern.compile("^.?",Pattern.MULTILINE|Pattern.DOTALL);
350

Matcher m13=p13.matcher("helloohelloo,loveroo");
351

System.out.println(m13.find());
352

System.out.println("start:"+m13.start()+"end:"+m13.end());
353

System.out.println(m13.find());
354

//System.out.println("start:"+m13.start()+"end:"+m13.end());
355

System.out.println("test MULTILINE");
356

Pattern p14=Pattern.compile("(?m)^hell.*oo$",Pattern.DOTALL);
357

Matcher m14=p14.matcher("hello,Worldoo\nhello,loveroo");
358

System.out.println(m14.find());
359

System.out.println("start:"+m14.start()+"end:"+m14.end());
360

System.out.println(m14.find());
361

//System.out.println("start:"+m14.start()+"end:"+m14.end());
362

System.out.println("test MULTILINE");
363

Pattern p15=Pattern.compile("^hell(.|[^.])*oo$");
364

Matcher m15=p15.matcher("hello,Worldoo\nhello,loveroo");
365

System.out.println(m15.find());
366

System.out.println("start:"+m15.start()+"end:"+m15.end());
367

System.out.println(m15.find());
368

// System.out.println("start:"+m15.start()+"end:"+m15.end());
369

370

//test DOTALL
371

System.out.println("test DOTALL");
372

Pattern p16=Pattern.compile(".",Pattern.DOTALL);
373

Matcher m16=p16.matcher("\n\r");
374

System.out.println(m16.find());
375

System.out.println(m16.find());
376

377

System.out.println("test DOTALL");
378

Pattern p17=Pattern.compile(".");
379

Matcher m17=p17.matcher("\n\r");
380

System.out.println(m17.find());
381

System.out.println(m17.find());
382

383

System.out.println("test DOTALL");
384

Pattern p18=Pattern.compile("(?s).");
385

Matcher m18=p18.matcher("\n\r");
386

System.out.println(m18.find());
387

System.out.println(m18.find());
388

389

//test CANON_EQ這個是jdk的例子但我實在不明白是什么意思，向大家請教
390

System.out.println("test CANON_EQ");
391

Pattern p19=Pattern.compile("a\u030A",Pattern.CANON_EQ);
392

System.out.println(Character.getType('\u030A'));
393

System.out.println("is"+Character.isISOControl('\u030A'));
394

System.out.println("is"+Character.isUnicodeIdentifierPart('\u030A'));
395

System.out.println(Character.getType('\u00E5'));
396

System.out.println("is"+Character.isISOControl('\u00E5'));
397

Matcher m19=p19.matcher("\u00E5");
398

System.out.println(m19.matches());
399

System.out.println(Character.getType('\u0085'));
400

System.out.println("is"+Character.isISOControl('\u0085'));
401

402

//注意下面三個例子體會Greedy,Reluctant and Possessive Quantifiers的不同
403

Pattern ppp=Pattern.compile(".*foo");
404

Matcher mmm=ppp.matcher("xfooxxxxxxfoo");
405

/**
406

* Greedy quantifiers
407

X? X, once or not at all
408

X* X, zero or more times
409

X+ X, one or more times
410

X{n} X, exactly n times
411

X(n,} X, at least n times
412

X{n,m} X, at least n but not more than m times
413

Greedy quantifiers是最常用的一種，如上，它的匹配方式是先匹配盡可能多的字符，當
414

這樣造成整個表達式整體不能匹配時就退一個字符再試比如：
415

.*foo與xfooxxxxxxfoo的匹配過程，.*先與整個輸入匹配，發現這樣不行，整個串不能匹配
416

* 于是退最后一個字符"o"再試，還不行，再退直到把foo都退出才發現匹配于是結束。因為這個過程
417

* 總是先從最大匹配開始到找到一個匹配，所以.*與之匹配的總是一個最大的，這個特點和資本家相似
418

* 故名貪婪的
419

*/
420

boolean isEnd=false;
421

int k=0;
422

System.out.println("==========");
423

System.out.println("xfooxxxxxxfoo");
424

while(isEnd==false)
425

try{
426

System.out.println("the:"+k++);
427

System.out.println(mmm.find());
428

System.out.println(mmm.end());
429

}catch(Exception e){
430

isEnd=true;
431

}
432

isEnd=false;
433

Pattern ppp1=Pattern.compile(".*?foo");
434

Matcher mmm1=ppp1.matcher("xfooxxxxxxfoo");
435

/**
436

* Reluctant quantifiers
437

X?? X, once or not at all
438

X*? X, zero or more times
439

X+? X, one or more times
440

X{n}? X, exactly n times
441

X(n,}? X, at least n times
442

X{n,m}? X, at least n but not more than m times
443

Reluctant quantifiers的匹配方式正好相反，它總是先從最小匹配開始，如果這時導致
444

整個串匹配失敗則再吃進一個字符再試，如：
445

.*?foo與xfooxxxxxxfoo的匹配過程，首先，.*與空串匹配，這時整個串匹配失敗，于是
446

* 再吃一個x,這時發現整個串匹配成功，當再調用find時從上次匹配結束時開始找，先吃一個
447

* 空串，不行，再吃一個x，不行，……直到把中間所有x都吃掉才發現匹配成功。這種方式總
448

* 是從最小匹配開始所以它能找到最多次數的匹配，但第一匹配都是最小的。它的行為有點象雇傭
449

* 工人，總是盡可能少的于活，故名勉強的。
450

*/
451

k=0;
452

System.out.println("?????????????????????");
453

System.out.println("xfooxxxxxxfoo");
454

while(isEnd==false)
455

try{
456

System.out.println("the:"+k++);
457

System.out.println(mmm1.find());
458

System.out.println(mmm1.end());
459

}catch(Exception e){
460

isEnd=true;
461

}
462

isEnd=false;
463

Pattern pp2=Pattern.compile(".*+foo");
464

Matcher mm2=pp2.matcher("xfooxxxxxxfoo");
465

/**
466

*
467

Possessive quantifiers
468

X?+ X, once or not at all
469

X*+ X, zero or more times
470

X++ X, one or more times
471

X{n}+ X, exactly n times
472

X(n,}+ X, at least n times
473

X{n,m}+ X, at least n but not more than m times
474

Possessive quantifiers 這種匹配方式與Greedy方式相似，所不同的是它不夠聰明，當
475

它一口吃掉所有可以吃的字符時發現不匹配則認為整個串都不匹配，它不會試著吐出幾個。它的行
476

為和大地主相似，貪婪但是愚蠢，所以名曰強占的。
477

*/
478

479

int ii=0;
480

System.out.println("+++++++++++++++++++++++++++");
481

System.out.println("xfooxxxxxxfoo");
482

while(isEnd==false)
483

try{
484

System.out.println("the:"+ii++);
485

System.out.println(mm2.find());
486

System.out.println(mm2.end());
487

}catch(Exception e){
488

isEnd=true;
489

}
490

491

492

}
493

494

}
495

496

497

posted on 2007-07-10 15:11 風舞者閱讀(715) 評論(2) 編輯收藏所屬分類: J2SE

Comments

# re: 超強的正則表達式（zt）

中東

內容不錯，我條理不夠清晰，讓讀者比較迷惘，理不出頭尾來

Posted @ 2007-07-11 14:22 回復更多評論

# re: 超強的正則表達式（zt）

風舞者

package com.datamininfo;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class MobileNumber {

public static void main(String[] args) {
MobileNumber mn = new MobileNumber();
mn.getNumber(null);
}

/**
* 提取號碼
*
* @param s
* @return
*/
@SuppressWarnings("unchecked")
public List getNumber(String s) {
List list = new ArrayList();
if (s == null) {
return list;
}
String pattern = "(\\D|)(086|86|0|)(13|15)(\\d{9})";
Pattern pattern1 = Pattern.compile(pattern);
Matcher matcher = pattern1.matcher(s);

while (matcher.find()) {
list.add(matcher.group(3) + matcher.group(4));
}
/*
* System.out.println("length="+list.size()); for(int j=0;j<list.size();j++){
* System.out.println(list.get(j)); }
*/
return list;
}

/**
* 判斷是否是全數字
*
* @param mobile
* @return
*/
public boolean isNumber(String mobile) {
boolean b = true;
int i = 0;
while (i < mobile.length() && Character.isDigit(mobile.charAt(i))) {
i++;
}
if (i != mobile.length())
b = false;
System.out.println(i);
return b;
}

/**
* 取得名字，如果是英文，不動，如果是中英文，取中文
*
* @param name
* @return
*/
public String modifyName(String name) {
if (name == null)
return "";
String pattern = "^([^\\w]+)(\\w+| |)";
Pattern pattern1 = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern1.matcher(name);
while (matcher.find()) {
// System.out.println(matcher.groupCount());
name = matcher.group(1);
// System.out.println(name=matcher.group(1));
// System.out.println(matcher.group(2));
}
// System.out.println(name);
return name;
}

/**
* 刪除字母數字下劃線空白符號\r\n etc.
*
* @param job
* @return
*/
public String delWord(String job) {
if (job == null)
return "";
String pattern = "\\w|\\s";
Pattern pattern1 = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern1.matcher(job);
StringBuffer s = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(s, "");
}
matcher.appendTail(s);
// System.out.println(s);
return s.toString();
}

}

Posted @ 2007-10-10 16:14 回復更多評論

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: DBCP數據庫連接池 Lucene分詞的一點學習斷點續傳 Java版（原）超強的正則表達式（zt） Java轉換全角半角

導航

統計

常用鏈接

留言簿(3)

隨筆分類

隨筆檔案

文章分類

文章檔案

收藏夾

朋友的blog

搜索

最新評論

閱讀排行榜

評論排行榜

超強的正則表達式（zt）