?
Lucene In Action ch2 系統(tǒng)的講解了 indexing,下面就來看看吧.
1,indexing的處理過程.
? 首先要把indexing的數據轉換為text,因為Lucene只能索引text,然后由Analysis來過慮text,把一些ch1中提到的所謂的stop words 過濾掉, 然后建立index.建立的index為inverted index 也就是所謂的倒排索引.
2,基本的ingex操作
?? 基本的操作 包括 :添加 刪除 更新.
I . 添加
下面我們看個例子代碼 BaseIndexingTestCase.class
01?package?lia.indexing; 02? 03?import?org.apache.lucene.store.Directory; 04?import?org.apache.lucene.store.FSDirectory; 05?import?org.apache.lucene.document.Document; 06?import?org.apache.lucene.document.Field; 07?import?org.apache.lucene.index.IndexWriter; 08?import?org.apache.lucene.index.IndexReader; 09?import?org.apache.lucene.analysis.Analyzer; 10?import?org.apache.lucene.analysis.SimpleAnalyzer; 11? 12?import?junit.framework.TestCase; 13?import?java.io.IOException; 14? 15?/** 16??* 17??*/ 18?public?abstract?class?BaseIndexingTestCase?extends?TestCase?{ 19???protected?String[]?keywords?=?{"1",?"2"}; 20???protected?String[]?unindexed?=?{"Netherlands",?"Italy"}; 21???protected?String[]?unstored?=?{"Amsterdam?has?lots?of?bridges", 22??????????????????????????????????"Venice?has?lots?of?canals"}; 23???protected?String[]?text?=?{"Amsterdam",?"Venice"}; 24???protected?Directory?dir; 25?? // setUp 方法 26???protected?void?setUp()?throws?IOException?{ 27?????String?indexDir?= 28???????System.getProperty("java.io.tmpdir",?"tmp")?+ 29???????System.getProperty("file.separator")?+?"index-dir"; 30?????dir?=?FSDirectory.getDirectory(indexDir,?true); 31?????addDocuments(dir); 32???} 33? 34???protected?void?addDocuments(Directory?dir) 35?????throws?IOException?{ 36?????IndexWriter?writer?=?new?IndexWriter(dir,?getAnalyzer(), 37???????true);??? // 得到indexWriter 實例 38?????writer.setUseCompoundFile(isCompound()); 39?????for?(int?i?=?0;?i?<?keywords.length;?i++)?{ 40???????Document?doc?=?new?Document();??????? // 添加文檔 41???????doc.add(Field.Keyword("id",?keywords[i])); 42???????doc.add(Field.UnIndexed("country",?unindexed[i])); 43???????doc.add(Field.UnStored("contents",?unstored[i])); 44???????doc.add(Field.Text("city",?text[i])); 45???????writer.addDocument(doc); 46?????} 47?????writer.optimize();?? // 優(yōu)化index 48?????writer.close(); 49???} 50?? // 可以覆蓋該方法提供不同的Analyzer 51???protected?Analyzer?getAnalyzer()?{ 52?????return?new?SimpleAnalyzer(); 53???} 54?? // 也可以覆蓋該方法 指出Compound屬性 是否是 Heterogeneous Documents 55???protected?boolean?isCompound()?{ 56?????return?true; 57???} 58?? // 測試添加文檔 59???public?void?testIndexWriter()?throws?IOException?{ 60?????IndexWriter?writer?=?new?IndexWriter(dir,?getAnalyzer(), 61???????false); 62?????assertEquals(keywords.length,?writer.docCount()); 63?????writer.close(); 64???} 65?? // 測試IndexReader 66???public?void?testIndexReader()?throws?IOException?{ 67?????IndexReader?reader?=?IndexReader.open(dir); 68?????assertEquals(keywords.length,?reader.maxDoc()); 69?????assertEquals(keywords.length,?reader.numDocs()); 70?????reader.close(); 71???} 72?}
|
這是一個測試超類 可以被其他的測試用例繼承 來測試不同的功能.上面帶有詳細的注釋.
在添加Field時, 會遇到同義詞的情況,添加同義詞由兩種方式:
?a.創(chuàng)建一個同義詞詞組,循環(huán)添加到Single Strng的不同F(xiàn)ield中.
?b.把同義詞添加到一個Base word的field中.如下:
?
String baseWord = "fast";
String synonyms[] = String {"quick", "rapid", "speedy"};
Document doc = new Document();
doc.add(Field.Text("word", baseWord));
for (int i = 0; i < synonyms.length; i++) {
doc.add(Field.Text("word", synonyms[i]));
}
?
這樣 在Lucene內部把每個詞都添加的一個名為word的Field中,在搜索時 你可以使用任何一個給定的詞語.