?

Lucene In Action ch2 系統的講解了 indexing,下面就來看看吧.

1,indexing的處理過程.

? 首先要把indexing的數據轉換為text,因為Lucene只能索引text,然后由Analysis來過慮text,把一些ch1中提到的所謂的stop words 過濾掉, 然后建立index.建立的index為inverted index 也就是所謂的倒排索引.

2,基本的ingex操作

?? 基本的操作 包括 :添加 刪除 更新.

I . 添加

下面我們看個例子代碼 BaseIndexingTestCase.class

01?package?lia.indexing;
02?
03?import?org.apache.lucene.store.Directory;
04?import?org.apache.lucene.store.FSDirectory;
05?import?org.apache.lucene.document.Document;
06?import?org.apache.lucene.document.Field;
07?import?org.apache.lucene.index.IndexWriter;
08?import?org.apache.lucene.index.IndexReader;
09?import?org.apache.lucene.analysis.Analyzer;
10?import?org.apache.lucene.analysis.SimpleAnalyzer;
11?
12?import?junit.framework.TestCase;
13?import?java.io.IOException;
14?
15?/**
16??*
17??*/
18?public?abstract?class?BaseIndexingTestCase?extends?TestCase?{
19???protected?String[]?keywords?=?{"1",?"2"};
20???protected?String[]?unindexed?=?{"Netherlands",?"Italy"};
21???protected?String[]?unstored?=?{"Amsterdam?has?lots?of?bridges",
22??????????????????????????????????"Venice?has?lots?of?canals"};
23???protected?String[]?text?=?{"Amsterdam",?"Venice"};
24???protected?Directory?dir;
25?? // setUp 方法
26???protected?void?setUp()?throws?IOException?{
27?????String?indexDir?=
28???????System.getProperty("java.io.tmpdir",?"tmp")?+
29???????System.getProperty("file.separator")?+?"index-dir";
30?????dir?=?FSDirectory.getDirectory(indexDir,?true);
31?????addDocuments(dir);
32???}
33?
34???protected?void?addDocuments(Directory?dir)
35?????throws?IOException?{
36?????IndexWriter?writer?=?new?IndexWriter(dir,?getAnalyzer(),
37???????true);??? // 得到indexWriter 實例
38?????writer.setUseCompoundFile(isCompound());
39?????for?(int?i?=?0;?i?<?keywords.length;?i++)?{
40???????Document?doc?=?new?Document();??????? // 添加文檔
41???????doc.add(Field.Keyword("id",?keywords[i]));
42???????doc.add(Field.UnIndexed("country",?unindexed[i]));
43???????doc.add(Field.UnStored("contents",?unstored[i]));
44???????doc.add(Field.Text("city",?text[i]));
45???????writer.addDocument(doc);
46?????}
47?????writer.optimize();?? // 優化index
48?????writer.close();
49???}
50?? // 可以覆蓋該方法提供不同的Analyzer
51???protected?Analyzer?getAnalyzer()?{
52?????return?new?SimpleAnalyzer();
53???}
54?? // 也可以覆蓋該方法 指出Compound屬性 是否是
Heterogeneous Documents
55???protected?boolean?isCompound()?{
56?????return?true;
57???}
58?? // 測試添加文檔
59???public?void?testIndexWriter()?throws?IOException?{
60?????IndexWriter?writer?=?new?IndexWriter(dir,?getAnalyzer(),
61???????false);
62?????assertEquals(keywords.length,?writer.docCount());
63?????writer.close();
64???}
65?? // 測試IndexReader
66???public?void?testIndexReader()?throws?IOException?{
67?????IndexReader?reader?=?IndexReader.open(dir);
68?????assertEquals(keywords.length,?reader.maxDoc());
69?????assertEquals(keywords.length,?reader.numDocs());
70?????reader.close();
71???}
72?}

這是一個測試超類 可以被其他的測試用例繼承 來測試不同的功能.上面帶有詳細的注釋.

在添加Field時, 會遇到同義詞的情況,添加同義詞由兩種方式:

?a.創建一個同義詞詞組,循環添加到Single Strng的不同Field中.

?b.把同義詞添加到一個Base word的field中.如下:

?

String baseWord = "fast";

String synonyms[] = String {"quick", "rapid", "speedy"};

Document doc = new Document();

doc.add(Field.Text("word", baseWord));

for (int i = 0; i < synonyms.length; i++) {

doc.add(Field.Text("word", synonyms[i]));

}

?

這樣 在Lucene內部把每個詞都添加的一個名為word的Field中,在搜索時 你可以使用任何一個給定的詞語.