锘??xml version="1.0" encoding="utf-8" standalone="yes"?> 1. 鍩烘湰浠嬬粛錛?/strong> paoding 錛歀ucene涓枃鍒嗚瘝“搴栦竵瑙g墰” Paoding Analysis 2. 寮鍙戣呭強寮鍙戞椿璺冨害錛?/strong> paoding 錛?a style="white-space: nowrap" target="_blank">qieqie.wang錛?google code 涓婃渶鍚庝竴嬈′唬鐮佹彁浜わ細2008-06-12錛宻vn 鐗堟湰鍙?132 3. 鐢ㄦ埛鑷畾涔夎瘝搴擄細 paoding 錛氭敮鎸佷笉闄愬埗涓暟鐨勭敤鎴瘋嚜瀹氫箟璇嶅簱錛岀函鏂囨湰鏍煎紡錛屼竴琛屼竴璇嶏紝浣跨敤鍚庡彴綰跨▼媯嫻嬭瘝搴撶殑鏇存柊錛岃嚜鍔ㄧ紪璇戞洿鏂拌繃鐨勮瘝搴撳埌浜岃繘鍒剁増鏈紝騫跺姞杞?br />
imdict 錛氭殏鏃朵笉鏀寔鐢ㄦ埛鑷畾涔夎瘝搴撱備絾 鍘熺増 ICTCLAS 鏀寔銆傛敮鎸佺敤鎴瘋嚜瀹氫箟 stop words 4. 閫熷害錛堝熀浜庡畼鏂逛粙緇嶏紝闈炶嚜宸辨祴璇曪級 paoding 錛氬湪PIII 1G鍐呭瓨涓漢鏈哄櫒涓婏紝1縐?/strong> 鍙噯紜垎璇?100涓?/strong> 姹夊瓧 5. 綆楁硶鍜屼唬鐮佸鏉傚害 paoding 錛歴vn src 鐩綍涓鍏?.3M錛?涓猵roperties鏂囦歡錛?8涓猨ava鏂囦歡錛?895 琛屻備嬌鐢ㄤ笉鐢ㄧ殑 Knife 鍒囦笉鍚岀被鍨嬬殑嫻侊紝涓嶇畻寰堝鏉傘?br />
imdict 錛氳瘝搴?6.7M錛堣繖涓瘝搴撴槸蹇呴』鐨勶級錛宻rc 鐩綍 152k錛?0涓猨ava鏂囦歡錛?399琛屻備嬌鐢?ICTCLAS HHMM闅愰┈灝旂澶ā鍨嬶紝“鍒╃敤澶ч噺璇枡搴撶殑璁粌鏉ョ粺璁℃眽璇瘝姹囩殑璇嶉鍜岃煩杞鐜囷紝浠庤屾牴鎹繖浜涚粺璁$粨鏋滃鏁翠釜姹夎鍙ュ瓙璁$畻鏈浼肩劧(likelihood)鐨勫垏鍒?#8221; 6. 鏂囨。 paoding 錛氬嚑涔庢棤銆備唬鐮侀噷鏈変竴浜涙敞閲婏紝浣嗗洜涓哄疄鐜版瘮杈冨鏉傦紝璇諱唬鐮佽繕鏄湁涓浜涢毦搴︾殑銆?br />
imdict 錛?鍑犱箮鏃犮?ICTCLAS 涔熸病鏈夎緇嗙殑鏂囨。錛孒HMM闅愰┈灝旂澶ā鍨嬬殑鏁板鎬уお寮猴紝涓嶅お濂界悊瑙c?br />
mmseg4j 錛?MMSeg 綆楁硶 鏄嫳鏂囩殑錛屼絾鍘熺悊姣旇緝綆鍗曘傚疄鐜頒篃姣旇緝娓呮櫚銆?br />
ik 錛?鏈変竴涓猵df浣跨敤鎵嬪唽錛岄噷闈㈡湁浣跨敤紺轟緥鍜岄厤緗鏄庛?/p>
7. 鍏跺畠 paoding 錛氬紩鍏ラ殣鍠伙紝璁捐姣旇緝鍚堢悊銆俿earch 1.0 鐗堟湰灝辯敤鐨勮繖涓備富瑕佷紭鍔垮湪浜庡師鐢熸敮鎸佽瘝搴撴洿鏂版嫻嬨備富瑕佸姡鍔夸負浣滆呭凡緇忎笉鏇存柊鐢氳嚦涓嶇淮鎶や簡銆?br />
imdict 錛氳繘鍏ヤ簡 lucene trunk錛屽師鐗?ictclas 鍦ㄥ悇縐嶈瘎嫻嬩腑閮芥湁涓嶉敊鐨勮〃鐜幫紝鏈夊潥瀹炵殑鐞嗚鍩虹錛屼笉鏄釜浜哄北瀵ㄣ傜己鐐逛負鏆傛椂涓嶆敮鎸佺敤鎴瘋瘝搴撱?br />
mmseg4j 錛?鍦╟omplex鍩虹涓婂疄鐜頒簡鏈澶氬垎璇?max-word)錛屼絾鏄繕涓嶆垚鐔燂紝榪樻湁寰堝闇瑕佹敼榪涚殑鍦版柟銆?br />
ik 錛?nbsp; 閽堝Lucene鍏ㄦ枃媯绱紭鍖栫殑鏌ヨ鍒嗘瀽鍣↖KQueryParser 8. 緇撹 涓漢瑙夊緱錛屽彲浠ュ湪 mmseg4j 鍜?paoding 涓変竴涓傚叧浜庤繖涓や釜鍒嗚瘝鏁堟灉鐨勫姣旓紝鍙互鍙傝冿細 http://blog.chenlb.com/2009/04/mmseg4j-max-word-segment-compare-with-paoding-in-effect.html 鎴栬呰嚜宸卞啀鍖呰涓涓嬶紝灝?paoding 鐨勮瘝搴撴洿鏂版嫻嬪仛涓涓崟鐙殑妯″潡瀹炵幇錛岀劧鍚庡氨鍙互鍦ㄦ墍鏈夊熀浜庤瘝搴撶殑鍒嗚瘝綆楁硶涔嬮棿鏃犵紳鍒囨崲浜嗐?/p>
ps錛屽涓嶅悓鐨?field 浣跨敤涓嶅悓鐨勫垎璇嶅櫒鏄竴涓彲浠ヨ冭檻鐨勬柟娉曘傛瘮濡?tag 瀛楁錛屽氨搴旇浣跨敤涓涓渶綆鍗曠殑鍒嗚瘝鍣紝鎸夌┖鏍煎垎璇嶅氨鍙互浜嗐?/p>
imdict 錛歩mdict鏅鴻兘璇嶅吀鎵閲囩敤鐨勬櫤鑳戒腑鏂囧垎璇嶇▼搴?br />
mmseg4j 錛?鐢?Chih-Hao Tsai 鐨?MMSeg 綆楁硶 瀹炵幇鐨勪腑鏂囧垎璇嶅櫒
ik 錛氶噰鐢ㄤ簡鐗規湁鐨?#8220;姝e悜榪唬鏈緇嗙矑搴﹀垏鍒嗙畻娉?#8220;錛屽瀛愬鐞嗗櫒鍒嗘瀽妯″紡
imdict 錛?a target="_blank">XiaoPingGao錛?榪涘叆浜?lucene contribute錛宭ucene trunk 涓?contrib/analyzers/smartcn/ 鏈鍚庝竴嬈℃彁浜わ細2009-07-24錛?br />
mmseg4j 錛?a style="white-space: nowrap" target="_blank">chenlb2008錛実oogle code 涓?2009-08-03 錛堟槰澶╋級錛岀増鏈彿 57錛宭og涓猴細mmseg4j-1.7 鍒涘緩鍒嗘敮
ik 錛?a style="white-space: nowrap" target="_blank">linliangyi2005錛実oogle code 涓?2009-07-31錛岀増鏈彿 41
mmseg4j 錛氳嚜甯ogou璇嶅簱錛屾敮鎸佸悕涓?wordsxxx.dic錛?utf8鏂囨湰鏍煎紡鐨勭敤鎴瘋嚜瀹氫箟璇嶅簱錛屼竴琛屼竴璇嶃備笉鏀寔鑷姩媯嫻嬨?-Dmmseg.dic.path
ik 錛?鏀寔api綰х殑鐢ㄦ埛璇嶅簱鍔犺澆錛屽拰閰嶇疆綰х殑璇嶅簱鏂囦歡鎸囧畾錛屾棤 BOM 鐨?UTF-8 緙栫爜錛孿r\n 鍒嗗壊銆備笉鏀寔鑷姩媯嫻嬨?/p>
imdict 錛?strong>483.64 (瀛楄妭/縐?錛?strong>259517(姹夊瓧/縐?
mmseg4j 錛?complex 1200kb/s宸﹀彸, simple 1900kb/s宸﹀彸
ik 錛氬叿鏈?0涓囧瓧/縐掔殑楂橀熷鐞嗚兘鍔?/p>
mmseg4j 錛?svn src 鐩綍涓鍏?132k錛?3涓猨ava鏂囦歡錛?089琛屻?a target="_blank">MMSeg 綆楁硶 錛屾湁鐐瑰鏉傘?br />
ik 錛?svn src 鐩綍涓鍏?.6M(璇嶅吀鏂囦歡涔熷湪閲岄潰)錛?2涓猨ava鏂囦歡錛?217琛屻傚瀛愬鐞嗗櫒鍒嗘瀽錛岃窡paoding綾諱技錛屾涔夊垎鏋愮畻娉曡繕娌℃湁寮勬槑鐧姐?/p>
package com.rain.util;
import Java.io.FileInputStream;
import Java.io.FileNotFoundException;
import Java.io.IOException;
import Java.io.InputStream;
import Java.io.InputStreamReader;
import Java.io.Reader;
import Java.io.UnsupportedEncodingException;
import org.apache.lucene.demo.html.HTMLParser;
public class HTMLDocParser {
private String htmlPath;
private HTMLParser htmlParser;
public HTMLDocParser(String htmlPath){
this.htmlPath=htmlPath;
initHtmlParser();
}
public void initHtmlParser(){
InputStream inputStream=null;
try{
inputStream=new FileInputStream(htmlPath);
}catch(FileNotFoundException e){
e.printStackTrace();
}
if(null!=inputStream){
try{
htmlParser=new HTMLParser(new InputStreamReader(inputStream,"utf-8"));
}catch(UnsupportedEncodingException e){
e.printStackTrace();
}
}
}
public String getTitle(){
if(null!=htmlParser){
try{
return htmlParser.getTitle();
}catch(IOException e){
e.printStackTrace();
}catch(InterruptedException e){
e.printStackTrace();
}
}
return "";
}
public Reader getContent(){
if(null!=htmlParser){
try{
return htmlParser.getReader();
}catch(IOException e){
e.printStackTrace();
}
}
return null;
}
public String getPath(){
return this.htmlPath;
}
}
鎻忚堪鎼滅儲緇撴灉鐨勭粨鏋勫疄浣揃ean
package com.rain.search;
public class SearchResultBean {
private String htmlPath;
private String htmlTitle;
public String getHtmlPath() {
return htmlPath;
}
public void setHtmlPath(String htmlPath) {
this.htmlPath = htmlPath;
}
public String getHtmlTitle() {
return htmlTitle;
}
public void setHtmlTitle(String htmlTitle) {
this.htmlTitle = htmlTitle;
}
}
绱㈠紩瀛愮郴緇熺殑瀹炵幇
package com.rain.index;
import Java.io.File;
import Java.io.IOException;
import Java.io.Reader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.document.Field;
import com.rain.util.HTMLDocParser;
public class IndexManager {
//the directory that stores HTML files
private final String dataDir="E:\\dataDir";
//the directory that is used to store a Lucene index
private final String indexDir="E:\\indexDir";
public boolean creatIndex()throws IOException{
if(true==inIndexExist()){
return true;
}
File dir=new File(dataDir);
if(!dir.exists()){
return false;
}
File[] htmls=dir.listFiles();
Directory fsDirectory=FSDirectory.getDirectory(indexDir,true);
Analyzer analyzer=new StandardAnalyzer();
IndexWriter indexWriter=new IndexWriter(fsDirectory,analyzer,true);
for(int i=0;i<htmls.length;i++){
String htmlPath=htmls[i].getAbsolutePath();
if(htmlPath.endsWith(".html")||htmlPath.endsWith("htm")){
addDocument(htmlPath,indexWriter);
}
}
indexWriter.optimize();
indexWriter.close();
return true;
}
public void addDocument(String htmlPath,IndexWriter indexWriter){
HTMLDocParser htmlParser=new HTMLDocParser(htmlPath);
String path=htmlParser.getPath();
String title=htmlParser.getTitle();
Reader content=htmlParser.getContent();
Document document=new Document();
document.add(new Field("path",path,Field.Store.YES,Field.Index.NO));
document.add(new Field("title",title,Field.Store.YES,Field.Index.TOKENIZED));
document.add(new Field("content",content));
try{
indexWriter.addDocument(document);
}catch(IOException e){
e.printStackTrace();
}
}
public String getDataDir(){
return this.dataDir;
}
public String getIndexDir(){
return this.indexDir;
}
public boolean inIndexExist(){
File directory=new File(indexDir);
if(0<directory.listFiles().length){
return true;
}else{
return false;
}
}
}
鎼滅儲鍔熻兘鐨勫疄鐜?br />package com.rain.search;
import Java.io.IOException;
import Java.util.ArrayList;
import Java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import com.rain.index.IndexManager;
public class SearchManager {
private String searchWord;
private IndexManager indexManager;
private Analyzer analyzer;
public SearchManager(String searchWord){
this.searchWord=searchWord;
this.indexManager=new IndexManager();
this.analyzer=new StandardAnalyzer();
}
/**
* do search
*/
public List search(){
List searchResult=new ArrayList();
if(false==indexManager.inIndexExist()){
try{
if(false==indexManager.creatIndex()){
return searchResult;
}
}catch(IOException e){
e.printStackTrace();
return searchResult;
}
}
IndexSearcher indexSearcher=null;
try{
indexSearcher=new IndexSearcher(indexManager.getIndexDir());
}catch(IOException e){
e.printStackTrace();
}
QueryParser queryParser=new QueryParser("content",analyzer);
Query query=null;
try{
query=queryParser.parse(searchWord);
}catch(ParseException e){
e.printStackTrace();
}
if(null!=query&&null!=indexSearcher){
try{
Hits hits=indexSearcher.search(query);
for(int i=0;i<hits.length();i++){
SearchResultBean resultBean=new SearchResultBean();
resultBean.setHtmlPath(hits.doc(i).get("path"));
resultBean.setHtmlTitle(hits.doc(i).get("title"));
searchResult.add(resultBean);
}
}catch(IOException e){
e.printStackTrace();
}
}
return searchResult;
}
}
璇鋒眰綆$悊鍣ㄧ殑瀹炵幇
package com.rain.servlet;
import Java.io.IOException;
import Java.util.List;
import javax.servlet.RequestDispatcher;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.rain.search.SearchManager;
/**
* @author zhourui
* 2007-1-28
*/
public class SearchController extends HttpServlet {
private static final long serialVersionUID=1L;
/* (non-Javadoc)
* @see javax.servlet.http.HttpServlet#doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
*/
@Override
protected void doPost(HttpServletRequest arg0, HttpServletResponse arg1) throws ServletException, IOException {
// TODO Auto-generated method stub
String searchWord=arg0.getParameter("searchWord");
SearchManager searchManager=new SearchManager(searchWord);
List searchResult=null;
searchResult=searchManager.search();
RequestDispatcher dispatcher=arg0.getRequestDispatcher("search.jsp");
arg0.setAttribute("searchResult",searchResult);
dispatcher.forward(arg0, arg1);
}
}
闅忕潃緋葷粺淇℃伅鐨勮秺鏉ヨ秺澶氾紝鎬庝箞鏍蜂粠榪欎簺淇℃伅嫻鋒磱涓崬璧瘋嚜宸辨兂瑕佺殑閭d竴鏍歸拡灝卞彉寰楅潪甯擱噸瑕佷簡錛屽叏鏂囨绱㈡槸閫氬父鐢ㄤ簬瑙e喅姝ょ被闂鐨勬柟妗堬紝鑰孡ucene鍒欎負瀹炵幇鍏ㄦ枃媯绱㈢殑宸ュ叿錛屼換浣曞簲鐢ㄩ兘鍙氳繃宓屽叆瀹冩潵瀹炵幇鍏ㄦ枃媯绱€?/p>
浜? 鐜鎼緩
浠巐ucene.apache.org涓婁笅杞芥渶鏂扮増鏈殑lucene.jar錛屽皢姝ar浣滀負欏圭洰鐨刡uild path錛岄偅涔堝湪欏圭洰涓氨鍙互鐩存帴浣跨敤lucene浜嗐?/p>
涓? 浣跨敤璇存槑
3.1. 鍩烘湰姒傚康
榪欓噷浠嬬粛鐨勪富瑕佷負鍦ㄤ嬌鐢ㄤ腑緇忓父紕板埌涓浜涙蹇碉紝浠ュぇ瀹墮兘姣旇緝鐔熸倝鐨勬暟鎹簱鏉ヨ繘琛岀被姣旂殑璁茶В錛屼嬌鐢↙ucene榪涜鍏ㄦ枃媯绱㈢殑榪囩▼鏈夌偣綾諱技鏁版嵁搴撶殑榪欎釜榪囩▼錛宼able---à鏌ヨ鐩稿簲鐨勫瓧孌墊垨鏌ヨ鏉′歡----à榪斿洖鐩稿簲鐨勮褰曪紝棣栧厛鏄疘ndexWriter錛岄氳繃瀹冨緩绔嬬浉搴旂殑绱㈠紩琛紝鐩稿綋浜庢暟鎹簱涓殑table錛屽湪鏋勫緩姝ょ儲寮曡〃鏃墮渶鎸囧畾鐨勪負璇ョ儲寮曡〃閲囩敤浣曠鏂瑰紡榪涜鏋勫緩錛屼篃灝辨槸璇村浜庡叾涓殑璁板綍鐨勫瓧孌典互浠涔堟柟寮忔潵榪涜鏍煎紡鐨勫垝鍒嗭紝榪欎釜鍦↙ucene涓О涓篈nalyzer錛孡ucene鎻愪緵浜嗗嚑縐嶇幆澧冧笅浣跨敤鐨凙nalyzer錛歋impleAnalyzer銆丼tandardAnalyzer銆丟ermanAnalyzer絳夛紝鍏朵腑StandardAnalyzer鏄粡甯鎬嬌鐢ㄧ殑錛屽洜涓哄畠鎻愪緵浜嗗浜庝腑鏂囩殑鏀寔錛屽湪琛ㄥ緩濂藉悗鎴戜滑灝遍渶瑕佸線閲岄潰鎻掑叆鐢ㄤ簬绱㈠紩鐨勮褰曪紝鍦↙ucene涓繖涓О涓篋ocument錛屾湁鐐圭被浼兼暟鎹簱涓璽able鐨勪竴琛岃褰曪紝璁板綍涓殑瀛楁鐨勬坊鍔犳柟娉曪紝鍦↙ucene涓О涓篎ield錛岃繖涓拰鏁版嵁搴撲腑鍩烘湰涓鏍鳳紝瀵逛簬Field Lucene鍒嗕負鍙绱㈠紩鐨勶紝鍙垏鍒嗙殑錛屼笉鍙鍒囧垎鐨勶紝涓嶅彲琚儲寮曠殑鍑犵緇勫悎綾誨瀷錛岄氳繃榪欏嚑涓厓绱犲熀鏈笂灝卞彲浠ュ緩绔嬭搗绱㈠紩浜嗐傚湪鏌ヨ鏃剁粡甯哥鍒扮殑涓哄彟澶栧嚑涓蹇碉紝棣栧厛鏄疩uery錛孡ucene鎻愪緵浜嗗嚑縐嶇粡甯稿彲浠ョ敤鍒扮殑Query錛歍ermQuery銆丮ultiTermQuery銆丅ooleanQuery銆乄ildcardQuery銆丳hraseQuery銆丳refixQuery銆丳hrasePrefixQuery銆丗uzzyQuery銆丷angeQuery銆丼panQuery錛孮uery鍏跺疄涔熷氨鏄寚瀵逛簬闇瑕佹煡璇㈢殑瀛楁閲囩敤浠涔堟牱鐨勬柟寮忚繘琛屾煡璇紝濡傛ā緋婃煡璇€佽涔夋煡璇€佺煭璇煡璇€佽寖鍥存煡璇€佺粍鍚堟煡璇㈢瓑錛岃繕鏈夊氨鏄疩ueryParser錛孮ueryParser鍙敤浜庡垱寤轟笉鍚岀殑Query錛岃繕鏈変竴涓狹ultiFieldQueryParser鏀寔瀵逛簬澶氫釜瀛楁榪涜鍚屼竴鍏抽敭瀛楃殑鏌ヨ錛孖ndexSearcher姒傚康鎸囩殑涓洪渶瑕佸浣曠洰褰曚笅鐨勭儲寮曟枃浠惰繘琛屼綍縐嶆柟寮忕殑鍒嗘瀽鐨勬煡璇紝鏈夌偣璞″鏁版嵁搴撶殑鍝绱㈠紩琛ㄨ繘琛屾煡璇㈠茍鎸変竴瀹氭柟寮忚繘琛岃褰曚腑瀛楁鐨勫垎瑙f煡璇㈢殑姒傚康錛岄氳繃IndexSearcher浠ュ強Query鍗沖彲鏌ヨ鍑洪渶瑕佺殑緇撴灉錛孡ucene榪斿洖鐨勪負Hits.閫氳繃閬嶅巻Hits鍙幏鍙栬繑鍥炵殑緇撴灉鐨凞ocument錛岄氳繃Document鍒欏彲鑾峰彇Field涓殑鐩稿叧淇℃伅浜嗐?br />
姣旇緝涓涓婰ucene鍜屾暟鎹簱錛?/p>
Lucene | 鏁版嵁搴?/td> |
绱㈠紩鏁版嵁婧愶細doc(field1,field2...) doc(field1,field2...) |
绱㈠紩鏁版嵁婧愶細record(field1,field2...) record(field1..) |
Document錛氫竴涓渶瑕佽繘琛岀儲寮曠殑“鍗曞厓” 涓涓狣ocument鐢卞涓瓧孌電粍鎴?/td> | Record錛氳褰曪紝鍖呭惈澶氫釜瀛楁 |
Field錛氬瓧孌?/td> | Field錛氬瓧孌?/td> |
Hits錛氭煡璇㈢粨鏋滈泦錛岀敱鍖歸厤鐨凞ocument緇勬垚 | RecordSet錛氭煡璇㈢粨鏋滈泦錛岀敱澶氫釜Record緇勬垚 |
闇瑕佺啛鎮夊嚑涓帴鍙o細
鍒嗘瀽鍣ˋnalyzer
鍒嗘瀽鍣ㄤ富瑕佸伐浣滄槸絳涢夛紝涓孌墊枃妗h繘鏉ヤ互鍚庯紝緇忚繃瀹冿紝鍑哄幓鐨勬椂鍊欏彧鍓╀笅閭d簺鏈夌敤鐨勯儴鍒嗭紝鍏朵粬鍒欏墧闄ゃ傝岃繖涓垎鏋愬櫒涔熷彲浠ヨ嚜宸辨牴鎹渶瑕佽岀紪鍐欍?br /> org.apache.lucene.analysis.Analyzer錛氳繖鏄竴涓櫄鏋勭被錛屼互涓嬩袱涓熷彛鍧囩戶鎵垮畠鑰屾潵銆?/span>
org.apache.lucene.analysis.SimpleAnalyzer錛氬垎鏋愬櫒錛屾敮鎸佹渶綆鍗曟媺涓佽璦銆?br /> org.apache.lucene.analysis.standard.StandardAnalyzer錛氭爣鍑嗗垎鏋愬櫒錛岄櫎浜嗘媺涓佽璦榪樻敮鎸佷簹媧茶璦錛屽茍鍦ㄤ竴浜涘尮閰嶅姛鑳戒笂榪涜瀹屽杽銆傚湪榪欎釜鎺ュ彛涓繕鏈変竴涓緢閲嶈鐨勬瀯閫犲嚱鏁幫細StandardAnalyzer(String[] stopWords)錛屽彲浠ュ鍒嗘瀽鍣ㄥ畾涔変竴浜涗嬌鐢ㄨ瘝璇紝榪欎笉浠呭彲浠ュ厤闄ゆ绱竴浜涙棤鐢ㄤ俊鎭紝鑰屼笖榪樺彲浠ュ湪媯绱腑瀹氫箟紱佹鐨勬斂娌繪с侀潪娉曟х殑媯绱㈠叧閿瘝銆?/span>
IndexWriter
IndexWriter鐨勬瀯閫犲嚱鏁版湁涓夌鎺ュ彛錛岄拡瀵圭洰褰旸irectory銆佹枃浠禙ile銆佹枃浠惰礬寰凷tring涓夌鎯呭喌銆?br />渚嬪IndexWriter(String path, Analyzer a, boolean create)錛宲ath涓烘枃浠惰礬寰勶紝a涓哄垎鏋愬櫒錛宑reate鏍囧織鏄惁閲嶅緩绱㈠紩錛坱rue錛氬緩绔嬫垨鑰呰鐩栧凡瀛樺湪鐨勭儲寮曪紝false錛氭墿灞曞凡瀛樺湪鐨勭儲寮曘傦級
涓浜涢噸瑕佺殑鏂規硶錛?/span>
鎺ュ彛鍚??xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /?> | 澶囨敞 |
addDocument(Document doc) | 绱㈠紩娣誨姞涓涓枃妗?o:p> |
addIndexes(Directory[] dirs) | 灝嗙洰褰曚腑宸插瓨鍦ㄧ儲寮曟坊鍔犲埌榪欎釜绱㈠紩 |
addIndexes(IndexReader[] readers) | 灝嗘彁渚涚殑绱㈠紩娣誨姞鍒拌繖涓儲寮?o:p> |
optimize() | 鍚堝茍绱㈠紩騫朵紭鍖?o:p> |
close() | 鍏抽棴 |
鎺ュ彛鍚?o:p> | 澶囨敞 |
add(Field field) | 娣誨姞涓涓瓧孌碉紙Field錛夊埌Document涓?o:p> |
String get(String name) | 浠庢枃妗d腑鑾峰緱涓涓瓧孌靛搴旂殑鏂囨湰 |
Field getField(String name) | 鐢卞瓧孌靛悕鑾峰緱瀛楁鍊?o:p> |
Field[] getFields(String name) | 鐢卞瓧孌靛悕鑾峰緱瀛楁鍊肩殑闆?o:p> |
Name | Stored | Indexed | Tokenized | use |
Keyword(String name, String value) | Y | Y | N | date,url |
Text(String name, Reader value) | N | Y | Y | short text fields: title,subject |
Text(String name, String value) | Y | Y | Y | longer text fields, like “body” |
UnIndexed(String name, String value) | Y | N | N | |
UnStored(String name, String value) | N | Y | Y | |
鎺ュ彛鍚?o:p> | 澶囨敞 |
Doc(int n) | 榪斿洖絎琻涓殑鏂囨。鐨勬墍鏈夊瓧孌?o:p> |
length() | 榪斿洖榪欎釜闆嗕腑鐨勫彲鐢ㄤ釜鏁?o:p> |
绱㈠紩寤虹珛閮ㄥ垎鐨勪唬鐮侊細
private void createIndex(String indexFilePath) throws Exception{
IndexWriter iwriter=getWriter(indexFilePath);
Document doc=new Document();
doc.add(Field.Keyword("name","jerry"));
doc.add(Field.Text("sender","bluedavy@gmail.com"));
doc.add(Field.Text("receiver","google@gmail.com"));
doc.add(Field.Text("title","鐢ㄤ簬绱㈠紩鐨勬爣棰?));
doc.add(Field.UnIndexed("content","涓嶅緩绔嬬儲寮曠殑鍐呭"));
Document doc2=new Document();
doc2.add(Field.Keyword("name","jerry.lin"));
doc2.add(Field.Text("sender","bluedavy@hotmail.com"));
doc2.add(Field.Text("receiver","msn@hotmail.com"));
doc2.add(Field.Text("title","鐢ㄤ簬绱㈠紩鐨勭浜屼釜鏍囬"));
doc2.add(Field.Text("content","寤虹珛绱㈠紩鐨勫唴瀹?));
iwriter.addDocument(doc);
iwriter.addDocument(doc2);
iwriter.optimize();
iwriter.close();
}
private IndexWriter getWriter(String indexFilePath) throws Exception{
boolean append=true;
File file=new File(indexFilePath+File.separator+"segments");
if(file.exists())
append=false;
return new IndexWriter(indexFilePath,analyzer,append);
}
3.2.1. 瀵逛簬鏌愬瓧孌電殑鍏抽敭瀛楃殑妯$硦鏌ヨ
Query query=new WildcardQuery(new Term("sender","*davy*"));
Searcher searcher=new IndexSearcher(indexFilePath);
Hits hits=searcher.search(query);
for (int i = 0; i < hits.length(); i++) {
System.out.println(hits.doc(i).get("name"));
}
3.2.2. 瀵逛簬鏌愬瓧孌電殑鍏抽敭瀛楃殑璇箟鏌ヨ
Query query=QueryParser.parse("绱㈠紩","title",analyzer);
Searcher searcher=new IndexSearcher(indexFilePath);
Hits hits=searcher.search(query);
for (int i = 0; i < hits.length(); i++) {
System.out.println(hits.doc(i).get("name"));
}
3.2.3. 瀵逛簬澶氬瓧孌電殑鍏抽敭瀛楃殑鏌ヨ
Query query=MultiFieldQueryParser.parse("绱㈠紩",new String[]{"title","content"},analyzer);
Searcher searcher=new IndexSearcher(indexFilePath);
Hits hits=searcher.search(query);
for (int i = 0; i < hits.length(); i++) {
System.out.println(hits.doc(i).get("name"));
}
3.2.4. 澶嶅悎鏌ヨ(澶氱鏌ヨ鏉′歡鐨勭患鍚堟煡璇?
Query query=MultiFieldQueryParser.parse("绱㈠紩",new String[]{"title","content"},analyzer);
Query mquery=new WildcardQuery(new Term("sender","bluedavy*"));
TermQuery tquery=new TermQuery(new Term("name","jerry"));
BooleanQuery bquery=new BooleanQuery();
bquery.add(query,true,false);
bquery.add(mquery,true,false);
bquery.add(tquery,true,false);
Searcher searcher=new IndexSearcher(indexFilePath);
Hits hits=searcher.search(bquery);
for (int i = 0; i < hits.length(); i++) {
System.out.println(hits.doc(i).get("name"));
}
鍥? 鎬葷粨
鐩鎬俊澶у閫氳繃涓婇潰鐨勮鏄庤兘鐭ラ亾Lucene鐨勪竴涓熀鏈殑浣跨敤鏂規硶錛屽湪鍏ㄦ枃媯绱㈡椂寤鴻澶у鍏堥噰鐢ㄨ涔夋椂鐨勬悳绱紝鍏堟悳绱㈠嚭鏈夋剰涔夌殑鍐呭錛屼箣鍚庡啀榪涜妯$硦涔嬬被鐨勬悳绱紝^_^錛岃繖涓繕鏄渶瑕佹牴鎹悳绱㈢殑闇姹傛墠鑳藉畾浜嗭紝Lucene榪樻彁渚涗簡寰堝鍏朵粬鏇村ソ鐢ㄧ殑鏂規硶錛岃繖涓氨絳夊緟澶у鍦ㄤ嬌鐢ㄧ殑榪囩▼涓嚜宸卞幓榪涗竴姝ョ殑鎽哥儲浜嗭紝姣斿瀵逛簬Lucene鏈韓鎻愪緵鐨凲uery鐨勬洿鐔熺粌鐨勬帉鎻★紝瀵逛簬Filter銆丼orter鐨勪嬌鐢紝鑷繁鎵╁睍瀹炵幇Analyzer錛岃嚜宸卞疄鐜癚uery絳夌瓑錛岀敋鑷沖彲浠ュ幓浜嗚В涓浜涘叧浜庢悳绱㈠紩鎿庣殑鎶鏈?鍒囪瘝銆佺儲寮曟帓搴?etc)絳夌瓑