锘??xml version="1.0" encoding="utf-8" standalone="yes"?>jzzijzzij在线观看亚洲熟妇,蜜芽亚洲av无码精品色午夜,亚洲色大成网站www永久一区 http://www.tkk7.com/tim-wu/zh-cnThu, 03 Jul 2025 23:55:05 GMTThu, 03 Jul 2025 23:55:05 GMT60Lucene鐨勭儲寮曠粨鏋勫浘http://www.tkk7.com/tim-wu/archive/2008/02/27/182532.html楣忛涓囬噷楣忛涓囬噷Wed, 27 Feb 2008 10:14:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/02/27/182532.htmlhttp://www.tkk7.com/tim-wu/comments/182532.htmlhttp://www.tkk7.com/tim-wu/archive/2008/02/27/182532.html#Feedback0http://www.tkk7.com/tim-wu/comments/commentRss/182532.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/182532.html鍙嶅悜绱㈠紩錛?br />

姝e悜绱㈠紩錛堣崏紼匡紝涓嶅畬鍏紝鍥犱負鏀跺埌field info鐨勫獎鍝嶏紝涓嶅悓鐨刦ield瀛樺偍鍐呭涓嶅悓錛屼笖fieldInfo鐨勬湁浜涗俊鎭?TOKENIZED BINARY COMPRESSED涔熸槸淇濆瓨鍦?fdt鐨勬瘡涓猟ocument鐩稿叧孌電殑bits涓?鑰屼笉鏄?fnm涓級:


]]>
Lucene鍜孏CJhttp://www.tkk7.com/tim-wu/archive/2008/02/14/179935.html楣忛涓囬噷楣忛涓囬噷Thu, 14 Feb 2008 07:27:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/02/14/179935.htmlhttp://www.tkk7.com/tim-wu/comments/179935.htmlhttp://www.tkk7.com/tim-wu/archive/2008/02/14/179935.html#Feedback0http://www.tkk7.com/tim-wu/comments/commentRss/179935.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/179935.html 鐩存帴璋冪敤鎿嶄綔緋葷粺綰у埆鐨刵ative鏂規硶錛岀浉淇¤鍐欐ц兘鑳藉鏋佸ぇ寰楁彁楂樺晩銆?br />
鍏蜂綋浠g爜鍙Lucene鐨刧cj鐩綍錛岀紪璇戜嬌鐢╝nt gcj

]]>
澶囧繕錛歭ucene涓殑ranking綆楁硶http://www.tkk7.com/tim-wu/archive/2008/02/09/179504.html楣忛涓囬噷楣忛涓囬噷Sat, 09 Feb 2008 09:58:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/02/09/179504.htmlhttp://www.tkk7.com/tim-wu/comments/179504.htmlhttp://www.tkk7.com/tim-wu/archive/2008/02/09/179504.html#Feedback0http://www.tkk7.com/tim-wu/comments/commentRss/179504.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/179504.html璇存槑瑙丼imilarity.java鐨刯avadoc淇℃伅錛?br />

綆楁硶璇峰弬鑰僯avadoc鐨勶紝瀹冧嬌鐢ㄧ殑鏄?a onclick="return top.js.OpenExtLink(window,event,this)" target="_blank">Vector Space Model (VSM) of Information Retrieval銆?

                閽堝涓鏉℃煡璇㈣鍙(query)錛屼竴涓猟(document)鐨勫緱鍒嗗叕寮?/div>
score(q,d)   =   coord(q,d)  ·  queryNorm(q)  ·  鈭?/big> ( tf(t in d)  ·  idf(t)2  ·  t.getBoost() ·  norm(t,d) )
t in q
               鍏朵腑錛?/div>
               tf(t in d) 琛ㄧず鏌愪釜term鐨勫嚭鐜伴鐜囷紝瀹氫箟浜唗erm t鍑虹幇鍦ㄥ綋鍓嶅湴document d鐨勬鏁般?閭d簺query涓粰瀹氬湴term錛屽鏋滃嚭鐜拌秺澶氭鐨勶紝寰楀垎瓚婇珮銆傚畠鍦ㄩ粯璁ゅ疄鐜癉efaultSimilarity鐨勫叕寮忎負
tf(t in d)   =   frequency½
               idf(t) 琛ㄧず鍙嶅悜鏂囨。棰戠巼銆傝繖涓弬鏁拌〃紺篸ocFreq(term t涓鍏卞湪澶氬皯涓枃妗d腑鍑虹幇)鐨勫弽鍚戝獎鍝嶅箋傚畠鎰忓懗鐫鍦ㄨ秺灝戞枃妗d腑鍑?wbr>鐜扮殑terms璐$尞瓚婇珮鍦板垎鏁般傚畠鍦ㄩ粯璁ゅ疄鐜癉efaultSimilarity鐨勫叕寮忎負:
idf(t)  =   1 + log (
numDocs
–––––––––
docFreq+1
)
                coord(q,d) 鏄竴涓熀浜庡湪璇ユ枃妗d腑鍑虹幇浜嗗灝戜釜query涓殑terms鐨勫緱鍒?wbr>鍥犵礌銆傛枃妗d腑鍑虹幇鐨剄uery涓殑terms鏁伴噺/query鎬誨叡澶氬皯涓猶uery鏁伴噺銆傚吀鍨嬬殑錛屼竴涓枃妗e寘鍚秺澶氬湴query涓殑terms浼氬緱鍒?wbr>鏇撮珮鍦板垎銆俆his is a search time factor computed in coord(q,d) by the Similarity in effect at search time. 
                queryNorm(q) 鏄竴涓爣鍑嗗寲鍙傛暟錛屽畠鏄敤鏉ュ尯鍒嗘瘮杈冧笉鍚宷ueries鏃剁殑鍥犵礌錛岃繖涓洜绱犱笉褰卞搷document ranking (鍥犱負鎵鏈夌殑ranked document閮戒細涔樹互鐩稿悓鐨勫?錛屼絾鏄笉鍚屽湴queries錛堟垨榪欎笉鍚屽湴indexes涓級瀹冧細寰楀埌涓嶅悓鐨勫彲鐢ㄤ簬姣旇緝鐨勫?wbr>銆俆his is a search time factor computed by the Similarity in effect at search time. 瀹冨湪榛樿瀹炵幇DefaultSimilarity鐨勫叕寮忎負:
queryNorm(q)   =   queryNorm(sumOfSquaredWeights)   =  
1
––––––––––––––
sumOfSquaredWeights½
                鍏朵腑鐨剆umOfSquaredWeights(of the query terms)鏄牴鎹畉he query Weight object璁$畻鍑烘潵鐨? For example, a boolean query computes this value as:
sumOfSquaredWeights   =   q.getBoost() 2  ·  鈭?/big> ( idf(t)  ·  t.getBoost() ) 2
t in q
 
                t.getBoost() 鏄竴涓猼erm t鍦╭uery q涓殑search time boost錛?瀹冩槸鍦╰he query text (see query syntax)涓寚瀹氱殑, 鎴栬呰搴旂敤紼嬪簭鐩存帴璋冪敤setBoost()璁劇疆鐨? 娉ㄦ剰錛岃繖鍎挎病鏈夌洿鎺ョ殑API鍘昏闂湪 a multi term query鐨勪竴涓猼erm鐨刡oost鍊鹼紝浣嗘槸multi terms浼氫互multi TermQuery objects鍦ㄤ竴涓猶uery涓琛ㄧず,鍥犳the boost of a term in the query鍙互浣跨敤瀛恞uery鐨?a>getBoost()鍙嶉棶鍒? 
                norm(t,d) 灝佽(encapsulates)浜嗕竴浜?indexing time)鐨刡oost鍜宭ength factors:  ???榪欎釜鍙傛暟涔嬪拰field涓璽okens鐨勬暟閲忔湁鍏?wbr>錛屽拰term鏈韓鏃犲叧???
                          Document boost - set by calling doc.setBoost() before adding the document to the index.
                          Field boost - set by calling field.setBoost() before adding the field to a document.
                          lengthNorm(field) -銆傚綋鏂囨。琚姞鍏ュ埌绱㈠紩鏃惰綆楋紝錛屽拰document鐨刦ield涓殑tokens鐨勬暟閲忔湁鍏籌紝鍥犳錛屼竴涓瘮杈冪煭鐨刦ields璐$尞鏇撮珮鐨勫垎鏁般侺engthNorm  is computed by the Similarity class in effect at indexing. DefaultSimilarity涓殑瀹炵幇涓?float)(1.0 / Math.sqrt(numTerms));
                    褰撲竴涓枃妗h鍔犲叆绱㈠紩鏃訛紝涓婅堪鍥犵礌浼氳鐩鎬箻銆傚鏋滄枃妗f湁澶氫釜fields鍚屽悕錛屼粬浠殑boosts鏁板間細琚嬈$浉涔樸?br />  
 
norm(t,d)   =   doc.getBoost()  ·  lengthNorm(field)  ·  f.getBoost()
field f in d named as t
                     浣嗘槸錛岃綆楀嚭鐨刵orm鏁板煎湪瀛樺偍鏃舵槸浣跨敤涓涓猘 single byte緙栫爜鐨勩俿earch鏃訛紝榪欎釜norm byte浠巌ndex directory璇誨彇錛屽茍涓旇瑙g爜鍥瀎loat銆傝繖涓紪鐮?wbr>/瑙g爜綆楁硶浼氫駭鐢熺簿搴︿涪澶便?nbsp;- it is not guaranteed that decode(encode(x)) = x. For instance, decode(encode(0.89)) = 0.75. Also notice that search time is too late to modify this norm part of scoring, e.g. by using a different Similarity for search. 


]]>Lucene濡備綍鎺у埗segments鐨勬暟閲?http://www.tkk7.com/tim-wu/archive/2008/02/06/179380.html楣忛涓囬噷楣忛涓囬噷Tue, 05 Feb 2008 17:58:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/02/06/179380.htmlhttp://www.tkk7.com/tim-wu/comments/179380.htmlhttp://www.tkk7.com/tim-wu/archive/2008/02/06/179380.html#Feedback3http://www.tkk7.com/tim-wu/comments/commentRss/179380.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/179380.html 鍦ㄦ悳绱㈡椂錛孡ucene浼氶亶鍘嗚繖浜泂egments錛屼互segments涓哄熀鏈崟浣嶇嫭绔嬫悳绱㈡瘡涓猻egments鏂囦歡錛岃屽悗鍐嶆妸鎼滅儲緇撴灉鍚堝茍銆?br />
寤虹珛绱㈠紩鏂囦歡鐨勮繃紼嬶紝瀹為檯灝辨槸鎶奷ocuments鏂囦歡涓涓釜鍔犲叆绱㈠紩涓紝Lucene鐨勫仛娉曟槸鏈寮濮嬩負姣忎釜鏂板姞鍏ョ殑document鐙珛鐢熸垚涓涓猻egment錛屾斁鍦ㄥ唴瀛樹腑銆傝屽悗錛屽綋鍐呭瓨涓璼egments鏁伴噺鍒拌揪涓涓槞鍊兼椂錛屽悎騫惰繖浜泂egments錛屾柊鐢熸垚涓涓猻egment鍔犲叆鏂囦歡緋葷粺鐨剆egments鍒楄〃涓?br /> 鑰屽綋鏂囦歡緋葷粺鐨剆egments鏁伴噺榪囧鏃訛紝鍔垮繀褰卞搷鎼滅儲鏁堢巼錛屽洜姝ら渶瑕佷笉鏂悎騫惰繖浜泂egments鏂囦歡銆?br />
閭d箞Lucene鐨勫悎騫剁瓥鐣ユ槸浠涔堬紵濡備綍淇濊瘉鍚堥傜殑segments鏁伴噺鍛紵

鍏跺疄Lucene鏈変袱濂楀熀鏈殑絳栫暐錛?br /> 絎竴縐嶇瓥鐣ュ疄鐜頒唬鐮佷綅浜嶪ndexWriter#optimize()鍑芥暟錛屽叾瀹炲氨鏄妸鎵鏈塻egments鏂囦歡鍚堝茍鎴愪竴涓枃浠躲傚悎騫剁殑綆楁硶鏄掑綊鍚堝茍鍒楄〃鏈鍚庣殑mergeFactor涓猻egments鏂囦歡鐩村埌鍚堝茍鎴愪竴涓枃浠躲傝繖鍎縨ergeFactor鏄疞ucene鐨勪竴涓弬鏁般?br /> btw: 浠庣畻娉曠粏鑺備笂鐪嬶紝鍏跺疄鎴戜笉鏄枩嬈㈣繖孌靛疄鐜幫紝鍥犱負鍒楄〃鐨勬渶鍚巑ergeFactor涓枃浠跺唴瀹瑰疄闄呰鎵弿浜唖egmens_count/mergeFactor嬈°傚鏋滃垎孌靛綊綰沖悎騫剁殑鏂瑰紡涓嶇煡閬撴槸鍚︽洿濂斤紝姣忎釜segment鏂囦歡鍐呭閮藉皢琚壂鎻?ceil(Log_mergeFactor(segmens_count)) 鎴朿eil(Log_mergeFactor(segmens_count)) +1嬈★紝鏄惁鏇村ソ?

絎簩縐嶇瓥鐣ュ疄鐜頒唬鐮佷綅浜嶪ndexWriter#maybeMergeSegments()鍑芥暟涓紝榪欎釜浠g爜灝卞鏉備簡錛屽畠鐨勫熀鏈瓥鐣ュ氨鏄姹傜‘淇濆悎騫跺悗涓や釜鍏紡鐨勬垚绔?
鍋囧畾segments鏄釜鏈夊簭鍒楄〃錛孊琛ㄧずmaxBufferedDocs錛宖(n)=ceil(log_M(ceil(n/B)))錛孧琛ㄧずmergeFactor錛岃繖鍎縨axBufferedDocs鍜宮ergeFactor鏄袱涓弬鏁?br /> 1. 濡傛灉絎琲涓猻egment鐨刣ocuments鏁伴噺涓簒錛岀i+1涓猻egment鐨刣ocuments鏁伴噺涓簓錛岄偅涔坒(x)>f(y)涓瀹氭垚绔?br /> 2. f(n)鍊肩浉鍚岀殑segments鐨勬暟閲忎笉寰楄秴榪嘙銆?br /> 閭d箞maybeMergeSegments()鍑芥暟鏄浣曠‘淇濊繖涓や釜鍏紡鎴愮珛鐨勫憿?
1.棣栧厛錛屼粠浠g爜錛?br />
    protected final void maybeFlushRamSegments() throws IOException {
        
// A flush is triggered if enough new documents are buffered or
        
// if enough delete terms are buffered
        if (ramSegmentInfos.size() >= minMergeDocs
                
|| numBufferedDeleteTerms >= maxBufferedDeleteTerms) {
            flushRamSegments();
        }
    }
榪欏効minMergeDocs=maxBufferedDocs, 鍥犳鍙互鐪嬪嚭錛屽綋鍐呭瓨涓紦瀛樼殑segments琚悎騫跺啓鍥炵鐩樻椂鐢熸垚鐨剆egment鐨刣ocument count絳変簬鎴栧皬浜巑axBufferedDocs錛堝嵆minMergeDocs錛夈?br /> 琛ュ厖錛氬洜涓簃aybeMergeSegments()榪愯鍦ㄥ悓姝ヤ唬鐮佷腑錛屽洜姝ゅ彧瑕乺amSegmentInfos.size==minMergerDocs(鍗砿axBufferedDocs)灝變細鍐欏洖紓佺洏錛屽洜姝ゅ簲璇ヤ笉瀛樺湪ramSegmentInfos.size>maxBufferedDocs鎵嶅啓鍥炵殑鎯呭喌銆傝屼笖錛屼絾濡傛灉鏄繖縐嶆儏鍐碉紝鍥犱負鍚堝茍鍚庣殑segment鏂囦歡鐨刣ocument count榪囧ぇ錛屽悗闈㈢殑maybeMergeSegments()灝嗕笉鍋氬悎騫跺鐞嗙洿鎺ラ鍑猴紝涓婅堪鍏紡灝卞彲鑳戒笉鎴愮珛錛岄偅涔堢畻娉曞皢鏈夐敊銆?/span>
----
2.
2.1 鍥犳maybeMergeSegments()絎竴嬈℃墽琛屾椂錛屾墍鏈塻egments鐨刣ocument count閮藉皬浜庣瓑浜巑axBufferedDocs銆傛鏃訛紝浠巌=0寮濮嬶紝鍚堝茍i~i+mergeFactor-1涓枃浠訛紝濡傛灉鍚堝茍鍚庣殑doc count>maxBufferedDocs鏃訛紝淇濈暀絎琲涓猻egment錛屽惁鍒欑戶緇悎騫舵敼鍙樺悗鐨刬~i+mergeFactor-1錛岀洿鍒癲oc count>maxBufferedDocs鎴栨墍鏈塻egments鏂囦歡涓暟宸茬粡<mergeFactor浜嗐傜粡榪囪繖鏍蜂竴杞殑鍚堝茍錛岄櫎浜嗘渶鍚?lt;mergeFactor涓殑doc counts<=maxBufferedDocs鏂囦歡澶栵紝鍏跺畠鏂囦歡鐨刣oc counts涓瀹氶兘>maxBufferedDocs騫?lt;maxBufferedDocs*mergeFactor浜嗐?br />  2.2 榪欐椂錛屼笉鐞嗕細鏈鍚?lt;mergeFactor涓猟oc count<maxBufferedDocs鐨勬枃浠訛紝鑰屽悗鎸?.1鐨勫悓鐞嗚鍒欙紝鍚堝茍涔嬪墠鐨勬枃浠訛紝璁╄繖浜涙枃浠剁殑鏈鍚?lt;mergerFactor涓猻egment絎﹀悎 maxBufferedDocs<doc counts<=maxBufferedDocs*mergeFactor錛屼箣鍓嶇殑segment鏂囦歡閮界鍚坢axBufferedDocs*mergeFactor<doc counts<=maxBufferedDocs*mergeFactor^2
2.3 閲嶅2.2錛屾渶鍚庡緱鍒扮殑鍒楄〃灝變細婊¤凍涓婅堪涓ょ瓑寮忕殑鎴愮珛
---
3
涔嬪悗錛屾瘡嬈′粠鍐呭瓨緙撳瓨涓璮lush鍑轟竴涓柊鐨剆egment鏃訛紝涔熷氨鏄線榪欎釜segments鍒楄〃鐨勬渶鍚庡鍔犱竴涓猟oc_count<=maxBufferedDocs鐨勬枃浠訛紝鍚屾牱鎵ц涓婅堪姝ラ2榪涜鍚堝茍錛岃兘澶熷緇堜繚璇佷笂榪頒袱鍏紡鐨勬垚绔嬨?br /> ----
4
4.1
IndexWriter#addIndexesNoOptimize鍚屾牱鍊熼壌浣跨敤浜唌aybeMergeSegments()綆楁硶錛屽尯鍒鏃訛紝瀹為檯鏄凡鏈変竴涓鍚堜袱鍏紡鐨剆egments搴忓垪T錛屽湪T涔嬪悗榪藉姞涓婇殢鎰忛『搴忕殑segments搴忓垪S銆傝繖鏃訛紝鎴戜滑鍏堟壘鍒癝涓璬oc count鍊兼渶澶х殑閭d釜segment錛岃綆楀嚭瀹冨睘浜庣殑鍖洪棿f(x)錛屼嬌寰梞axBufferedDocs*mergeFactor^x<doc counts<=maxBufferedDocs*mergeFactor^(x+1)錛岃屽悗鎸?.2鐨勭畻娉曞悎騫跺嚭闄や簡鏈鍚?lt;mergerFactor涓猻egments澶栵紝 涔嬪墠鎵鏈塻egments閮界鍚?a. doc count>maxBufferedDocs*mergeFactor^(x+1) b.絎﹀悎涓婅堪2絳夊紡銆?br /> btw: 鍥犱負榪欏効璋冪敤IndexWriter#addIndexesNoOptimize浼犲叆鐨勫弬鏁版槸maxBufferedDocs*mergeFactor^(x+1)錛屽洜涓篠鎵鏈塻egment鐨刣oc count閮戒竴瀹氬皬浜巑axBufferedDocs*mergeFactor^(x+1)錛屽洜姝鐨勬墍鏈夊厓绱犻兘浼氬弬涓庢敹緙╁悎騫躲?br /> 4.2 灝嗘渶鍚?lt;mergerFactor涓猟oc count<maxBufferedDocs*mergeFactor^(x+1)鐨剆egments鍚堝茍錛屽鏋滃悎騫跺悗鐨剆egment鐨刣oc count澶т簬maxBufferedDocs*mergeFactor^(x+1)錛屽氨緇х畫2.2姝ラ錛屼嬌寰楁暣涓槦鍒楃鍚堜笂榪頒袱鍏紡
-----

涓婅堪涓ょ絳栫暐錛屾渶緇堢‘淇濅簡Lucene涓殑segments涓嶄細澶,紜繚鏁堢巼銆?br />
BTW錛氬疄闄呬笂錛屽鏋渄ocuments澶鏃訛紝Lucene榪樻敮鎸佹妸docuements鍒嗘垚鍑犱釜緇勶紝姣忎釜緇勭敤鐙珛鐨勮繘紼嬫垨鐢佃剳榪涜绱㈠紩錛岃屽悗鍐嶅涓洰褰曠殑绱㈠紩鍚堝茍璧鋒潵錛屽叿浣撳彲鍙傝僆ndexWriter#addIndexesNoOptimize鍜孖ndexWriter#addIndexes鍑芥暟

]]>
澶囧繕錛歭ucene鐨勪竴浜沞num綾誨瀷http://www.tkk7.com/tim-wu/archive/2008/01/29/178355.html楣忛涓囬噷楣忛涓囬噷Tue, 29 Jan 2008 06:02:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/01/29/178355.htmlhttp://www.tkk7.com/tim-wu/comments/178355.htmlhttp://www.tkk7.com/tim-wu/archive/2008/01/29/178355.html#Feedback0http://www.tkk7.com/tim-wu/comments/commentRss/178355.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/178355.html         LOAD Document#getFieldable鍜孌ocument#getField涓嶄細榪斿洖null
        LAZY_LOAD :Lazy鐨凢ield鎰忓懗鐫鍦ㄦ悳绱㈢粨鏋滈噷榪欎釜Field鐨勫肩己鐪佹槸涓嶈鍙栫殑錛屽彧鏈夊綋浣犵湡姝e榪欎釜Field鍙栧肩殑鏃跺欐墠浼氬幓鍙栥傛墍浠ュ鏋滀綘瑕佸瀹冨彇鍊鹼紝浣犲緱淇濊瘉IndexReader榪樻病鏈塩lose銆?Document#getField涓嶈兘浣跨敤錛屽彧鑳戒嬌鐢―ocument#getFieldable
        NO_LOAD Document#getField鍜孌ocument#getFieldable閮借繑鍥瀗ull錛孌ocument#add涓嶈璋冪敤銆?br />         LOAD_AND_BREAK 綾諱技LOAD錛孌ocument#getField鍜孌ocument#getFieldable閮藉彲鐢紝浣嗚繑鍥炲悗灝辯粨鏉燂紝Document鍙兘娌℃湁瀹屾暣鐨刦ield鐨凷et錛屽弬鑰僉oadFirstFieldSelector 銆?br />         LOAD_FOR_MERGE 綾諱技LOAD錛屼絾涓嶅帇緙╀換浣曟暟鎹傚彧琚玈egmentMerger鐨勪竴涓狥ieldSelector鍖垮悕鍐呭祵瀹炵幇綾諱嬌鐢ㄣ侱ocument#getField鍜孌ocument#getFieldable鍙繑鍥瀗ull.
        SIZE 榪斿洖Field鐨剆ize鑰屼笉鏄痸alue. Size琛ㄧず瀛樺偍榪欎釜field闇瑕佺殑bytes鏁革紝string鏁板間嬌鐢?*chars銆俿ize琚瓨鍌ㄤ負a binary value錛岃〃鐜頒負as an int in a byte[]錛寃ith the higher order byte first in [0]銆?br />         SIZE_AND_BREAK 綾諱技SIZE錛屼絾绔嬪埢break from the field loading loop, i.e. stop loading further fields, after the size is loaded

======================================

Field涓笁澶num: Store Index鍜孴ermVector錛?br />

       ------------------------------------
        Store.COMPRESS  Store the original field value in the index in a compressed form. This is useful for long documents and for binary valued fields.鍘嬬緝瀛樺偍錛?br />         Store.YES Store the original field value in the index. This is useful for short texts like a document's title which should be displayed with the results. The value is stored in its original form, i.e. no analyzer is used before it is stored. 绱㈠紩鏂囦歡鏈潵鍙瓨鍌ㄧ儲寮曟暟鎹? 姝よ璁″皢鍘熸枃鍐呭鐩存帴涔熷瓨鍌ㄥ湪绱㈠紩鏂囦歡涓紝濡傛枃妗g殑鏍囬銆?br />         Store.NO  Do not store the field value in the index. 鍘熸枃涓嶅瓨鍌ㄥ湪绱㈠紩鏂囦歡涓紝鎼滅儲緇撴灉鍛戒腑鍚庯紝鍐嶆牴鎹叾浠栭檮鍔犲睘鎬у鏂囦歡鐨凱ath錛屾暟鎹簱鐨勪富閿瓑錛岄噸鏂拌繛鎺ユ墦寮鍘熸枃錛岄傚悎鍘熸枃鍐呭杈冨ぇ鐨勬儏鍐點?br />         鍐沖畾浜咶ield瀵硅薄鐨?this.isStored 鍜?nbsp;       this.isCompressed
     ------------------------------------
        Index.NO Do not index the field value. This field can thus not be searched, but one can still access its contents provided it is Field.Store stored. 涓嶈繘琛岀儲寮曪紝瀛樻斁涓嶈兘琚悳绱㈢殑鍐呭濡傛枃妗g殑涓浜涢檮鍔犲睘鎬у鏂囨。綾誨瀷, URL絳夈?br />         Index.TOKENIZED Index the field's value so it can be searched. An Analyzer will be used to tokenize and possibly further normalize the text before its terms will be stored in the index. This is useful for common text. 鍒嗚瘝绱㈠紩
        Index.UN_TOKENIZED  Index the field's value without using an Analyzer, so it can be searched. As no analyzer is used the value will be stored as a single term. This is useful for unique Ids like product numbers. 涓嶅垎璇嶈繘琛岀儲寮曪紝濡備綔鑰呭悕錛屾棩鏈熺瓑錛孯od Johnson鏈韓涓轟竴鍗曡瘝錛屼笉鍐嶉渶瑕佸垎璇嶃?/p>

        Index.NO_NORMS 涓嶅垎璇嶏紝寤虹儲寮曘俷orms鏄粈涔???瀛楁鍊???銆備絾鏄疐ield鐨勫間笉鍍忛氬父閭f牱琚繚瀛橈紝鑰屾槸鍙彇涓涓猙yte錛岃繖鏍瘋妭綰﹀瓨鍌ㄧ┖闂???? Index the field's value without an Analyzer, and disable the storing of norms.  No norms means that index-time boosting and field length normalization will be disabled.  The benefit is less memory usage as norms take up one byte per indexed field for every document in the index.Note that once you index a given field <i>with</i> norms enabled, disabling norms will have no effect.  In other words, for NO_NORMS to have the above described effect on a field, all instances of that field must be indexed with NO_NORMS from the beginning.
        鍐沖畾浜咶ield瀵硅薄鐨?this.isIndexed  this.isTokenized  this.omitNorms
     ------------------------------------
        Lucene 1.4.3鏂板鐨勶細
        TermVector.NO Do not store term vectors.  涓嶄繚瀛榯erm vectors
        TermVector.YES Store the term vectors of each document. A term vector is a list of the document's terms and their number of occurences in that document. 淇濆瓨term vectors銆?
        TermVector.WITH_POSITIONS Store the term vector + token position information 淇濆瓨term vectors銆傦紙淇濆瓨鍊煎拰token浣嶇疆淇℃伅錛?br />         TermVector.WITH_OFFSETS Store the term vector + Token offset information
        TermVector.WITH_POSITIONS_OFFSETS Store the term vector + Token position and offset information 淇濆瓨term vectors銆傦紙淇濆瓨鍊煎拰Token鐨刼ffset錛?br />         鍐沖畾浜咶ield瀵硅薄鐨則his.storeTermVector this.storePositionWithTermVector this.storeOffsetWithTermVector





]]>
Java7 VB2008閮藉紑濮嬫敮鎸丩ambda(Closure)浜?/title><link>http://www.tkk7.com/tim-wu/archive/2008/01/29/178345.html</link><dc:creator>楣忛涓囬噷</dc:creator><author>楣忛涓囬噷</author><pubDate>Tue, 29 Jan 2008 04:58:00 GMT</pubDate><guid>http://www.tkk7.com/tim-wu/archive/2008/01/29/178345.html</guid><wfw:comment>http://www.tkk7.com/tim-wu/comments/178345.html</wfw:comment><comments>http://www.tkk7.com/tim-wu/archive/2008/01/29/178345.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.tkk7.com/tim-wu/comments/commentRss/178345.html</wfw:commentRss><trackback:ping>http://www.tkk7.com/tim-wu/services/trackbacks/178345.html</trackback:ping><description><![CDATA[<p>Closure: http://en.wikipedia.org/wiki/Closure_%28computer_science%29<br /> 鎴戣繕姣旇緝鍠滄Microsoft鐨勪竴孌佃鏄庯紝浣嶄簬閾炬帴<a >http://msdn.microsoft.com/msdnmag/issues/07/09/BasicInstincts/Default.aspx?loc=zh</a>涓紝鏌ユ壘“<span id="io4gsuk" class="clsSubhead">Lambda 琛ㄨ揪寮忓拰鍙橀噺鎻愬崌</span>”<br /> <br /> 鏈鏃╂帴瑙losure鏄湪瀛avascript, 鍓嶅勾榪樺啓浜嗙瘒鍏充簬Closure瀵筳avascript鍐呭瓨娉勯湶鐨勬枃绔?a href="http://www.tkk7.com/tim-wu/archive/2006/05/29/48729.html">http://www.tkk7.com/tim-wu/archive/2006/05/29/48729.html</a><br /> 涓鐩翠互涓鴻繖灝辨槸鍑芥暟寮忚璦鐨勭壒鎬э紝欏跺灝辨槸.net鐨勫鎵樺拰瀹冩湁鍑犲垎鐩歌瘑錛?br /> 娌℃兂鍒扮幇鍦↗ava7涔熻鏀寔浜嗭紝鏈夊叴瓚g殑鏈嬪弸鍙互鍘昏璇伙細<br /> <a >http://www.javac.info/</a><br /> 娌$粏璇伙紝涓嶇煡閬揕ambda鍦↗ava榪欑寮虹被鍨嬫鏌ョ殑璇█涓〃鐜扮殑濡備綍銆?br /> <br /> 浣滀負鍑芥暟璇█錛孯uby涓竴鐩撮兘鏈塩losure鐨勭敤娉曪紝<a >http://samdanielson.com/2007/3/19/proc-new-vs-lambda-in-ruby</a>鏈変釜綆鍗曚緥瀛愩?/p> <div style="border-right: #cccccc 1px solid; padding-right: 5px; border-top: #cccccc 1px solid; padding-left: 4px; font-size: 13px; padding-bottom: 4px; border-left: #cccccc 1px solid; width: 98%; word-break: break-all; padding-top: 4px; border-bottom: #cccccc 1px solid; background-color: #eeeeee"><img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" /><span style="color: #000000">def foo<br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />  f </span><span style="color: #000000">=</span><span style="color: #000000"> Proc.new { return </span><span style="color: #000000">"</span><span style="color: #000000">return from foo from inside proc</span><span style="color: #000000">"</span><span style="color: #000000"> }<br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />  f.call # control leaves foo here<br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />  return </span><span style="color: #000000">"</span><span style="color: #000000">return from foo</span><span style="color: #000000">"</span><span style="color: #000000"> <br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" /></span><span style="color: #0000ff">end</span><span style="color: #000000"><br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" /><br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />def bar<br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />  f </span><span style="color: #000000">=</span><span style="color: #000000"> lambda { return </span><span style="color: #000000">"</span><span style="color: #000000">return from lambda</span><span style="color: #000000">"</span><span style="color: #000000"> }<br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />  f.call # control does </span><span style="color: #0000ff">not</span><span style="color: #000000"> leave bar here<br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />  return </span><span style="color: #000000">"</span><span style="color: #000000">return from bar</span><span style="color: #000000">"</span><span style="color: #000000"> <br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" /></span><span style="color: #0000ff">end</span><span style="color: #000000"><br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" /><br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />puts foo # prints </span><span style="color: #000000">"</span><span style="color: #000000">return from foo from inside proc</span><span style="color: #000000">"</span><span style="color: #000000"> <br /> <img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" />puts bar # prints </span><span style="color: #000000">"</span><span style="color: #000000">return from bar</span><span style="color: #000000">"</span></div> <p> 鏈榪戯紝ruby 1.9鍙堟彁渚涗簡鏂扮殑瀹氫箟lambda</p> <div style="border-right: #cccccc 1px solid; padding-right: 5px; border-top: #cccccc 1px solid; padding-left: 4px; font-size: 13px; padding-bottom: 4px; border-left: #cccccc 1px solid; width: 98%; word-break: break-all; padding-top: 4px; border-bottom: #cccccc 1px solid; background-color: #eeeeee"><img src="http://www.tkk7.com/images/OutliningIndicators/None.gif" align="top" alt="" /><span style="color: #000000">x </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">-></span><span style="color: #000000">{puts </span><span style="color: #000000">"</span><span style="color: #000000">Hello Lambda</span><span style="color: #000000">"</span><span style="color: #000000">} </span></div> 鍙傝 <a >http://www.infoq.com/cn/news/2008/01/new-lambda-syntax</a><br /> <br /> VB 2008涔熸敮鎸佷簡錛岃繖涓摼鎺ユ湁涓緥瀛愶紝<span style="color: #0000ff"><strong>鍙堟槸Lambda錛屽張鏄寖鍨嬶紝錛屽張鏄鎵樺洖璋?/strong></span>錛屽緢鏈夋剰鎬濓細<br /> http://msdn.microsoft.com/msdnmag/issues/07/09/BasicInstincts/Default.aspx?loc=zh<br /> VB涓?#8220;闇瑕佹敞鎰忕殑涓鐐歸檺鍒舵槸錛宭ambda 琛ㄨ揪寮忓畬鍏ㄥ氨鏄竴涓崟涓〃杈懼紡銆傚湪 Visual Basic 2008 涓紝鎮ㄥ湪 lambda 琛ㄨ揪寮忎腑鍙兘鏈変竴涓崟涓〃杈懼紡銆傚湪鏈笓鏍忎腑錛屾垜灝嗚繘涓姝ュ悜鎮ㄥ睍紺?Visual Basic 2008 涓紩鍏ョ殑涓涓柊鐨勪笁鍏冭繍綆楃錛屽畠灝嗗厑璁告偍鏋勯犳潯浠惰〃杈懼紡錛屼絾鐩墠鐨勫姛鑳戒笉鏀寔鍦?lambda 琛ㄨ揪寮忎腑浣跨敤浠繪剰璇彞銆?#8221;<br /> <br /> <br /> <br /> <img src ="http://www.tkk7.com/tim-wu/aggbug/178345.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.tkk7.com/tim-wu/" target="_blank">楣忛涓囬噷</a> 2008-01-29 12:58 <a href="http://www.tkk7.com/tim-wu/archive/2008/01/29/178345.html#Feedback" target="_blank" style="text-decoration:none;">鍙戣〃璇勮</a></div>]]></description></item><item><title>澶囧繕:lucene鐨勫嚑縐嶅父鐢ˋnalyzerhttp://www.tkk7.com/tim-wu/archive/2008/01/26/177742.html楣忛涓囬噷楣忛涓囬噷Fri, 25 Jan 2008 18:03:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/01/26/177742.htmlhttp://www.tkk7.com/tim-wu/comments/177742.htmlhttp://www.tkk7.com/tim-wu/archive/2008/01/26/177742.html#Feedback1http://www.tkk7.com/tim-wu/comments/commentRss/177742.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/177742.html

浠ヤ笅鍐呭鍧囦負杞澆,url瑙佸叿浣撻摼鎺?

鏈甯歌鐨勫洓涓狝nalyzer,璇存槑:  http://windshowzbf.bokee.com/3016397.html 
WhitespaceAnalyzer  浠呬粎鏄幓闄ょ┖鏍鹼紝瀵瑰瓧絎︽病鏈塴owcase鍖?涓嶆敮鎸佷腑鏂?br /> SimpleAnalyzer :鍔熻兘寮轟簬WhitespaceAnalyzer,灝嗛櫎鍘籰etter涔嬪鐨勭鍙峰叏閮ㄨ繃婊ゆ帀,騫朵笖灝嗘墍鏈夌殑瀛楃lowcase鍖?涓嶆敮鎸佷腑鏂?br /> StopAnalyzer: StopAnalyzer鐨勫姛鑳借秴瓚婁簡SimpleAnalyzer錛屽湪SimpleAnalyzer鐨勫熀紜涓?澧炲姞浜嗗幓闄topWords鐨勫姛鑳?涓嶆敮鎸佷腑鏂?綾諱腑浣跨敤涓涓猻tatic鏁扮粍淇濆瓨浜咵NGLISH_STOP_WORDS, 澶父瑙佷笉index鐨剋ords
StandardAnalyzer: 鐢↗avacc瀹氫箟鐨勪竴濂桬BNF錛屼弗紱佺殑璇硶銆傛湁浜鴻鑻辨枃鐨勫鐞嗚兘鍔涘悓浜嶴topAnalyzer.鏀寔涓枃閲囩敤鐨勬柟娉曚負鍗曞瓧鍒囧垎銆傛湭浠旂粏姣旇緝錛屼笉鏁㈢‘瀹氥?/p>

鍏朵粬鐨勬墿灞?
ChineseAnalyzer:鏉ヨ嚜浜嶭ucene鐨剆and box.鎬ц兘綾諱技浜嶴tandardAnalyzer,緙虹偣鏄笉鏀寔涓嫳鏂囨販鍜屽垎璇?
CJKAnalyzer:chedong鍐欑殑CJKAnalyzer鐨勫姛鑳藉湪鑻辨枃澶勭悊涓婄殑鍔熻兘鍜孲tandardAnalyzer鐩稿悓.浣嗘槸鍦ㄦ眽璇殑鍒嗚瘝涓婏紝涓嶈兘榪囨護鎺夋爣鐐圭鍙鳳紝鍗充嬌鐢ㄤ簩鍏冨垏鍒?br /> TjuChineseAnalyzer: http://windshowzbf.bokee.com/3016397.html鍐欑殑,鍔熻兘鏈涓哄己澶?TjuChineseAnlyzer鐨勫姛鑳界浉褰撳己澶?鍦ㄤ腑鏂囧垎璇嶆柟闈㈢敱浜庡叾璋冪敤鐨勪負ICTCLAS鐨刯ava鎺ュ彛.鎵浠ュ叾鍦ㄤ腑鏂囨柟闈㈡ц兘涓婂悓涓嶪CTCLAS.鍏跺湪鑻辨枃鍒嗚瘝涓婇噰鐢ㄤ簡Lucene鐨凷topAnalyzer,鍙互鍘婚櫎 stopWords,鑰屼笖鍙互涓嶅尯鍒嗗ぇ灝忓啓,榪囨護鎺夊悇綾繪爣鐐圭鍙?

 


渚嬪瓙:
http://www.langtech.org.cn/index.php/uid-5080-action-viewspace-itemid-68, 榪樻湁綆鍗曠殑浠g爜鍒嗘瀽

Analyzing "The quick brown fox jumped over the lazy dogs"

WhitespaceAnalyzer:

[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]

SimpleAnalyzer:

[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]

StopAnalyzer:

[quick] [brown] [fox] [jumped] [over] [lazy] [dogs]

StandardAnalyzer:

[quick] [brown] [fox] [jumped] [over] [lazy] [dogs]


Analyzing "XY&Z Corporation - xyz@example.com"

WhitespaceAnalyzer:

[XY&Z] [Corporation] [-] [xyz@example.com]

SimpleAnalyzer:

[xy] [z] [corporation] [xyz] [example] [com]

StopAnalyzer:

[xy] [z] [corporation] [xyz] [example] [com]

StandardAnalyzer:

[xy&z] [corporation] [xyz@example.com]

 

鍙傝冭繛鎺?
http://macrochen.blogdriver.com/macrochen/1167942.html
http://macrochen.blogdriver.com/macrochen/1153507.html

http://my.dmresearch.net/bbs/viewthread.php?tid=8318
http://windshowzbf.bokee.com/3016397.html



]]>
澶囧繕錛歶nicode & utf-8http://www.tkk7.com/tim-wu/archive/2008/01/25/177788.html楣忛涓囬噷楣忛涓囬噷Fri, 25 Jan 2008 08:21:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/01/25/177788.htmlhttp://www.tkk7.com/tim-wu/comments/177788.htmlhttp://www.tkk7.com/tim-wu/archive/2008/01/25/177788.html#Feedback0http://www.tkk7.com/tim-wu/comments/commentRss/177788.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/177788.html
鎺ㄨ崘
http://gceclub.sun.com.cn/developer/technicalArticles/Intl/Supplementary/index_zh_CN.html
http://www.linuxpk.com/3821.html
=======================================
BMP鐨勮В閲婏細
http://zh.wikipedia.org/w/index.php?title=%E5%9F%BA%E6%9C%AC%E5%A4%9A%E6%96%87%E7%A8%AE%E5%B9%B3%E9%9D%A2&variant=zh-cn
http://zh.wikipedia.org/w/index.php?title=%E8%BE%85%E5%8A%A9%E5%B9%B3%E9%9D%A2&variant=zh-cn#.E7.AC.AC.E4.B8.80.E8.BC.94.E5.8A.A9.E5.B9.B3.E9.9D.A2
1涓狟MP鍜?6涓緟鍔﹑lane錛岄渶瑕?1涓猙its.

======================================
ISO-10646涓嶶nicode鍏崇郴
http://zh.wikipedia.org/wiki/%E9%80%9A%E7%94%A8%E5%AD%97%E7%AC%A6%E9%9B%86
ISO-10646鏈
Unicode鏈
UCS-2   BMP UTF-16
UCS-4     UTF-32
娉ㄦ剰錛歎TF-16鍙湅鎴愭槸UCS-2鐨?a title="鐖墮泦" >鐖墮泦銆傚湪娌掓湁杈呭姪騫抽潰瀛楃鍓嶏紝UTF-16鑸嘦CS-2鎵鎸囩殑鏄悓涓鐨勬剰鎬濄備絾鐣跺紩鍏ヨ緟鍔╁鉤闈㈠瓧絎﹀緦錛屽氨鍙ū鐐篣TF-16浜嗭紝鍥犱負鎴戜滑浼氫嬌鐢?涓猆TF-16錛屼篃灝變技涔?bytes淇濆瓨涓涓緟鍔╁鉤闈㈠瓧絎︺傜従鍦ㄨ嫢鏈夎粺浠惰伈紼辮嚜宸辨敮鎻碪CS-2綬ㄧ⒓錛岄偅鍏跺鏄殫鎸囧畠涓嶈兘鏀彺杈呭姪騫抽潰瀛楃鐨勫濠夎獮銆?br /> ======================================
UTF-8瑕佸畬鏁磋〃杈緐nicode闇瑕?bytes錛岃〃杈綛MP闇瑕?bytes錛岃http://en.wikipedia.org/wiki/UTF-8錛屾敞鎰?#8220;The range D800-DFFF is disallowed by Unicode. The encoding scheme reliably transforms values in that range, but they are not valid scalar values in Unicode. See Table 3-7 in the Unicode 5.0 standard. ”


======================================
BOM Byte Order Mark錛屽湪UCS緙栫爜涓湁涓涓彨鍋?ZERO WIDTH NO-BREAK SPACE"鐨勫瓧絎︼紝瀹冪殑緙栫爜鏄疐EFF銆傝孎FFE鍦║CS涓槸涓嶅瓨鍦ㄧ殑瀛楃錛屾墍浠ヤ笉搴旇鍑虹幇鍦ㄥ疄闄呬紶杈撲腑銆俇CS瑙勮寖寤鴻鎴戜滑鍦ㄤ紶杈撳瓧鑺傛祦鍓嶏紝鍏堜紶杈撳瓧絎?ZERO WIDTH NO-BREAK SPACE"銆?br /> 榪欐牱濡傛灉鎺ユ敹鑰呮敹鍒癋EFF錛屽氨琛ㄦ槑榪欎釜瀛楄妭嫻佹槸Big-Endian鐨勶紱濡傛灉鏀跺埌FFFE錛屽氨琛ㄦ槑榪欎釜瀛楄妭嫻佹槸Little-Endian鐨勩?br /> 瀛楃"ZERO WIDTH NO-BREAK SPACE"鍙堣縐頒綔BOM銆俇TF-8涓嶉渶瑕丅OM鏉ヨ〃鏄庡瓧鑺傞『搴忥紝浣嗗彲浠ョ敤BOM鏉ヨ〃鏄庣紪鐮佹柟寮忋傚瓧絎?ZERO WIDTH NO-BREAK SPACE"(涔熷氨鏄疷+FEFF)鐨刄TF-8緙栫爜鏄疎F BB BF錛堝氨鏄?1101111,10111011,10111111錛夈傛墍浠ュ鏋滄帴鏀惰呮敹鍒頒互EF BB BF寮澶寸殑瀛楄妭嫻侊紝灝辯煡閬撹繖鏄疷TF-8緙栫爜浜嗐?br /> Windows灝辨槸浣跨敤BOM鏉ユ爣璁版枃鏈枃浠剁殑緙栫爜鏂瑰紡鐨勩?br />
闄や簡FEFF錛岃嫳鏂噖iki http://en.wikipedia.org/wiki/UTF-8榪樿В閲婅鏄庝簡涓浜涚洰鍓嶄笉浼氬嚭鐜板湪utf-8瀛楄妭嫻佷腑鐨刡yte鍊箋?br />
=========================================
Java
http://www.jorendorff.com/articles/unicode/java.html
http://gceclub.sun.com.cn/developer/technicalArticles/Intl/Supplementary/index_zh_CN.html 瀹岀編瑙i噴java涓殑unicode銆傚彟澶栨彁鍒癹ava涓璾tf-8鍏跺疄鏈変袱縐嶆牸寮忥紝鍒嗗埆鏄爣鍑唘tf-8鍜屾敼鑹痷tf-8銆傚浜庢枃鏈緭鍏ワ紝Java 2 SDK 鎻愪緵鐢ㄤ簬鎺ュ彈“\Uxxxxxx”鏍煎紡瀛楃涓茬殑浠g爜鐐硅緭鍏ユ柟娉曪紝榪欓噷澶у啓鐨?#8220;U”琛ㄧず杞箟搴忓垪鍖呭惈鍏釜鍗佸叚榪涘埗鏁板瓧錛屽洜姝ゅ厑璁鎬嬌鐢ㄥ琛ュ瓧絎︺傚皬鍐欑殑“u”琛ㄧず杞箟搴忓垪“\uxxxx”鐨勫師濮嬫牸寮忋?br /> http://dlog.cn/html/diary/showlog.vm?sid=2&cat_id=-1&log_id=557 浠嬬粛浜哠tring鐨凧DK5鏂板鏂規硶
http://blog.csdn.net/qinysong/archive/2006/09/05/1179480.aspx 榪炵潃涓夌瘒鐢ㄥ疄渚嬭鏄庯紝璇█姣旇緝涔憋紝璇寸殑涔熶笉灝芥紜紝浣嗕粬鐢ㄤ簡鍋氳瘯楠岀殑java浠g爜鏈夌偣鎰忔濓紝鑳藉府鍔╂濊冧唬鐮佷腑涓浜泃ricky鐨勭幇璞°?br /> http://topic.csdn.net/u/20070928/22/5207088c-c47d-43ed-8416-26f850631cff.html 鏈変竴浜涘洖絳旓紝
http://topic.csdn.net/u/20070515/14/57af3319-28de-4851-b4cf-db65b2ead01c.html 鏈変簺璇曢獙浠g爜錛屼環鍊間笉澶?br /> http://www.w3china.org/blog/more.asp?name=hongrui&id=24817 鏈変簺java瀹炰緥浠g爜錛屾病緇嗙湅銆?br />

鍙︼細
Java 1.0 supports Unicode version 1.1.
Java 1.1 onwards supports Unicode version 2.0.
J2SE 1.4涓殑瀛楃澶勭悊鏄熀浜嶶nicode 3.0鏍囧噯鐨勩?br /> J2SE v 1.5 supports Unicode 4.0 character set.

鑰岋細
Unicode 3.0錛?999騫翠節鏈堬紱娑佃搵浜嗕締鑷狪SO 10646-1鐨勫崄鍏綅鍏冮氱敤瀛楀厓闆嗭紙UCS錛夊熀鏈鏂囩ó騫抽潰錛圔asic Multilingual Plane錛?

Unicode 3.1錛?001騫翠笁鏈堬紱鏂板寰濱SO 10646-2瀹氱京鐨勮紨鍔╁鉤闈紙Supplementary Planes)


鎵浠ワ細
浠g爜鐐瑰湪U+0000鍒癠+FFFF涔嬮棿鐨勫氨鐢╘u0000鍒癨uffff琛ㄧず
U+10000鍒癠+1FFFF涔嬮棿鐨勭敤   \ud800鍒癨udbff涓殑浣滀負絎竴涓崟鍏?   鐢╘udc00鍒癨udfff浣滀負絎簩鍗曞厓,緇勫悎璧鋒潵琛ㄧず
char榪欎釜姒傚康灝辨槸鎸嘰u0000鍒癨uffff,榪欐槸鍗犱袱涓瓧鑺?
鍏朵綑鐨勭敤code   point榪欎釜姒傚康
JDK   1.5   浠ヤ笂鏀寔   Unicode   4.0錛屼篃灝辨槸   Unicode   鐨勮寖鍥存槸   U+0000锝濽+10FFFF錛?
瓚呰繃   U+FFFF   鐨勫瓧絎﹂噰鐢ㄤ唬鐮佺偣錛堜篃灝辨槸   int   綾誨瀷鐨勬暟鎹級鏉ヨ〃紺猴紝鍏蜂綋鐨勫彲浠?
鍙傝冧竴涓嬩笅闈㈣繖涓摼鎺ョ殑鏂囩珷銆奐ava   騫沖彴涓殑澧炶ˉ瀛楃銆嬶紝瀵規浣滀簡寰堣緇嗙殑浠?
緇嶃?http://gceclub.sun.com.cn/developer/technicalArticles/Intl/Supplementary/index_zh_CN.html


================================
http://www.tkk7.com/tim-wu/archive/2007/09/12/144550.html

================================

U-00000000 - U-0000007F: 0xxxxxxx
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
浣嗙洰鍓岻SO鍜孶nicode緇勭粐閮戒笉浼氳瀹?0FFFF浠ヤ笂鐨勫瓧絎?br />
浠g⒓綃勫湇
鍗佸叚閫插埗
妯欓噺鍊?scalar value
浜岄插埗
UTF-8
浜岄插埗 / 鍗佸叚閫插埗
璦婚噵
000000 - 00007F
128鍊嬩唬紕?/small>
00000000 00000000 0zzzzzzz 0zzzzzzz(00-7F) ASCII絳夊肩瘎鍦嶏紝浣嶅厓緄勭敱闆墮枊濮?/td>
涓冨媧 涓冨媧
000080 - 0007FF
1920鍊嬩唬紕?/small>
00000000 00000yyy yyzzzzzz 110yyyyy(C2-DF) 10zzzzzz(80-BF) 絎竴鍊?a title="瀛楄妭" >浣嶅厓緄?/a>鐢?10闁嬪錛屾帴钁楃殑浣嶅厓緄?/a>鐢?0闁嬪
涓夊媦錛涗簩鍊媦錛涘叚鍊媧 浜斿媦錛涘叚鍊媧
000800 - 00FFFF
63488鍊嬩唬紕?/small>
00000000 xxxxyyyy yyzzzzzz 1110xxxx(E0-EF) 10yyyyyy 10zzzzzz 絎竴鍊?a title="瀛楄妭" >浣嶅厓緄?/a>鐢?110闁嬪錛屾帴钁楃殑浣嶅厓緄?/a>鐢?0闁嬪
鍥涘媥錛涘洓鍊媦錛涗簩鍊媦錛涘叚鍊媧 鍥涘媥錛涘叚鍊媦錛涘叚鍊媧
010000 - 10FFFF
1048576鍊嬩唬紕?/small>
000wwwxx xxxxyyyy yyzzzzzz 11110www(F0-F4) 10xxxxxx 10yyyyyy 10zzzzzz 鐢?1110闁嬪錛屾帴钁楃殑浣嶅厓緄?/a>鐢?0闁嬪
涓夊媤錛涗簩鍊媥錛涘洓鍊媥錛涘洓鍊媦錛涗簩鍊媦錛涘叚鍊媧 涓夊媤錛涘叚鍊媥錛涘叚鍊媦錛涘叚鍊媧

================================
鍙傝冿細http://blog.csdn.net/qinysong/archive/2006/09/05/1179480.aspx錛屼絾璇ユ枃瀵箄nicode鐗堟湰璇存槑鏈夎錛岃鏄庤涓?br />

鍦ㄥぇ綰?1993 騫翠箣鍚庡紑鍙戠殑澶у鏁扮幇浠g紪紼嬭璦閮芥湁涓涓壒鍒殑鏁版嵁綾誨瀷, 鍙仛 Unicode/ISO 10646-1 瀛楃. 鍦?Ada95 涓彨 Wide_Character, 鍦?Java 涓彨 char.

ISO C 涔熻緇嗚鏄庝簡澶勭悊澶氬瓧鑺傜紪鐮佸拰瀹藉瓧絎?(wide characters) 鐨勬満鍒? 1994 騫?9 鏈?Amendment 1 to ISO C 鍙戣〃鏃跺張鍔犲叆浜嗘洿澶? 榪欎簺鏈哄埗涓昏鏄負鍚勭被涓滀簹緙栫爜鑰岃璁$殑, 瀹冧滑姣斿鐞?UCS 鎵闇鐨勮鍋ュ.寰楀. UTF-8 鏄?ISO C 鏍囧噯璋冪敤澶氬瓧鑺傚瓧絎︿覆鐨勭紪鐮佺殑涓涓緥瀛? wchar_t 綾誨瀷鍙互鐢ㄦ潵瀛樻斁 Unicode 瀛楃.



]]>
澶囧繕: Lucene涓璔uery璇硶鏍戠殑鏁寸悊http://www.tkk7.com/tim-wu/archive/2008/01/24/177451.html楣忛涓囬噷楣忛涓囬噷Thu, 24 Jan 2008 03:33:00 GMThttp://www.tkk7.com/tim-wu/archive/2008/01/24/177451.htmlhttp://www.tkk7.com/tim-wu/comments/177451.htmlhttp://www.tkk7.com/tim-wu/archive/2008/01/24/177451.html#Feedback0http://www.tkk7.com/tim-wu/comments/commentRss/177451.htmlhttp://www.tkk7.com/tim-wu/services/trackbacks/177451.html浠g爜涓篞ueryParser.jj錛岃娉曚負JavaCC瀹炵幇鐨凩L()錛?br /> 瀹屾暣鏂囨。錛?a >http://lucene.apache.org/java/2_0_0/queryparsersyntax.html

鍜屾鍒欎竴鏍鳳細
?琛ㄧず0涓垨1涓?br /> +琛ㄧず涓涓垨澶氫釜
*琛ㄧず0涓垨澶氫釜


浠ヤ笅鏄疶oken閮ㄥ垎錛?br />

_NUM_CHAR::=["0"-"9"//鏁板瓧
_ESCAPED_CHAR::= "\\" [ "\\""+""-""!""("")"":""^""[""]""\"""{""}""~""*""?" ] > //鐗規畩瀛楃錛?/span>
_TERM_START_CHAR ::=~" ""\t""\n""\r""+""-""!""("")"":""^","[""]""\"""{""}""~""*""?" ] //TERM鐨勮搗濮嬪瓧絎︼紝闄や簡鍒楀嚭鐨勫叾瀹冨瓧絎﹂兘鍙互
_TERM_CHAR::=<_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" | "+" ) > //TERM鍙嬌鐢ㄥ瓧絎?/span>
_WHITESPACE::= ( " " | "\t" | "\n" | "\r"//絀烘牸鍜屽洖杞︼紝

 

<DEFAULT> TOKEN:
 AND::
=("AND" | "&&")
 OR::
=("OR" | "||")
 NOT::
=("NOT" | "!")
 PLUS::
="+"
 MINUS::
="-"
 LPAREN::
="("
 RPAREN::
=")"
 COLON::
=":"
 STAR::
="*"
 CARAT::
="^" //鍚庢帴Boost錛屽師鏂?lt;CARAT:     "^" > : Boost錛屽悗闈oost璇存槑浠涔堟病鏄庣櫧
 QUOTED::="\"" (~["\""| "\\\"")+ "\"" // 琛ㄧず鐢?鍖呰搗鏉ョ殑瀛楃涓?瀛楃"寮濮嬶紝涓棿鐢變笉鏄?鐨勭鍙鋒垨鑰呰繛鐫鐨勮繖涓や釜絎﹀彿\"緇勬垚錛屽瓧絎?緇撴潫錛?/span>
 TERM::=<_TERM_START_CHAR> (<_TERM_CHAR>)*
 FUZZY_SLOP::
="~" ( (<_NUM_CHAR>)+ ( "." (<_NUM_CHAR>)+ )? )? //瀛楃~寮濮嬶紝鑰屽悗鏄暟瀛?Lucene鏀寔妯$硦鏌ヨ錛屼緥濡?roam~"鎴?roam~0.8"錛孴he value is between 0 and 1錛岀畻娉曚負the Levenshtein Distance, or Edit Distance algorithm
 PREFIXTERM::=(<_TERM_START_CHAR> | "*") (<_TERM_CHAR>)* "*" > //妯$硦鏌ユ壘錛岃〃紺轟互鏌愭煇寮澶寸殑鏌ヨ, 瀛楃琛ㄧず涓?something*"錛屽墠緙鍏佽妯$硦絎﹀彿*錛屼腑闂村彲鏈夊瓧絎︿篃鍙病鏈夛紝 緇撳熬蹇呴』鏄?
 WILDTERM::=(<_TERM_START_CHAR> | [ "*""?" ]) (<_TERM_CHAR> | ( [ "*""?" ] ))* > //綾諱技涓婇潰錛屼絾鍚屾椂鏀寔?瀛楃錛岀粨灝懼彲浠ユ槸瀛楃涔熷彲浠ユ槸* ?銆備嬌鐢╗]琛ㄧずor鍏崇郴鏃訛紝涓嶉渶瑕佷嬌鐢▅錛屽彧瑕?鍙峰垎鍓插嵆鍙?/span>
 RANGEIN_START::="[" //鍦≧angeQuery涓紝[鎴杮琛ㄧず浜嗘槸鍚﹀寘鍚竟鐣屾潯浠舵湰韜? 鐢ㄥ瓧絎﹁〃紺轟負"[begin TO end]" 鎴栬?{begin TO end}",鍚庢帴RangeIn
 RANGEEX_START::="{" //鍚屼笂錛屽悗鎺angeEx

<Boost> TOKEN:
 NUMBER::
=(<_NUM_CHAR>)+ ( "." (<_NUM_CHAR>)+ )? //鍚庢帴DEFAULT錛?nbsp;鏁存暟鎴栧皬鏁?/span>

<RangeIn> TOKEN:
 RANGEIN_TO::
="TO"
 RANGEIN_END::
="]" //鍚庢帴DEFAULT, RangIn鐨勭粨鏉?/span>
 RANGEIN_QUOTED::= "\"" (~["\""| "\\\"")+ "\"" //鍚屼笂榪癚UOTED錛岃〃紺虹敤"鍖呰搗鏉ョ殑瀛楃涓?
 RANGEIN_GOOP::= (~" ""]" ])+ //1涓垨澶氫釜涓嶆槸絀烘牸鍜宂鐨勭鍙?榪欐牱灝辮兘鎻愬彇鍑篬]涓殑鍐呭

<RangeEx> TOKEN :
 RANGEEX_TO::
="TO">
 RANGEEX_END::
="}" //鍚庢帴DEFAULT, RangeEx鐨勭粨鏉?/span>
 RANGEEX_QUOTED::="\"" (~["\""| "\\\"")+ "\"" //鍚屼笂榪癚UOTED錛岃〃紺虹敤"鍖呰搗鏉ョ殑瀛楃涓?
 RANGEEX_GOOP::=(~" ""}" ])+ //1涓垨澶氫釜涓嶆槸絀烘牸鍜宂鐨勭鍙?榪欐牱灝辮兘鎻愬彇鍑篬]涓殑鍐呭


<DEFAULT, RangeIn, RangeEx> SKIP : {
  
< <_WHITESPACE>>
//鎵鏈夌┖鏍煎拰鍥炶濺琚拷鐣?br />



浠ヤ笅涓鴻В鏋愰儴鍒?br />

 

Conjunction::=<AND> { ret = CONJ_AND; } | <OR>  { ret = CONJ_OR; }  ] //榪炴帴
Modifiers::=<PLUS> { ret = MOD_REQ; } | <MINUS> { ret = MOD_NOT; } | <NOT> { ret = MOD_NOT; } ] //+ - !絎﹀彿
Query::=Modifiers Clause (Conjunction Modifiers Clause)*
Clause::
=[(<TERM> <COLON>|<STAR> <COLON>)] //btw:浠g爜涓璍OOKAHEAD[2]琛ㄧず浣跨敤LL(2)
         (Term|<LPAREN> Query <RPAREN> (<CARAT> <NUMBER>)?)  //瀛愬彞. ???????榪欏効璇硶鏈夌偣,浠夸經鍏佽 *:(*:dog)榪欐牱鐨勮娉?寰堝鎬?/span>
Term::=(
    (
<TERM>|<STAR>|<PREFIXTERM>|<WILDTERM>|<NUMBER>) [<FUZZY_SLOP>] [<CARAT><NUMBER>[<FUZZY_SLOP>]} 
    
| ( <RANGEIN_START> (<RANGEIN_GOOP>|<RANGEIN_QUOTED>) [ <RANGEIN_TO> ] (<RANGEIN_GOOP>|<RANGEIN_QUOTED> <RANGEIN_END> ) [ <CARAT> boost=<NUMBER> ] //榪欏効鐪嬪嚭range蹇呴』鍚屾椂鏈変袱绔?涓嶈兘鍙湁鏈変竴绔?nbsp;
    | ( <RANGEEX_START> <RANGEEX_GOOP>|<RANGEEX_QUOTED> [ <RANGEEX_TO> ] <RANGEEX_GOOP>|<RANGEEX_QUOTED> <RANGEEX_END> )[ <CARAT> boost=<NUMBER> ] //鍦≧angeQuery涓紝[鎴杮琛ㄧず浜嗘槸鍚﹀寘鍚竟鐣屾潯浠舵湰韜? 鐢ㄥ瓧絎﹁〃紺轟負"[begin TO end]" 鎴栬?{begin TO end}",鍚庢帴RangeIn
    | <QUOTED> [ <FUZZY_SLOP> ] [ <CARAT> boost=<NUMBER> ] //琚?"鍖呭惈鐨勫唴瀹?/span>


btw: 鐚滄祴: javacc涓?濡傛灉浣跨敤[],鍒欏厑璁稿嚭鐜?嬈℃垨1嬈?br />



]]>
澶嶆潅搴︿負log(n)鐨勬帓搴忓爢鏍堢畻娉?/title><link>http://www.tkk7.com/tim-wu/archive/2008/01/09/174073.html</link><dc:creator>楣忛涓囬噷</dc:creator><author>楣忛涓囬噷</author><pubDate>Wed, 09 Jan 2008 09:32:00 GMT</pubDate><guid>http://www.tkk7.com/tim-wu/archive/2008/01/09/174073.html</guid><wfw:comment>http://www.tkk7.com/tim-wu/comments/174073.html</wfw:comment><comments>http://www.tkk7.com/tim-wu/archive/2008/01/09/174073.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.tkk7.com/tim-wu/comments/commentRss/174073.html</wfw:commentRss><trackback:ping>http://www.tkk7.com/tim-wu/services/trackbacks/174073.html</trackback:ping><description><![CDATA[<p>浠婂ぉ璇諱簡lucent涓殑PriorityQueue.java, 涓涓緢宸у鐨勫鏉傚害涓簂og(n)鐨勬帓搴忓爢鏍?<br /> <br /> 濮嬬粓紜繚鏁扮粍A[1...n]涓?<br /> A[i]<A[2*i]  & A[i] < A[2*i +1]<br /> 寰堝鏄撴帹璁哄嚭A[1]涓瀹氭槸鏈灝忔暟鍊? 騫朵笖姣忔put()鍜宲op()鑷沖縐誨姩log(n)涓暟鍊?br /> <br /> 鐪熸槸濂戒箙娌℃帴瑙︾畻娉曚簡:)</p> <img src ="http://www.tkk7.com/tim-wu/aggbug/174073.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.tkk7.com/tim-wu/" target="_blank">楣忛涓囬噷</a> 2008-01-09 17:32 <a href="http://www.tkk7.com/tim-wu/archive/2008/01/09/174073.html#Feedback" target="_blank" style="text-decoration:none;">鍙戣〃璇勮</a></div>]]></description></item></channel></rss> <footer> <div class="friendship-link"> <p>感谢您访问我们的网站,您可能还对以下资源感兴趣:</p> <a href="http://www.tkk7.com/" title="亚洲av成人片在线观看">亚洲av成人片在线观看</a> <div class="friend-links"> </div> </div> </footer> 主站蜘蛛池模板: <a href="http://3688008.com" target="_blank">久久人午夜亚洲精品无码区</a>| <a href="http://jundaflex.com" target="_blank">红杏亚洲影院一区二区三区</a>| <a href="http://aa7852.com" target="_blank">91嫩草私人成人亚洲影院</a>| <a href="http://www759696b.com" target="_blank">黄色网页在线免费观看</a>| <a href="http://chinaedubrand.com" target="_blank">全部免费毛片在线</a>| <a href="http://xhm33.com" target="_blank">美女免费视频一区二区</a>| <a href="http://sese3366.com" target="_blank">免费国产在线观看不卡</a>| <a href="http://xseporn.com" target="_blank">香蕉97碰碰视频免费</a>| <a href="http://dyj696.com" target="_blank">亚洲毛片av日韩av无码</a>| <a href="http://13813855.com" target="_blank">国产精品99爱免费视频</a>| <a href="http://91xx8.com" target="_blank">亚洲精品国产精品乱码不卡√</a>| <a href="http://slmlxg.com" target="_blank">国产无遮挡又黄又爽免费网站</a>| <a href="http://1314a.com" target="_blank">中文字幕不卡亚洲</a>| <a href="http://jack-fx.com" target="_blank">日本免费一区二区久久人人澡 </a>| <a href="http://zzo8.com" target="_blank">国产精品免费一级在线观看</a>| <a href="http://yakonet.com" target="_blank">亚洲AV永久无码精品一福利 </a>| <a href="http://xmm5pkt.com" target="_blank">亚洲精品成人网站在线观看</a>| <a href="http://91sebo.com" target="_blank">国产精品免费观看视频</a>| <a href="http://m8va.com" target="_blank">亚洲AV综合色区无码一区</a>| <a href="http://ai-xian.com" target="_blank">97av免费视频</a>| <a href="http://bd0574.com" target="_blank">国产亚洲精品bv在线观看</a>| <a href="http://726zh.com" target="_blank">国产精品jizz在线观看免费</a>| <a href="http://yuduruizhi.com" target="_blank">人妻仑刮八A级毛片免费看</a>| <a href="http://shcxsoft.com" target="_blank">中文字幕亚洲日本岛国片</a>| <a href="http://ynxxrh.com" target="_blank">久久成人免费电影</a>| <a href="http://kmp77.com" target="_blank">亚洲人妖女同在线播放</a>| <a href="http://liangdy.com" target="_blank">国产精品冒白浆免费视频</a>| <a href="http://imfever.com" target="_blank">xxxxx做受大片视频免费</a>| <a href="http://djllgs.com" target="_blank">亚洲国产精品免费视频</a>| <a href="http://djllgs.com" target="_blank">麻豆一区二区免费播放网站</a>| <a href="http://www876444.com" target="_blank">亚洲日韩一区精品射精</a>| <a href="http://kkxzz.com" target="_blank">亚洲欧洲国产成人综合在线观看 </a>| <a href="http://lawelites.com" target="_blank">亚洲av综合avav中文</a>| <a href="http://ziniurj.com" target="_blank">99在线精品视频观看免费</a>| <a href="http://mtsp5.com" target="_blank">久久亚洲中文无码咪咪爱</a>| <a href="http://gzmsijz.com" target="_blank">亚洲一区二区三区偷拍女厕</a>| <a href="http://qvod-player.com" target="_blank">91久久精品国产免费一区</a>| <a href="http://wlzp88.com" target="_blank">亚洲AV色欲色欲WWW</a>| <a href="http://bxd888.com" target="_blank">国产精品亚洲片在线观看不卡</a>| <a href="http://ywzms.com" target="_blank">在线观看免费高清视频</a>| <a href="http://cebeke.com" target="_blank">一级做a爰片性色毛片免费网站</a>| <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </body>