[轉(zhuǎn)]Ketama一致性Hash算法(含Java代碼)

From:http://www.iteye.com/topic/684087

一致性哈希算法(Consistent Hashing Algorithm)是一種分布式算法，常用于負載均衡。Memcached client也選擇這種算法，解決將key-value均勻分配到眾多Memcached server上的問題。它可以取代傳統(tǒng)的取模操作，解決了取模操作無法應(yīng)對增刪Memcached Server的問題(增刪server會導致同一個key,在get操作時分配不到數(shù)據(jù)真正存儲的server，命中率會急劇下降)，詳細的介紹在這篇帖子中http://www.iteye.com/topic/611976(后文指代這篇文章的地方均稱為引文)。

［下面以Memcached的分布式問題為討論點，但將Memcached server抽象為節(jié)點(Node)］
引文中描述的一致性Hash算法有個潛在的問題是:
     將節(jié)點hash后會不均勻地分布在環(huán)上，這樣大量key在尋找節(jié)點時，會存在key命中各個節(jié)點的概率差別較大，無法實現(xiàn)有效的負載均衡。
     如有三個節(jié)點Node1,Node2,Node3，分布在環(huán)上時三個節(jié)點挨的很近，落在環(huán)上的key尋找節(jié)點時，大量key順時針總是分配給Node2，而其它兩個節(jié)點被找到的概率都會很小。

這種問題的解決方案可以有:
     改善Hash算法，均勻分配各節(jié)點到環(huán)上；［引文］使用虛擬節(jié)點的思想，為每個物理節(jié)點（服務(wù)器）在圓上分配100～200個點。這樣就能抑制分布不均勻，最大限度地減小服務(wù)器增減時的緩存重新分布。用戶數(shù)據(jù)映射在虛擬節(jié)點上，就表示用戶數(shù)據(jù)真正存儲位置是在該虛擬節(jié)點代表的實際物理服務(wù)器上。

在查看Spy Memcached client時，發(fā)現(xiàn)它采用一種稱為Ketama的Hash算法，以虛擬節(jié)點的思想，解決Memcached的分布式問題。

對Ketama的介紹
Ketama is an implementation of a consistent hashing algorithm, meaning you can add or remove servers from the memcached pool without causing a complete remap of all keys.
Here’s how it works:
* Take your list of servers (eg: 1.2.3.4:11211, 5.6.7.8:11211, 9.8.7.6:11211)
* Hash each server string to several (100-200) unsigned ints
* Conceptually, these numbers are placed on a circle called the continuum. (imagine a clock face that goes from 0 to 2^32)
* Each number links to the server it was hashed from, so servers appear at several points on the continuum, by each of the numbers they hashed to.
* To map a key->server, hash your key to a single unsigned int, and find the next biggest number on the continuum. The server linked to that number is the correct server for that key.
* If you hash your key to a value near 2^32 and there are no points on the continuum greater than your hash, return the first server in the continuum.
If you then add or remove a server from the list, only a small proportion of keys end up mapping to different servers.

下面以Spy Memcached中的代碼為例來說明這種算法的使用

該client采用TreeMap存儲所有節(jié)點，模擬一個環(huán)形的邏輯關(guān)系。在這個環(huán)中，節(jié)點之前是存在順序關(guān)系的，所以TreeMap的key必須實現(xiàn)Comparator接口。
那節(jié)點是怎樣放入這個環(huán)中的呢？

///////////////////////////////////////////////////////////////////////
//對所有節(jié)點，生成nCopies個虛擬結(jié)點
for(Node node : nodes) {
   //每四個虛擬結(jié)點為一組，為什么這樣？下面會說到
   for(int i=0; i<nCopies / 4; i++) {
    //getKeyForNode方法為這組虛擬結(jié)點得到惟一名稱
    byte[] digest=HashAlgorithm.computeMd5(getKeyForNode(node, i));
   /** Md5是一個16字節(jié)長度的數(shù)組，將16字節(jié)的數(shù)組每四個字節(jié)一組，
      分別對應(yīng)一個虛擬結(jié)點，這就是為什么上面把虛擬結(jié)點四個劃分一組的原因*/
  for(int h=0;h<4;h++) {
    //對于每四個字節(jié)，組成一個long值數(shù)值，做為這個虛擬節(jié)點的在環(huán)中的惟一key
   Long k = ((long)(digest[3+h*4]&0xFF) << 24)
    | ((long)(digest[2+h*4]&0xFF) << 16)
    | ((long)(digest[1+h*4]&0xFF) << 8)
    | (digest[h*4]&0xFF);

   allNodes.put(k, node);
  }
}
}

///////////////////////////////////////////////////////////////////////
上面的流程大概可以這樣歸納:四個虛擬結(jié)點為一組，以getKeyForNode方法得到這組虛擬節(jié)點的name，Md5編碼后，每個虛擬結(jié)點對應(yīng)Md5碼 16個字節(jié)中的4個，組成一個long型數(shù)值，做為這個虛擬結(jié)點在環(huán)中的惟一key。第12行k為什么是Long型的呢？呵呵，就是因為Long型實現(xiàn)了 Comparator接口。

處理完正式結(jié)點在環(huán)上的分布后，可以開始key在環(huán)上尋找節(jié)點的游戲了。
對于每個key還是得完成上面的步驟:計算出Md5，根據(jù)Md5的字節(jié)數(shù)組，通過Kemata Hash算法得到key在這個環(huán)中的位置。

///////////////////////////////////////////////////////////////////////
final Node rv;
byte[] digest = hashAlg.computeMd5(keyValue);
Long key = hashAlg.hash(digest, 0);
//如果找到這個節(jié)點，直接取節(jié)點，返回
if(!ketamaNodes.containsKey(key)) {
//得到大于當前key的那個子Map，然后從中取出第一個key，就是大于且離它最近的那個key
   SortedMap<Long, Node> tailMap=ketamaNodes.tailMap(key);
   if(tailMap.isEmpty()) {
    key=ketamaNodes.firstKey();
} else {
  key=tailMap.firstKey();
}
//在JDK1.6中，ceilingKey方法可以返回大于且離它最近的那個key
//For JDK1.6 version
//          key = ketamaNodes.ceilingKey(key);
//          if (key == null) {
//              key = ketamaNodes.firstKey();
//          }
}

rv=allNodes.get(key);

///////////////////////////////////////////////////////////////////////

引文中已詳細描述過這種取節(jié)點邏輯:在環(huán)上順時針查找，如果找到某個節(jié)點，就返回那個節(jié)點;如果沒有找到，則取整個環(huán)的第一個節(jié)點。

測試結(jié)果
測試代碼是自己整理的，主體方法沒有變

分布平均性測試:測試隨機生成的眾多key是否會平均分布到各個結(jié)點上
測試結(jié)果如下:
1.Nodes count : 5, Keys count : 100000, Normal percent : 20.0%
2.-------------------- boundary ----------------------
3.Node name :node1 - Times : 20821 - Percent : 20.821001%
4.Node name :node3 - Times : 19018 - Percent : 19.018%
5.Node name :node5 - Times : 19726 - Percent : 19.726%
6.Node name :node2 - Times : 19919 - Percent : 19.919%
7.Node name :node4 - Times : 20516 - Percent : 20.516%
最上面一行是參數(shù)說明，節(jié)點數(shù)目，總共有多少key，每個節(jié)點應(yīng)該分配key的比例是多少。下面是每個結(jié)點分配到key的數(shù)目和比例。
多次測試后發(fā)現(xiàn)，這個Hash算法的節(jié)點分布還是不錯的，都在標準比例左右徘徊，是個合適的負載均衡算法。

節(jié)點增刪測試:在環(huán)上插入N個結(jié)點，每個節(jié)點nCopies個虛擬結(jié)點。隨機生成眾多key，在增刪節(jié)點時，測試同一個key選擇相同節(jié)點的概率
測試如果如下:
1.Normal case : nodes count : 50
2.Added case : nodes count : 51
3.Reduced case : nodes count : 49
4.------------ boundary -------------
5.Same percent in added case : 93.765%
6.Same percent in reduced case : 93.845%
上面三行分別是正常情況，節(jié)點增加，節(jié)點刪除情況下的節(jié)點數(shù)目。下面兩行表示在節(jié)點增加和刪除情況下，同一個key分配在相同節(jié)點上的比例(命中率)。
多次測試后發(fā)現(xiàn)，命中率與結(jié)點數(shù)目和增減的節(jié)點數(shù)量有關(guān)。同樣增刪結(jié)點數(shù)目情況下，結(jié)點多時命中率高。同樣節(jié)點數(shù)目，增刪結(jié)點越少，命中率越高。這些都與實際情況相符。

附件為Ketama算法的Java代碼及測試代碼
"Ketama_Hashing_Algorithm.rar" http://vdisk.weibo.com/s/x4gf

posted on 2012-11-15 11:15 小馬歌閱讀(296) 評論(0) 編輯收藏所屬分類: web tools 、java groovy 、performance

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發(fā)表評論。




網(wǎng)站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關(guān)文章: Jquery Ajax和getJSON獲取后臺普通Json數(shù)據(jù)和層級Json數(shù)據(jù)解析 activemq 監(jiān)控服務(wù)監(jiān)控微信通知 log4j實現(xiàn)日志集中存儲 GYP 簡介 php讀取網(wǎng)絡(luò)文件 curl, fsockopen ,file_get_contents 幾個方法的效率對比 thrift rpc 使用常見問題解答和經(jīng)驗 facebook是如何管理代碼的 php 項目開發(fā)基礎(chǔ)工具 [開源推薦]Google新的Web UI庫：Polymer 【轉(zhuǎn)】

My Links

Blog Stats

留言簿(26)

隨筆分類

文章分類

文章檔案

博客連接

搜索

最新評論

閱讀排行榜

評論排行榜

[轉(zhuǎn)]Ketama一致性Hash算法(含Java代碼)

From:http://www.iteye.com/topic/684087