ivaneeo's blog

自由的力量，自由的生活。

BlogJava :: 首頁 :: 聯系 :: 聚合

:: 管理

669 Posts :: 0 Stories :: 64 Comments :: 0 Trackbacks

常用鏈接

留言簿(34)

我參與的團隊

隨筆分類

隨筆檔案

搜索

閱讀排行榜

評論排行榜

android http上傳文件

在Android的客戶端編程中（特別是SNS 類型的客戶端），經常需要實現注冊功能Activity，要用戶輸入用戶名，密碼，郵箱，照片后注冊。但這時就有一個問題，在HTML中用form表單就能實現如上的注冊表單，需要的信息會自動封裝為完整的HTTP協議，但在Android中如何把這些參數和需要上傳的文件封裝為HTTP協議呢？

我們可以先做個試驗，看一下form表單到底封裝了什么樣的信息。

第一步：編寫一個Servlet，把接收到的HTTP信息保存在一個文件中，代碼如下：

public void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
//獲取輸入流，是HTTP協議中的實體內容
ServletInputStream sis=request.getInputStream();
//緩沖區
byte buffer[]=new byte[1024];
FileOutputStream fos=new FileOutputStream("d:\\file.log");
int len=sis.read(buffer, 0, 1024);
//把流里的信息循環讀入到file.log文件中
while( len!=-1 )
{
fos.write(buffer, 0, len);
len=sis.readLine(buffer, 0, 1024);
}
fos.close();
sis.close();
}

第二步：實現如下一個表單頁面，詳細的代碼如下：

<form action="servlet/ReceiveFile" method="post" enctype="multipart/form-data">
第一個參數<input type="text" name="name1"/> <br/>
第二個參數<input type="text" name="name2"/> <br/>
第一個上傳的文件<input type="file" name="file1"/> <br/>
第二個上傳的文件<input type="file" name="file2"/> <br/>
<input type="submit" value="提交">
</form>

注意了，由于要上傳附件，所以一定要設置enctype為multipart/form-data，才可以實現附件的上傳。

第三步：填寫完信息后按“提交”按鈕后，在D盤下查找file.log文件用記事本打開，數據如下：

—————————–7d92221b604bc

Content-Disposition: form-data; name=”name1″

hello

—————————–7d92221b604bc

Content-Disposition: form-data; name=”name2″

world

—————————–7d92221b604bc

Content-Disposition: form-data; name=”file1″; filename=”C:\2.GIF”

Content-Type: image/gif

GIF89a

€ € €€ €€ € €€€€€覽?     3 f  3 33 3f 3 3 3 f f3 ff f f f ? 檉櫃櫶 ? ? 蘤虣燙 ?  3 f   3 3 33 f3 ? ? 33 33333f33?3?33f 3f33ff3f?f?f3 3?3檉3櫃3櫶3?3 3?3蘤3虣3燙3?3 333f3??f f 3f ff 檉蘤 f3 f33f3ff3檉3蘤3ff ff3fffff檉f蘤ff f?f檉f櫃f櫶f?f f?f蘤f虣f燙f?f f3fff檉蘤 3 f 櫃虣 ? ?3?f?櫃3虣3檉檉3檉f檉櫃f虣f櫃櫃3櫃f櫃櫃櫶櫃櫶櫶3櫶f櫶櫃燙櫶? ?3?f?櫃虣 3 f 櫶燙 ? ?3?f?櫶3燙3蘤蘤3蘤f蘤櫶f燙f虣虣3虣f虣櫶櫶虣燙燙3燙f燙櫶燙燙? ?3?f?櫶燙  3 f ? ? 3 333f3?3?3f f3fff?f?f ?檉櫃櫶??蘤虣燙? 3f??!? ,

e  ??羵Q鸚M!C囑lH馉脝遠5荑p釩?3R?R愣?MV39V5?談re琷?試 3??qn?薵Q燚c?獖i鄲EW艗赥戟j ;

—————————–7d92221b604bc

Content-Disposition: form-data; name=”file2″; filename=”C:\2.txt”

Content-Type: text/plain

hello everyone!!!

—————————–7d92221b604bc–

從表單源碼可知，表單上傳的數據有4個：參數name1和name2，文件file1和file2

首先從file.log觀察兩個參數name1和name2的情況。這時候使用UltraEdit打開file.log（因為有些字符在記事本里顯示不出來，所以要用16進制編輯器）

結合16進制數據和記事本顯示的數據可知上傳參數部分的格式規律：

1. 第一行是“—————————–7d92221b604bc”作為分隔符，然后是“\r\n”（即16進制編輯器顯示的0D 0A）回車換行符。

2. 第二行

（1）首先是HTTP中的擴展頭部分“Content-Disposition: form-data;”，表示上傳的是表單數據。

（2） “name=”name1″”參數的名稱。

（3） “\r\n”（即16進制編輯器顯示的0D 0A）回車換行符。

3. 第三行：“\r\n”（即16進制編輯器顯示的0D 0A）回車換行符。

4. 第四行：參數的值，最后是“\r\n”（即16進制編輯器顯示的0D 0A）回車換行符。

由觀察可得，表單上傳的每個參數都是按照以上1—4的格式構造HTTP協議中的參數部分。

結合16進制數據和記事本顯示的數據可知上傳文件部分的格式規律：

1. 第一行是“—————————–7d92221b604bc”作為分隔符，然后是“\r\n”（即16進制編輯器顯示的0D 0A）回車換行符。

2. 第二行：

a) 首先是HTTP中的擴展頭部分“Content-Disposition: form-data;”，表示上傳的是表單數據。

b) “name=”file2″;”參數的名稱。

c) “filename=”C:\2.txt””參數的值。

d) “\r\n”（即16進制編輯器顯示的0D 0A）回車換行符。

3. 第三行：HTTP中的實體頭部分“Content-Type: text/plain”：表示所接收到得實體內容的文件格式。計算機的應用中有多種多種通用的文件格式，人們為每種通用格式都定義了一個名稱，稱為 MIME，MIME的英文全稱是”Multipurpose Internet Mail Extensions” （多功能Internet 郵件擴充服務）

4. 第四行：“\r\n”（即16進制編輯器顯示的0D 0A）回車換行符。

5. 第五行開始：上傳的內容的二進制數。

6. 最后是結束標志“—————————–7d92221b604bc–”，注意：這個結束標志和分隔符的區別是最后多了“–”部分。

但現在還有一個問題，就是分隔符“—————————–7d92221b604bc”是怎么確定的呢？是不是一定要“7d92221b604bc”這串數字?

我們以前的分析只是觀察了HTTP請求的實體部分，可以借用工具觀察完整的HTTP請求看一看有沒有什么線索？

在IE下用HttpWatch，在Firefox下用Httpfox這個插件，可以實現網頁數據的抓包，從圖4可看出，原來在Content-Type部分指定了分隔符所用的字符串。

根據以上總結的注冊表單中的參數傳遞和文件上傳的規律，我們可以能寫出Android中實現一個用戶注冊功能（包括個人信息填寫和上傳圖片部分）的工具類，

首先，要有一個javaBean類FormFile封裝文件的信息：

public class FormFile {
/* 上傳文件的數據 */
private byte[] data;
/* 文件名稱 */
private String filname;
/* 表單字段名稱*/
private String formname;
/* 內容類型 */
private String contentType = "application/octet-stream"; //需要查閱相關的資料
public FormFile(String filname, byte[] data, String formname, String contentType) {
this.data = data;
this.filname = filname;
this.formname = formname;
if(contentType!=null) this.contentType = contentType;
}
public byte[] getData() {
return data;
}
public void setData(byte[] data) {
this.data = data;
}
public String getFilname() {
return filname;
}
public void setFilname(String filname) {
this.filname = filname;
}
public String getFormname() {
return formname;
}
public void setFormname(String formname) {
this.formname = formname;
}
public String getContentType() {
return contentType;
}
public void setContentType(String contentType) {
this.contentType = contentType;
}
}

實現文件上傳的代碼如下：

/**
* 直接通過HTTP協議提交數據到服務器,實現表單提交功能
* @param actionUrl 上傳路徑
* @param params 請求參數 key為參數名,value為參數值
* @param file 上傳文件
*/
public static String post(String actionUrl, Map<String, String> params, FormFile[] files) {
    try {
        String BOUNDARY = “———7d4a6d158c9″; //數據分隔線
        String MULTIPART_FORM_DATA = “multipart/form-data”;

        URL url = new URL(actionUrl);
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        conn.setDoInput(true);//允許輸入
        conn.setDoOutput(true);//允許輸出
        conn.setUseCaches(false);//不使用Cache
        conn.setRequestMethod(”POST”);
        conn.setRequestProperty(”Connection”, “Keep-Alive”);
        conn.setRequestProperty(”Charset”, “UTF-8″);
        conn.setRequestProperty(”Content-Type”, MULTIPART_FORM_DATA + “; boundary=” + BOUNDARY);

        StringBuilder sb = new StringBuilder();

        //上傳的表單參數部分，格式請參考文章
        for (Map.Entry<String, String> entry : params.entrySet()) {//構建表單字段內容
            sb.append(”–”);
            sb.append(BOUNDARY);
            sb.append(”\r\n”);
            sb.append(”Content-Disposition: form-data; name=\”"+ entry.getKey() + “\”\r\n\r\n”);
            sb.append(entry.getValue());
            sb.append(”\r\n”);
        }
        DataOutputStream outStream = new DataOutputStream(conn.getOutputStream());
        outStream.write(sb.toString().getBytes());//發送表單字段數據

        //上傳的文件部分，格式請參考文章
        for(FormFile file : files){
            StringBuilder split = new StringBuilder();
            split.append(”–”);
            split.append(BOUNDARY);
            split.append(”\r\n”);
            split.append(”Content-Disposition: form-data;name=\”"+ file.getFormname()+”\”;filename=\”"+ file.getFilname() + “\”\r\n”);
            split.append(”Content-Type: “+ file.getContentType()+”\r\n\r\n”);
            outStream.write(split.toString().getBytes());
            outStream.write(file.getData(), 0, file.getData().length);
            outStream.write(”\r\n”.getBytes());
        }
        byte[] end_data = (”–” + BOUNDARY + “–\r\n”).getBytes();//數據結束標志
        outStream.write(end_data);
        outStream.flush();
        int cah = conn.getResponseCode();
        if (cah != 200) throw new RuntimeException(”請求url失敗”);
        InputStream is = conn.getInputStream();
        int ch;
        StringBuilder b = new StringBuilder();
        while( (ch = is.read()) != -1 ){
            b.append((char)ch);
        }
        outStream.close();
        conn.disconnect();
        return b.toString();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

posted @ 2011-06-09 16:26 ivaneeo 閱讀(3327) | 評論 (0) | 編輯收藏

Freemarker 的 Configuration 實例和 MRU Cache

一. 相同配置（set....）的 Configuration 可以考慮只在整個 Application 中共享同一個實例：

Create a configuration instance

First you have to create a freemarker.template.Configuration instance and adjust its settings. A Configuration instance is a central place to store the application level settings of FreeMarker. Also, it deals with the creation and caching of pre-parsed templates.

Probably you will do it only once at the beginning of the application (possibly servlet) life-cycle:

二. 具有不同配置（set....）的 Configuration 應該建立相互獨立的實例：

From now you should use this single configuration instance. Note however that if a system has multiple independent components that use FreeMarker, then of course they will use their own private Configuration instance.

三. 共享的 Configuration 實例有利于開啟 MRU Cache 功能：

Multithreading

In a multithreaded environment Configuration instances, Template instances and data models should be handled as immutable (read-only) objects. That is, you create and initialize them (for example with set... methods), and then you don't modify them later (e.g. you don't call set...). This allows us to avoid expensive synchronized blocks in a multithreaded environment. Beware with Template instances; when you get a Template instance with Configuration.getTemplate, you may get an instance from the template cache that is already used by other threads, so do not call its set... methods (calling process is of course fine).

The above restrictions do not apply if you access all objects from the same single thread only.

四. 開啟 MRU Cache 策略

Template caching

FreeMarker caches templates (assuming you use the Configuration methods to create Template objects). This means that when you call getTemplate, FreeMarker not only returns the resulting Template object, but stores it in a cache, so when next time you call getTemplate with the same (or equivalent) path, it just returns the cached Template instance, and will not load and parse the template file again.

cfg.setCacheStorage(new freemarker.cache.MruCacheStorage(20, 250))

Or, since MruCacheStorage is the default cache storage implementation:

cfg.setSetting(Configuration.CACHE_STORAGE_KEY, "strong:20, soft:250");

When you create a new Configuration object, initially it uses an MruCacheStorage where maxStrongSize is 0, and maxSoftSize is Integer.MAX_VALUE (that is, in practice, infinite). But using non-0 maxStrongSize is maybe a better strategy for high load servers, since it seems that, with only softly referenced items, JVM tends to cause just higher resource consumption if the resource consumption was already high, because it constantly throws frequently used templates from the cache, which then have to be re-loaded and and re-parsed.

五. MRU （Most Recently Used） Cache 自動更新模板內容的特性

If you change the template file, then FreeMarker will re-load and re-parse the template automatically when you get the template next time. However, since checking if the file has been changed can be time consuming, there is a Configuration level setting called ``update delay''. This is the time that must elapse since the last checking for a newer version of a certain template before FreeMarker will check that again. This is set to 5 seconds by default. If you want to see the changes of templates immediately, set it to 0. Note that some template loaders may have problems with template updating. For example, class-loader based template loaders typically do not notice that you have changed the template file.

六. MRU Cache 的兩級緩存策略

A template will be removed from the cache if you call getTemplate and FreeMarker realizes that the template file has been removed meanwhile. Also, if the JVM thinks that it begins to run out of memory, by default it can arbitrarily drop templates from the cache. Furthermore, you can empty the cache manually with the clearTemplateCache method of Configuration.

The actual strategy of when a cached template should be thrown away is pluggable with the cache_storage setting, by which you can plug any CacheStorage implementation. For most users freemarker.cache.MruCacheStorage will be sufficient. This cache storage implements a two-level Most Recently Used cache. In the first level, items are strongly referenced up to the specified maximum (strongly referenced items can't be dropped by the JVM, as opposed to softly referenced items). When the maximum is exceeded, the least recently used item is moved into the second level cache, where they are softly referenced, up to another specified maximum. The size of the strong and soft parts can be specified with the constructor. For example, set the size of the strong part to 20, and the size of soft part to 250:

posted @ 2011-06-09 15:50 ivaneeo 閱讀(732) | 評論 (0) | 編輯收藏

HADOOP報錯Incompatible namespaceIDs

今早一來，突然發現使用-put命令往HDFS里傳數據傳不上去了，抱一大堆錯誤，然后我使用bin/hadoop dfsadmin -report查看系統狀態

admin@adw1:/home/admin/joe.wangh/hadoop-0.19.2>bin/hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: ?%

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

使用bin/stop-all.sh關閉HADOOP

admin@adw1:/home/admin/joe.wangh/hadoop-0.19.2>bin/stop-all.sh
stopping jobtracker
172.16.197.192: stopping tasktracker
172.16.197.193: stopping tasktracker
stopping namenode
172.16.197.193: no datanode to stop
172.16.197.192: no datanode to stop
172.16.197.191: stopping secondarynamenode

哦，看到了吧，發現datanode前面并沒有啟動起來。去DATANODE上查看一下日志

admin@adw2:/home/admin/joe.wangh/hadoop-0.19.2/logs>vi hadoop-admin-datanode-adw2.hst.ali.dw.alidc.net.log

************************************************************/
2010-07-21 10:12:11,987 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/admin/joe.wangh/hadoop/data/dfs.data.dir: namenode namespaceID = 898136669; datanode namespaceID = 2127444065
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:288)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:206)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1239)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1194)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1202)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1324)
......

錯誤提示namespaceIDs不一致。

下面給出兩種解決辦法，我使用的是第二種。

Workaround 1: Start from scratch

I can testify that the following steps solve this error, but the side effects won't make you happy (me neither). The crude workaround I have found is to:

1. stop the cluster

2. delete the data directory on the problematic datanode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data

3. reformat the namenode (NOTE: all HDFS data is lost during this process!)

4. restart the cluster

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.

Workaround 2: Updating namespaceID of problematic datanodes

Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is "minimally invasive" as you only have to edit one file on the problematic datanodes:

1. stop the datanode

2. edit the value of namespaceID in <dfs.data.dir>/current/VERSION to match the value of the current namenode

3. restart the datanode

If you followed the instructions in my tutorials, the full path of the relevant file is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data/current/VERSION (background: dfs.data.dir is by default set to ${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir to /usr/local/hadoop-datastore/hadoop-hadoop).

If you wonder how the contents of VERSION look like, here's one of mine:

#contents of <dfs.data.dir>/current/VERSION

namespaceID=393514426

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

cTime=1215607609074

storageType=DATA_NODE

layoutVersion=-13

原因:每次namenode format會重新創建一個namenodeId,而tmp/dfs/data下包含了上次format下的id,namenode format清空了namenode下的數據,但是沒有晴空datanode下的數據,導致啟動時失敗,所要做的就是每次fotmat前,清空tmp一下的所有目錄.

posted @ 2011-06-09 14:20 ivaneeo 閱讀(566) | 評論 (0) | 編輯收藏

zeekeeper重連的代碼

private void buildZK() {
System.out.println("Build zk client");
try {
zk = new ZooKeeper(zookeeperConnectionString, 10000, this);
Stat s = zk.exists(rootPath, false);
if (s == null) {
zk.create(rootPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
zk.create(rootPath + "/ELECTION", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}
String value = zk.create(rootPath + "/ELECTION/n_", hostAddress, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
} catch (Exception e) {
e.printStackTrace();
System.err.println("Error connect to zoo keeper");
}
}
public void process(WatchedEvent event) {
System.out.println(event);
if (event.getState() == Event.KeeperState.Disconnected || event.getState() == Event.KeeperState.Expired) {
System.out.println("Zookeeper connection timeout.");
buildZK();
}
}

posted @ 2011-06-09 13:38 ivaneeo 閱讀(459) | 評論 (0) | 編輯收藏

Zookeeper配置文件

修改配置

復制conf/zoo_sample.cfg文件為conf/zoo.cfg，修改其中的數據目錄。

# cat /opt/apps/zookeeper/conf/zoo.cfg  tickTime=2000 initLimit=5 syncLimit=2 dataDir=/opt/zkdata clientPort=2181

相關配置如下：

tickTime：這個時間作為Zookeeper服務器之間或者服務器與客戶端之間維護心跳的時間，時間單位毫秒。
initLimit：選舉leader的初始延時。由于服務器啟動加載數據需要一定的時間（尤其是配置數據非常多），因此在選舉 Leader后立即同步數據前需要一定的時間來完成初始化?？梢赃m當放大一點。延時時間為initLimit*tickTime，也即此數值為 tickTime的次數。
syncLimit：此時間表示為Leader與Follower之間的最大響應時間單元，如果超時此時間（syncLimit*tickTime)，那么Leader認為Follwer也即死掉，將從服務器列表中刪除。

如果是單機模式的話，那么只需要tickTime/dataDir/clientPort三個參數即可，這在單機調試環境很有效。

集群環境配置

增加其他機器的配置

# cat /opt/apps/zookeeper/conf/zoo.cfg  tickTime=2000 initLimit=5 syncLimit=2 dataDir=/opt/zkdata clientPort=2181 server.1=10.11.5.202:2888:3888 server.2=192.168.105.218:2888:3888 server.3=192.168.105.65:2888:3888

其中server.X的配置是每一個機器的相關參數。X代表唯一序號，例如1/2/3等，值是IP:PORT:PORT。其中IP是 zookeeper服務器的IP地址或者域名，第一個PORT（例如2888）是服務器之間交換數據的端口，也即Follower連接Leader的端口，而第二個端口（例如3888）是各服務器選舉Leader的端口。單機配置集群的話可以通過不同的端口來實現。

同步文件目錄

# rsync --inplace -vzrtLp --delete-after --progress /opt/apps/zookeeper root@192.168.105.218:/opt/apps # rsync --inplace -vzrtLp --delete-after --progress /opt/apps/zookeeper root@192.168.106.65:/opt/apps

建立每一個服務器的id

注意，此id需要和zoo.cfg中的配置對應起來

ssh root@10.11.5.202 'echo 1 > /opt/zkdata/myid' ssh root@192.168.105.218 'echo 2 > /opt/zkdata/myid' ssh root@192.168.106.65 'echo 3 > /opt/zkdata/myid'

啟動服務器

ssh root@10.11.5.202 '/opt/apps/zookeeper/bin/zkServer.sh start' ssh root@192.168.105.218 '/opt/apps/zookeeper/bin/zkServer.sh start' ssh root@192.168.106.65 '/opt/apps/zookeeper/bin/zkServer.sh start'

防火墻配置

如果開啟了iptables防火墻，則需要在文件/etc/sysconfig/iptables文件下增加如下配置

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2181 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2888 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3888 -j ACCEPT

重啟防火墻：

service iptables restart

posted @ 2011-06-08 18:07 ivaneeo 閱讀(1200) | 評論 (0) | 編輯收藏

HBase解惑

最近經常對自己提一些問題，然后自己通過google、讀代碼、測試尋求答案來解決疑惑，可能這些問題也能給其他人帶來一些幫助。

quora是個不錯的問答型網站，興趣去看一下自己感興趣的話題吧~

1）HBase中的TTL參數什么意思？
TTL == "Time To Live". You can specify how long a cell lives in hbase.
Onces its "TTL" has expired, its removed.

2）影響read性能的配置參數有哪些？

hbase-env.xml:
export HBASE_HEAPSIZE=4000

hbase-default.xml:
hfile.block.cache.size

3）HBase在寫操作的時候會更新LruBlockCache嗎？

從代碼上看寫的時候不會更新lruBlockCache!

4）如何將一個HBase CF指定為IN_MEMORY？
創建table的時候可以指定CF的屬性，create 'taobao', {NAME => 'edp', IN_MEMORY => true}

5）HBase cache每次load的最小單位是block

6）如果每次load一個block到cache中，而以后不會再讀取這個block，則這個block對block cache
hit ratio沒有貢獻啊，但是為什么block cache hit ratio有60%+呢？（這個我當初的錯誤理解，漏
洞還是很多的）

注意block cache hit ratio的最小計量單位應該是record，cache的最小單位才是block，因為block
下面有很多record，后面的record借助了讀第一個record帶來的cache福利，所以block cache hit ratio
才會有60%+

7）如果只有一行一個cf，寫入很大量的數據會不會發生region split?

view plain copy to clipboard print ?

<property>
<name>hbase.hregion.max.filesize</name>
<value>67108864</value>
<description>
Maximum HStoreFile size. If any one of a column families' HStoreFiles has
grown to exceed this value, the hosting HRegion is split in two.
Default: 256M.
</description>
</property>

測試: 將參數hbase.hregion.max.filesize設置成64M以后，然后create table的時候只創建一個CF，測試的時候只往一個row + CF 下面塞入數據，數據量大概在80M左右，在web上顯示的數目是107M，但是沒有發生region split。這說明region split最小單位應該是row key級別，因為這里只有一個row，即使數據量已經上去了，但是還是沒有發生region split.

posted @ 2011-06-08 18:02 ivaneeo 閱讀(742) | 評論 (0) | 編輯收藏

Thinkpad T60 volume buttons in 9.04 and 9.10

Hi all.
I've a thinkpad T60 and 9.10 installed. I did some search on the forums and found the workaround with tpb package to fix thinkpad volume buttons issue.
My problems with that fix are:
-tbp package depens on xosd (or whatever like that, NOT Notify-OSD) so the result is not the best...

-tpb package is not neccessary at all, because thinkpad_acpi module can take care about volume buttons as well, you just have to enable the hotkey mask! http://www.thinkwiki.org/wiki/Thinkpad-acpi

So my workaround on T60 (in terminal):
9.04 jaunty:

Code:

sudo echo enable,0x00ffffff > /proc/acpi/ibm/hotkey

9.10 karmic: (using sysfs): (also works on 10.04 and 10.10 as well...)

Code:

sudo cp /sys/devices/platform/thinkpad_acpi/hotkey_all_mask /sys/devices/platform/thinkpad_acpi/hotkey_mask

Update:
The solutions only works till next reboot or suspend/resume cycle.
you should put the commands in:
/etc/rc.local
without sudo of course, to make it permanent.

Please confirm if the solution works on other thikpad models.

As soon as I find solution for all the things I need on my T60 I will put it up on Thinkwiki and paste the link here.
(Active protection - hdaps)
(Trackpoint additional functions - you just have to install the: gpointing-device-settings package)
(fingerprint reader - thinkfinger)

Hope it helped for someone.

posted @ 2011-05-31 15:16 ivaneeo 閱讀(308) | 評論 (0) | 編輯收藏

Accessing Hadoop DFS for Data Storage and Retrieval Using Java

Distributed File Systems (DFS) are a new type of file systems which provides some extra features over normal file systems and are used for storing and sharing files across wide area network and provide easy programmatic access. File Systems like HDFS from Hadoop and many others falls in the category of distributed file systems and has been widely used and are quite popular.

This tutorial provides a step by step guide for accessing and using distributed file system for storing and retrieving data using j\Java. Hadoop Distributed File System has been used for this tutorial because it is freely available, easy to setup and is one of the most popular and well known Distributed file system. The tutorial demonstrates how to access Hadoop distributed file system using java showing all the basic operations.

Introduction
Distributed File Systems (DFS) are a new type of file systems which provides some extra features over normal file systems and are used for storing and sharing files across wide area network and provide easy programmatic access.

Distributed file system is used to make files distributed across multiple servers appear to users as if they reside in one place on the network. Distributed file system allows administrators to consolidate file shares that may exist on multiple servers to appear as if they all are in the same location so that users can access them from a single point on the network.
HDFS stands for Hadoop Distributed File System and is a distributed file system designed to run on commodity hardware. Some of the features provided by Hadoop are:
•   Fault tolerance: Data can be replicated, so if any of the servers goes down, resources still will be available for user.
•   Resource management and accessibility: Users does not require knowing the physical location of the data; they can access all the resources through a single point. HDFS also provides web browser interface to view the contents of the file.
•   It provides high throughput access to application data.

This tutorial will demonstrate how to use HDFS for basic distributed file system operations using Java. Java 1.6 version and Hadoop driver has been used (link is given in Pre-requisites section). The development environment consists of Eclipse 3.4.2 and Hadoop 0.19.1 on Microsoft Windows XP – SP3.

Pre-requisites
1. Hadoop-0.19.1 installation - here and here -

2. Hadoop-0.19.1-core.jar file

3. Commons-logging-1.1.jar file

4. Java 1.6

5. Eclipse 3.4.2

Creating New Project and FileSystem Object

First step is to create a new project in Eclipse and then create a new class in that project.
Now add all the jar files to the project, as mentioned in the pre-requisites.
First step in using or accessing Hadoop Distributed File System (HDFS) is to create file system object.
Without creating an object you cannot perform any operations on the HDFS, so file system object is always required to be created.
Two input parameters are required to create object. They are “Host name” and “Port”.
Code below shows how to create file system object to access HDFS.

Configuration config = new Configuration();

config.set("fs.default.name","hdfs://127.0.0.1:9000/");

FileSystem dfs = FileSystem.get(config);

Here Host name = “127.0.0.1” & Port = “9000”.

Various HDFS operations

Now we will see various operations that can be performed on HDFS.

Creating Directory

Now we will start with creating a directory.
First step for using HDFS is to create a directory where we will store our data.
Now let us create a directory named “TestDirectory”.

String dirName = "TestDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/"+dirName);

dfs.mkdirs(src);

Here dfs.getWorkingDirectory() function will return the path of the working directory which is the basic working directory and all the data will be stored inside this directory. mkdirs() function accepts object of the type Path, so as shown above Path object is created first. Directory is required to be created inside basic working directory, so Path object is created accordingly. dfs.mkdirs(src)function will create a directory in the working folder with name “TestDirectory”.

Sub directories can also be created inside the “TestDirectory”; in that case path specified during creation of Path object will change. For example a directory named “subDirectory” can be created inside directory “TestDirectory” as shown in below code.

String subDirName = "subDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/"+ subDirName);

dfs.mkdirs(src);

Deleting Directory or file

Existing directory in the HDFS can be deleted. Below code shows how to delete the existing directory.

String dirName = "TestDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/"+dirName);

Dfs.delete(src);

Please note that delete() method can also be used to delete files. What needs to be deleted should be specified in the Path object.

Copying file to/from HDFS from/to Local file system

Basic aim of using HDFS is to store data, so now we will see how to put data in HDFS.
Once directory is created, required data can be stored in HDFS from the local file system.
So consider that a file named “file1.txt” is located at “E:\HDFS” in the local file system, and it is required to be copied under the folder “subDirectory” (that was created earlier) in HDFS.
Code below shows how to copy file from local file system to HDFS.

Path src = new Path("E://HDFS/file1.txt");

Path dst = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/");

dfs.copyFromLocalFile(src, dst);

Here src and dst are the Path objects created for specifying the local file system path where file is located and HDFS path where file is required to be copied respectively. copyFromLocalFile() method is used for copying file from local file system to HDFS.

Similarly, file can also be copied from HDFS to local file system. Code below shows how to copy file from HDFS to local file system.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");

Path dst = new Path("E://HDFS/");

dfs.copyToLocalFile(src, dst);

Here copyToLocalFile() method is used for copying file from HDFS to local file system.

CIO, CTO & Developer Resources

Creating a file and writing data in it

It is also possible to create a file in HDFS and write data in it. So if required instead of directly copying the file from the local file system, a file can be first created and then data can be written in it.
Code below shows how to create a file name “file2.txt” in HDFS directory.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file2.txt");

dfs.createNewFile(src);

Here createNewFile() method will create the file in HDFS based on the input provided in src object.

Now as the file is created, data can be written in it. Code below shows how to write data present in the “file1.txt” of local file system to “file2.txt” of HDFS.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file2.txt");

FileInputStream fis = new FileInputStream("E://HDFS/file1.txt");

int len = fis.available();

byte[] btr = new byte[len];

fis.read(btr);

FSDataOutputStream fs = dfs.create(src);

fs.write(btr);

fs.close();

Here write() method of FSDataOutputStream is used to write data in file located in HDFS.

Reading data from a file

It is always necessary to read the data from file for performing various operations on data. It is possible to read data from the file which is stored in HDFS.
Code below shows how to retrieve data from the file present in the HDFS. Here data is read from the file (file1.txt) which is present in the directory (subDirectory) that was created earlier.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");

FSDataInputStream fs = dfs.open(src);

String str = null;

while ((str = fs.readline())!= null)
{
System.out.println(str);
}

Here readline() method of FSDataInputStream is used to read data from the file located in HDFS. Also src is the Path object used to specify the path of the file in HDFS which has to be read.

Miscellaneous operations that can be performed on HDFS

Below are some of the basic operations that can be performed on HDFS.

Below is the code that can be used to check whether particular file or directory exists in HDFS. If it exists, it returns true and if it doesn’t exists it returns false.dfs.exists() method is used for this.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/HDFS/file1.txt");

System.out.println(dfs.exists(src));

Below is the code that can be used to check the default block size in which file would be split. It returns block size in terms of Number of Bytes.dfs.getDefaultBlockSize() method is used for this.

System.out.println(dfs.getDefaultBlockSize());

To check for the default replication factor, as shown belowdfs.getDefaultReplication() method can be used.

System.out.println(dfs.getDefaultReplication());

To check whether given path is HDFS directory or file, as shown belowdfs.isDirectory() or dfs.isFile() methods can be used.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");
System.out.println(dfs.isDirectory(src));
System.out.println(dfs.isFile(src));

Conclusion
So we just learned some of the basics about Hadoop Distributed File System, how to create and delete directory, how to copy file to/from HDFS from/to local file system, how to create and delete file into directory, how to write data in file, and how to read data from file. We also learned various other operations that can be performed on HDFS. Thus from what we have done we can say that, HDFS is easy to use for data storage and retrieval.

References:
http://hadoop.apache.org/common/docs/current/hdfs_design.html
http://en.wikipedia.org/wiki/Hadoop

posted @ 2011-05-17 10:43 ivaneeo 閱讀(571) | 評論 (0) | 編輯收藏

paxos 實現

本文主要介紹zookeeper中zookeeper Server leader的選舉，zookeeper在選舉leader的時候采用了paxos算法(主要是fast paxos)，這里主要介紹其中兩種：LeaderElection 和FastLeaderElection.

我們先要清楚以下幾點

一個Server是如何知道其它的Server

在zookeeper中,一個zookeeper集群有多少個Server是固定，每個Server用于選舉的IP和PORT都在配置文件中

除了IP和PORT能標識一個Server外，還有沒有別的方法

每一個Server都有一個數字編號，而且是唯一的，我們根據配置文件中的配置來對每一個Server進行編號，這一步在部署時需要人工去做，需要在存儲數據文件的目錄中創建一個文件叫myid的文件，并寫入自己的編號,這個編號在處理我提交的value相同很有用

成為Leader的必要條件

獲得n/2 + 1個Server同意(這里意思是n/2 + 1個Server要同意擁有zxid是所有Server最大的哪個Server)

zookeeper中選舉采用UDP還是TCP

zookeeper中選舉主要是采用UDP，也一種實現是采用TCP，在這里介紹的兩種實現采用的是UDP

zookeeper中有哪幾種狀態

LOOKING 初始化狀態

LEADING 領導者狀態

FOLLOWING 跟隨者狀態

如果所有zxid都相同(例如: 剛初始化時),此時有可能不能形成n/2+1個Server，怎么辦

zookeeper中每一個Server都有一個ID,這個ID是不重復的，而且按大小排序，如果遇到這樣的情況時，zookeeper就推薦ID最大的哪個Server作為Leader

zookeeper中Leader怎么知道Fllower還存活，Fllower怎么知道Leader還存活

Leader定時向Fllower發ping消息，Fllower定時向Leader發ping消息，當發現Leader無法ping通時，就改變自己的狀態(LOOKING)，發起新的一輪選舉

名詞解釋

zookeeer Server： zookeeper中一個Server,以下簡稱Server

zxid(zookeeper transtion id)： zookeeper 事務id，他是選舉過程中能否成為leader的關鍵因素，它決定當前Server要將自己這一票投給誰(也就是我在選舉過程中的value,這只是其中一個,還有id)

myid/id(zookeeper server id)： zookeeper server id ，他也是能否成為leader的一個因素

epoch/logicalclock：他主要用于描述leader是否已經改變,每一個Server中啟動都會有一個epoch,初始值為0,當開始新的一次選舉時epoch加1,選舉完成時 epoch加1。

tag/sequencer：消息編號

xid：隨機生成的一個數字，跟epoch功能相同

Fast Paxos消息流向圖與Basic Paxos的對比

消息流向圖

basic paxos 消息流向圖

Client   Proposer      Acceptor     Learner
|         |          |  |  |       |  |
X-------->|          |  |  |       |  |  Request
|         X--------->|->|->|       |  |  Prepare(N)//向所有Server提議
|         |<---------X--X--X       |  |  Promise(N,{Va,Vb,Vc})//向提議人回復是否接受提議(如果不接受回到上一步)
|         X--------->|->|->|       |  |  Accept!(N,Vn)//向所有人發送接受提議消息
|         |<---------X--X--X------>|->|  Accepted(N,Vn)//向提議人回復自己已經接受提議)
|<---------------------------------X--X  Response
|         |          |  |  |       |  |

fast paxos消息流向圖

沒有沖突的選舉過程

Client    Leader         Acceptor      Learner
|         |          |  |  |  |       |  |
|         X--------->|->|->|->|       |  |  Any(N,I,Recovery)
|         |          |  |  |  |       |  |
X------------------->|->|->|->|       |  |  Accept!(N,I,W)//向所有Server提議，所有Server收到消息后，接受提議
|         |<---------X--X--X--X------>|->|  Accepted(N,I,W)//向提議人發送接受提議的消息
|<------------------------------------X--X  Response(W)
|         |          |  |  |  |       |  |

第一種實現: LeaderElection

LeaderElection是Fast paxos最簡單的一種實現，每個Server啟動以后都詢問其它的Server它要投票給誰，收到所有Server回復以后，就計算出zxid最大的哪個Server，并將這個Server相關信息設置成下一次要投票的Server

每個Server都有一個response線程和選舉線程,我們先看一下每個線程是做一些什么事情

response線程

它主要功能是被動的接受對方法的請求，并根據當前自己的狀態作出相應的回復，每次回復都有自己的Id，以及xid，我們根據他的狀態來看一看他都回復了哪些內容

LOOKING狀態：

自己要推薦的Server相關信息(id,zxid)

LEADING狀態

myid,上一次推薦的Server的id

FLLOWING狀態:

當前Leader的id，以及上一次處理的事務ID(zxid)

選舉線程

選舉線程由當前Server發起選舉的線程擔任，他主要的功能對投票結果進行統計，并選出推薦的Server。選舉線程首先向所有Server發起一次詢問(包括自己)，被詢問方，根據自己當前的狀態作相應的回復，選舉線程收到回復后，驗證是否是自己發起的詢問(驗證 xid是否一致)，然后獲取對方的id(myid)，并存儲到當前詢問對象列表中，最后獲取對方提議的leader相關信息(id,zxid)，并將這些信息存儲到當次選舉的投票記錄表中，當向所有Server都詢問完以后，對統計結果進行篩選并進行統計，計算出當次詢問后獲勝的是哪一個 Server，并將當前zxid最大的Server設置為當前Server要推薦的Server(有可能是自己，也有可以是其它的Server，根據投票結果而定，但是每一個Server在第一次投票時都會投自己)，如果此時獲勝的Server獲得n/2 + 1的Server票數，設置當前推薦的leader為獲勝的Server，將根據獲勝的Server相關信息設置自己的狀態。每一個Server都重復以上流程，直到選出 leader

了解每個線程的功能以后，我們來看一看選舉過程

選舉過程中，Server的加入

當一個Server啟動時它都會發起一次選舉，此時由選舉線程發起相關流程，那么每個Server都會獲得當前zxid最大的哪個Server是誰，如果當次最大的Server沒有獲得n/2+1個票數，那么下一次投票時，他將向zxid最大的Server投票，重復以上流程，最后一定能選舉出一個Leader

選舉過程中，Server的退出

只要保證n/2+1個Server存活就沒有任何問題，如果少于n/2+1個Server存活就沒辦法選出Leader

選舉過程中，Leader死亡

當選舉出Leader以后，此時每個Server應該是什么狀態(FLLOWING)都已經確定，此時由于Leader已經死亡我們就不管它，其它的Fllower按正常的流程繼續下去，當完成這個流程以后，所有的Fllower都會向Leader發送Ping消息，如果無法ping通，就改變自己的狀態為(FLLOWING ==> LOOKING)，發起新的一輪選舉

選舉完成以后，Leader死亡

這個過程的處理跟選舉過程中Leader死亡處理方式一樣，這里就不再描述

第二種實現: FastLeaderElection

fastLeaderElection是標準的fast paxos的實現，它首先向所有Server提議自己要成為leader，當其它Server收到提議以后，解決epoch和zxid的沖突，并接受對方的提議，然后向對方發送接受提議完成的消息

數據結構

本地消息結構：

static public class Notification {
long leader; //所推薦的Server id

long zxid; //所推薦的Server的zxid(zookeeper transtion id)

long epoch; //描述leader是否變化(每一個Server啟動時都有一個logicalclock，初始值為0)

QuorumPeer.ServerState state; //發送者當前的狀態
InetSocketAddress addr; //發送者的ip地址
}

網絡消息結構：

static public class ToSend {

int type;        //消息類型
long leader; //Server id
long zxid;     //Server的zxid
long epoch; //Server的epoch
QuorumPeer.ServerState state; //Server的state
long tag;      //消息編號

InetSocketAddress addr;

}

Server具體的實現

每個Server都一個接收線程池(3個線程)和一個發送線程池 (3個線程),在沒有發起選舉時，這兩個線程池處于阻塞狀態，直到有消息到來時才解除阻塞并處理消息，同時每個Server都有一個選舉線程(可以發起選舉的線程擔任)；我們先看一下每個線程所做的事情，如下：

被動接收消息端(接收線程池)的處理:

notification：首先檢測當前Server上所被推薦的zxid,epoch是否合法(currentServer.epoch <= currentMsg.epoch && (currentMsg.zxid > currentServer.zxid || (currentMsg.zxid == currentServer.zxid && currentMsg.id > currentServer.id))) 如果不合法就用消息中的zxid,epoch,id更新當前Server所被推薦的值，此時將收到的消息轉換成Notification消息放入接收隊列中，將向對方發送ack消息

ack: 將消息編號放入ack隊列中，檢測對方的狀態是否是LOOKING狀態，如果不是說明此時已經有Leader已經被選出來，將接收到的消息轉發成Notification消息放入接收對隊列

主動發送消息端(發送線程池)的處理:

notification: 將要發送的消息由Notification消息轉換成ToSend消息，然后發送對方，并等待對方的回復,如果在等待結束沒有收到對方法回復，重做三次,如果重做次還是沒有收到對方的回復時檢測當前的選舉(epoch)是否已經改變，如果沒有改變，將消息再次放入發送隊列中，一直重復直到有Leader選出或者收到對方回復為止

ack: 主要將自己相關信息發送給對方

主動發起選舉端(選舉線程)的處理:

首先自己的epoch 加1，然后生成notification消息,并將消息放入發送隊列中，系統中配置有幾個Server就生成幾條消息，保證每個Server都能收到此消息,如果當前Server的狀態是LOOKING就一直循環檢查接收隊列是否有消息，如果有消息，根據消息中對方的狀態進行相應的處理。

LOOKING狀態:

首先檢測消息中epoch是否合法，是否比當前Server的大,如果比較當前Server的epoch大時，更新epoch，檢測是消息中的zxid,id是否比當前推薦的Server大，如果是更新相關值，并新生成notification消息放入發關隊列，清空投票統計表；如果消息小的epoch則什么也不做；如果相同檢測消息中zxid,id是否合法,如果消息中的zxid，id大，那么更新當前Server相關信息，并新生成notification消息放入發送隊列，將收到的消息的IP和投票結果放入統計表中，并計算統計結果，根據結果設置自己相應的狀態

LEADING狀態:

將收到的消息的IP和投票結果放入統計表中(這里的統計表是獨立的)，并計算統計結果，根據結果設置自己相應的狀態

FOLLOWING狀態:

將收到的消息的IP和投票結果放入統計表中(這里的統計表是獨立的)，并計算統計結果，根據結果設置自己相應的狀態

了解每個線程的功能以后，我們來看一看選舉過程,選舉過程跟第一程一樣

選舉過程中，Server的加入

當一個Server啟動時它都會發起一次選舉，此時由選舉線程發起相關流程，通過將自己的zxid和epoch告訴其它Server，最后每個Server都會得zxid值最大的哪個Server的相關信息，并且在下一次投票時就投zxid值最大的哪個Server，重復以上流程，最后一定能選舉出一個Leader

選舉過程中，Server的退出

只要保證n/2+1個Server存活就沒有任何問題，如果少于n/2+1個Server存活就沒辦法選出Leader

選舉過程中，Leader死亡

當選舉出Leader以后，此時每個Server應該是什么狀態 (FLLOWING)都已經確定，此時由于Leader已經死亡我們就不管它，其它的Fllower按正常的流程繼續下去，當完成這個流程以后，所有的 Fllower都會向Leader發送Ping消息，如果無法ping通，就改變自己的狀態為(FLLOWING ==> LOOKING)，發起新的一輪選舉

選舉完成以后，Leader死亡

這個過程的處理跟選舉過程中Leader死亡處理方式一樣，這里就不再描述

posted @ 2011-05-05 13:16 ivaneeo 閱讀(1268) | 評論 (1) | 編輯收藏

Zookeeper研究和應用

摘要: zookeeper簡介 zookeeper是一個開源分布式的服務,它提供了分布式協作,分布式同步,配置管理等功能. 其實現的功能與google的chubby基本一致.zookeeper的官方網站已經寫了一篇非常經典的概述性文章,請大家參閱:ZooKeeper: A Distributed Coordination Service for Distributed Applications 在此我... 閱讀全文

posted @ 2011-05-05 13:15 ivaneeo 閱讀(1656) | 評論 (0) | 編輯收藏

僅列出標題

ivaneeo's blog

常用鏈接

留言簿(34)

我參與的團隊

隨筆分類

隨筆檔案

搜索

最新評論

閱讀排行榜

評論排行榜

Create a configuration instance

Multithreading

Template caching

修改配置

集群環境配置

增加其他機器的配置

同步文件目錄

建立每一個服務器的id

啟動服務器

防火墻配置

我們先要清楚以下幾點

名詞解釋

Fast Paxos消息流向圖與Basic Paxos的對比

消息流向圖

第一種實現: LeaderElection

response線程

選舉線程

第二種實現: FastLeaderElection

數據結構

Server具體的實現