-- 關注搜索引擎的開發(fā)

日歷

2006年5月

日

一

二

三

四

五

六

統(tǒng)計

隨筆 - 82
文章 - 2
評論 - 228
引用 - 0

隨筆分類(45)

隨筆檔案(82)

文章檔案(2)

2006年4月 (2)

Java Spaces

Alanb(Sun) (rss)
FreeRoller (rss)
JavaBlogs
JavaWorld (rss)

搜索

積分與排名

積分 - 65500
排名 - 816

閱讀排行榜

評論排行榜

2006年5月20日

微軟的新搜索引擎

微軟從未放棄搜索引擎的競爭，一直和Google暗暗較勁。盡管live search在內(nèi)部員工里像是一個joke，但老大一直毫不猶豫地往里砸錢。

說實話，我盡量使用微軟的產(chǎn)品，操作系統(tǒng)放棄了linux，開發(fā)工具放棄了perl和java，當然這些是工作使然。但map我以前用 MapQuest，現(xiàn)在改用live map，瀏覽器也棄Firefox改用IE8，但凡能用的，我都會改用微軟的產(chǎn)品，不過對于搜索引擎，感覺實在太爛了，搜出來的東西總不是自己想要的，往后翻了10來頁也不見有用的。后來就偷偷把Google設為默認引擎。見到一個同事比我更過分，連outlook的搜索都改用Google Desktop來搜索。

后來，3月初的時候，內(nèi)部就發(fā)布了一個新的搜索引擎，叫Kumo(酷摸？)。據(jù)說是因為live這個名字不好，不信把它反過來念念看看是什么？我覺得只是一個名字的更換沒有什么意義。后來還是忍不住上去試了試，發(fā)現(xiàn)確實比原來的那個好一些。沒事的時候也會用Kumo 摸一把。

今天，鮑老大又宣布發(fā)布一個新的搜索引擎，叫Bing。感覺怎樣？我怎么讀的像有病的‘病’？還不叫Search Engine,改叫Decision Engine，夠新潮的概念。我不太清楚為什么取這樣一個名字（據(jù)鮑老大說，是因為它短小好記），不過從一個日文名字變成一個中文名字，我感覺這是陸奇上臺登上Search老大交椅之后的一個成功。記得前兩天Search主頁的封面就開始用上內(nèi)部某員工拍的中國陽朔的風景照片。不管猜測對不對，新的搜索引擎還是要試一試，結(jié)果有好事之徒一上來就搜了個“六四”，結(jié)果出來的全是大學四六級考試，讓人有些瀑布寒。還沒有公開release，公關就已經(jīng)做得這么好了。

讓人更囧的是，為慶祝新的release，search組的人每人發(fā)了一件T-shirt。據(jù)說前面是"I Bing"，后面是“U Bing”。聽起來像“我有病，你也有病”。不過Search組的人并以為然，因為他們?yōu)?#8220;Bing”取了一個中文名字叫“必應”。比“谷歌”好一點么？

其他組的好事之徒可沒那么友好，測試了一段時間之后，把這個“bing”的搜索引擎親切地叫做Mr. Bean。

當然，面對新鮮事物，我們還應該抱著積極的態(tài)度。我想因為在測試階段，我更愿意相信這是因為沒有足夠的用戶行為數(shù)據(jù)導致的短暫的發(fā)育不良。這個“必應”在下周可能就會正式發(fā)布了。讓我們試目以待。

posted @ 2009-05-29 13:20 Dedian 閱讀(3634) | 評論 (14) | 編輯收藏

我們需要什么樣的應用程序？

我先前有說過，“很多的軟件做成web-based是web3.0的一個趨勢”。從技術角度上說，這些web-based的應用程序和以前裝在本地硬盤的軟件有些不一樣，確切地可以理解那些具有服務功能的網(wǎng)站或者應用程序為能夠瀏覽器所容納的對象，而瀏覽器只是一個可以支持多種對象的容器，可對象的后臺的服務應用程序正是 deploy在各種web服務器上的軟件。

而那些所謂的腳本語言只是容器與各種對象的通訊語言。

一直以來，容器和后臺服務應用程序一直在改進。但更多的是一個又一個鮮活的對象通過瀏覽器展現(xiàn)在我們眼前，默默地改變我們的生活。

其實，說很多的軟件做成web-based就是變成一個個可以為瀏覽器所接納的對象模型只概括了其中的一部分。它只是說到軟件的表現(xiàn)形式。這很容易讓大家忽略數(shù)據(jù)的存儲形式，而默認這樣的web-based的服務讓我們更多的是享受網(wǎng)絡上的數(shù)據(jù)或者搜索引擎上的數(shù)據(jù)。我們不用經(jīng)常下載軟件占據(jù)自己的硬盤，有了網(wǎng)絡電視，我們也不用下載電影，甚至也無需下載音樂。我們自己的數(shù)據(jù)比如email，blog,訂閱的雜志，收藏的信息也都存放在各個網(wǎng)站的服務器上，而無需下載下來。

我們似乎已經(jīng)習慣了在線的狀態(tài)。淡忘了脫機的那個年代。而一向標新立異的Google似乎又找到回歸的需求，那就是最近推出的的Google Gears。它提供人們一個瀏覽器的插件，通過這個插件我們下載數(shù)據(jù)到本地硬盤，并且提供一個小型數(shù)據(jù)庫引擎(SQLite)在本地硬盤幫助存儲，建立索引和搜索數(shù)據(jù)。另外提供接口實現(xiàn)后臺的數(shù)據(jù)同步而無需占用瀏覽器資源。

目前Google Gears的API應用在Google Reader上，即用戶可以下載訂閱的電子雜志到本地硬盤，方便整理和收藏。

一句話，軟件有放在網(wǎng)上的趨勢，人們也同樣關注個人數(shù)據(jù)的搜集和存放。舉個例子，我一直用Del.icio.us來收藏一些技術網(wǎng)站或者文章，可有一天我查閱技術文章的時候，點擊鏈接過去，卻是物是人非頁已去。這時我就想當時文章要是可以自動下載到自己硬盤并整理好那該多好。當然，手工的Copy+Paste就算了，我希望的是像Del.icio.us的一鍵操作。

posted @ 2007-05-31 14:27 Dedian 閱讀(1924) | 評論 (1) | 編輯收藏

what comparison function is in linux sorting ?

Got a question, when I apply sort command line in linux to sort some domain names by dictionary order, no matter which option i used, it will sort some domains like this:

...
abca.com
abc-d.com
abce.com
...

I am curious what comparison function it applys in its' sorting function. I supposed it should be a string comparison, like strcmp function, but it is not. coz strcmp will compare ascii code of characters in string one by one, thus above sorting should like this:

abc-d.com
abca.com
abce.com

one guess is that when sorting names the special characters like "." "-" will be skipped. but still got some problem when sorting following names:

abc---d.com
abc--d.com
abc-d.com

why can linux sorting keep this order? if it skips some special characters, above names should be compared equally and maybe sorted as a random order.

confused, anybody has thought about that?

-----
p.s.

Haven't got updated here for quite a long time, coz I am back to program with c under linux and I believe it is a place for Java programmers.

-----

update:

Linux sorting compares unicode of strings … more about unicode is here

posted @ 2007-02-02 07:10 Dedian 閱讀(1417) | 評論 (1) | 編輯收藏

創(chuàng)建自己的搜索引擎

隨著網(wǎng)絡上信息量的日益增加，人們的學習和工作越來越離不開網(wǎng)絡搜索引擎(有些生活中的小例子在《Google 今天8歲》文中有提到)。

但是，另外一方面，我們會對搜索出來的成千上萬的結(jié)果束手無措，使得我們基本上對第一頁的搜索結(jié)果保持興趣，從而引發(fā)各種為爭取出現(xiàn)在搜索引擎的第一頁的各種技術(如SEO)或手段(Spamdexing)出現(xiàn)，惡劣的則大打出手，甚至搜索引擎公司出現(xiàn)各種幕后黑手。

對于用戶來說，則需要一點智商，來迅速地達到自己的搜索目的。

對于搜索引擎的老大Google顯然注意到這一事實以及這一事實帶來的客戶需求：即搜索引擎應該滿足客戶自定義化(Customizable).

最近，Google推出的產(chǎn)品 custom search service 則適應了這一需要。

idea很簡單，就是用戶可以自己根據(jù)自己的興趣所在設置一些自己經(jīng)常去的或者感興趣的又信息量比較大的一些網(wǎng)站。這樣就可以制定Google的搜索引擎就搜索這幾個網(wǎng)站，或者以這幾個網(wǎng)站的為主。

例外，這個簡單idea的產(chǎn)品還具備web2.0的色彩。也就是可以幾個興趣相投的人一起編輯網(wǎng)站列表，從而類似一個搜索圈(搜索社區(qū))搜索出大家共同感興趣的東西。

有興趣的大家可以自己玩玩。我初步自定義了一個與Blog有關的搜索引擎。

點擊這里。或者連接：
http://www.google.com/coop/cse?cx=006688650489436466578%3Ac7-4rxi0jf4

或者點擊這個簡單的域名地址：

http://blogdigger.info

大家有興趣可以一起玩，只要你們有gmail的賬號。

加入的方法很簡單，就是點擊主頁上的鏈接：

Volunteer to contribute to this search engine.

當然，你需要一個Google 的賬號（沒有也沒有關系，只需要用你們的email注冊一個就可以了，很簡單）

這樣，你就可以成為這個搜索引擎的一員了，平時，你覺得那個網(wǎng)站很好，里面的信息量也比較大，你可以把這個網(wǎng)站添加到Blog Digger的網(wǎng)站列表中。也可以為你感興趣的一些搜索添加搜索條目。

如果慢慢的覺得這個自定義的Google好玩，就記住這個鏈接吧：http://blogdigger.info

posted @ 2006-10-27 06:04 Dedian 閱讀(2392) | 評論 (3) | 編輯收藏

Again, Problem or Bug for URLConnection ?

Not sure if it is a bug of (Http)URLConnection, but it hang sometimes for some URLs while calling any functions to get information from connection (includes getResponseCode, getInputStream, getContent, getContentLength, getHeaderField blabla..) after connection has been built (even I have set the read timeout and connect time out).

the functions openConnection() and connect() are ok, curious about that problem.

anybody has the same problem or similar problem with URLConnection?

posted @ 2006-10-21 07:20 Dedian 閱讀(1313) | 評論 (0) | 編輯收藏

Ajax 淺談

---祝大家中秋愉快---

Ajax (Asynchronous JavaScript and XML)是近年來流行的一門web 技術。在Blogjava上看到有人開始在介紹AJAX，但仿佛流于概念或理論的東西，對于想用Ajax的初學者似乎不是很make sense。我想，學習任何一樣新的技術，例子和步驟是極為make sense的兩樣東西。

筆者想結(jié)合過去的學習經(jīng)驗簡單講講使用Ajax的基本步驟和舉幾個實用例子。由于筆者主要在于后臺端的開發(fā)，所以很多腳本并不是很擅長。Ajax也主要限于以前大學的修課和近期的一些為后臺端程序的測試的簡單實現(xiàn)。所以只是一個拋磚引玉的使用Ajax版本，歡迎相互學習交流。

0. 導讀

??? 1。使用Ajax的基本流程
??? 2。使用Ajax的基本步驟。(簡單例子--> Demo)
??? 3。再來一個例子(Google Suggest)。(Demo)
??? 4。家庭作業(yè) :)

1。使用Ajax的基本流程

在筆者看來，Ajax更像是一個簡單的網(wǎng)絡框架，它描述著如何高效地使網(wǎng)絡前端的數(shù)據(jù)展現(xiàn)和網(wǎng)絡后端的數(shù)據(jù)之間的交互?；旧希褪菫g覽器提供一個XMLHttpRequest(當然在IE里是ActiveXObject)的對象向后臺端的腳本程序或者Servlet Classes發(fā)送http請求，從后臺端的回應中獲取文本數(shù)據(jù)(如xml格式和最近有人討論的Json格式)并嵌入前臺段的網(wǎng)頁中或腳本中。

下圖是一個簡單的流程圖：

2。使用Ajax的基本步驟。

下面，我們結(jié)合上面的流程，以及一個簡單的例子(見這篇文章)過一遍基本的步驟。(藍色代碼為標準寫法)

第一步：Form 代碼：接受前臺端的輸入，并通過Action方法(方法函數(shù)里包含創(chuàng)建XMLHttpRequest對象)把request post到后臺端。

<input id="username" name="username" type="text"
? onblur="checkName(this.value,'')" />
<span class="hidden" id="nameCheckFailed">
? This name is in use, please try another.
</span>

<script language="javascript">
function checkName(input, response)
{
? if (response != ''){
??? // Response mode
??? message?? = document.getElementById('nameCheckFailed');
??? if (response == '1'){
????? message.className = 'error';
??? }else{
????? message.className = 'hidden';
??? }
? }else{
??? // Input mode
??? url? = 'http://localhost/xml/checkUserName.php?q=' + input;
??? loadXMLDoc(url);
? }
}

var req;

function loadXMLDoc(url)
{
??? // branch for native XMLHttpRequest object
??? if (window.XMLHttpRequest) {
??????? req = new XMLHttpRequest();
??????? req.onreadystatechange = processReqChange;
??????? req.open("GET", url, true);
??????? req.send(null);
??? // branch for IE/Windows ActiveX version
??? } else if (window.ActiveXObject) {
??????? req = new ActiveXObject("Microsoft.XMLHTTP");
??????? if (req) {
??????????? req.onreadystatechange = processReqChange;
??????????? req.open("GET", url, true);
??????????? req.send();
??????? }
??? }
}
</script>

注：
1。這里的form只是一個input box,action的方法是onblur,就是響應失去焦點的事件，然后調(diào)用一個函數(shù)checkName, 這個函數(shù)里通過XMLHttpRequest向PHP server script 發(fā)送Post請求(看得出來，這里的php server script的文件名叫checkUserName.php，唯一參數(shù)是q)。
2。函數(shù)loadXMLDoc里有個通用的創(chuàng)建XMLHttpRequest對象的代碼，標準代碼整理如下：
??????? var req;
??? ??? function foo()
??? ??? {
??? ??? ??? req = false;

??? ??? ??? // branch for native XMLHttpRequest object
??? ??? ??? if(window.XMLHttpRequest)
??? ??? ??? {
??? ??? ??? ??? try
??? ??? ??? ??? {
??? ??? ??? ??? ??? req = new XMLHttpRequest();
??? ??? ??? ??? }
??? ??? ??? ??? catch(e)
??? ??? ??? ??? {
??? ??? ??? ??? ??? req = false;
??? ??? ??? ??? }
??? ??? ??? }
??? ??? ??? else if(window.ActiveXObject) // branch for IE/Windows ActiveX version
??? ??? ??? {
??? ??? ??? ??? try
??? ??? ??? ??? {
??? ??? ??? ??? ??? req = new ActiveXObject("Msxml2.XMLHTTP");
??? ??? ??? ??? }
??? ??? ??? ??? catch(e)
??? ??? ??? ??? {
??? ??? ??? ??? ??? try
??? ??? ??? ??? ??? {
??? ??? ??? ??? ??? ??? req = new ActiveXObject("Microsoft.XMLHTTP");
??? ??? ??? ??? ??? }
??? ??? ??? ??? ??? catch(e)
??? ??? ??? ??? ??? {
??? ??? ??? ??? ??? ??? req = false;
??? ??? ??? ??? ??? }
??? ??? ??? ??? }
??? ??? ??? }
??? ??? ??? if(req)
??? ??? ??? {
??? ?? ?? ?? ?? ??//do something here
???? ??? ??? }
??? ??? ???

??? ??? }

第二步：響應文本處理代碼：XMLHttpRequest對象里有個類似消息響應函數(shù)的屬性，即通過設置 req.onreadystatechange 來告訴XMLHttpRequest在哪個函數(shù)里處理服務端返回的文本信息。
如在上面的例子中：

req.onreadystatechange = processReqChange;

那么我們接著要有一個processReqChange的函數(shù)：

function processReqChange() 
{
    // only if req shows "complete"
    if (req.readyState == 4) {
        // only if "OK"
        if (req.status == 200) 
	{
            // ...processing statements go here...
	    processResponse();
        } else {
            alert("There was a problem retrieving 
               the XML data:\n" + req.statusText);
        }
    }
}

function processResponse()
{
    response  = req.responseXML.documentElement;
    method    = response.getElementsByTagName('method')[0].firstChild.data;
    result    = response.getElementsByTagName('result')[0].firstChild.data;
    eval(method + '(\'\', result)');

}

注：
1。基本上processReqChange 函數(shù)是標準代碼的寫法。
2。這里要用到前面定義的全局變量(XMLHttpRequest對象)req

第三步：后臺端代碼(這個例子是php server script)：接受前臺端的請求，處理其參數(shù)，并返回相應的結(jié)果。

文件名: checkUserName.php

<?php
header('Content-Type: text/xml');

function nameInUse($q)
{?
? if (isset($q)){
??? switch(strtolower($q))
??? {
????? case? 'drew' :
????????? return '1';
????????? break;
????? case? 'fred' :
????????? return '1';
????????? break;
????? default:
????????? return '0';
??? }
? }else{
??? return '0';
? }
?
}
?>
<?php echo '<?xml version="1.0" encoding="UTF-8"? standalone="yes"?>'; ?>
<response>
? <method>checkName</method>
? <result><?php
??? echo nameInUse($_GET['q']) ?>
? </result>
</response>
注：代碼很簡單，就不用解釋了。這里返回的是xml格式的字符串。

總體效果見這里
輸入"fred"或者"drew"的名字，失去焦點后會顯示名字已存在的信息。

?3。再來一個例子。

這里再講一個實用的例子，這是以前上課的一個課堂作業(yè)，也很有代表性。是關于Google Suggest(好像新的Google Toolbar上就用的這個功能)的應用問題。這里是寫好的DEMO。現(xiàn)在越來越多的網(wǎng)站提供類似Web Service的API, 我們利用他們提供的API URL可以返回一些我們用的著的數(shù)據(jù)，放在我們的網(wǎng)頁上。這里就用的上Ajax。只不過有些返回來的文本數(shù)據(jù)是xml格式的，就可以利用上面的簡單例子來處理，但很多像Google Suggest那樣是返回一段類似代碼格式的文本。我們就要利用Javascript的eval函數(shù)，把這些文本當作一段代碼在嵌入自己的網(wǎng)頁中。如果嵌入的代碼中含有函數(shù)，則需要自己再寫一個同名的函數(shù)作為實現(xiàn)。(這就是流程圖中的optional的func 3)

這里完整代碼就不貼了，貼一些關鍵代碼(原本后臺端是用Java Servlet寫的，但做demo的空間沒有Tomcat不支持Servlet,所以改用Php實現(xiàn)，大家可以自己用Java再寫一邊作為家庭作業(yè) :) )：

1) form 代碼：

<form name = "QForm" method="POST" action="google_suggest.php">
??? <table bgcolor="8080C0" width="90%" >
??? <tr>
??? ??? <td? nowrap>Search Term:</td>
??? ??? <td ><input type="text" name="qtext"? onkeyup="return GetSuggestion()" size="60"></td>
??? </tr>
??? <tr>
??? ??? <th colspan="2" align="left" bgcolor="#A8A8FF"><DIV id=google_suggest_target>results go here . . . </DIV></th>
??? </tr>
??? </table>
??? </form>

注：
a. 看得出來，要把查詢的字符串post到google_suggest.php上
b. action的函數(shù)是GetSuggestion()，其返回的字符串會顯示在預留的網(wǎng)頁空間里。

2) 后臺端代碼(PHP)：這里主要接收前臺的請求，并不請求轉(zhuǎn)化為向Google Suggest的API URL請求，把接收到的文本信息返回給前端。代碼很簡單，如下：

文件名：google_suggest.php

<?php
function getGoogleSuggest($q)
{

??? $url = "http://www.google.com/complete/search?hl=en&js=true&qu=" . $q;
??? return file_get_contents($url);
}
?>

<?php echo getGoogleSuggest($_POST['q']) ?>

注：
a。 Google Suggest API 返回的是一個代碼格式的文本信息，如下：
sendRPCDone(frameElement, "", new Array(), new Array(), new Array(""));
所以我們再前臺接受到這個文本信息之后，應該寫一個sendRPCDone的函數(shù)來做進一步信息處理(比如說列表出查詢結(jié)果)。

3) 前臺文本處理代碼：

??? <script type="text/javascript">
??? ??? var req;
??? ??? function GetSuggestion()
??? ??? {
??? ??? ??? req = false;
??? ??? ??? var f = document.QForm;

??? ??? ??? // branch for native XMLHttpRequest object
??? ??? ??? if(window.XMLHttpRequest)
??? ??? ??? {
??? ??? ??? ??? try
??? ??? ??? ??? {
??? ??? ??? ??? ??? req = new XMLHttpRequest();
??? ??? ??? ??? }
??? ??? ??? ??? catch(e)
??? ??? ??? ??? {
??? ??? ??? ??? ??? req = false;
??? ??? ??? ??? }
??? ??? ??? }
??? ??? ??? else if(window.ActiveXObject) // branch for IE/Windows ActiveX version
??? ??? ??? {
??? ??? ??? ??? try
??? ??? ??? ??? {
??? ??? ??? ??? ??? req = new ActiveXObject("Msxml2.XMLHTTP");
??? ??? ??? ??? }
??? ??? ??? ??? catch(e)
??? ??? ??? ??? {
??? ??? ??? ??? ??? try
??? ??? ??? ??? ??? {
??? ??? ??? ??? ??? ??? req = new ActiveXObject("Microsoft.XMLHTTP");
??? ??? ??? ??? ??? }
??? ??? ??? ??? ??? catch(e)
??? ??? ??? ??? ??? {
??? ??? ??? ??? ??? ??? req = false;
??? ??? ??? ??? ??? }
??? ??? ??? ??? }
??? ??? ??? }
??? ??? ??? if(req)
??? ??? ??? {
??? ??? ??? ??? var url = "google_suggest.php";
???????
??? ??? ??? ??? req.onreadystatechange = processReqChange;
??? ??? ??? ??? req.open("POST", url, true);

??????? ??? ??? req.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
??? ??? ??? ??? req.setRequestHeader("Method", "POST " + url + " HTTP/1.1");
??? ??? ??? ??? req.send("q=" + escape(document.QForm.qtext.value));
??? ??? ??? }
??? ??? ???

??? ??? }
??? ???
??? ??? function processReqChange()
??? ??? {
??? ??? ??? if(req.readyState == 4) // only if req shows "loaded"
??? ??? ??? {
??? ??? ???????????????? if (req.status == 200) // only if "OK"
??? ??? ???????????????? {
??? ??? ???????????????? ??? ??? x = req.responseText;
??? ??? ???????????????????????? eval(x);
??? ??? ???????????????? }
??? ??? ???????????????? else
??? ??? ???????????????? {
??? ??? ?????????? ??? ??? ????? alert("There was a problem retrieving the XML data:\n" + req.statusText);
??? ??? ???????????????? }
??? ??? ??? }
??? ??? ??? else if(req.readyState == 2)
??? ??? ??? {
??? ??? ??? }
??? ??? }
??? ???
??? ??? function sendRPCDone(frameElement, qString, arr1, arr2, arr3)
??? ??? {
??? ???
??? ??? ??? var suggest_results = eval(arr1);
??? ??? ??? var counts = eval(arr2);
??? ??? ??? var htmlstr = "<TABLE cellspacing=4 border=0>";
??? ??? ??? for (var i=0; i < suggest_results.length; i++)
??? ??? ??? {
??? ??? ??? ??? htmlstr += "<tr><td><a href=\"javascript:self.location=\'http://www.google.com/search?hl=en&q=" + suggest_results[i] + "&btnG=Google+Search\'\">" + suggest_results[i] + "</a></td>";
??? ??? ??? ??? htmlstr += "<TD width=200><font color= 228b22>" + counts[i] + "</font></TD></TR>"
??? ??? ?????? ??? ?
??? ??? ??? }
??? ??? ??? htmlstr += "</TABLE>";
??? ??? ??? document.getElementById("google_suggest_target").innerHTML = htmlstr;
??? ???
??? ??? }
??? ???
??? ??? </script>

4。家庭作業(yè) :)

一定要自己寫一些代碼，才能鞏固知識:)
題目：
我們經(jīng)常用del.icio.us來收藏我們喜歡的網(wǎng)站或者文章，并加一些類似讀書筆記的注釋。那么我們怎么利用del.icio.us提供的API來訪問我們的讀書筆記信息，并顯示在自己的Blog里呢？
提示：
1。你要有一個del.icio.us的賬號，并且已經(jīng)有所網(wǎng)頁收藏作為實驗數(shù)據(jù):)
2。API URL 是 "http://del.icio.us/feeds/json/" + "你的賬號名"；自己參看一下，看返回什么樣的格式文本。另外，如果要限制返回的記錄數(shù)，可以加"?count=10"這樣的參數(shù)。

最后，祝大家中秋愉快！

---------------------------完----------------------------

posted @ 2006-10-07 07:05 Dedian 閱讀(2247) | 評論 (2) | 編輯收藏

PHP/Java Integration on Windows

reference: http://us3.php.net/java
help doc: http://php-java-bridge.sourceforge.net/


1- Make sure u have Installed Apache 2 & PHP 5 and Java J2EE 1.5

2- download pecl-5.0.5-Win32.zip and php-java-bridge_2.0.8.zip, which will include 

extra dll(s)

??  - unpack pecl pkg to your extensions folder, in PHP5 its ext.

??  - unpack java-Bridge to root php folder, in my case its simply C:\PHP

??  
Note: 
1. the java-Bridge inculdes new versions of certain files like php_java.dll

??  so, it would be wise to rename your old files that came with PECL pkg for example

??  file_old, to rollback at anytime.
2. Don't run batch file under php-java-bridge after unpacking to php root folder, just add following lines in php.ini configure file (depends on installation fold of j2ee):

extension=php_java.dll
extension_dir = "C:\php\ext" 
[java]
java.java_home=C:\Program Files\Java\jre1.5.0_06
java.java=C:\Program Files\Java\jre1.5.0_06\bin\javaw.exe
java.log_level=2
;java.log_file=ext/JavaBridge.log

posted @ 2006-10-06 09:05 Dedian 閱讀(1135) | 評論 (0) | 編輯收藏

install Apache2 & PHP5 on Windows XP

http://www.apachelounge.com/forum/viewtopic.php?t=570

http://www.webmasterstop.com/86.html

posted @ 2006-09-29 05:44 Dedian 閱讀(1026) | 評論 (0) | 編輯收藏

Google 今天8歲

估計大家已經(jīng)在Google的主頁上看到了新的logo。對，今天是google8歲的生日。

記不清什么時候第一次使用了Google,如今一個搜索引擎改變了人們的網(wǎng)絡生活，也帶來了互聯(lián)網(wǎng)的革命。如今人們大談網(wǎng)絡社區(qū)或社會化的同時，搜索引擎又開始一個新的臺階。

8 年的時間，Google從一個單一的搜索產(chǎn)品已經(jīng)衍生出各種改變或影響人們生活的產(chǎn)品，并不斷推動網(wǎng)絡概念和技術上的變革。比如我們經(jīng)常用的產(chǎn)品有 Google talk, Google Adsence, Google Gmail, Google Calendar, Google Map, Google Video, Google Store, Google Earth,Google toolbar, Google Desktop. 還有很多Google正在思考的產(chǎn)品。

總而言之，如果網(wǎng)絡成為你生活中的一部分，那么Google也越來越成為你生活的一部分。Google的文化連同它的產(chǎn)品也越來越成為很多其他網(wǎng)站公司效仿的對象。

那讓我們看看我們普通網(wǎng)民一般用Google來搜索什么？

1。如果你有個朋友多年未見，不妨用Google搜搜他的名字。
2。如果你提筆忘成語或古詩，不妨用Google搜搜你能想起來的殘缺部分。
3。如果你想找一張圖片，不妨也搜搜看。
4。如果你想做作業(yè)，寫文章或?qū)懏厴I(yè)論文，最好不過了?？梢运训胶芏喔信d趣的，相關的素材。
5。如果你不知道翻譯你的成績單，利用Google的翻譯功能吧。
6。如果你有不認識的單詞，句子，俚語或者一些文化背景的東西，用用Google,wiki的查詢結(jié)果通常在第一頁。
7。如果你聽到一首好歌，且不知道歌名，誰唱的，還想知道歌詞，那就用你聽到的幾句歌詞搜搜吧。
8。如果你接到一個莫名其妙的電話，搜一搜，說不定知道是哪家公司打過來的。
9。覺得一個人或者一個網(wǎng)站或者一邊文章很cool,不妨也搜一搜，會有很多有趣的東西出現(xiàn)。
10。大家都在談論著一件事，或者最近很流行的一個話題或術語，搜一搜，看看他們到底在說什么。
11。有一個似乎很著名的英文縮寫，搜一搜，看看到底全稱是什么。
12。電腦遇到問題了，怎么辦？先不要著急，先搜一搜，看看有沒有人和你一樣的問題，有沒有解決方案。
13。這家伙的網(wǎng)頁做的很cool,怎么弄得？搜一搜，保證長見識。
14。很想問問題，搜一搜你的問題，說不定有答案。

好了，估計還有很多，大家接著補充。。。

posted @ 2006-09-28 07:55 Dedian 閱讀(1047) | 評論 (1) | 編輯收藏

關于抓蝦

當你有一個很好的idea的時候，你或許會感到有一絲興奮。然而如果你發(fā)現(xiàn)你的idea以你一己之力卻無法實現(xiàn)，并且還找不到志同道合的同志，你的興奮就會很快地變?yōu)橛魫?。再過幾天，你會發(fā)現(xiàn)網(wǎng)上已經(jīng)有人做了一件幾乎同樣的事并且比你事先的idea還要做的好的時候，那種郁悶又會升級為失落。

其實很多普通的又有點智慧的IT人都要不同程度地承受這樣的一種失落。

抓蝦就是這樣一個曾經(jīng)讓我有幾許失落的感覺。失落得我有很長一段時間沒有注冊一個用戶。不過收拾收拾自己的心情，我還是很欣然的接受這樣一個優(yōu)秀的國產(chǎn)web 2.0網(wǎng)站。

其實抓蝦的idea很簡單。它是一個把web 2.0概念和目前風行的基于RSS信息標準聚合格式很好地結(jié)合在一起的新興國產(chǎn)訂閱網(wǎng)站。盡管國外很早就有像Bloglines這樣的在線RSS信息訂閱網(wǎng)站。但不如抓蝦把web 2.0的概念有機地結(jié)合在一起。前者只是一個簡單的訂閱系統(tǒng)和簡單的共享。

關于web 2.0這個從上次網(wǎng)絡泡沫的廢墟上站起來的概念，目前大都的網(wǎng)民都有親密接觸。2005開始在國內(nèi)流行至今的Blog和wiki其實就是web 2.0產(chǎn)物中的代表。

以前的網(wǎng)站更像一個信息發(fā)布的平臺。如果說網(wǎng)站是一個電影院的話，那我們這些網(wǎng)民充其量就是觀看電影的觀眾，即便我們可以注冊成為VIP而進入包廂看電影亦不過如此。你甚至可以把電影帶回家看，但你不能控制電影院播放電影的內(nèi)容，也不能隨隨便便發(fā)布你自己制作的電影。

然而，web 2.0的概念就是給網(wǎng)民提供一個享受各種web服務的平臺。

網(wǎng)民不再是觀眾，而可以是演員，導演，發(fā)行商，甚至二販子。從技術角度上講，web 2.0使用戶開始可以控制數(shù)據(jù)。從用戶角度講，web 2.0使Internet成為一個虛擬社區(qū)，大家可以相互交流和共享。(從這種意義上說，早期的BBS和P2P下載軟件都是web2.0)

關于RSS聚合，我一直認為它只是一個基于xml的數(shù)據(jù)結(jié)構(gòu)。在很早以前開始用.Net開發(fā)的時候，我就接受xml schema的一個思路，就是實現(xiàn)數(shù)據(jù)與其表現(xiàn)形式相分離。這也是我克服想嘲笑xml這樣一個如此簡單的網(wǎng)絡標準的沖動。不過那時，我就有用RSS作為 Internet上凌亂不堪的信息的一個標準結(jié)構(gòu)的想法，這樣搜索引擎就會變得簡單(也曾經(jīng)為此寫過一個類似資料收集器的小程序)。尤其在選了一門 Distributed Multimedia Information Management的課程后。里面大談網(wǎng)絡的Ontology和RDF技術。其實也就是用xml的數(shù)據(jù)結(jié)構(gòu)去描述網(wǎng)絡實體及其內(nèi)在聯(lián)系的一種技術。不過，rdf相對于簡單的rss來說，在應用上似乎超前一些。

有了web 2.0的概念，有了標準的數(shù)據(jù)結(jié)構(gòu)，再加上一些具體的網(wǎng)站實現(xiàn)技術（比如目前流行的Ruby）,你就可以自己搗鼓一個web 2.0的網(wǎng)站出來。抓蝦很顯然在這方面做的比較成功。一方面，國內(nèi)這方面成功的網(wǎng)站還比較少(經(jīng)常去的也就是抓蝦和豆瓣)，另一方面，目前rss(如 blog)正在國內(nèi)大肆流行的季節(jié)。

當然現(xiàn)在不少web 2.0沒戲的論調(diào)。其實這沒什么新鮮。網(wǎng)絡的東西就是這樣，每個人都有idea,都可以有技術做，但要存活做大，就這能是少數(shù)。web 2.0目前還是燒錢階段，因為提供的服務都是免費的(大家已經(jīng)習慣了網(wǎng)絡的免費午餐)，只能燒錢搶用戶，最后賣流量，再搞壟斷。如果沒錢，就只能做成像奇客發(fā)現(xiàn)(diglog.com)這樣子（這個網(wǎng)站的idea和著名的digg.com類似，但顯然還在incubation階段）。這一點，和web 1.0沒有什么區(qū)別。這也是為什么大都的IT人依然郁悶，生活在各大小不等的目前還存活的公司的庇護下做著自己各自的夢想。

posted @ 2006-09-26 08:51 Dedian 閱讀(1945) | 評論 (2) | 編輯收藏

Understand Java Map Collection

http://www.oracle.com/technology/pub/articles/maps1.html

posted @ 2006-09-23 02:52 Dedian 閱讀(1073) | 評論 (1) | 編輯收藏

HttpURLConnect Problem

When I try to get some information of http connection to some websites (say http://linuxbyte.net) by function HttpURLConnection.getResponseCode(), it seems tthat JVM hangs for quite a while. Somebody says that is maybe the problem of http server who must be a Microsoft webserver. Here and here are the bug report information for Java 1.3 or before. Though it is said that the problem has been solved after java 1.4, i still get undesirable a long time waiting before a SocketException (Connection reset) is thrown out. Btw, conn.setConnectTimeout or conn.setConnectTimeout is involved for this problem. I am not sure if there is any method that can save time to skip those bad links.

posted @ 2006-09-21 06:32 Dedian 閱讀(1139) | 評論 (0) | 編輯收藏

The Ruby Programming Language

Here is a good article to introduce Ruby ..why we choose Ruby instead of Perl and Python ?

posted @ 2006-09-19 05:51 Dedian 閱讀(949) | 評論 (0) | 編輯收藏

Reader and InputStream

-- Scenario:
??? The purpose of a reader is to interpret a low-level byte stream (ByteArrayInputStream, StringInputStream, FileInputStream and so on) as a character stream and provid character input to whatever class needs it. And it is very simple to convert an inputstream to a reader:

Reader reader = new InputStreamReader( in ); //in is an instance of class InputStream or derived classes

But the issue is sometimes we need convert a reader to inputstream, think about following scenaros:
1.? the original inputstream has been filtered by certian reader, now we need save back filtered content into database by inputstream: we can not use original inputstream but filtered stream which can only get from your reader.
2.? Given a class who contains a reader to access a streaming content after complex parsering or downloading, we want to utilize the streaming content in this class while not repeating complex operations for content analysis, so we need employ some wrapper methods to get inputstream from reader.

-- Solution:
1. write own InputStream implementation, such as following:

class MyInputStream extends InputStream
{
private Reader rd;
public  MyInputStream(Reader rd)
{
super();
     this.rd = rd;
}
?
?
// implement the read() method to make this all work
publicint read()
{
int t = rd.read();
// you can do your processing on the inputReader here
// fiddle with the values and return
return t;
?
}
}

Note: Applications that need to define a subclass of InputStream must always provide a method that returns the next byte of input.
(refer to http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html)

-- anything else? BTW, for parsering xml-based input stream by SAX, I am glad to see that the inputSource constructor can take either InputStream or Reader (refer to http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/InputSource.html)

posted @ 2006-08-29 09:46 Dedian 閱讀(1338) | 評論 (0) | 編輯收藏

About Hash function

for general purpose hash function:

http://www.partow.net/programming/hashfunctions/

for cryptography & hash function

http://www.x5.net/faqs/crypto/

for a faster and better hash function (comparison of several hash function):

http://burtleburtle.net/bob/hash/doobs.html

----> for further reading...

posted @ 2006-08-19 03:01 Dedian 閱讀(983) | 評論 (0) | 編輯收藏

Getting the IP Address and Hostname

1. Getting the IP Address of a Hostname

    try 
    {
        InetAddress addr = InetAddress.getByName("yahoo.com");
        byte[] ipAddr = addr.getAddress();

        // Convert to dot representation
        String ipAddrStr = "";
        for (int i=0; i<ipAddr.length; i++) {
            if (i > 0) {
                ipAddrStr += ".";
            }
            ipAddrStr += ipAddr[i]&0xFF;
        }
    } 
    catch (UnknownHostException e) {
    }

2. Getting the Hostname of an IP Address

This example attempts to retrieve the hostname for an IP address. Note that getHostName() may not succeed, in which case it simply returns the IP address.

try {
        // Get hostname by textual representation of IP address
        InetAddress addr = InetAddress.getByName("127.0.0.1");

        // Get hostname by a byte array containing the IP address
        byte[] ipAddr = new byte[]{127, 0, 0, 1};
        addr = InetAddress.getByAddress(ipAddr);

        // Get the host name
        String hostname = addr.getHostName();

        // Get canonical host name
        String hostnameCanonical = addr.getCanonicalHostName();
    } catch (UnknownHostException e) {
    }

3. Getting the IP Address and Hostname of the Local Machine

    try {
        InetAddress addr = InetAddress.getLocalHost();

        // Get IP Address
        byte[] ipAddr = addr.getAddress();

        // Get hostname
        String hostname = addr.getHostName();
    } catch (UnknownHostException e) {
    }

posted @ 2006-08-18 06:53 Dedian 閱讀(559) | 評論 (0) | 編輯收藏

How does Alexa work?

http://forums.seochat.com/alexa-ranking-49/how-does-alexa-work-140.html

posted @ 2006-08-16 07:24 Dedian 閱讀(310) | 評論 (1) | 編輯收藏

Robert Tappan Morris

In the last digest about Greatest software ever written, I noted a worm named Morris which is ranked 12 of greatest software by the author. Actually, after finishing my clustering searching enigne development which is based on Lucene, i am studying p2p architecture for my distributed searching engine (more precisely is webcrawler part). When I am reading some p2p loopup protocol papers such as Chord, I also noticed a guy named Morris who is one of the developers. Hmmm,? this is the same Morris, from wiki, I know that guys is now an associate professor in MIT, and was indicted because of the damage by his Morris worm. Anyway, I'd like to say that it is very interesting to know some stories about those geeks.

posted @ 2006-08-15 05:53 Dedian 閱讀(449) | 評論 (0) | 編輯收藏

What's The Greatest Software Ever Written?

http://www.informationweek.com/shared/printableArticle.jhtml?articleID=191901844

12. The Morris worm
11. Google search rank
10. Apollo guidance system
9. Excel spreadsheet
8. Macintosh OS
7. Sabre system
6. Mosaic browser
5. Java language
4. IBM System 360 OS
3. gene-sequencing software at the Institute for Genomic Research
2. IBM's System R
1. Unix System III

How r u thinking?

posted @ 2006-08-15 02:22 Dedian 閱讀(346) | 評論 (0) | 編輯收藏

Google, 開源的教父？

有興趣的朋友可以參見原文

下面是本人的一些大致的翻譯：
------------------------------------------------------------

大伙都知道，Google是運行在很多的Linux(GNU)系統(tǒng)的服務器上的，而這只是它支持免費軟件的一個方面。其他的比如，Summer of Code, 現(xiàn)在已成為一個生產(chǎn)很多優(yōu)秀代碼和項目的孵化基地，并且最近開放的Code Repository, 大有取代sourceforge.net(筆者注：廣大開源的據(jù)點)之趨勢。一方面，Google貢獻出它的Picasa(Linux(GNU)平臺)(筆者注：一個圖片管理軟件)，并被Wine(筆者注：Linux/Unix上的Windows,建于x-window之上)所使用；另一方面，Google也贊助一些開源項目，如Sri Lanka，大概有$25,000之多。
?
當然，Google也會秘密地進行一些開源的資助。比如，令我們大伙驚訝的Mozilla Foundation(筆者注：大家熟悉的另一瀏覽器Firefox)居然在去年有賺到72個million?-- 就是在Firefox上把Google的搜索引擎作為缺省的搜索引擎。

2005年的1月份，Google把Ben Goodger招為靡下。此人乃Firefox的首席工程師，并且是幾個主要開源編碼者之一。到了年末，Guido van Rossum, Python的始創(chuàng)人，也加入了Google。最近，Linux2.6核心的維護人，Andrew Morton也宣稱即將離開OSDL并投奔到Google.

所有的這些，都意味著開源領域的大變遷。

記得在最初的那些年代里，人們都為著自己的興趣愛好在業(yè)余時間里一邊工作一邊學習地奮力地寫著自己的代碼。突然，第一個.com的時代來臨，不少早期的開源公司開始聘請頂級程序員：如核心編碼員Alan Cox, David Miller，Stephen Tweedie等人紛紛來到Red Hat, 還有一些去了Linuxcare。

隨著第一個.com泡沫經(jīng)濟的破滅，高手們被迫紛紛尋找新的工作，不少人去了新興之秀OSDL。基于這樣的一個背景，Google的興起以及大攬人才意味著早期公司廣具人才的模式的回歸。當然，這次他們的工作都間接的有關于Google的主要市場策略。

Google的策略是精明的，看看最近招的人，Goodger和Morton,一個是瀏覽器，一個是操作系統(tǒng)。無不顯示出其與Microsoft暗暗較勁的決心。

當然還有另一方面的原因，可能不是那么明顯，那就是最近的一些爭論，關于Google能否履行其最初對開源領域許下的諾言。矛頭指向Google是否應該公開它的源碼？因為Google用了不少開源的東西。

所以，從某種角度上講，招一些開源黑客人士入帳遠遠比把代碼隨處發(fā)布好的多。

那些關于用了開源的代碼的公司是不是也應該開放他們的代碼的爭論不僅僅涉及到Google。其他的一些主要得益者如Yahoo, 其最近正活躍于收購一些Web 2.0的公司如Flickr 和Del.icio.us，這些都很顯然有著開源的印記，當然它沒有Google那樣與開源的關系那么源遠流長，不過Yahoo也開始著手吸引開源人才。

posted @ 2006-08-11 06:39 Dedian 閱讀(913) | 評論 (0) | 編輯收藏

Web Standards or web trends?

People are still talking about web 2.0, I am not sure that is pure technical term. In my understanding, maybe most of meaning of web 2.0 is its marketing meaning. that is, web is becoming commonality and people generate the web's content. Again, i am not sure?what is the place of web service in web 2.0, in my understanding, the web is not merely client-server marketing model (I am not talking web structure here), but an?interactive community. But question is , who gonna be the operator or administrator of this community or if there?are any game?rules?needed to follow?? will that be another utopian ?

Well, on a technical layer, I'd like to shed some lights on so-called web standard trends

1. front end --
???????? CSS ----> layout
?????????XML ----> data?
?????????XHTML ----> markup
?????????Javascript & DOM ----> behavior + XMLHttpRequest?--> AJAX ?

2. back end --?
?????????some open source projects such as Ruby on Rail...

let me know how you are thinking...

posted @ 2006-08-09 09:21 Dedian 閱讀(816) | 評論 (0) | 編輯收藏

Doug Cutting 訪談錄 -- 關于搜索引擎的開發(fā)

作為Lucene和Nutch兩大Apach Open Source Project的始創(chuàng)人(其實還有Lucy, Lucene4C 和Hadoop等相關子項目)，Doug Cutting 一直為搜索引擎的開發(fā)人員所關注。他終于在為Yahoo以Contractor的身份工作4年后，于今年正式以Employee的身份加入Yahoo

下面是筆者在工作之余,翻譯其一篇2年前的訪談錄，原文(Doug Cutting Interview)在網(wǎng)上Google一下就容易找到。希望對搜索引擎開發(fā)的初學者起到一個拋磚引玉的效果。

(注：翻譯水平有限，不求雅，只求信，達。希望見諒)

1。請問你以何為生？你是如何開始從事搜索引擎開發(fā)的？

我主要在家從事兩個與搜索有關的開源項目的開發(fā): Lucene和Nutch.?錢主要來自于一些與這些項目相關的一些合同中。目前Yahoo! Labs?有一部分贊助在Nutch上。這兩個項目還有一些其他的短期合同?。

2。你能大概給我們講解一下Nutch嗎？以及你將在哪方面運用它？

我還是先說一下Lucene吧。Lucene其實是一個提供全文文本搜索的函數(shù)庫，它不是一個應用軟件。它提供很多API函數(shù)讓你可以運用到各種實際應用程序中?，F(xiàn)在，它已經(jīng)成為Apache的一個項目并被廣泛應用著。這里列出一些已經(jīng)使用Lucene的系統(tǒng)。

Nutch是一個建立在Lucene核心之上的Web搜索的實現(xiàn)，它是一個真正的應用程序。也就是說，你可以直接下載下來拿過來用。它在Lucene的基礎上加了網(wǎng)絡爬蟲和一些和Web相關的東東。其目的就是想從一個簡單的站內(nèi)索引和搜索推廣到全球網(wǎng)絡的搜索上，就像Google和Yahoo一樣。當然，和那些巨人競爭，你得動一些腦筋，想一些辦法。我們已經(jīng)測試過100M的網(wǎng)頁，并且它的設計用在超過1B的網(wǎng)頁上應該沒有問題。當然，讓它運行在一臺機器上，搜索一些服務器，也運行的很好。

3。在你看來，什么是搜索引擎的核心元素？也就說，一般的搜索引擎軟件可以分成哪幾個主要部分或者模塊？

讓我想想，大概是如下幾塊吧：

?-- 攫取(fetching)：就是把被指向的網(wǎng)頁下載下來。
?-- 數(shù)據(jù)庫：保存攫取的網(wǎng)頁信息，比如那些網(wǎng)頁已經(jīng)被攫取，什么時候被攫取的以及他們又有哪些鏈接的網(wǎng)頁等等。
?-- 鏈接分析：對剛才數(shù)據(jù)庫的信息進行分析，給每個網(wǎng)頁加上一些權(quán)值(比如PageRank,WebRank什么的)，以便對每個網(wǎng)頁的重要性有所估計。不過，在我看來，索引那些網(wǎng)頁標記(Anchor)里面的內(nèi)容更為重要。(這也是為什么諸如Google Bombing如此高效的原因)
?-- 索引(Indexing): 就是對攫取的網(wǎng)頁內(nèi)容，以及鏈入鏈接，鏈接分析權(quán)值等信息進行索引以便迅速查詢。
?-- 搜索(Searching): 就是通過一個索引進行查詢?nèi)缓蟀凑站W(wǎng)頁排名顯示。

當然，為了讓搜索引擎能夠處理數(shù)以億計的網(wǎng)頁，以上的模塊都應該是分布式的。也就是說，可以在多臺機器上并行運行。

4。你剛才說大家可以立馬下載Nutch運行在自己的機器上。這是不是說，即便那些對Apache服務器沒有掌控權(quán)的網(wǎng)站管理員在短時間內(nèi)就可以使用Nutch?

很不幸，估計他們大都沒戲。因為Nutch還是需要一個Java servlet的容器(筆者注：比如Tomcat)。而這個有些ISP支持，但大都不支持。(筆者注: 只有對Apache服務器有掌控權(quán)，你才能在上面安裝一個Tomcat之類的東東)

5。我可以把Lucene和Google Web API結(jié)合起來嗎？或者和其他的一些我先前寫過的應用程序結(jié)合起來？

有那么一幫人已經(jīng)為Nutch寫了一些類似Google的API, 但還沒有一個融入現(xiàn)在的系統(tǒng)。估計不久的將來就行了。

6。你認為目前實現(xiàn)一個搜索引擎最大的障礙在哪里？是硬件，存儲障礙還是排名算法？還有，你能不能告訴我大概需要多大的空間搜索引擎才能正常工作，就說我只想寫一個針對搜索成千上百萬的RSS feeds的一個搜索引擎吧。

Nutch大概一個網(wǎng)頁總共需要10kb的空間吧。Rss feeds的網(wǎng)頁一般都比較小(筆者注: Rss feeds都是基于xml的文本網(wǎng)頁，所以不會很大)，所以應該更好處理吧。當然Nutch目前還沒有針對RSS的支持。(筆者注：實際上，API里面有針對RSS的數(shù)據(jù)結(jié)構(gòu)和解析)

7。從Yahoo! Labs拿到資金容易嗎？哪些人可以申請？你又要為之做出些什么作為回報？

我是被邀請的，我沒有申請。所以我不是很清楚個中的流程。

8。Google有沒有表示對Nutch感興趣？

我和那邊的一些家伙談過，包括Larry Page(筆者注: Google兩個創(chuàng)始人之一)。他們都很愿意提供一些幫助，但是他們也無法找到一種不會幫助到他們競爭對手的合適方式。

9。你有實現(xiàn)你自己的PageRank或者WebRank算法系統(tǒng)在你的Nutch里嗎？什么是你做網(wǎng)頁排名(Ranking)的考慮？

是的，Nutch里面有一個鏈接分析模塊。它是可選的，因為對于站內(nèi)搜索來說，網(wǎng)頁排名是不需要的。

10。我想你以前有聽說過，就是對于一個開源的搜索引擎，是不是意味著同樣會給那些搞搜索引擎優(yōu)化(SEO)的黑客們有機可趁？

恩，有可能。
就說利用反向工程破解的非開源搜索引擎中的最新的反垃圾信息檢測算法需要大概6個月的時間。對于一個開放源碼的搜索引擎來說，破解將會更快。但不管怎么說，那些制造垃圾信息者最終總能找到破解辦法，唯一的區(qū)別就是破解速度問題。所以最好的反垃圾信息技術，不管開源也好閉源也好，就是讓別人知道了其中的機制之后也能繼續(xù)工作那一種。

還有，如果這六月中你是把檢測出來的垃圾信息從你的索引中移除，他們無計可施，他們只能改變他們的站點。如果你的垃圾信息檢測是基于對一些網(wǎng)站中好的和壞的例子的統(tǒng)計分析，你可以徹夜留意那些新的垃圾信息模式并在他們有機會反應之前將他們移除。

開源會使得禁止垃圾信息的任務稍稍艱巨一點，但不是使之成為不可能。況且，那些閉源的搜索引擎也并沒有秘密地解決這些問題。我想閉源的好處就是不讓我們看到它其實沒有我們想象的那么好。

11。Nutch和分布式的網(wǎng)絡爬蟲Grub相比怎么樣？你是怎么想這個問題的？

我能說的就是，Grub是一個能夠讓網(wǎng)民們貢獻一點自己的硬件和帶寬給巨大的LookSmart的爬行任務的一個工程。它只有客戶端是開源，而服務端沒有。所以大家并不能配置自己的Grub服務，也不能訪問到Grub收集的數(shù)據(jù)。

更一般意義的分布式網(wǎng)絡爬行又如何？當一個搜索引擎變得很大的時候，其爬行上的代價相對搜索上需要付出的代價將是小巫見大巫。所以，一個分布式爬蟲并不能是顯著降低成本，相反它會使得一些已經(jīng)不是很昂貴的東西變得很復雜(筆者注：指pc和硬盤之類的硬件)。所以這不是一個便宜的買賣。

廣泛的分布式搜索是一件很有趣的事，但我不能肯定它能否實現(xiàn)并保持速度足夠的快。一個更快的搜索引擎就是一個更好的搜索引擎。當大家可以任意快速更改查詢的時候，他們就更能在他們失去耐心之前頻繁找到他們所需的東西。但是，要建立一個不到1秒內(nèi)就可以搜索數(shù)以億計的網(wǎng)頁的廣泛的分布式搜索引擎是很難的一件事，因為其中網(wǎng)絡有很高的延時。大都的半秒時間或者像Google展示它的查詢那樣就是在一個數(shù)據(jù)中心的網(wǎng)絡延時。如果你讓同樣一個系統(tǒng)運行在千家萬戶的家里的PC上，即便他們用的是DSL和Cable上網(wǎng)，網(wǎng)絡的延時將會更高從而使得一個查詢很可能要花上幾秒鐘甚至更長的時間。從而他也不可能會是一個好的搜索引擎。

12。你反復強調(diào)速度對于搜索引擎的重要性，我經(jīng)常很迷惑Google怎么就能這么快地返回查詢結(jié)果。你認為他們是怎么做到的呢？還有你在Nutch上的經(jīng)驗看法如何？

我相信Google的原理和Nutch大抵相同：就是把查詢請求廣播到一些節(jié)點上，每個節(jié)點返回一些頁面的頂級查詢結(jié)果。每個節(jié)點上保存著幾百萬的頁面，這樣可以避免大多查詢的磁盤訪問，并且每個節(jié)點可以每秒同時處理成十上百的查詢。如果你想獲得數(shù)以億計的頁面，你可以把查詢廣播到成千的節(jié)點上。當然這里會有不少網(wǎng)絡流量。

具體的在這篇文章（ www.computer.org/ micro/mi2003/ m2022.pdf）中有所描述。

13。你剛才有提到垃圾信息，在Nutch里面是不是也有類似的算法？怎么區(qū)別垃圾信息模式比如鏈接場(Linkfarms)(筆者注：就是一群的網(wǎng)頁彼此互相鏈接，這是當初在1999年被一幫搞SEO弄出來的針對lnktomi搜索引擎的使網(wǎng)頁的排名得到提高的一種Spamdexing方法)和那些正常的受歡迎的站點鏈接。

這個，我們還沒有騰出時間做這塊。不過，很顯然這是一個很重要的領域。在我們進入鏈接場之前，我們需要做一些簡單的事情：察看詞匯填充(Word stuffing)(筆者注：就是在網(wǎng)頁里嵌入一些特殊的詞匯，并且出現(xiàn)很多的次，甚至上百次，有些是人眼看不到的，比如白板寫白字等伎倆，這也是Spamdexing方法的一種)，白板寫白字(White-on-white text)，等等。

我想在一般意義上來說(垃圾信息檢測是其中的一個子問題)，搜索質(zhì)量的關鍵在于擁有一個對查詢結(jié)果手工可靠評估的輔助措施。這樣，我們可以訓練一個排名算法從而產(chǎn)生更好的查詢結(jié)果(垃圾信息的查詢結(jié)果是一種壞的查詢結(jié)果)。商業(yè)的搜索引擎往往會雇傭一些人進行可靠評估。Nutch也會這樣做，但很顯然我們不能只接受那些友情贊助的評估，因為那些垃圾信息制造者很容易會防止那些評估。因此我們需要一種手段去建立一套自愿評估者的信任體制。我認為一個平等評論系統(tǒng)(peer-review system),有點像Slashdot的karma系統(tǒng), 應該在這里很有幫助。

14。你認為搜索引擎在不久的將來路在何方？你認為從一個開發(fā)者的角度來看，最大的障礙將在哪里？

很抱歉，我不是一個想象力豐富的人。我的預測就是在未來的十年里web搜索引擎將和現(xiàn)在的搜索引擎相差無幾。現(xiàn)在應該屬于平穩(wěn)期。在最初的幾年里，網(wǎng)絡搜索引擎確實曾經(jīng)發(fā)展非常迅速。源于1994年的網(wǎng)絡爬蟲使用了標準的信息析取方法。直到1998年Google的出現(xiàn)，其間更多的基于Web的方法得到了發(fā)展。從那以后，新方法的引入大大放慢了腳步。那些樹枝低的果實已被收獲。創(chuàng)新只有在剛發(fā)展的時候比較容易，越到后來越成熟，越不容易創(chuàng)新。網(wǎng)絡搜索引擎起源于上個世紀90年代，現(xiàn)在儼然已成一顆搖錢樹，將來很快會走進人們的日常生活中。

至于開發(fā)上的挑戰(zhàn)，我認為操作上的可靠性將是一個大的挑戰(zhàn)。我們目前正在開發(fā)一個類似GFS(Google的文件系統(tǒng))的東西。它是巨型搜索引擎不可缺少的基石：你不能讓一個小組件的錯誤導致一個大的癱瘓。你應該很容易的讓系統(tǒng)擴展，只需往硬件池里加更多硬件而不需繁縟的重新配置。還有，你不需要一大坨的操作人員完成，所有的一切將大都自己搞定。

----------------完----------------------

posted @ 2006-08-02 06:07 Dedian 閱讀(14474) | 評論 (199) | 編輯收藏

CVS Tutorial

--? Getting Ready to Use CVS

First set the variable CVSROOT to /class/`username`/cvsroot
[Or any other directory you wish]
[For csh/tcsh: setenv CVSROOT ~/cvsroot]
[For bash/ksh: CVSROOT=~/cvsroot;export CVSROOT]

Next run cvsinit. It will create this directory along with the subdirectory CVSROOT and put several files into CVSROOT.

-- How to put a project under CVS

A simple program consisting of multiple files is in /workspaces/project.

To put this program under cvs first

cd to /workspaces/project

Next

cvs import -m "Sample Program" project sample start

CVS should respond with
N project/Makefile
N project/main.c
N project/bar.c
N project/foo.c

No conflicts created by this import

If your were importing your own program, you could now delete the original source.
(Of course, keeping a backup is always a good idea)

-- Basic CVS Usage

Now that you have added 'project' to your CVS repository, you will want to be able to modify the code.

To do this you want to check out the source. You will want to cd to your home directory before you do this.

cd

cvs checkout project

CVS should respond with
cvs checkout: Updating project
U project/Makefile
U project/bar.c
U project/foo.c
U project/main.c

This creates the project directory in your home directory and puts the files: Makefile, bar.c, foo.c, and main.c into the directory along with a CVS directory which stores some information about the files.

You can now make changes to any of the files in the source tree.
Lets say you add a printf("DONE\n"); after the function call to bar()
[Or just cp /class/bfennema/project_other/main2.c to main.c]

Now you have to check in the new copy

cvs commit -m "Added a DONE message." main.c

CVS should respond with
Checking in main.c;
/class/'username'/cvsroot/project/main.c,v <-- main.c
new revision: 1.2; previous revision: 1.1
done

Note, the -m option lets you define the checking message on the command line. If you omit it you will be placed into an editor where you can type in the checking message.

-- Using CVS with Multiple Developers

To simulate multiple developers, first create a directory for your second developer.
Call it devel2 (Create it in your home directory).
Next check out another copy of project.

HINT: cvs checkout project

Next, in the devel2/project directory, add a printf("YOU\n"); after the printf("BAR\n");
[Or copy /class/bfennema/project_other/bar2.c to bar.c]

Next, check in bar.c as developer two.

HINT: cvs commit -m "Added a YOU" bar.c

Now, go back to the original developer directory.
[Probably /class/'username'/project]

Now look at bar.c. As you can see, the change made by developer one has no been integrated into your version. For that to happen you must

cvs update bar.c

CVS should respond with
U bar.c

Now look at bar.c. It should now be the same as developer two's.
Next, edit foo.c as the original developer and add printf("YOU\n"); after the printf("FOO\n");
[Or copy /class/bfennema/project_other/foo2.c to foo.c]

Then check in foo.c

HINT: cvs commit -m "Added YOU" foo.c

Next, cd back to developer two's directory.
Add printf("TOO\n"); after the printf("FOO\n");
[Or copy /class/bfennema/project_other/foo3.c to foo.c]

Now type

cvs status foo.c

CVS should respond with

===================================================================
File: foo.c             Status: Needs Merge

   Working revision:    1.1.1.1 'Some Date'
   Repository revision: 1.2     /class/'username'/cvsroot/project/foo.c,v
   Sticky Tag:          (none)
   Sticky Date:         (none)
   Sticky Options:      (none)

The various status of a file are:
Up-to-date

The file is identical with the latest revision in the repository.Locally Modified

You have edited the file, and not yet committed your changes.Needing Patch

Someone else has committed a newer revision to the repository.Needs Merge

Someone else has committed a newer revision to the repository, and you have also made modifications to the file.
Therefore, this is telling use we need to merge our changes with the changes made by developer one. To do this

cvs update foo.c

CVS should respond with
RCS file: /class/'username'/cvsroot/project/foo.c,v
retrieving revision 1.1.1.1
retrieving revision 1.2
Merging differences between 1.1.1.1 and 1.2 into foo.c
rcsmerge: warning: conflicts during merge
cvs update: conflicts found in foo.c
C foo.c

Since the changes we made to each version were so close together, we must manually adjust foo.c to look the way we want it to look. Looking at foo.c we see:

void foo()
{
  printf("FOO\n");
<<<<<<< foo.c
  printf("TOO\n");
=======
  printf("YOU\n");
>>>>>>> 1.2
}

We see that the text we added as developer one is between the ======= and the >>>>>>> 1.2.
The text we just added is between the ======= and the <<<<<<< foo.c

To fix this, move the printf("TOO\n");to after the printf("YOU\n");line and delete the additional lines the CVS inserted. [Or copy /class/bfennema/project_other/foo4.c to foo.c]
Next, commit foo.c

cvs commit -m "Added TOO" foo.c

Since you issued a cvs update command and integrated the changes made by developer one, the integrated changes are committed to the source tree.

-- Additional CVS Commands

To add a new file to a module:

Get a working copy of the module.
Create the new file inside your working copy.
use cvs add filename to tell CVS to version control the file.
use cvs commit filename to check in the file to the repository.

Removing files from a module:

Make sure you haven't made any uncommitted modifications to the file.
Remove the file from the working copy of the module. rm filename.
use cvs remove filename to tell CVS you want to delete the file.
use cvs commit filename to actually perform the removal from the repository.

For more information see the cvs man pages or the cvs.ps file in cvs-1.7/doc.

---------------
copy from http://www.csc.calpoly.edu/~dbutler/tutorials/winter96/cvs/

posted @ 2006-07-20 07:06 Dedian 閱讀(511) | 評論 (0) | 編輯收藏

Java Logging mechanism

reference:

http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html

posted @ 2006-06-27 02:49 Dedian 閱讀(279) | 評論 (0) | 編輯收藏

Generic in the Java Programming Language

When reading GData source code, you will find that there are lots of generic-style code in it, which is one of several extensions of JDK 1.5. If you are using java 1.5 compiler, it is surely deserved to get some ideas about generic. Be noticed that Java generic looks like C++ Temple, but is quite different.

1. what is the idea of generic?
To simply say, generic is an idea of parameterizing type, including class type and other data types.

2. examples?
-- We are familar with some container types, such as Collection. Here is an example for our former (Java 1.4 or before) typical usage:
Vector myList = new Vector();
myList.add(new Integer(100));
Integer value = (Integer)myList.get(0);

now it is better to write like this for type safety: (Eclipse IDE will display type safety warnings for above code if under java 1.5 compiler option)
??Vector<Integer> myList = new Vector<Integer>();
??myList.add(new Integer(100));
??Integer value = myList.get(0);

-- the reason why write code like this is Class Vector has been defined as a generic:
public Class Vector<E>
{
??????void add(E x);
????? ......
}

-- when we see some angle brackets(invocations) shown in?declaration, that is a generic. The invocation is a parameterized type. to use this generic, we need specify an actual type argument. (such as Integer as above)

3. trick in generic

-- we know that the idea of generic makes some data type such as container more flexible or acceptable for inputting entries. But that will be also very tricky. To take container as an example of generic, one of tricks is?can we copy values from one container to another container? if you want to copy like following style, the answer is no.
List<String> ls = new ArrayList<String>();
List<Object> lo = ls; //compile time error!

-- though we know String is a subtype of Object, and we can assign a value of String to an Object. But we can not assign a List of String to a List of Object as a whole part(like reference to a variable). The reason is we can access inner part of List(I mean element here, if List is as a simple data type such as Object, maybe we can do that), that will make List type unsafe. So, Java 1.5 complier will not let you do that.

-- Look inside two styles of code in above examples(of 2), we might say that the older style looks more flexible, because myList can accept more data types besides Integer, but the new style in 1.5 can only take Integer values. Well, if we need more flexible, we apply wildcards for generic.

4. Wildcards and bounded wildcards

-- if we see something like Collection<?> c, there is a question mark in angle brackets. That is Wildcard, which means type is temporarily unknown but it will be replaced by any type.
-- if we see something like Collection<? extends Number> c, that is bounded wildcard, which means the elements in Collection has a supertype bound. You can not put any other type whose supertype is not Number into this Collection.
-- But, no matter wildcard or bounded wildcard, we can not put a specified type value in it, that is because wildcard means type is unknown, you can not give a value to unknown data type.
-- So, what hell can wildcard be used for ? return back the flexible idea we mentioned before. We need apply wildcard to describe a flexible idea in definition or declaration, not to do real things.
for example, we can define an method like this:
void printCollection(Collection<?> c)
{
??????for(Object e : c){System.out.println(e);}
}
see? that is flexible. You can call this function for any Collection. You can use elements in Collection<?>, just don't try to put something in it.
-- So the question is, if we wanna that flexibility for our method, and we also need put something in it during the subroutine. How can we do? and then, we need use generic method

5. Generic method
-- that means method declaration can also be parameterized.
-- example:
????public <T> void addCollection(List<T> objs, T obj)
? ?{
??????? objs.add(obj);
?? ?}

6. when to use generic method and when to use wildcard ?
-- if the type parameter is used only once, or it has no relationship to other arguments of method including the return type, then wildcard?is?better to use to decribe clearer and more concise meanings.
-- otherwise, generic method should be used.
example:
class Collection
{
??????public static <T, S extends T> void copy(List<T> dest, List<S> src){...}
}
can be better rewritten as :
class Collection
{
??????public static <T> void copy(List<T> dest, List<? extends T> src){...}
}

reference: http://java.sun.com/j2se/1.5/pdf/generics-tutorial.pdf

posted @ 2006-06-23 09:39 Dedian 閱讀(1395) | 評論 (0) | 編輯收藏

something about standard of Syndication Format

http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/0507&file=w4sta.xml&xsl=article.xsl&;jsessionid=GZQWvln9z4JY2dXX8HyQ5f5KtRptqHRWvh17tjCXVbxHnGyzvTm2!554406865

posted @ 2006-06-22 06:06 Dedian 閱讀(212) | 評論 (0) | 編輯收藏

Enhancements in JDK 5

http://java.sun.com/j2se/1.5.0/docs/guide/language/index.html

posted @ 2006-06-21 09:51 Dedian 閱讀(205) | 評論 (0) | 編輯收藏

a bug in Java ?

when I try to debug my webcrawler?by crawling?yahoo website, I found that when trying to connect to a website which URL is such as http://www.youtube.com/w/Kak%E1?v=PIBe_V9PBIA&search=kak%C3%A1, the following exception will happen:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 12
?at java.lang.String.substring(Unknown Source)
?at sun.net.www.ParseUtil.unescape(Unknown Source)
?at sun.net.www.ParseUtil.decode(Unknown Source)
?at sun.net.www.ParseUtil.toURI(Unknown Source)
?at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
?at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)

follow is simple testing code:
?
private static final String urlstring = "
???URL url = new URL(urlstring);
???
???URLConnection con = url.openConnection();
???
???con.connect();

since there?are no other explicit exceptions except MalformedURLException & IOException mentioned to catch for this code, I am not sure if it is a bug in Java for URL parsing...

anybody got some idea about that?

P.S. ok, somebody has pointed out that Runtime exceptions, like java.lang.StringIndexOutOfBoundsException, do not have to be declared, but they can be thrown. So i need catch StringIndexOutOfBoundsException this exception for my code. But in my understanding, the function should catch all the exceptions from lower functions, and then throw out if it can not handle them, thus we can catch those exception from deep functions. I am not sure Runtime exceptions are exceptional ...

posted @ 2006-06-15 07:48 Dedian 閱讀(505) | 評論 (0) | 編輯收藏

Something is in progress

Still working on Webcrawler part, the URL collection strategies are under thinking. A URL frontier which stores the list of? activate URLs to be parsed or downloaded will be applied to handle for synchonized I/O operations with URL collection/Inventory, stuck by some issues:

1. Duplicate URL Elimination:
??? a. Host name aliases --> DNS Resolver
??? b. Omitted port numbers
??? c. Alternative paths on the same host
??? d. replication across difference host
??? e. non-sense links or session IDs embedded in URLs ?
2. Reachable of URL
3. Distributed Storage of URL Inventory and relative synchronization problem
4. Fetch strategies for URL Frontier or Fetchor to get activate links for parsing
5. Scheduler for fetching and updating URL collection: multi-thread or single thread on each pc, when to decide re-parsing a page
7. URL-Seen test: if that page has been parsed and should it re-parse? which should be done before entering URL frontier...
8. Extensibility issues for those modules: Fetcher, Extractor/Filters, Collector...
9. Checkpointing for crawlering interupted: how to resume the crawler job, how to split crawler jobs and distribute to different machines

seems that I need couple days to refine my systen architecture design...

posted @ 2006-06-09 08:57 Dedian 閱讀(847) | 評論 (0) | 編輯收藏

I/O Design Patterns

Here is an article for effective I/O programming thought, mark it just for future re-check my I/O design in distributed searching engine system. Non-blocking synchronous mode was applied in my current system. I need check it out if anything can do to improve the performance and large scalability later.

posted @ 2006-06-09 08:56 Dedian 閱讀(204) | 評論 (0) | 編輯收藏

Good or Bad, Check your OO Design

An idea is proposed by a PHD student of University of Auckland to check your OO Design on Java. The key point is to use directed graph to analyze the dependencies between all java classes, and the more classses involved in some cycle, the worse design it is.

Several Java Open source softwares have been examed in his research report...
Though it is not the only metric to check your OO design, I'd like to say that it is an interesting thought.

posted @ 2006-06-08 03:05 Dedian 閱讀(986) | 評論 (0) | 編輯收藏

Retrieve values in HashTable or HashMap

Unlike collection types such as Vector or List, Map (HashTable or HashMap) accesses a value by a key. If we want to retrieve all the values that have been put in a Map, one of simple ways to do that is employing a Collection or plus an Iterator, here is the sample code (just retrieve vaules, skip keys), assuming there is a variable: HashMap<String, <ComplexDataType>> links

Collection c = links.value();
Vector<ComplexDataType> v = new Vector<ComplexDataType>(c);
for(int i = 0; i< v.size(); i++)
{
??? ComplexDataType tempData = (ComplexDataType)v.get(i);
??? dosomethingwith(tempData);
}

P.S. Map provides three views of map: keySet, entrySet and values collection, we can use any of them .

posted @ 2006-06-02 07:16 Dedian 閱讀(342) | 評論 (0) | 編輯收藏

Java Interview Questions

These questions are very useful for some Java newbies and guys who wanna prepare some interviews on Java programming positions, which is really cool.

reference:
http://www.allapplabs.com/interview_questions/java_interview_questions.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_2.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_3.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_4.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_5.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_6.htm

posted @ 2006-06-02 06:14 Dedian 閱讀(388) | 評論 (0) | 編輯收藏

Java Reading & Writing file

1. Reading text from Standard Input

try 
{
       BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
       String str = "";
       while (str != null) 
       {
          System.out.print("> some prompt ");
          str = in.readLine();
	  dosomethingwith(str);
       }
} 
catch (IOException e) 
{
}

2. Reading text from a file

try 
{
     BufferedReader in = new BufferedReader(new FileReader("filename"));
     String str;
     while ((str = in.readLine()) != null) 
     {
	dosomethingwith(str);
     }
     in.close();
} 
catch (IOException e) 
{
}

3. Reading a file into a BityArray

    // Returns the contents of the file in a byte array.
    public static byte[] getBytesFromFile(File file) throws IOException 
    {
        InputStream is = new FileInputStream(file);

        // Get the size of the file
        long length = file.length();

        // You cannot create an array using a long type.
        // It needs to be an int type.
        // Before converting to an int type, check
        // to ensure that file is not larger than Integer.MAX_VALUE.
        if (length > Integer.MAX_VALUE) 
	{
            // File is too large
        }

        // Create the byte array to hold the data
        byte[] bytes = new byte[(int)length];

        // Read in the bytes
        int offset = 0;
        int numRead = 0;
        while (offset < bytes.length
               && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) 
	{
            offset += numRead;
        }

        // Ensure all the bytes have been read in
        if (offset < bytes.length) 
	{
            throw new IOException("Could not completely read file "+file.getName());
        }

        // Close the input stream and return bytes
        is.close();
        return bytes;

    }

4. Writing to a file

try 
{
    BufferedWriter out = new BufferedWriter(new FileWriter("filename"));
    out.write("some string");
    out.close();
} 
catch (IOException e) 
{
}

Note: If the file does not already exist, it is automatically created.

5. Appending to a file

try 
{
     BufferedWriter out = new BufferedWriter(new FileWriter("filename", true));
     out.write("appending String");
     out.close();
} 
catch (IOException e) 
{
}

6. Using a Random Access File

try 
{
     File f = new File("filename");
     RandomAccessFile raf = new RandomAccessFile(f, "rw");

     // Read a character
     char ch = raf.readChar();

     // Seek to end of file
     raf.seek(f.length());

     // Append to the end
     raf.writeChars("aString");
     raf.close();
} 
catch (IOException e) 
{
}

reference:
http://javaalmanac.com/egs/java.io/pkg.html

posted @ 2006-05-31 08:12 Dedian 閱讀(563) | 評論 (1) | 編輯收藏

Java Glossary -- Volatile

volatile

The volatile keyword is used on variables that may be modified simultaneously by other threads. This warns the compiler to fetch them fresh each time, rather than caching them in registers. This also inhibits certain optimisations that assume no other thread will change the values unexpectedly. Since other threads cannot see local variables, there is never any need to mark local variables volatile.

quote from:

http://mindprod.com/jgloss/volatile.html

posted @ 2006-05-25 04:45 Dedian 閱讀(306) | 評論 (1) | 編輯收藏

Lucene 2.0 release mostly this Friday

Though still under voting, it is originally?mentioned by Doug Cutting, and got only positive votes. So it is very likely we can get a 2.0 release version on this Friday. Some bugs has been fixed and deprecated code has been removed in this approaching version.

posted @ 2006-05-24 09:00 Dedian 閱讀(226) | 評論 (0) | 編輯收藏

歲月遐想

二十年前

我受著老師家長的各種表揚帶著各種的小紅花拿著各種的競賽獎狀

我現(xiàn)在的老板也許正在池塘里抓魚樹上捕知了向家長鬧棒棒糖吃

十年前

我開始談戀愛開始在月光下行走在沒人行走的小道上開始學著猶豫地寫詩

我現(xiàn)在的老板也許正在狂啃高中課本而郁郁寡歡或許也開始遞小紙條給鄰座的小女生

十年后的今天

戀人終成我的內(nèi)人然后我在吭哧吭哧地在我現(xiàn)在的老板提供的一片小天地下寫著莫名其妙的代碼

鄰座的小女生終成記憶然后我現(xiàn)在的老板在我10米不遠的窗明幾凈的空曠的房間里看著我以及100號在他眼里和我差不多的人賣命地為他寫著代碼而輕松的聽者不知是不是搖滾的音樂而搖頭晃腦。

十年后的明天

？

結(jié)局1：

內(nèi)人依然還是內(nèi)人我還在吭哧吭哧地寫著代碼身邊卻多了一個長著和我有些許相似的小孩拽著我的胳膊鬧著要用我的電腦玩游戲

無數(shù)的漂亮女生在大樓里走馬觀花然后我現(xiàn)在的老板在我100米以外不知是不是房間的里面開著大會和著幾個肥頭大耳的股東討論著我以及1000號類似的人類的存活問題

結(jié)局2：

內(nèi)人依然還是內(nèi)人我終于省吃儉用和內(nèi)人開辦有史以來第一個屬于自己的公司坐在屬于自己的窗明幾凈的辦公室里看著外面100號年輕如20年前的我的小兄弟們熱火朝天的干著革命

漂亮的女生們依然走馬觀花現(xiàn)在我的老板在更高更大的高樓大廈里和著幾個肥頭大耳的股東討論著怎么把曾經(jīng)是他的手下如今卻成了一個小老板的我的公司進行兼并的大事。

結(jié)局3：

內(nèi)人依然還是內(nèi)人我卻擁有一個屬于自己的公司辦公室聚集著一幫曾經(jīng)是我的同事以及現(xiàn)在的老板混在其中的人群在空調(diào)房里為我出謀劃策或者吭哧吭哧地寫著和10年前不一樣的代碼

一個漂亮的女生終于成為漂亮少婦現(xiàn)在的老板卻因為經(jīng)營不善轉(zhuǎn)手把公司賣給曾經(jīng)在他手下吭哧吭哧寫代碼的我然后我給了他一個不錯的職位讓他養(yǎng)家糊口娶妻生子。

P.S. 函數(shù) Likely(結(jié)局n) (1<=n<=3)為嚴格單調(diào)遞減函數(shù)，其上限為0.0001

P.S.

以上歲月遐想純屬yy,我的老板不是中國人，沒有我yy中的他的少年以及青年。既然他不懂中文，我這里用中文進行yy決不會有落把柄在他手中的危險。寫這段yy的話的目的是表達我對年輕的他的敬仰(希望他能看懂這句中文)，以及我還未泯滅在幸福生活中的一點雄心。

posted @ 2006-05-20 13:28 Dedian 閱讀(277) | 評論 (0) | 編輯收藏

Ooops! my laptop not working...

Oops! My laptop, Compaq Presario R3230, is not working now (just worked yesterday evening), blue screen, hangs at disk checking...when I reboot with safe mode, it still hangs at is multi(0)disk(0)rdisk(0)partition(1)\windows\system32\drivers\atisgkaf.sys, I guess there is something wrong with my video driver, but how can I fix that problem without wipe out my documents in harddriver?

I am trying to google by it, it seems some guys also got that problem, some steps are suggested:

1. ?Insert the QuickRestore CD into the CD drive and restart the
? ? system.
2. ?When the red Compaq logo appears, press and hold the Caps
? ? Lock key. ?Next screen will be a blinking QuickRestore screen.
3. ?When the QuickRestore text stops blinking, press and hold the
? ? Num Lock key.

but where can I get QuickRestore CD? included CD seems not in my room any more...anybody has thought about that?

posted @ 2006-05-20 04:32 Dedian 閱讀(186) | 評論 (0) | 編輯收藏


Copyright © Dedian	Powered by: 博客園模板提供：滬江博客

導航

常用鏈接

留言簿(8)