亚洲色大成网站www,亚洲av极品无码专区在线观看,亚洲人成网站日本片

meta学习(f��n) - 前辈kafka

矛_�� | Fat Mind — Sat, 06 Jul 2013 06:57:00 GMT

http://incubator.apache.org/kafka/design.html

1.Why we built this
asd�Q�activity stream data�Q�数据是��M��|�站的一部分�Q�反映网站��用情况，如：(x��)那些内容被搜索、展�C�。通常�Q�此部分数据被以log方式记录在文�Ӟ��然后定期的整合和分析。od�Q�operation data�Q�是关于机器性能数据�Q�和其它不同途径整合的操作数据�?br /> 在近几年�Q�asd和od变成一个网站重要的一部分�Q�更复杂的基��设施是必��ȝ��?br /> 数据特点�Q?br /> a、大吞吐量的不变的ad�Q�对实时计算是一个挑战，�?x��)很��?gu��)��过10倍or100倍�?/span>
b、传�l�的记录log方式�?span class="Apple-style-span" style="font-family: verdana, 'courier new'; font-size: 14px; line-height: 21px; color: #000000; ">respectable and scalable方式��L��持离�U�处理，但是延迟太高�?br /> Kafka is intended to be a single queuing platform that can support both offline and online use cases.

2.Major Design Elements

There is a small number of major design decisions that make Kafka different from most other messaging systems:

Kafka is designed for persistent messages as the common case�Q?span class="Apple-style-span" style="font-size: 14px; ">消息持久
Throughput rather than features are the primary design constraint�Q?span class="Apple-style-span" style="font-size: 14px; ">吞吐量是�W�一要求
State about what has been consumed is maintained as part of the consumer not the server�Q?span class="Apple-style-span" style="font-size: 14px; ">状态由客户端维�?/span>
Kafka is explicitly distributed. It is assumed that producers, brokers, and consumers are all spread over multiple machines�Q?span class="Apple-style-span" style="font-size: 14px; ">必须是分布式

3.Basics
    Messages are the fundamental unit of communication�Q?br />     Messages are published to a topic by a producer which means they are physically sent to a server acting as a broker�Q�消息被生��者发布到一个topic�Q�意味着物理的发送消息到broker�Q?br />    多个consumer订阅一个topic�Q�则此topic的每个消息都�?x��)被分发到每个consumer�Q?br />    kafka是分布式�Q�producer、broker、consumer�Q�均可以由集��的多台机器�l�成�Q�相互协�?a logic group�Q?br />    属于同一个consumer group的每一个consumer process�Q�每个消息能准确的由其中的一个process消费�Q?span class="Apple-style-span" style="color: #222222; font-family: Arial, sans-serif; font-size: 15px; line-height: 22px; ">A more common case in our own usage is that we have multiple logical consumer groups, each consisting of a cluster of consuming machines that act as a logical whole.
   kafka不管一个topic有多��个consumer�Q�其消息仅会(x��)存储一份�?br />
4.Message Persistence and Caching

4.1 Don't fear the filesystem !
   kafka完全依赖文�g�pȝ��d��储和cache消息�Q?br />    大家通常对磁盘的直觉�?很慢'�Q�则使�h们对持久化结构，是否能提供有竞争力的性能表示怀疑；实际上，��盘到底有多慢或多块�Q�完全取决于如何使用��盘�Q?span class="Apple-style-span" style="color: #222222; font-family: Arial, sans-serif; font-size: 15px; line-height: 22px; ">a properly designed disk structure can often be as fast as the network.
   http://baike.baidu.com/view/969385.htm raid-5
   http://www.china001.com/show_hdr.php?xname=PPDDMV0&dname=66IP341&xpos=172 ��盘�U�类
   ��盘��序��d��的性能非常高， linear writes on a 6 7200rpm SATA RAID-5 array is about 300MB/sec�Q?/span>These linear reads and writes are the most predictable of all usage patterns, and hence the one detected and optimized best by the operating system using read-ahead and write-behind techniques。顺序读写是最可预见的模式�Q�因此操作系�l�通过read-head和write-behind技术去优化�?br />    ��C��操作�pȝ��Q�用mem作�ؓ(f��)disk的cache�Q�Any modern OS will happily divert all free memory to disk caching with little performance penalty when the memory is reclaimed. All disk reads and writes will go through this unified cache.
   Jvm�Q�a、对象的内存开销是非常大的，通常是数据存储的2倍；b、当heap数据增大�Ӟ��gc代�h(hu��n)��来��大�Q?/span>
    As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure。依赖文件系�l�和pagecache是优于mem cahce或其它结构的�?/span>
   数据压羃�Q�Doing so will result in a cache of up to 28-30GB on a 32GB machine without GC penalties.
   This suggests a design which is very simple: maintain as much as possible in-memory and flush to the filesystem only when necessary. ��可能的�l�持在内存中�Q�仅当必��L��写回到文件系�l?
   当数据被立即写回到持久化的文�Ӟ��而未调用flush�Q�其意味着数据仅被写入到os pagecahe�Q�在后箋某个旉��由os flush。Then we add a configuration driven flush policy to allow the user of the system to control how often data is flushed to the physical disk (every N messages or every M seconds) to put a bound on the amount of data "at risk" in the event of a hard crash. 提供flush�{�略�?/span>

4.2 Constant Time Suffices

The persistent data structure used in messaging systems metadata is often a BTree. BTrees are the most versatile data structure available, and make it possible to support a wide variety of transactional and non-transactional semantics in the messaging system.
Disk seeks come at 10 ms a pop, and each disk can do only one seek at a time so parallelism is limited. Hence even a handful of disk seeks leads to very high overhead.
Furthermore BTrees require a very sophisticated page or row locking implementation to avoid locking the entire tree on each operation.

The implementation must pay a fairly high price for row-locking or else effectively serialize all reads.
持久化消息的元数据通常是BTree�l�构�Q�但��盘�l�构�Q�其代�h(hu��n)太大。原因：(x��)寻道、避免锁整棵�?w��i)�?br />

Intuitively a persistent queue could be built on simple reads and appends to files as is commonly the case with logging solutions.
持久化队列可以构建在��d��append to 文�g。所以不支持BTree的一些语义，但其好处是：(x��)O(1)消耗，无锁��d��?br />

the performance is completely decoupled from the data size--one server can now take full advantage of a number of cheap, low-rotational speed 1+TB SATA drives.

Though they have poor seek performance, these drives often have comparable performance for large reads and writes at 1/3 the price and 3x the capacity.

4.3 Maximizing Efficiency
Furthermore we assume each message published is read at least once (and often multiple times), hence we optimize for consumption rather than production. 更进一步，我们假设被发布的消息臛_��?x��)读一�ơ，因此优化consumer优先于producer�?br />

There are two common causes of inefficiency :
two many network requests, �Q?

APIs are built around a "message set" abstraction�Q?

This allows network requests to group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time.�Q?仅提供批量操作api�Q�则每次�|�络开销是��^分在一�l�消息，而不是单个消息�?br /> and excessive byte copying.�Q?

The message log maintained by the broker is itself just a directory of message sets that have been written to disk.

Maintaining this common format allows optimization of the most important operation : network transfer of persistent log chunks.�Q?br /> To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:

The operating system reads data from the disk into pagecache in kernel space
The application reads the data from kernel space into a user-space buffer
The application writes the data back into kernel space into a socket buffer
The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network

利用os提供的zero-copy�Q?

only the final copy to the NIC buffer is needed.

4.4 End-to-end Batch Compression
In many cases the bottleneck is actually not CPU but network. This is particularly true for a data pipeline that needs to send messages across data centers.

Efficient compression requires compressing multiple messages together rather than compressing each message individually.

Ideally this would be possible in an end-to-end fashion — that is, data would be compressed prior to sending by the producer and remain compressed on the server, only being decompressed by the eventual consumers.

A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be delivered all to the same consumer and will remain in compressed form until it arrives there.
理解�Q�kafka

producer api 提供扚w��压羃�Q�broker不对此批消息做�Q何操作，且以压羃的方式，一赯��发送到consumer�?br />
4.5 Consumer state
Keeping track of what has been consumed is one of the key things a messaging system must provide.

State tracking requires updating a persistent entity and potentially causes random accesses.

Most messaging systems keep metadata about what messages have been consumed on the broker. That is, as a message is handed out to a consumer, the broker records that fact locally. 大部分消息系�l�，存储是否被消费的元信息在broker。则是说�Q�一个消息被分发��C��个consumer�Q�broker记录�?br /> 问题�Q�当consumer消费��p�|后，�?x��)导致消息丢失；改进�Q�每�ơconsumer消费后，�l�broker ack�Q�若broker在超时时间未收到ack�Q�则重发此消息�?br /> 问题�Q?.当消�Ҏ(gu��)��功，但未ack�Ӟ��?x��)导致消�?��? 2.

now the broker must keep multiple states about every single message 3.当broker是多台机器时�Q�则状态之间需要同�?br />
4.5.1 Message delivery semantics

So clearly there are multiple possible message delivery guarantees that could be provided : at most once 、at least once、exactly once�?br />

This problem is heavily studied, and is a variation of the "transaction commit" problem. Algorithms that provide exactly once semantics exist, two- or three-phase commits and Paxos variants being examples, but they come with some drawbacks. They typically require multiple round trips and may have poor guarantees of liveness (they can halt indefinitely).
消费分发语义�Q�是 ‘事务提交’ 问题的变�U�。算法提�?exactly onece 语义�Q�两阶段 or 三阶�D�|��交，paxos 均是例子�Q�但它们存在�~�点。典型的问题是要求多�ơround trip�Q�且

poor guarantees of liveness�?br />

Kafka does two unusual things with respect to metadata.

First the stream is partitioned on the brokers into a set of distinct partitions.

Within a partition messages are stored in the order in which they arrive at the broker, and will be given out to consumers in that same order. This means that rather than store metadata for each message (marking it as consumed, say), we just need to store the "high water mark" for each combination of consumer, topic, and partition.

4.5.2 Consumer state
    In Kafka, the consumers are responsible for maintaining state information (offset) on what has been consumed.

Typically, the Kafka consumer library writes their state data to zookeeper.

This solves a distributed consensus problem, by removing the distributed part!

There is a side benefit of this decision. A consumer can deliberately rewind back to an old offset and re-consume data.

4.5.3 Push vs. pull

A related question is whether consumers should pull data from brokers or brokers should push data to the subscriber.

There are pros and cons to both approaches.
However a push-based system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred. push目标是consumer能在最大速率��L��费，可不�q�的是，当consume速率��于生��速率�Ӟ��the consumer tends to be overwhelmed�?br />

A pull-based system has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model. 不存在push问题�Q�且也保证充分利用consumer能力�?br />
5. Distribution
Kafka is built to be run across a cluster of machines as the common case. There is no central "master" node. Brokers are peers to each other and can be added and removed at anytime without any manual configuration changes. Similarly, producers and consumers can be started dynamically at any time. Each broker registers some metadata (e.g., available topics) in Zookeeper. Producers and consumers can use Zookeeper to discover topics and to co-ordinate the production and consumption. The details of producers and consumers will be described below.

6. Producer

6.1 Automatic producer load balancing
Kafka supports client-side load balancing for message producers or use of a dedicated load balancer to balance TCP connections.

The advantage of using a level-4 load balancer is that each producer only needs a single TCP connection, and no connection to zookeeper is needed.

The disadvantage is that the balancing is done at the TCP connection level, and hence it may not be well balanced (if some producers produce many more messages then others, evenly dividing up the connections per broker may not result in evenly dividing up the messages per broker).

Client-side zookeeper-based load balancing solves some of these problems. It allows the producer to dynamically discover new brokers, and balance load on a per-request basis. It allows the producer to partition data according to some key instead of randomly.

The working of the zookeeper-based load balancing is described below. Zookeeper watchers are registered on the following events—

a new broker comes up
a broker goes down
a new topic is registered
a broker gets registered for an existing topic

Internally, the producer maintains an elastic pool of connections to the brokers, one per broker. This pool is kept updated to establish/maintain connections to all the live brokers, through the zookeeper watcher callbacks. When a producer request for a particular topic comes in, a broker partition is picked by the partitioner (see section on semantic partitioning). The available producer connection is used from the pool to send the data to the selected broker partition.
producer通过zk�Q�管理与broker的连接。当一个请求，�Ҏ(gu��)��partition rule 计算分区�Q�从�q�接池选择对应的connection�Q�发送数据�?br />
6.2 Asynchronous send

Asynchronous non-blocking operations are fundamental to scaling messaging systems.

This allows buffering of produce requests in a in-memory queue and batch sends that are triggered by a time interval or a pre-configured batch size.

6.3 Semantic partitioning

The producer has the capability to be able to semantically map messages to the available kafka nodes and partitions.

This allows partitioning the stream of messages with some semantic partition function based on some key in the message to spread them over broker machines.

矛_�� | Fat Mind 2013-07-06 14:57 发表评论

Js�l�习(f��n)�Q�操�U�cookie

矛_�� | Fat Mind — Sun, 09 Sep 2012 07:18:00 GMT

1.Js代码�Q�login.js文�g

//用户的登陆信息写入cookies
function SetCookie(form)//两个参数�Q�一个是cookie的名子，一个是�?/span>
{
    var name = form.name.value;
    var password = form.password.value;
    var Days = 1; //�?nbsp;cookie ��被保存 7 �?nbsp;
    var exp  = new Date(); //生成一个现在的日期�Q�加上保存期限，然后讄��cookie的生存期限！
    exp.setTime(exp.getTime() + Days*24*60*60*1000);
    document.cookie = "user="+ escape(name) + "/" + escape(password) + ";expires=" + exp.toGMTString();
}
//取cookies函数--正则表达�?不会(x��)�Q�学�?f��n)正则表辑ּ?
function getCookie(name)
{
    var arr = document.cookie.match(new RegExp("(^| )"+name+"=([^;]*)(;|$)"));
    if(arr != null) return unescape(arr[2]);
    return null;
}
//取cookies函数--普通实�?nbsp;
  function   readCookie(form){
      var   cookieValue   =   "";
      var   search   =   "user=";
      if(document.cookie.length   >   0)     {
          offset   =   document.cookie.indexOf(search);
          if(offset !=  -1){
              offset   +=   search.length;
              end   =   document.cookie.indexOf(";",offset);
              if   (end  ==  -1)
                    end   =   document.cookie.length;
              //获取cookies里面的�?nbsp;
              cookieValue   =   unescape(document.cookie.substring(offset,end))
              if(cookieValue != null){
                    var str = cookieValue.split("/");
                    form.name.value = str[0];
                    form.password.value = str[1];
              }
          }
      }
  }
//删除cookie�Q�（servlet里面�Q�设�|�时间�ؓ(f��)0�Q�设�|��ؓ(f��)-1和session的范围是一��L(f��ng)��Q�，javascript好像是有点区�?/span>
function delCookie()
{
    var name = "admin";
    var exp = new Date();
    exp.setTime(exp.getTime() - 1);
    var cval=getCookie(name);
    if(cval!=null) document.cookie= name + "="+cval+";expires="+exp.toGMTString();
}

2.jsp代码�Q�文件login.jsp

<%@ page contentType="text/html; charset=gb2312" language="java"
    import="java.sql.*" errorPage=""%>

        javascript 控制 cookie




    function checkEmpty(form){
        for(i=0;i            if(form.elements[i].value==""){
                alert("表单信息不能为空");
                return false;
            }
        }
    }













                        登陆

                                        用户名：(x��)







                                        密码�Q?br />








                            ��C��?nbsp;

目的�Q�当你再�ơ打开login.jsp��面�Q�表单里面的内容已经写好了，是你上一�ơ的登陆信息�Q?/p>

问题�Q?.JavaScript里面取cookie都是写死的，不是很灵�z�！

2.JavaScript的cookie是按照字�W�串的�Ş式存攄��Q�所以拿出的时候，你要按照你放�q�去的�Ş式来选择�Q?/div>

3.本来是想实现自动登陆的，可我的每个页面都要session的检查！一个客��L(f��ng)��Q�一个服务器端，没能实现�Q?/div>

矛_�� | Fat Mind 2012-09-09 15:18 发表评论

Js 基础知识

矛_�� | Fat Mind — Sun, 20 May 2012 05:50:00 GMT

1.变量�c�d��
  - undefined
  - null
  - string
   - == �?=== 区别
  - number
  - boolean
  - string、number、boolean均有对应�?'对象�c?
2.函数
  - 定义函数
   - function 关键�?br />   - 参数�Q�见例子�Q�，arguments
   - 函数内变量声明，var区别
  - 作用�?br />   - 铑ּ��l�构�Q�子函数可以看见父函数的变量�Q?br />  - 匿名函数
      - 使用场景�Q�非复用场景�Q�如�Q�jsonp回调函数�Q?br />   - this特征

例子�Q?/font>

var add = function(x) {
return x++;
}
add(1,2,3); // 参数可以随意多个�Q�类似Java中的(int x ...)

var fn = function(name, pass) {

alert(name);

alert(pass);

};

fn("hello","1234",5); // 按照传递的��序排列

var name = "windows";
var fn = function() {
var name = "hello";
alert(this.name);
}
fn(); // windows�Q�this在匿名函数内部是指向windows范围

var name = "windows";
var fn = function() {
name = "hello";
alert(this.name);
}
fn(); // 因函数内部变量name未声明�ؓ(f��)var�Q�则属于全局变量�Q�且this指向windows�Q�则�?hello'

function add(a) {

return ++a;

}

var fn = function(x,add){

return add(x);

}

fn(1, add); // 函数作�ؓ(f��)参数

3.闭包

http://www.ruanyifeng.com/blog/2009/08/learning_javascript_closures.html 【good�?br />其它语言闭包概念 http://www.ibm.com/developerworks/cn/linux/l-cn-closure/

4.对象
- new Object()

– 对象字面�?/div>

– 构造函�?br /> - 上述操作�Q�经历的步骤

–创徏新对�?div>

–��构造方法的作用域赋�l�新对象(new 操作�W?

–为对象添加属�? �Ҏ(gu��)��

–�q�回该对�?/div>

var obj = new Object(); // new Object方式

obj.name = 'zhangsan';

var obj = { // 字面帔R��方式�Q�定义对�?/div>

name : 'zhangsan',

showName : function (){

alert(this.name);
}

};
alert(obj.showName());

function Person(name) { // 构造函�?br />    this.name = name;     
    this.showName = function(){         
        return this.name;     } 
    }; 
var obj = new Person("zhangsan");  // 必须�?new 关键�Q�否则等于调用一个普通函�?br />obj.showName(); 
alert(obj.name);

资料�Q�内部培训ppt

矛_�� | Fat Mind 2012-05-20 13:50 发表评论

矛_�� | Fat Mind — Fri, 06 Apr 2012 06:02:00 GMT

1.句柄��是一个标识符�Q�只要获得对象的句柄�Q�我们就可以对对象进行�Q意的操作�?br />

2.句柄不是指针�Q�操作系�l�用句柄可以扑ֈ�一块内存，�q�个句柄可能是标识符�Q?/span>map�?/span>key�Q�也可能是指针，看操作系�l�怎么处理的了�?/span>

fd��是在某�U�程度上替代句柄吧；

Linux 有相应机�Ӟ��但没有统一的句柄类型，各种�c�d��的系�l�资源由各自的类型来标识�Q�由各自的接口操作�?br />

3.http://tech.ddvip.com/2009-06/1244006580122204_11.html

在操作系�l�层面上�Q�文件操作也有类��g��FILE的一个概念，�?/span>Linux里，�q�叫做文件描�q�符(File Descriptor)�Q�而在Windows里，叫做句柄(Handle)(以下在没有歧义的时候统�U�Cؓ(f��)句柄)。用户通过某个函数打开文�g以获得句柄，�?/span> 后用��h��U�|��件皆通过该句柄进行�?/span>

设计�q�么一个句柄的原因在于句柄可以防止用户随意��d��操作�pȝ��内核的文件对象。无论是Linux�q�是Windows�Q�文件句柄��L��和内核的文�g对象相关联的�Q�但如何兌��l�节用户�q�不可见。内核可以通过句柄来计��出内核里文件对象的地址�Q�但此能力�ƈ不对用户开放�?/span>

下面举一个实际的例子�Q�在Linux中，��gؓ(f��)0�?/span>1�?/span>2�?/span>fd分别代表标准输入、标准输出和标准错误输出。在�E�序中打开文�g得到�?/span>fd�?/span>3开始增�ѝ�?/span> fd具体是什么呢?在内�怸��Q�每一个进�E�都有一个私有的“打开文�g�?/span>”�Q�这个表是一个指针数�l�，每一个元素都指向一个内核的打开文�g对象。�?/span>fd�Q�就是这个表的下标。当用户打开一个文件时�Q�内�怼�(x��)在内部生成一个打开文�g对象�Q��ƈ在这个表里找��C��个空��，让这一��Ҏ(gu��)��向生成的打开文�g对象�Q��ƈ�q�回�q�一��的下标作�ؓ(f��)fd。由于这个表处于内核�Q��ƈ且用��h��法访问到�Q�因此用户即使拥�?/span>fd�Q�也无法得到打开文�g对象的地址�Q�只能够通过�pȝ��提供的函数来操作�?/span>

�?/span>C语言里，操纵文�g的渠道则�?/span>FILE�l�构�Q�不难想象，C语言中的FILE�l�构必定�?/span>fd有一对一的关�p�，每个FILE�l�构都会(x��)记录自己唯一对应�?/span>fd�?/span>

句柄 http://zh.wikipedia.org/wiki/%E5%8F%A5%E6%9F%84

�?/span>�E�序设计 �?/span>,句柄是一�U�特�D�的��指针。当一�?/span>应用�E�序要引用其他系�l?/span>(�?/span>数据�?/span>�?/span>操作�pȝ�� )所��理�?/span>内存块或对象 �Ӟ��p��使用句柄�?/span>

句柄与普�?/span>指针的区别在于，指针包含的是引用对象 �?/span>内存地址 �Q�而句柄则是由�pȝ��所��理的引用标识，该标识可以被�pȝ��重新定位��C��?/span>内存地址上。这�U�间接访�?/span>对象的模式增��Z��pȝ��对引�?/span>对象的控制。（参见��装 )�?/span>

在上世纪80�q�代的操作系�l�（�?/span>Mac OS �?/span>Windows �Q�的内存��理中，句柄被广泛应用�?/span>Unix �pȝ��?/span>文�g描述�W?/span> 基本上也属于句柄。和其它桌面环境一��P��Windows API 大量使用句柄来标识系�l�中�?/span>对象 �Q��ƈ建立操作�pȝ��?/span>用户�I�间之间的通信渠道。例如，桌面上的一个窗体由一�?/span>HWND �c�d��的句柄来标识�?/span>

如今�Q?/span>内存定w��的增大和虚拟内存 ��法使得更简单的指针愈加受到青睐�Q�而指向另一指针的那�c�d��柄受到冷淡。尽��如此，许多操作�pȝ�� 仍然把指向私�?/span>对象的指针以�?/span>�q�程传递给客户�?/span> 的内�?/span>数组下标�U�Cؓ(f��)句柄�?br />

矛_�� | Fat Mind 2012-04-06 14:02 发表评论

单元��试利器 �?powermock [使用��结]

矛_�� | Fat Mind — Thu, 29 Mar 2012 04:39:00 GMT

官方 �Q?a >http://code.google.com/p/powermock/

1. 使用mockito的同学，推荐阅读如下部分

- document [必选]

- getting started
- motavition
- mockito extends [必选]
- mockito 1.8+ useage
- common
- tutorial
- faq [必选]

2. 附�g�Q�实际开发中使用到的powermock的一些特性，��化后的例子（仅�ؓ(f��)说明powermock api使用�Q?/span>。主要包�?/span> �Q?/span>

- 修改�U�有�?/span>

- �U�有�Ҏ(gu��)��

- ��试�U�有�Ҏ(gu��)��

- Mock

- Verify

- 静态方�?/span>

- Mock

- 抛出异常

- Verify

- Mock�c�部分方�?/span>

- Mock Java core library�Q�如�Q?/span>Thread

- Mock 构造器

/Files/shijian/powermock.rar

矛_�� | Fat Mind 2012-03-29 12:39 发表评论

[转]Google: Excellent Papers for 2011

矛_�� | Fat Mind — Sat, 24 Mar 2012 03:39:00 GMT

原文地址�Q?/span>http://googleresearch.blogspot.com/2012/03/excellent-papers-for-2011.html

Excellent Papers for 2011

Posted by Corinna Cortes and Alfred Spector, Google Research

Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our publications offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google.

��h��公司�U�极参与�U�学界的交流�Q�通过发表技术论文，贡献开源��Y�Ӟ��制定标准�Q�引入新�?/span>API和工��P��丑֊�讲��和演�Ԍ��参加正在�q�行的技术辩论，�{�等。我们发布的文章提供技术和��法的进步，在开发新的��品和服务�q�程中学�?f��n)到的内容，揭示一些我们在��h��所面��(f��)的技术挑战�?/span>

In an effort to highlight some of our work, we periodically select a number of publications to be featured on this blog. We first posted a set of papers on this blog in mid-2010 and subsequently discussed them in more detail in the following blog postings. In a second round, we highlighted new noteworthy papers from the later half of 2010. This time we honor the influential papers authored or co-authored by Googlers covering all of 2011 -- covering roughly 10% of our total publications. It’s tough choosing, so we may have left out some important papers. So, do see the publications list to review the complete group.

��Z��彰显我们的一些工作，我们定期选择一些列文章发布�?/span>blog�?/span>2010中期�Q�我们第一�ơ发布了一些列的文章在blog�Q��ƈ随后在博客文章中更详�l�讨论它们。在�W�二轮中�Q�我们强调从2010�q�下半年新值得注意的论文。这一�ơ，我们�l�有影响力的文章的作者或合著者以荣誉�Q�大�U�占��L��章数�?/span>10%。这是艰隄��选择的，所以我们可能已�l�遗漏了一些重要文章。因此，��L(f��ng)��完整的文章清单�?/span>

In the coming weeks we will be offering a more in-depth look at these publications, but here are some summaries:

在未来几周我们将更深入的谈论�q�些论文�Q�但现在只做一些�ȝ��?/span>

Audio processing

“Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function”, Richard F. Lyon,Journal of the Acoustical Society of America, vol. 130 (2011), pp. 3893-3904.
Lyon's long title summarizes a result that he has been working toward over many years of modeling sound processing in the inner ear. This nonlinear cochlear model is shown to be "good" with respect to psychophysical data on masking, physiological data on mechanical and neural response, and computational efficiency. These properties derive from the close connection between wave propagation and filter cascades. This filter-cascade model of the ear is used as an efficient sound processor for several machine hearing projects at Google.

声音处理�Q?/span>�q�个滤�L器��联模型的��x��是用来作��Z��U�高效的声音处理器，是谷歌的几个机器声音处理��目之一�?/span>

Electronic Commerce and Algorithms

“Online Vertex-Weighted Bipartite Matching and Single-bid Budgeted Allocations”, Gagan Aggarwal, Gagan Goel, Chinmay Karande, Aranyak Mehta, SODA 2011.
The authors introduce an elegant and powerful algorithmic technique to the area of online ad allocation and matching: a hybrid of random perturbations and greedy choice to make decisions on the fly. Their technique sheds new light on classic matching algorithms, and can be used, for example, to pick one among a set of relevant ads, without knowing in advance the demand for ad slots on future web page views.

作者介�l�在�U�广告分配和匚w��斚w��的优雅和强大的算法技术：(x��)一�U��؜合随机扰动和贪婪选择�Q�实现在�U�决定。他们的技术揭�C�Z��l�典的匹配算法的新的方向�Q�例如，挑选其中一�l�相关的�q�告�Q�事先不知道未来的网站页面访问的�q�告位置的需求。【关注�?/span>

“Milgram-routing in social networks”, Silvio Lattanzi, Alessandro Panconesi, D. Sivakumar, Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 725-734.
Milgram’s "six-degrees-of-separation experiment" and the fascinating small world hypothesis that follows from it, have generated a lot of interesting research in recent years. In this landmark experiment, Milgram showed that people unknown to each other are often connected by surprisingly short chains of acquaintances. In the paper we prove theoretically and experimentally how a recent model of social networks, "Affiliation Networks", offers an explanation to this phenomena and inspires interesting technique for local routing within social networks.

�c�_��格兰姆的“六个度分��d��?/span>”�Q�迷人的��世界遵从它的结果，在最�q�几�q�已�l��生了很多有趣的研�I�。在�q�一��h��里程��意义的实验�Q�表明未知的�Ҏ(gu��)��往往是通过熟�h�Q�以令�h惊讶的短链连接即可认识。在本文中，我们提供理论和实验关于近代的�C�会(x��)�|�络模型�Q?/span>“Affiliation Networks”�Q�提供了一�U�解释这�U�现象，�q�激发社�?x��)网�l�的interesting technique for local routing。【关注�?/span>

“Non-Price Equilibria in Markets of Discrete Goods”, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Noam Nisan, EC, 2011.
We present a correspondence between markets of indivisible items, and a family of auction based n player games. We show that a market has a price based (Walrasian) equilibrium if and only if the corresponding game has a pure Nash equilibrium. We then turn to markets which do not have a Walrasian equilibrium (which is the interesting case), and study properties of the mixed Nash equilibria of the corresponding games.

在离散商品市场的非�h(hu��n)格��^衡【关注�?/span>

HCI

“From Basecamp to Summit: Scaling Field Research Across 9 Locations”, Jens Riegelsberger, Audrey Yang, Konstantin Samoylov, Elizabeth Nunge, Molly Stevens, Patrick Larvie, CHI 2011 Extended Abstracts.
The paper reports on our experience with a basecamp research hub to coordinate logistics and ongoing real-time analysis with research teams in the field. We also reflect on the implications for the meaning of research in a corporate context, where much of the value may be less in a final report, but more in the curated impressions and memories our colleagues take away from the the research trip.

“User-Defined Motion Gestures for Mobile Interaction”, Jaime Ruiz, Yang Li, Edward Lank, CHI 2011: ACM Conference on Human Factors in Computing Systems, pp. 197-206.
Modern smartphones contain sophisticated sensors that can detect rich motion gestures — deliberate movements of the device by end-users to invoke commands. However, little is known about best-practices in motion gesture design for the mobile computing paradigm. We systematically studied the design space of motion gestures via a guessability study that elicits end-user motion gestures to invoke commands on a smartphone device. The study revealed consensus among our participants on parameters of movement and on mappings of motion gestures onto commands, by which we developed a taxonomy for motion gestures and compiled an end-user inspired motion gesture set. The work lays the foundation of motion gesture design—a new dimension for mobile interaction.

Information Retrieval

“Reputation Systems for Open Collaboration”, B.T. Adler, L. de Alfaro, A. Kulshrestra, I. Pye, Communications of the ACM, vol. 54 No. 8 (2011), pp. 81-87.
This paper describes content based reputation algorithms, that rely on automated content analysis to derive user and content reputation, and their applications for Wikipedia and google Maps. The Wikipedia reputation system WikiTrust relies on a chronological analysis of user contributions to articles, metering positive or negative increments of reputation whenever new contributions are made. The Google Maps system Crowdsensus compares the information provided by users on map business listings and computes both a likely reconstruction of the correct listing and a reputation value for each user. Algorithmic-based user incentives ensure the trustworthiness of evaluations of Wikipedia entries and Google Maps business information.

Machine Learning and Data Mining

“Domain adaptation in regression”, Corinna Cortes, Mehryar Mohri, Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011.
Domain adaptation is one of the most important and challenging problems in machine learning.  This paper presents a series of theoretical guarantees for domain adaptation in regression, gives an adaptation algorithm based on that theory that can be cast as a semi-definite programming problem, derives an efficient solution for that problem by using results from smooth optimization, shows that the solution can scale to relatively large data sets, and reports extensive empirical results demonstrating the benefits of this new adaptation algorithm.

“On the necessity of irrelevant variables”, David P. Helmbold, Philip M. Long, ICML, 2011
Relevant variables sometimes do much more good than irrelevant variables do harm, so that it is possible to learn a very accurate classifier using predominantly irrelevant variables.  We show that this holds given an assumption that formalizes the intuitive idea that the variables are non-redundant.  For problems like this it can be advantageous to add many additional variables, even if only a small fraction of them are relevant.

“Online Learning in the Manifold of Low-Rank Matrices”, Gal Chechik, Daphna Weinshall, Uri Shalit, Neural Information Processing Systems (NIPS 23), 2011, pp. 2128-2136.
Learning measures of similarity from examples of similar and dissimilar pairs is a problem that is hard to scale. LORETA uses retractions, an operator from matrix optimization, to learn low-rank similarity matrices efficiently. This allows to learn similarities between objects like images or texts when represented using many more features than possible before.

Machine Translation

“Training a Parser for Machine Translation Reordering”, Jason Katz-Brown, Slav Petrov, Ryan McDonald, Franz Och, David Talbot, Hiroshi Ichikawa, Masakazu Seno, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11).
Machine translation systems often need to understand the syntactic structure of a sentence to translate it correctly. Traditionally, syntactic parsers are evaluated as standalone systems against reference data created by linguists. Instead, we show how to train a parser to optimize reordering accuracy in a machine translation system, resulting in measurable improvements in translation quality over a more traditionally trained parser.

“Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation”, Ashish Venugopal,Jakob Uszkoreit, David Talbot, Franz Och, Juri Ganitkevitch, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
We propose a general method to watermark and probabilistically identify the structured results of machine learning algorithms with an application in statistical machine translation. Our approach does not rely on controlling or even knowing the inputs to the algorithm and provides probabilistic guarantees on the ability to identify collections of results from one’s own algorithm, while being robust to limited editing operations.

“Inducing Sentence Structure from Parallel Corpora for Reordering”, John DeNero, Jakob Uszkoreit, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Automatically discovering the full range of linguistic rules that govern the correct use of language is an appealing goal, but extremely challenging.  Our paper describes a targeted method for discovering only those aspects of linguistic syntax necessary to explain how two different languages differ in their word ordering.  By focusing on word order, we demonstrate an effective and practical application of unsupervised grammar induction that improves a Japanese to English machine translation system.

Multimedia and Computer Vision

“Kernelized Structural SVM Learning for Supervised Object Segmentation”, Luca Bertelli, Tianli Yu, Diem Vu, Burak Gokturk,Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011.
The paper proposes a principled way for computers to learn how to segment the foreground from the background of an image given a set of training examples. The technology is build upon a specially designed nonlinear segmentation kernel under the recently proposed structured SVM learning framework.

“Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths”, Matthias Grundmann, Vivek Kwatra, Irfan Essa,IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011).
Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our stabilization technique automatically converts casual shaky footage into more pleasant and professional looking videos by mimicking these cinematographic principles. The original, shaky camera path is divided into a set of segments, each approximated by either constant, linear or parabolic motion, using an algorithm based on robust L1 optimization. The stabilizer has been part of the YouTube Editor (youtube.com/editor) since March 2011.

“The Power of Comparative Reasoning”, Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin, International Conference on Computer Vision (2011).
The paper describes a theory derived vector space transform that converts vectors into sparse binary vectors such that Euclidean space operations on the sparse binary vectors imply rank space operations in the original vector space. The transform a) does not need any data-driven supervised/unsupervised learning b) can be computed from polynomial expansions of the input space in linear time (in the degree of the polynomial) and c) can be implemented in 10-lines of code. We show competitive results on similarity search and sparse coding (for classification) tasks.

NLP

“Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections”, Dipanjan Das, Slav Petrov, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11), 2011, Best Paper Award.
We would like to have natural language processing systems for all languages, but obtaining labeled data for all languages and tasks is unrealistic and expensive. We present an approach which leverages existing resources in one language (for example English) to induce part-of-speech taggers for languages without any labeled training data. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in a hidden Markov model trained with the Expectation Maximization algorithm.

Networks

“TCP Fast Open”, Sivasankar Radhakrishnan, Yuchung Cheng, Jerry Chu, Arvind Jain, Barath Raghavan, Proceedings of the 7th International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2011.
TCP Fast Open enables data exchange during TCP’s initial handshake. It decreases application network latency by one full round-trip time, a significant speedup for today's short Web transfers. Our experiments on popular websites show that Fast Open reduces the whole-page load time over 10% on average, and in some cases up to 40%.

“Proportional Rate Reduction for TCP”, Nandita Dukkipati, Matt Mathis, Yuchung Cheng, Monia Ghobadi, Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement 2011, Berlin, Germany - November 2-4, 2011.
Packet losses increase latency of Web transfers and negatively impact user experience. Proportional rate reduction (PRR) is designed to recover from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs during TCP’s fast recovery. Experiments on Google Web and YouTube servers in U.S. and India demonstrate that PRR reduces the TCP latency of connections experiencing losses by 3-10% depending on response size.

Security and Privacy

“Automated Analysis of Security-Critical JavaScript APIs”, Ankur Taly, Úlfar Erlingsson, John C. Mitchell, Mark S. Miller, Jasvir Nagra, IEEE Symposium on Security & Privacy (SP), 2011.
As software is increasingly written in high-level, type-safe languages, attackers have fewer means to subvert system fundamentals, and attacks are more likely to exploit errors and vulnerabilities in application-level logic.  This paper describes a generic, practical defense against such attacks, which can protect critical application resources even when those resources are partially exposed to attackers via software interfaces. In the context of carefully-crafted fragments of JavaScript, the paper applies formal methods and semantics to prove that these defenses can provide complete, non-circumventable mediation of resource access; the paper also shows how an implementation of the techniques can establish the properties of widely-used software, and find previously-unknown bugs.

“App Isolation: Get the Security of Multiple Browsers with Just One”, Eric Y. Chen, Jason Bau, Charles Reis, Adam Barth, Collin Jackson, 18th ACM Conference on Computer and Communications Security, 2011.
We find that anecdotal advice to use a separate web browser for sites like your bank is indeed effective at defeating most cross-origin web attacks.  We also prove that a single web browser can provide the same key properties, for sites that fit within the compatibility constraints.

Speech

“Improving the speed of neural networks on CPUs”, Vincent Vanhoucke, Andrew Senior, Mark Z. Mao, Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.
As deep neural networks become state-of-the-art in real-time machine learning applications such as speech recognition, computational complexity is fast becoming a limiting factor in their adoption. We show how to best leverage modern CPU architectures to significantly speed-up their inference.

“Bayesian Language Model Interpolation for Mobile Speech Input”, Cyril Allauzen, Michael Riley, Interspeech 2011.
Voice recognition on the Android platform must contend with many possible target domains - e.g. search, maps, SMS. For each of these, a domain-specific language model was built by linearly interpolating several n-gram LMs from a common set of Google corpora. The current work has found a way to efficiently compute a single n-gram language model with accuracy very close to the domain-specific LMs but with considerably less complexity at recognition time.

Statistics

“Large-Scale Parallel Statistical Forecasting Computations in R”, Murray Stokely, Farzan Rohani, Eric Tassone, JSM Proceedings, Section on Physical and Engineering Sciences, 2011.
This paper describes the implementation of a framework for utilizing distributed computational infrastructure from within the R interactive statistical computing environment, with applications to timeseries forecasting. This system is widely used by the statistical analyst community at Google for data analysis on very large data sets.

Structured Data

“Dremel: Interactive Analysis of Web-Scale Datasets”, Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Communications of the ACM, vol. 54 (2011), pp. 114-123.
Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Besides continued growth internally to Google, Dremel now also backs an increasing number of external customers including BigQuery and UIs such as AdExchange front-end.

“Representative Skylines using Threshold-based Preference Distributions”, Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jim Xu, International Conference on Data Engineering (ICDE), 2011.
The paper adopts principled approach towards representative skylines and formalizes the problem of displaying k tuples such that the probability that a random user clicks on one of them is maximized. This requires mathematically modeling (a) the likelihood with which a user is interested in a tuple, as well as (b) how one negotiates the lack of knowledge of an explicit set of users. This work presents theoretical and experimental results showing that the suggested algorithm significantly outperforms previously suggested approaches.

“Hyper-local, directions-based ranking of places”, Petros Venetis, Hector Gonzalez, Alon Y. Halevy, Christian S. Jensen,PVLDB, vol. 4(5) (2011), pp. 290-30.
Click through information is one of the strongest signals we have for ranking web pages. We propose an equivalent signal for raking real world places: The number of times that people ask for precise directions to the address of the place. We show that this signal is competitive in quality with human reviews while being much cheaper to collect, we also show that the signal can be incorporated efficiently into a location search system.

Systems

“Power Management of Online Data-Intensive Services”, David Meisner, Christopher M. Sadler, Luiz André Barroso, Wolf-Dietrich Weber, Thomas F. Wenisch, Proceedings of the 38th ACM International Symposium on Computer Architecture, 2011.
Compute and data intensive Web services (such as Search) are a notoriously hard target for energy savings techniques. This article characterizes the statistical hardware activity behavior of servers running Web search and discusses the potential opportunities of existing and proposed energy savings techniques.

“The Impact of Memory Subsystem Resource Sharing on Datacenter Applications”, Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, Mary-Lou Soffa, ISCA, 2011.
In this work, the authors expose key characteristics of an emerging class of Google-style workloads and show how to enhance system software to take advantage of these characteristics to improve efficiency in data centers. The authors find that across datacenter applications, there is both a sizable benefit and a potential degradation from improperly sharing micro-architectural resources on a single machine (such as on-chip caches and bandwidth to memory). The impact of co-locating threads from multiple applications with diverse memory behavior changes the optimal mapping of thread to cores for each application. By employing an adaptive thread-to-core mapper, the authors improved the performance of the datacenter applications by up to 22% over status quo thread-to-core mapping, achieving performance within 3% of optimal.

“Language-Independent Sandboxing of Just-In-Time Compilation and Self-Modifying Code”, Jason Ansel, Petr Marchenko, Úlfar Erlingsson, Elijah Taylor, Brad Chen, Derek Schuff, David Sehr, Cliff L. Biffle, Bennet S. Yee, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2011.
Since its introduction in the early 90's, Software Fault Isolation, or SFI, has been a static code technique, commonly perceived as incompatible with dynamic libraries, runtime code generation, and other dynamic code.  This paper describes how to address this limitation and explains how the SFI techniques in Google Native Client were extended to support modern language implementations based on just-in-time code generation and runtime instrumentation. This work is already deployed in Google Chrome, benefitting millions of users, and was developed over a summer collaboration with three Ph.D. interns; it exemplifies how Research at Google is focused on rapidly bringing significant benefits to our users through groundbreaking technology and real-world products.

“Thialfi: A Client Notification Service for Internet-Scale Applications”, Atul Adya, Gregory Cooper, Daniel Myers, Michael Piatek,Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011, pp. 129-142.
This paper describes a notification service that scales to hundreds of millions of users, provides sub-second latency in the common case, and guarantees delivery even in the presence of a wide variety of failures.  The service has been deployed in several popular Google applications including Chrome, Google Plus, and Contacts.

��译�q�行�?

矛_�� | Fat Mind 2012-03-24 11:39 发表评论

clojure之HelloWrold

矛_�� | Fat Mind — Sun, 18 Mar 2012 15:33:00 GMT

今天开始尝试clojure�Q�遇到的问题、经验整�?br />
1.了解clojure

http://metaphy.iteye.com/blog/458872

2.开始HelloWrold
   - 搭徏开发环境（对于从Java�q�来的�h�Q�肯定习(f��n)惯eclipse�Q?br />   在线安装的速度比乌龟还慢，推荐全手动方式安装插�?br />    �Q�eclipse手动安装插�g http://www.tkk7.com/shijian/archive/2012/03/18/372141.html�Q?br />   ��ȝ��zip�Q?nbsp;http://roysong.iteye.com/blog/1260147
   - 跑�v�?br />   - �?黑窗�?�?nbsp;http://clojure.org/getting_started�Q�热热��n
   - eclipse开发（提醒�Q�必��L��clojure-xxx.jar加入classpath�Q?br />   - 阅读 http://www.ibm.com/developerworks/cn/opensource/os-eclipse-clojure/�Q�再�l�习(f��n)

3.如何学习(f��n)
   http://weiyongqing.iteye.com/blog/1441743
    �?“我就应该一步一步来�Q�先把clojure的doc文档�|�站上的core都敲打一遍，然后�Q�学�?f��n)孙宁的RPC框架�Q�空闲时�?clojure的问�?#8221;

矛_�� | Fat Mind 2012-03-18 23:33 发表评论

eclipse使用整理

矛_�� | Fat Mind — Sun, 18 Mar 2012 13:10:00 GMT

一、快捷键

1.常用快捷�?/span>

   a. crtl + h 查找内容
   b. ctrl + shift + r 快速打开资源文�g
   c. ctrl + shift + t 快速打开�c�L��?br /> d. alt + shift + o 快速打开 '选中相同词，出现阴媄'

2.如何讄��自己特定的快捷键


二、插�?br />
�?/span>务必阅读�Q?/span>
   http://wiki.eclipse.org/FAQ_How_do_I_install_new_plug-ins%3F �Q��ؓ(f��)什么推荐��用eclipse update manager�Q?
   http://www.venukb.com/2006/08/20/install-eclipse-plugins-the-easy-way/ �Q�主要讲�?manual install'安装方式�Q?/span>

1.插�g安装方式
   1.1 在线安装
   官网wiki写的很清楚，优势�Q?.插�g之间依赖��理、版本兼�Ҏ(gu��)��管�? 2.如同你在Windows安装软�g一��P��当你不需要的时候可以通过update manage很容易的卸蝲�Q�当你安装更多的plguin�Ӟ��更容易管理�?br />   eclipse wiki对manual install的看法：(x��)This obviously is a more dangerous approach, as no certification takes place about the suitability of the plug-in; it may rely on other plug-ins not available in your installation. In the case of compatibility conflicts, you won’t find out until you use the plug-in that it might break.
   可惜的是�Q�很多时候网�l�的情况不是很理惻I��试很多遍后�Q�依然失败；�q�是促��manual install�Ҏ(gu��)��的原因�?
   1.2 手动安装
   a、第一�U�方式：(x��)下蝲plugin到本圎ͼ�解压后复制features、plugin�?eclipse_home%下对应的目录
   如此�?nbsp;http://static.flickr.com/75/219742315_9ee663e2c8_o.png
   优势�Q�绝对简单；�~�点�Q�正好是通过update manager安装的优点，插�g之间的依赖、版本兼�Ҏ(gu��)��，以及(qi��ng)后箋的管理，都需要手动操作�?br />   b、第二种方式�Q�通过.link的方式，解决'后箋��理问题'
   b-1、eclipse目录创徏 links 目录
   b-2、创建对应的.link文�g�Q�如�Q�subversive.link
   b-3、创建subversive/eclipse/�Q�拷贝features、plugin到此目录
   b-4、修改subversive.link文�g�Q�如�Q�path=E:/dev/eclipse-t/thrid-plugins/subversive
   b-5、重启eclipse�Q�重启后�Q�发现要使用svn�Q�必��d��装subversive connector�Q�验证手动安装的�~�点�Q?br />   c、提�C�：(x��)
   - 手动安装插�g�Ӟ��务必仔细阅读�Q�此插�g的先前条�Ӟ��否则出问题，很难排查�Q��?br />                    如：(x��)m2eclipse先决条�g

subeclipse

、mylyn�?br /> �?“Pre-requisite: an Eclipse version including Java Support (e.g. with the JDT : Java Development Tools, as in Eclipse For Java Developers, Eclipse For RCP/RAP developers, Eclipse for JavaEE developers, etc.)” http://code.google.com/p/counterclockwise/wiki/Documentation#Install_Counterclockwise_plugin
-

eclipse 手动安装plugin�Q?link文�g的path路径必须使用�l�对路径

�ȝ��Q�对eclipse插�g安装�Q�首先推荐update manager�Q�仅当网�l�环境不允许�Ӟ��安装��p�|�Ӟ��再尝试手动安装�?br />
2.插�g资源攉��

2.1�?m2eclipse插�g安装
   1�Q�先��x��?br />    a、eclipse3.2或更高版本（可忽略，一般��用的eclipse已经3.5以上版本�Q?br />    b、jdk高于1.4版本�Q�eclipse�q�行在jdk环境�Q�非jre环境
   c、必��d��安装插�g�Q�subeclipse�Q�svn�Q�、mylyn�Q��Q务管理）�Q? mylyn在eclipse3.5以上版本�Q�已默认存在�Q�无需安装
   svn插�g在线安装地址�Q�网�l�不��定性，更推荐下载zip�Q�archive选择本地文�g安装�Q?br />        http://subclipse.tigris.org/servlets/ProjectProcess;jsessionid=290480ED68C2C7E781DCCE66CE657FC2?pageID=p4wYuA
   2�Q�安装m2eclipse�Q�未扑ֈ�可下载到本地的zip�Q�只能在�U�安装，地址 http://www.eclipse.org/m2e/download/

矛_�� | Fat Mind 2012-03-18 21:10 发表评论

泛型 [core Java 阅读�W�记]

矛_�� | Fat Mind — Thu, 08 Mar 2012 13:05:00 GMT

题记�Q�单元测试的�q�程中，遇到泛型mock的问题；重新温习(f��n)一遍，阅读�Q�core java 泛型�Q?br />

xmind格式�Q�可下蝲�Q?�Q?a title="整理�q�程中，记录为xmind格式" href="/Files/shijian/generic.rar">整理�q�程中，记录为xmind格式

单元��试遇到的问题，��化后如下�Q?br />

1     public List extends Date> getDateT() {
2         return null;
3     }
4     public List<Date> getDate() {
5         return null;
6     }
7     public void mockGetDate() {
8         TestMain main = mock(TestMain.class);
9         when(main.getDate()).thenReturn(new ArrayList<Date>()); //�~�译OK
10         /*
11          * The method thenReturn(List) in the type
12          * OngoingStubbing>
is not applicable for the arguments (ArrayList)
13          */
14         when(main.getDateT()).thenReturn(new ArrayList<Date>()); //�~�译错误
15         when(main.getDateT()).thenReturn(new ArrayList<Timestamp>()); //�~�译错误
16         when(main.getDateT()).thenReturn(new ArrayList<Object>()); //�~�译错误
17         when(main.getDateT()).thenReturn(new ArrayList()); //�~�译OK
18     }

仍没理解�Q�哪位大仙，能帮我解释下 �Q?img src ="http://www.tkk7.com/shijian/aggbug/371545.html" width = "1" height = "1" />

矛_�� | Fat Mind 2012-03-08 21:05 发表评论

记录��_(d��)��(x��)日常问题

矛_�� | Fat Mind — Thu, 15 Dec 2011 07:55:00 GMT

1.应用 jar 冲突
log4j冲突��D��Q�应用报错。类型�{换冲�H��?br /> 需求：(x��)定位某个�c�d��际从那个jar加蝲 �Q?-verbose:class 参数�Q�或�?nbsp;

-XX:+TraceClassLoading�Q�，详细的记录了加蝲了那些类、从那个jar加蝲�?br />
参见�Q?a >http://agapple.iteye.com/blog/946603

2.性能��试�q�程
linux有什么命令、或软�g�Q�可以同时收集cpu、load、上下文切换、mem、网�l�IO、磁盘I(y��)O�{�数据吗 �Q?br /> vmstat 含义详解 �Q?-> 囑�Ş化报�?�Q�痛苦的是要'人工'看着记录数据�Q�这��直是�E�序员的污点呀�Q?br /> �Q�vmstat的IO�l�计的是块设备（如磁盘）的数据，�|�卡没有对应的设备文�Ӟ��http://oss.org.cn/kernel-book/ch11/11.2.3.htm�Q�，�|�络IO�l�计使用iftop�Q?nbsp;
vmstat http://linux.about.com/library/cmd/blcmdl8_vmstat.htm

3.Jboss启动错误

java.sql.SQLException: Table already exists: JMS_MESSAGES in statement [CREATE CACHED TABLE JMS_MESSAGES]
参见�Q�http://dinghaoliang.blog.163.com/blog/static/126540714201082764733272/
%jboss_home%/server/default/deploy/hsqldb-ds.xml�q�个文�g中有一个DefaultDS数据源配�|�，临时解决删除hsqldb-ds.xml文�g。原因未知�?br />
4.logback 0.9.19 版本�Q�引�?lt;encoder>�Q�放�?

  <encoder>
            <pattern>%m%npattern>
            <charset class="java.nio.charset.Charset">UTF-8charset>
        encoder>

源码�Q�OutputStreamAppender.java

protected void writeOut(E event) throws IOException {

this.encoder.doEncode(event);

}

�Ҏ(gu��)��志文件charset指定�Q�经�q�debug调试�Q�必��通过此方式配�|�才有效。否则取�pȝ��默认�~�码�?br />
5.讄��linux�pȝ��~�码

http://linux.vbird.org/linux_basic/0320bash.php#variable_locale
其实‘�pȝ��~�码’讄��Q�即讄��对应的系�l�变量，则所有可讄��pȝ��变量的文仉��可设�|�编码，export使其生效
locale 查看当前用户使用的编码（�Q�，locale -a 查看机器所支持的所有编�?br />默认讄��Q?br /> a、系�l��? /etc/profile -> /etc/sysconfig/i18n�Q�设�|?LANG �Q�无效显�C�export生效�Q�（YY�Q�i18n有个LANGUAGE讑֮��Q�不知其含义�Q�删除无影响�Q?br /> b、用��L(f��ng)��?~/bash_rc、~/bash_profile、~/bash_login、~/profile�Q�读取有限顺序：(x��)从左向右�Q�必��L��C�export生效

http://linux.vbird.org/linux_basic/0320bash.php#settings_bashrc

�a�定 LANG 或者是 LC_ALL 時，則其他的語系變數��會被這兩個變數所取代。��M��一句话�Q�在当前用户讄��LANG�Q�是最优方案�?br />

矛_�� | Fat Mind 2011-12-15 15:55 发表评论