亚洲国产第一页www,中文字幕亚洲综合久久男男,亚洲人6666成人观看

数据库事务与隔离�{��详解

Chan Chen — Fri, 21 Dec 2012 05:06:00 GMT

事务�Q?/span>transaction�Q�是数据库管理系�l�的执行单位�Q�可以是一个数据库操作�Q�如Select操作�Q�或者是一�l�操作序列。事�?/span>ACID属性，卛_��子性（Atomicity�Q�、一致�?/span>(Consistency)、隔��L��（Isolation�Q�、持久性（Durability�Q��?/span>

原子性：保证事务中的所有操作全部执行或全部不执行。例如执行�{账事务，要么转�̎成功�Q�要么失败。成功，则金额从转出帐户转入到目的帐��P��q�且两个帐户金额��发生相应的变化�Q�失败，则两个�̎��L��金额都不变。不会出现�{出帐��h��了钱�Q�而目的帐��h��有收到钱的情��c�?/span>

一致性：保证数据库始�l�保持数据的一致�?#8212;—事务操作之前是一致的�Q�事务操作之后也是一致的�Q�不��事务成功与否。如上面的例子，转�̎之前和之后数据库都保持数据上的一致性�?/span>

隔离性：多个事务�q�发执行的话�Q�结果应该与多个事务串行执行效果是一��L��。显然最��单的隔离��是��所有事务都串行执行�Q�先来先执行�Q�一个事务执行完了才允许执行下一个。但�q�样数据库的效率低下�Q�如�Q�两个不同的事务只是��d��同一�Ҏ��据，�q�样完全可以�q�发�q�行。�ؓ了控制�ƈ发执行的效果��有了不同的隔离�U�别。下面将详细介绍�?/span>

持久性：持久性表�C�Z��物操作完成之后，�Ҏ��据库的媄响是持久的，即��数据库因故障而受到破坏，数据库也应该能够恢复。通常的实现方式是采用日志�?/span>

事务隔离�U�别�Q?/span>transaction isolation levels�Q�：隔离�U�别��是对对事务�q�发控制的等�U��?/span>ANSI/ ISO SQL��其分�ؓ串行化（SERIALIZABLE�Q�、可重复读（REPEATABLE READ�Q�、读已提交（READ COMMITED�Q�、读未提交（READ UNCOMMITED�Q�四个等�U�。�ؓ了实现隔��ȝ�别通常数据库采用锁�Q?/span>Lock�Q�。一般在�~�程的时候只需要设�|�隔��ȝ��U�，至于具体采用什么锁则由数据库来讄��。首先介�l�四�U�等�U�，然后举例解释后面三个�{��Q�可重复诅R��读已提交、读未提交）中会出现的�ƈ发问题�?/span>

串行化（SERIALIZABLE�Q�：所有事务都一个接一个地串行执行�Q�这样可以避免��读（phantom reads�Q�。对于基于锁来实现�ƈ发控制的数据库来��_��串行化要求在执行范围查询�Q�如选取�q�龄�?/span>10�?/span>30之间的用��P��的时候，需要获取范围锁�Q?/span>range lock�Q�。如果不是基于锁实现�q�发控制的数据库�Q�则��查到有违反串行操作的事务�Ӟ��需要滚回该事务�?/span>

可重复读�Q?/span>REPEATABLE READ�Q�：所有被Select获取的数据都不能被修改，�q�样��可以避免一个事务前后读取数据不一致的情况。但是却没有办法控制�q�读�Q�因��个时候其他事务不能更�Ҏ��选的数据�Q�但是可以增加数据，因�ؓ前一个事务没有范围锁�?/span>

��d��提交�Q?/span>READ COMMITED�Q�：被读取的数据可以被其他事务修攏V��这样就可能��D��不可重复诅R��也��是��_��事务的读取数据的时候获取读锁，但是��d��之后立即释放�Q�不需要等��C��务结束）�Q�而写锁则是事务提交之后才释放。释放读锁之后，��可能被其他事物修改数据。该�{��也是SQL Server默认的隔��ȝ��U��?/span>

��L��提交�Q?/span>READ UNCOMMITED�Q�：�q�是最低的隔离�{��Q�允许其他事务看到没有提交的数据。这�U�等�U�会��D��脏读�Q?/span>Dirty Read�Q��?/span>

例子�Q�下面考察后面三种隔离�{��对应的�ƈ发问题。假设有两个事务。事�?/span>1执行查询1�Q�然后事�?/span>2执行查询2�Q�然后提交，接下来事�?/span>1中的查询1再执行一�ơ。查询基于以下表�q�行�Q?/span>

users
id	name	age
1	Joe	20
2	Jill	25

可重复读(�q�读�Q�phantom reads)

一个事务中先后各执行一�ơ同一个查询，但是�q�回的结果集却不一栗��发生这�U�情冉|��因�ؓ在执行Select操作的时候没有获取范围锁�Q�Range Lock�Q�，��D��其他事务仍然可以插入新的数据�?/p>

Transaction 1	Transaction 2
/* Query 1 / SELECT FROM users WHERE age BETWEEN 10 AND 30;
	/* Query 2 */ INSERT INTO users VALUES ( 3, 'Bob', 27 ); COMMIT;
/* Query 1 / SELECT FROM users WHERE age BETWEEN 10 AND 30;

注意transaction 1对同一个查询语句（Query 1�Q�执行了两次�?如果采用更高�U�别的隔��ȝ��U�（即串行化�Q�的话，那么前后两次查询应该�q�回同样的结果集。但是在可重复读隔离�{��中却前后两次�l�果集不一栗��但是�ؓ什么叫做可重复�ȝ��U�呢�Q�那是因��{��解决了下面的不可重复读问题�?/p>

��d��提交�Q�不可重复读�Q�Non-repeatable reads�Q?/h3>

在采用锁来实现�ƈ发控制的数据库系�l�中�Q�不可重复读是因为在执行Select操作的时候没有加读锁�Q�read lock�Q��?/p>

Transaction 1	Transaction 2
/* Query 1 / SELECT FROM users WHERE id = 1;
	/* Query 2 */ UPDATE users SET age = 21 WHERE id = 1; COMMIT;
/* Query 1 / SELECT FROM users WHERE id = 1;

在这个例子当中，Transaction 2提交成功,所以Transaction 1�W�二�ơ将获取一个不同的age�?在SERIALIZABLE和REPEATABLE READ隔离�U�别�?数据库应该返回同一个倹{��而在READ COMMITTED和READ UNCOMMITTED�U�别中数据库�q�回更新的倹{��这样就出现了不可重复读�?/p>

��L��提交 (脏读�Q�dirty reads)

如果一个事�?��d��了另一个事�?修改的��|��但是最后事�?滚回了，那么事务2��p��取了一个脏数据�Q�这也就是所谓的脏读。发生这�U�情况就是允�怺�务读取未提交的更新�?/p>

Transaction 1	Transaction 2
/* Query 1 / SELECT FROM users WHERE id = 1;
	/* Query 2 */ UPDATE users SET age = 21 WHERE id = 1;
/* Query 1 / SELECT FROM users WHERE id = 1;
	RollBack

�l�g��q�ͼ�可以�{�到下面的表��|��

隔离�{��	脏读	不可重复�?/span>	�q�读
��L��提交	YES	YES	YES
��d��提交	NO	YES	YES
可重复读	NO	NO	YES
串行�?/span>	NO	NO	NO

Chan Chen 2012-12-21 13:06 发表评论

BLOBs and CLOBs

Chan Chen — Fri, 30 Nov 2012 05:44:00 GMT

solidDB® can store binary and character data up to 2147483647 (2G - 1) bytes long. When such data exceeds a certain length, the data is called a BLOB (Binary Large OBject) or CLOB (Character Large OBject), depending upon the data type that stores the information. CLOBS contain only "plain text" and can be stored in any of the following data types:

CHAR, WCHAR

VARCHAR, WVARCHAR

LONG VARCHAR (mapped to standard type CLOB),

LONG WVARCHAR (mapped to standard type NCLOB)

BLOBs can store any type of data that can be represented as a sequence of bytes, such as a digitized picture, video, audio, a formatted text document. (They can also store plain text, but you'll have more flexibility if you store plain text in CLOBs). BLOBs are stored in any of the following data types:

BINARY

VARBINARY

LONG VARBINARY (mapped to standard type BLOB)

Since character data is a sequence of bytes, character data can be stored in BINARY fields, as well as in CHAR fields. CLOBs can be considered a subset of BLOBs.

For convenience, we will use the term BLOBs to refer to both CLOBs and BLOBs.

For most non-BLOB data types, such as integer, float, date, etc., there is a rich set of valid operations that you can do on that data type. For example, you can add, subtract, multiply, divide, and do other operations with FLOAT values. Because a BLOB is a sequence of bytes and the database server does not know the "meaning" of that sequence of bytes (i.e. it doesn't know whether the bytes represent a movie, a song, or the design of the space shuttle), the operations that you can do on BLOBs are very limited.

solidDB does allow you to perform some string operations on CLOBs. For example, you can search for a particular substring (e.g. a person's name) inside a CLOB by using the LOCATE() function. Because such operations require a lot of the server's resources (memory and/or CPU time), solidDB allows you to limit the number of bytes of the CLOB that are processed. For example, you might specify that only the first 1 megabyte of each CLOB be searched when doing a string search. For more information, see the description of the MaxBlobExpressionSize configuration parameter in solidDB Administration Guide.

Although it is theoretically possible to store the entire blob "inside" a typical table, if the blob is large, then the server usually performs better if most or all of the blob is not stored in the table. In solidDB, if a blob is no more than N bytes long, then the blob is stored in the table. If the blob is longer than N bytes, then the first N bytes are stored in the table, and the rest of the blob is stored outside the table as disk blocks in the physical database file. The exact value of "N" depends in part upon the structure of the table, the disk page size that you specified when you created the database, etc., but is always at least 256. (Data 256 bytes or shorter is always stored in the table.)

If a data row size is larger than one third of the disk block size of the database file, you must store it partly as a BLOB.

The SYS_BLOBS system table is used as a directory for all BLOB data in the physical database file. One SYS_BLOB entry can accommodate 50 BLOB parts. If the BLOB size exceeds 50 parts, several SYS_BLOB entries per BLOB are needed.

The query below returns an estimate on the total size of BLOBs in the database.

select sum(totalsize) from sys_blobs

The estimate is not accurate, because the info is only maintained at checkpoints. After two empty checkpoints, this query should return an accurate response.

Chan Chen 2012-11-30 13:44 发表评论

LDAP采用BDB作�ؓ后端数据库的理由

Chan Chen — Thu, 03 May 2012 03:38:00 GMT

1.许多世界知名的大公司都采用了BDB作�ؓ多种关键性业务的后端数据库�?/span>
Sleepycat Software makes Berkeley DB, the most widely used application-specific data management software in the world with more than 200 million deployments. Customers such as Amazon, AOL, British Telecom, Cisco Systems, EMC, Ericsson, Google, Hitachi, HP, Motorola, RSA Security, Sun Microsystems, TIBCO and Veritas also rely on Berkeley DB for fast, scalable, reliable and cost-effective data management for their mission-critical applications.
�?Sleepycat软�g公司出品的Berkeley DB是一�U�在特定的数据管理应用程序中�q�泛使用的数据库�pȝ��,在世界范围内有超�q�两亿的用户支持.许多世界知名的厂�?像Amazon, AOL, British Telecom, Cisco Systems, EMC, Ericsson, Google, Hitachi, HP, Motorola, RSA Security, Sun Microsystems, TIBCO 以及 Veritas都依赖于BDB��Z��们的许多关键性应用提供快速的,�Ҏ��的,可靠�?�q�且高性�h比的数据��理.

2.以下是chinaunix.net上一位高手给出的解释�Q�在�q�里引用一下�?/span>
mysql��是用BDB实现�?mysql的后�? 。mysql快，BDB比mysql�q�要快N倍�?/span>
BDB�q�发高于RDBMS�?nbsp;
定w��支持可达256TB�?/span>
��Z��HASH支持select数据比RDBMS快�?/span>

3.BDB数据库与其它的几�U�数据库的比较�?/span>
BDB数据库不同与其他几种数据�?-关系型（Relational databases�Q�，面向对象型（Object-oriented databases�Q�，�|�络数据库（Network databases�Q�，它是一�U�嵌入式�Q�embeded databases)数据库�?/span>

下面先简要说说BDB与其它几�U�数据库的区别：
�Q?�Q�它们几乎都无一例外的采用了�l�构化查询语�a��Q�SQL�Q�，而BDB没有�?/span>
�Q?�Q�它们几乎都无一例外的采用了客户/服务器模型，而BDB采用的是嵌入式模型�?/span>

下面是在�|�上扄��一些有关BDB的资�?解释了BDB之所以会和当前流行的大多数数据库不同的一些原�?所引资料未注明出处,后面的翻译是我自己加的：
�Q?�Q�Berkeley DB is an open source embedded database library that provides scalable, high-performance, transaction-protected data management services to applications. Berkeley DB provides a simple function-call API for data access and management.
译：BDB是一个开放源代码的嵌入式数据库的函数库，它�ؓ应用�E�序提供�Ҏ��的�Q�高性能的，transaction-protected的数据库��理服务�Q�BDB为数据的讉K��和管理提供了��单的应用�E�序接口API�?/span>
�Q?�Q�Berkeley DB is embedded because it links directly into the application. It runs in the same address space as the application. As a result, no inter-process communication, either over the network or between processes on the same machine, is required for database operations. Berkeley DB provides a simple function-call API for a number of programming languages, including C, C++, Java, Perl, Tcl, Python, and PHP. All database operations happen inside the library. Multiple processes, or multiple threads in a single process, can all use the database at the same time as each uses the Berkeley DB library. Low-level services like locking, transaction logging, shared buffer management, memory management, and so on are all handled transparently by the library.
译：BDB之所以是嵌入式数据库是因为它是直接连到应用程序中的。它和应用程序在同一内存�I�间�q�行。其�l�果是，不管应用�E�序是运行在同一台机器上�q�是�q�行在网�l�上�Q�在�q�行数据库操作时�Q�它都无需�q�行�q�程间通信。BDB��多编�E�语�a�提供了函数接口，�q�些语言包括C, C++, Java, Perl, Tcl, Python, �?PHP。所有的数据库操作都发生在函数库内部。多个进�E�，或者是一个进�E�中的多个线�E�，都可以同时��用BDB�Q�因为它们实际是在调用BDB函数库。一些像locking, transaction logging, shared buffer management, memory management�{�等之类的低�U�服务都可以由函数库透明地处理�?/span>
�Q?�Q�The library is extremely portable. It runs under almost all UNIX and Linux variants, Windows, and a number of embedded real-time operating systems. It runs on both 32-bit and 64-bit systems. It has been deployed on high-end Internet servers, desktop machines, and on palmtop computers, set-top boxes, in network switches, and elsewhere. Once Berkeley DB is linked into the application, the end user generally does not know that there's a database present at all.
译：BDB函数库是高度可移植的。它可以�q�行在几乎所有的UNIX和LINUX�pȝ��之上�Q�也支持WINDOWS和多�U�嵌入式实时操作�pȝ��。它既可以运行在32位系�l�上�Q�也可以�q�行�?4位系�l�上。它�z�跃在高端服务器�Q�桌面系�l�，掌上电脑�Q�set-top boxes�Q�网�l�交换机以及其它的一些领域。一旦BDB被连接到应用当中以后�Q�终端用户一般是不知道后端数据库的存在的�?/span>
�Q?�Q�Berkeley DB is scalable in a number of respects. The database library itself is quite compact (under 300 kilobytes of text space on common architectures), but it can manage databases up to 256 terabytes in size. It also supports high concurrency, with thousands of users operating on the same database at the same time. Berkeley DB is small enough to run in tightly constrained embedded systems, but can take advantage of gigabytes of memory and terabytes of disk on high-end server machines.
译：BDB在许多方面都是弹性的。函数库本��n非常紧凑�Q�在常见的机器体�p�M��大约只占用不�?00K的text�I�间�Q�但是它可以操作多达256TB的数据。它也支持高强度的�ƈ发操作，可以同时允许��C��千计的用户在同一个数据库�q�行操作。在高端服务器领域，BDB是��够小的，它可以在高度受限的嵌入式�pȝ��上运行，但却可以利用高达GB量��的内存空间和高达TB量��的磁盘空间�?/span>
�Q?�Q�Berkeley DB generally outperforms relational and object-oriented database systems in embedded applications for a couple of reasons. First, because the library runs in the same address space, no inter-process communication is required for database operations. The cost of communicating between processes on a single machine, or among machines on a network, is much higher than the cost of making a function call. Second, because Berkeley DB uses a simple function-call interface for all operations, there is no query language to parse, and no execution plan to produce.
译：BDB在嵌入式应用斚w��的性能比关�p�d��数据库和面向对象的数据库优越的原因是多方面的。首先，因�ؓ函数库和应用是运行在同一地址�I�间中的�Q�省掉了数据库操作时的进�E�间通信。而众所周知�Q�不��是在单��Z��q�是在分布式�pȝ��上，�q�程间通信所��q��旉��q�多于函数调用所要的旉��。其�ơ，因�ؓBDB�Ҏ��有的操作提供了简�z�的函数调用接口�Q�无需�Ҏ��询语�a��q�行解析�Q�也不需要预执行�?/span>
�Q?�Q�In contrast to most other database systems, Berkeley DB provides relatively simple data access services.Berkeley DB supports only a few logical operations on records. They are:

Insert a record in a table.
Delete a record from a table.
Find a record in a table by looking up its key.
Update a record that has already been found.
译：与其他大多数数据库系�l�相比，BDB提供了相对简单的数据讉K��服务。BDB只支持对记录所做的几种逻辑操作。它们是�Q?/span>

在表中插入一条记录�?/span>
从表中删除一条记录�?/span>
通过查询键（key�Q�从表中查找一条记录�?/span>
更新表中已有的一条记录�?/span>
�Q?�Q�Berkeley DB is not a standalone database server. It is a library, and runs in the address space of the application that uses it.It is possible to build a server application that uses Berkeley DB for data management. For example, many commercial and open source Lightweight Directory Access Protocol (LDAP) servers use Berkeley DB for record storage. LDAP clients connect to these servers over the network. Individual servers make calls through the Berkeley DB API to find records and return them to clients. On its own, however, Berkeley DB is not a server.
译：BDB不是一个独立的数据库服务器。它是一个函数库�Q�和调用它的应用�E�序是运行在同一地址�I�间中的。可以把BDB作�ؓ数据库管理系�l�来构徏服务器程序。比如，有许多商业的和开源的轻量�U�目录访问协议（LDAP�Q�服务器都��用BDB存储记录。LDAP客户端通过�|�络�q�接到服务器。服务器调用BDB的API来查找记录�ƈ�q�回�l�客戗��而在它本�w�而言�Q�BDB却不是数据库的服务器端�?/span>
所以，BDB是一�U�完全不同于其它数据库管理系�l�的数据库，而且它也不是一个数据库服务器端�?/span>

4.BDB的优点和�~�点�?/span>
Berkeley DB is an ideal database system for applications that need fast, scalable, and reliable embedded database management. For applications that need different services, however, it can be a poor choice.
Berkeley DB was conceived and built to provide fast, reliable, transaction-protected record storage. The library itself was never intended to provide interactive query support, graphical reporting tools, or similar services that some other database systems provide.We have tried always to err on the side of minimalism and simplicity. By keeping the library small and simple, we create fewer opportunities for bugs to creep in, and we guarantee that the database system stays fast, because there is very little code to execute. If your application needs that set of features, then Berkeley DB is almost certainly the best choice for you.
译：当面对的是对性能�Q�规模和可靠性要求都比较高的嵌入式应用的时候，BDB是理想的数据库管理系�l�。但对于要求多种不同服务的应用而言�Q�选择它是不适当的�?/span>
BDB的初��h��提供快速的�Q�可靠的�Q�transaction-protected的记录存储。函数库本��n�q�没有提供对交互查询的支持，也没有提供图形化的报表工��P��或者一些其它的数据库管理系�l�提供的服务。我们一直在致力于保持函数库的短��和��l�，�q�样做，可以使得bug出现的机会大大减��，而且因�ؓ只有很少的代码需要执行，我们可以保证数据库一直快速的�q�行。如果你的应用正好需要的是这��L��一套功能的话，那么BDB几乎一定是你的首选对象�?/span>

5.我个人的观点
BDB之所以适合LDAP,一个关键的因素是它可以保证LDAP的快速响�?因�ؓBDB本��n是一�U�嵌入式的数据库,速度快是它最大的特点,也是它和其他数据库系�l�相比最大的优势.我们再来看LDAP,LDAP是一�U�一旦数据徏立就很少需要改动的数据�?�q�且它最常用的操作是��d��,查询,搜烦�{�等不改变数据库内容的操�?而让BDB来做�q�几�U�事情无疑是最好的选择.�q�样,即��在有大量用户提交数据库查询的情况�?LDAP仍能快速反馈给用户有用信息.所�?速度的考虑是LDAP选用BDB的最大因�?�q�也是目前绝大多数的LDAP服务器都选用BDB的根本原�?

Chan Chen 2012-05-03 11:38 发表评论

Make Auto Incrementing Field in MongoDB

Chan Chen — Fri, 13 Apr 2012 17:11:00 GMT

Side counter method

One can keep a counter of the current _id in a side document, in a collection dedicated to counters.

Then use FindAndModify to atomically obtain an id and increment the counter.

> db.counters.insert({_id: "userId", c: 0});

> var o = db.counters.findAndModify(

... {query: {_id: "userId"}, update: {$inc: {c: 1}}});

{ "_id" : "userId", "c" : 0 }

> db.mycollection.insert({_id:o.c, stuff:"abc"});

> o = db.counters.findAndModify(

... {query: {_id: "userId"}, update: {$inc: {c: 1}}});

{ "_id" : "userId", "c" : 1 }

> db.mycollection.insert({_id:o.c, stuff:"another one"});

Once you obtain the next id in the client, you can use it and be sure no other client has it.

Optimistic loop method

One can do it with an optimistic concurrency "insert if not present" loop. The following example, in Mongo shell Javascript syntax, demonstrates.

// insert incrementing _id values into a collection

function insertObject(o) {

x = db.myCollection;

while( 1 ) {

// determine next _id value to try

var c = x.find({},{_id:1}).sort({_id:-1}).limit(1);

var i = c.hasNext() ? c.next()._id + 1 : 1;

o._id = i;

x.insert(o);

var err = db.getLastErrorObj();

if( err && err.code ) {

if( err.code == 11000 /* dup key */ )

continue;

else

print("unexpected error inserting data: " + tojson(err));

}

break;

}

The above should work well unless there is an extremely high concurrent insert rate on the collection. In that case, there would be a lot of looping potentially.

Chan Chen 2012-04-14 01:11 发表评论

SP

Chan Chen — Tue, 10 Apr 2012 10:14:00 GMT

-- --------------------------------------------------------------------------------

-- Routine DDL

-- Note: comments before and after the routine body will not be stored by the server

-- --------------------------------------------------------------------------------

DELIMITER $$

CREATE DEFINER=`root`@`localhost` PROCEDURE `chan_insert_date_by_starttime`()

BEGIN

DECLARE var_id decimal(18,0);

DECLARE var_issue decimal(18,0);

DECLARE var_date datetime ;

DECLARE cur1 CURSOR FOR

SELECT issue, datevalue

FROM jira.customfieldvalue

where customfield in (10006,10007)

and issue in (

SELECT distinct issue

FROM jira.customfieldvalue

where customfield in (10007)

)

and issue not in (

SELECT distinct issue

FROM jira.customfieldvalue

where customfield in (10006)

);

OPEN cur1;

read_loop: LOOP

FETCH cur1 INTO var_issue, var_date;

select (max(id) + 1) into var_id from jira.customfieldvalue ;

INSERT INTO jira.customfieldvalue(id,issue, customfield,datevalue)

VALUES (var_id,var_issue,10006, var_date);

END LOOP;

CLOSE cur1;

END

-- --------------------------------------------------------------------------------

-- Routine DDL

-- Note: comments before and after the routine body will not be stored by the server

-- --------------------------------------------------------------------------------

DELIMITER $$

CREATE DEFINER=`root`@`localhost` PROCEDURE `chan_insert_date_by_created`()

BEGIN

DECLARE var_id decimal(18,0);

DECLARE var_issue decimal(18,0);

DECLARE var_date datetime ;

DECLARE cur1 CURSOR FOR

SELECT id, created FROM jira.jiraissue where id in

( SELECT issue

FROM jira.customfieldvalue

where customfield in (10000)

and issue not in (

SELECT distinct issue

FROM jira.customfieldvalue

where customfield in (10007)

)

and issue not in (

SELECT distinct issue

FROM jira.customfieldvalue

where customfield in (10006)

)

) ;

OPEN cur1;

read_loop: LOOP

FETCH cur1 INTO var_issue, var_date;

select (max(id) + 1) into var_id from jira.customfieldvalue ;

INSERT INTO jira.customfieldvalue(id,issue, customfield,datevalue)

VALUES (var_id,var_issue,10006, var_date);

END LOOP;

CLOSE cur1;

END

Chan Chen 2012-04-10 18:14 发表评论

MySQL Auto Backup

Chan Chen — Wed, 21 Mar 2012 09:43:00 GMT

Below is the script example to backup mysql database in command line:-

$ mysqldump -h localhost -u username -p database_name > backup_db.sql

If your mysql database is very big, you might want to compress your sql file.

Just use the mysql backup command below and pipe the output to gzip,

then you will get the output as gzip file.

$ mysqldump -u username -h localhost -p database_name | gzip -9 > backup_db.sql.gz

If you want to extract the .gz file, use the command below:-

$ gunzip backup_db.sql.gz

Type the following command to import sql data file:

$ mysql -u username -p -h localhost DATA-BASE-NAME < data.sql

In this example, import 'data.sql' file into 'blog' database using vivek as username:

$ mysql -u vivek -p -h localhost blog < data.sql

If you have a dedicated database server, replace localhost hostname with with actual server name or IP address as follows:

$ mysql -u username -p -h 202.54.1.10 databasename < data.sql

OR use hostname such as mysql.cyberciti.biz

$ mysql -u username -p -h mysql.cyberciti.biz database-name < data.sql

If you do not know the database name or database name is included in sql dump you can try out something as follows:

$ mysql -u username -p -h 202.54.1.10 < data.sql

To auto backup database, please read 10 ways to Automatically & Manually Backup MySQL Database , here is one example

[root@sdc-d1-pangaea-devops1 ~]# vi /etc/crontab

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name command to be executed
59 23 * * tue root mysqldump -u root -proot123 jira > /export/home/web/dbbak/jira/jira_$(date +\%Y\%m\%d).sql

This script will generate an backup file at 23:59 every Tue.

Note

The cron does not regonize `date +%Y%m%d`, so use $(date +\%Y\%m\%d) instead of it.

Chan Chen 2012-03-21 17:43 发表评论

MongoDB Admin Tool -- RockMongo Install for Ubuntu

Chan Chen — Sun, 18 Mar 2012 07:54:00 GMT

Since from the first time to use MongoDB in my own project, more than one month has passed. During this last month, I have to say MongoDB is powerful, and suit my project well, but it is hard to manage/admin the database. and the way of display json data is ugly by the build-in mongo commomd. For instance,

> db.User.find()
{ "_id" : ObjectId("4f657b5e6803fa511a000000"), "about" : "Change the way people learn", "blackListCreateTime" : "2012-03-16 13:00", "createTime" : "2012-03-16 13:00", "email" : "admin@stdtlk.com", "fanCreateTime" : "2012-03-16 13:00", "fanId" : [     ObjectId("4f657b5e6803fa511a000000"),     ObjectId("4f657b5e6803fa511a000000") ], "firstName" : "chan", "lastName" : "chen", "levle" : "0", "password" : "admin", "photoURL" : "img/admin.jpg", "school" : "Stdtlk University", "title" : "starter" }
{ "_id" : ObjectId("4f6582986803fa3d1a000000"), "about" : "I like the way to learn", "blackListCreateTime" : "2012-03-16 13:00", "createTime" : "2012-03-16 13:00", "email" : "google@stdtlk.com", "fanCreateTime" : "2012-03-16 13:00", "fanId" : [     ObjectId("4f657b5e6803fa511a000000"),     ObjectId("4f657b5e6803fa511a000000") ], "firstName" : "google", "lastName" : "google", "levle" : "0", "password" : "google", "photoURL" : "img/google.jpg", "school" : "AAA University", "title" : "starter" }
{ "_id" : ObjectId("4f6582cc6803fa501a000003"), "about" : "Apple is greate", "blackListCreateTime" : "2012-03-16 13:00", "createTime" : "2012-03-16 13:00", "email" : "apple@stdtlk.com", "fanCreateTime" : "2012-03-16 13:00", "fanId" : [     ObjectId("4f657b5e6803fa511a000000"),     ObjectId("4f657b5e6803fa511a000000") ], "firstName" : "apple", "lastName" : "apple", "levle" : "0", "password" : "apple", "photoURL" : "img/apple.jpg", "school" : "BBB University", "title" : "starter" }
{ "_id" : ObjectId("4f6582ec6803faa91c000000"), "about" : "microsoft is old", "blackListCreateTime" : "2012-03-16 13:00", "createTime" : "2012-03-16 13:00", "email" : "microsoft@stdtlk.com", "fanCreateTime" : "2012-03-16 13:00", "fanId" : [     ObjectId("4f657b5e6803fa511a000000"),     ObjectId("4f657b5e6803fa511a000000") ], "firstName" : "microsoft", "lastName" : "microsoft", "levle" : "0", "password" : "microsoft", "photoURL" : "img/microsoft.jpg", "school" : "CCC University", "title" : "starter" }

All documents are outputted in one line, if the document has a field which have huge length of characters, it is not easy to find out and read what we look for. So I look around on the internet, and read the MongoDB Admin UI, it recommend some useful tools to admin the database. RockMongo is tool that catch my eyes, and many folks on stack overflow also put some positive feedback on this tool, hence I decide to try it out. Here are the steps to install Rockmongo on Ubuntu.
1. Install apache2, php5

sudo apt-get install apache2 php5 php5-dev php5-cli

2. Install php mongo driver

pecl install mongo

3. Config php.ini

root@ubuntu:~# find / -name php.ini
/etc/php5/apache2/php.ini
/etc/php5/cli/php.ini
root@ubuntu:~# echo "extension=mongo.so" >> /etc/php5/apache2/php.ini

4.check php install successfully

root@ubuntu:~# sudo find / -name www
/var/www
root@ubuntu:~# sudo echo "" >> /var/www/info.php

open http://localhost/info.php
if everything goes ok, this page will display

5. download rockmongo from rockmongo

root@ubuntu:/var/www# cd /var/www
root@ubuntu:/var/www# wget http://rock-php.googlecode.com/files/rockmongo-v1.1.0.zip

root@ubuntu:/var/www# unzip rockmongo-v1.1.0.zip -d rockmongo

6. lunch mongod

7. restart apache server

root@ubuntu

: /etc/init.d/apache2 restart

8. open http://localhost/rockmongo/index.php by default, username and password are both admin.

Chan Chen 2012-03-18 15:54 发表评论

Indexes in MongoDB

Chan Chen — Mon, 27 Feb 2012 06:34:00 GMT

摘要: Refer to: http://www.mongodb.org/display/DOCS/IndexesBasicsAn index is a data structure that collects information about the values of the specified fields in the documents of a collection. This d... 阅读全文

Chan Chen 2012-02-27 14:34 发表评论

Optimizing Object IDs

Chan Chen — Mon, 27 Feb 2012 06:06:00 GMT

The _id field in a MongoDB document is very important and is always indexed for normal collections. This page lists some recommendations. Note that it is common to use the BSON ObjectID datatype for _id's, but the values of an _id field can be of any type.

Use the collections 'natural primary key' in the _id field.

_id's can be any type, so if your objects have a natural unique identifier, consider using that in _id to both save space and avoid an additional index.

When possible, use _id values that are roughly in ascending order.

If the _id's are in a somewhat well defined order, on inserts the entire b-tree for the _id index need not be loaded. BSON ObjectIds have this property.

Store Binary GUIDs as BinData, rather than as hex encoded strings

BSON includes a binary data datatype for storing byte arrays. Using this will make the id values, and their respective keys in the _id index, twice as small.

Note that unlike the BSON Object ID type (see above), most UUIDs do not have a rough ascending order, which creates additional caching needs for their index.

> // mongo shell bindata info: 
> help misc         
    b = new BinData(subtype,base64str)      create a BSON BinData value             
    b.subtype()                             the BinData subtype (0..255)         
    b.length()                              length of the BinData data in bytes         
    b.hex()                                 the data as a hex encoded string         
    b.base64()                              the data as a base 64 encoded string         
    b.toString()

Extract insertion times from _id rather than having a separate timestamp field.

The BSON ObjectId format provides documents with a creation timestamp (one second granularity) for free. Almost all drivers implement methods for extracting these timestamps; see the relevant api docs for details. In the shell:

> // mongo shell ObjectId methods 
> help misc         
    o = new ObjectId()      create a new ObjectId         
    o.getTimestamp()        return timestamp derived from first 32 bits of the OID         
    o.isObjectId()         
    o.toString()         
    o.equals(otherid)

Sort by _id to sort by insertion time

BSON ObjectId's begin with a timestamp. Thus sorting by _id, when using the ObjectID type, results in sorting by time. Note: granularity of the timestamp portion of the ObjectID is to one second only.

> // get 10 newest items 
> db.mycollection.find().sort({id:-1}).limit(10);

Chan Chen 2012-02-27 14:06 发表评论

Five Reasons of Choosing MongoDB

Chan Chen — Fri, 24 Feb 2012 14:12:00 GMT

Document Database > Most of you data is embedded in a document, so in order to get the data about a person, you don't have to join several tables. Thus, better performance for many use cases.
Strong Query Language > Despite not been an RDBMS, MongoDB has a very strong query language that allows you to get something very specific or very general from a document or documents. The DB is queried using javascript so you can do many more things beside querying (e.g. functions, calculations).
Sharding & Replication > Sharding allows you application to scale horizontally rather than vertically. In other words, more small servers instead of one huge server. And replication gives you fail-over safety in several configurations (e.g. master/slave).
Powerful Indexing - I originally got interested in MongoDB because it allows geo-spatial indexing out of the box but it has many other indexing configurations as well.
Cross-Platform - MongoDB has many drivers.

Chan Chen 2012-02-24 22:12 发表评论

MongoDB vs. RDBMS Schema Design

Chan Chen — Mon, 20 Feb 2012 11:34:00 GMT

摘要: In this article, based on chapter 4 of MongoDB in Action, author Kyle Banker explains how MongoDB schema differs from an equivalent RDBMS schema, and how common relationships between entities, such as... 阅读全文

Chan Chen 2012-02-20 19:34 发表评论

Mongo Metadata

Chan Chen — Sun, 19 Feb 2012 08:47:00 GMT

The .system.* namespaces in MongoDB are special and contain database system information. System collections include:

.system.namespaces lists all namespaces.

.system.indexes lists all indexes. Additional namespace / index metadata exists in the database.ns files, and is opaque.

.system.profile stores database profiling information.

.system.users lists users who may access the database.

Additionally, in the local database only there is replication information in system collections, e.g., local.system.replset contains the replica set configuration.

Information on the structure of a stored document is stored within the document itself. See BSON .

There are several restrictions on manipulation of objects in the system collections. Inserting in system.indexes adds an index, but otherwise that table is immutable (the special drop index command updates it for you). system.users is modifiable. system.profile is droppable.

Note: $ is a reserved character. Do not use it in namespace names or within field names. Internal collections for indexes use the $character in their names. These collection store b-tree bucket data and are not in BSON format (thus direct querying is not possible).

Chan Chen 2012-02-19 16:47 发表评论

Schema Design for MongoDB

Chan Chen — Sat, 18 Feb 2012 07:58:00 GMT

摘要: Schema design in MongoDB is very different than schema design in a relational DBMS. However it is still very important and the first step towards building an application. In relational data models, c... 阅读全文

Chan Chen 2012-02-18 15:58 发表评论

Basic Term of MongoDB

Chan Chen — Sat, 18 Feb 2012 07:52:00 GMT

Document

MongoDB can be thought of as a document-oriented database. By 'document', we mean structured documents, not freeform text documents. These documents canbe thought of as objectsbut only the data of an object, not the code, methods or class hierarchy. Additionally, there is much less linking between documents in MongoDB data models than there is between objects in a program written in an object-oriented programming language.

In MongoDB the documents are conceptually JSON. More specifically the documents are represented in a format calledBSON(standing for Binary JSON).

Documents are stored inCollections.

Maximum Document Size

MongoDB limits the data size of individual BSON objects/documents. At the time of this writing the limit is 16MB.

This limit is designed as a sanity-check; it is not a technical limit on document sizes. The thinking is that if documents are larger than this size, it is likely the schema is not ideal. Further it allows drivers to make some assumptions on the max size of documents.

The concept is that the maximum document size is a limit that ensures each document does not require an excessive amount of RAM from the machine, or require too much network bandwidth to fetch. For example, fetching a full 100MB document would take over 1 second to fetch over a gigabit ethernet connection. In this situation one would be limited to 1 request per second.

Over time, as computers grow in capacity, the limit will be adjusted upward.

Collection

MongoDB collections are essentially named groupings of documents. You can think of them as roughly equivalent to relational database tables.

A MongoDB collection is a collection ofBSONdocuments. These documents usually have the same structure, but this is not a requirement since MongoDB is a schema-free (or more accurately, "dynamic schema") database. You may store a heterogeneous set of documents within a collection, as you do not need predefine the collection's "columns" or fields.

A collection is created when the first document is inserted.

Collection names should begin with letters or an underscore and may include numbers; $ is reserved. Collections can be organized in namespaces; these are named groups of collections defined using a dot notation. For example, you could define collections blog.posts and blog.authors, both reside under "blog". Note that this is simply an organizational mechanism for the user -- the collection namespace is flat from the database's perspective.

The maximum size of a collection name is 128 characters (including the name of the db and indexes). It is probably best to keep it under 80/90 chars.

Namespace

MongoDB stores BSON objects in collections. The concatenation of the database name and the collection name (with a period in between) is called a namespace.

For example, acme.users is a namespace, where acme is the database name, and users is the collection name. Note that periods can occur in collection names, so a name such as acme.blog.posts is legal too (in that case blog.posts is the collection name.

Chan Chen 2012-02-18 15:52 发表评论

��Z��么要用非关系数据库？

Chan Chen — Sat, 18 Feb 2012 07:48:00 GMT

随着互联�|�web2.0�|�站的兴��P��非关�p�d��的数据库现在成了一个极其热门的新领域，非关�p�L��据库产品的发展非常迅速。而传�l�的关系数据库在应付web2.0�|�站�Q�特别是��大规模和高�q�发的SNS�c�d��的web2.0�U�动态网站已�l�显得力不从心，暴露了很多难以克服的问题�Q�例如：

1、High performance - �Ҏ��据库高�ƈ发读写的需�?/strong>
web2.0�|�站要根据用户个性化信息来实时生成动态页面和提供动态信息，所以基本上无法使用动态页面静态化技术，因此数据库�ƈ发负载非帔R��Q�往往要达到每�U�上万次��d��h��。关�p�L��据库应付上万�ơSQL查询�q�勉强顶得住�Q�但是应付上万次SQL写数据请求，��盘IO��已�l�无法承受了。其实对于普通的BBS�|�站�Q�往往也存在对高�ƈ发写��h��的需求，例如像JavaEye�|�站的实时统计在�U�用��L��态，记录热门帖子的点��L��敎ͼ�投票计数�{�，因此�q�是一个相当普遍的需求�?nbsp;

2、Huge Storage - �Ҏ�v量数据的高效率存储和讉K��的需�?nbsp;
�c�M��Facebook�Q�twitter�Q�Friendfeed�q�样的SNS�|�站�Q�每天用户��生�v量的用户动态，以Friendfeed��Z��Q�一个月��p��C��2.5亿条用户动态，对于关系数据库来��_��在一�?.5亿条记录的表里面�q�行SQL查询�Q�效率是极其低下乃至不可忍受的。再例如大型web�|�站的用��L��录系�l�，例如腾讯�Q�盛大，动辄��C��亿计的帐��P��关系数据库也很难应付�?nbsp;

3、High Scalability && High Availability- �Ҏ��据库的高可扩展性和高可用性的需�?nbsp;
在基于web的架构当中，数据库是最难进行横向扩展的�Q�当一个应用系�l�的用户量和讉K��量与日俱增的时候，你的数据库却没有办法像web server和app server那样��单的通过��d��更多的硬件和服务节点来扩展性能和负载能力。对于很多需要提�?4��时不间断服务的�|�站来说�Q�对数据库系�l�进行升�U�和扩展是非常痛苦的事情�Q�往往需要停机维护和数据�q�移�Q��ؓ什么数据库不能通过不断的添加服务器节点来实现扩展呢�Q?nbsp;

在上面提到的“三高”需求面前，关系数据库遇��C��难以克服的障��，而对于web2.0�|�站来说�Q�关�p�L��据库的很多主要特性却往往无用武之圎ͼ�例如�Q?nbsp;

1、数据库事务一致性需�?/strong>
很多web实时�pȝ��q�不要求严格的数据库事务�Q�对��M��致性的要求很低�Q�有些场合对写一致性要求也不高。因此数据库事务��理成了数据库高负蝲下一个沉重的负担�?nbsp;

2、数据库的写实时性和��d��时性需�?/strong>
对关�p�L��据库来说�Q�插入一条数据之后立��L��询，是肯定可以读出来�q�条数据的，但是对于很多web应用来说�Q��ƈ不要求这么高的实时性，比方说我�Q�JavaEye的robbin�Q�发一条消息之后，�q�几�U�乃臛_��几秒之后�Q�我的订阅者才看到�q�条动态是完全可以接受的�?nbsp;

3、对复杂的SQL查询�Q�特别是多表兌��查询的需�?/strong>
��M��大数据量的web�pȝ��Q�都非常忌讳多个大表的关联查询，以及复杂的数据分析类型的复杂SQL报表查询�Q�特别是SNS�c�d��的网站，从需求以及��品设计角度，��避免了�q�种情况的��生。往往更多的只是单表的主键查询�Q�以及单表的��单条件分��|��询，SQL的功能被极大的弱化了�?nbsp;

因此�Q�关�p�L��据库在这些越来越多的应用场景下显得不那么合适了�Q��ؓ了解册��c�问题的非关�p�L��据库应运而生�Q�现在这两年�Q�各�U�各样非关系数据库，特别是键值数据库(Key-Value Store DB)风�v云涌�Q�多得让人眼��q݋乱。前不久国外刚刚丑֊��?/span>NoSQL Conference�Q�各路NoSQL数据库纷�U�亮相，加上未亮�怽�是名声在外的�Q��v码有��过10个开源的NoSQLDB�Q�例如：

Redis�Q�Tokyo Cabinet�Q�Cassandra�Q�Voldemort�Q�MongoDB�Q�Dynomite�Q�HBase�Q�CouchDB�Q�Hypertable�Q?nbsp;Riak�Q�Tin�Q?nbsp;Flare�Q?nbsp;Lightcloud�Q?nbsp;KiokuDB�Q�Scalaris�Q?nbsp;Kai�Q?nbsp;ThruDB�Q?nbsp; ......

�q�些NoSQL数据库，有的是用C/C++�~�写的，有的是用Java�~�写的，�q�有的是用Erlang�~�写的，每个都有自己的独��C��处，看都看不�q�来了，�?robbin)也只能从中挑选一些比较有特色�Q�看��h��更有前景的��品学习和了解一下。这些NoSQL数据库大致可以分��Z��下的三类�Q?nbsp;

一、满��x��高读写性能需求的Kye-Value数据库：Redis�Q�Tokyo Cabinet�Q?nbsp;Flare

高性能Key-Value数据库的主要特点��是��h��极高的�ƈ发读写性能�Q�Redis�Q�Tokyo Cabinet�Q?nbsp;Flare�Q�这3个Key-Value DB都是用C�~�写的，他们的性能都相当出�Ԍ��但出了出色的性能�Q�他们还有自��q��特的功能�Q?nbsp;

1�?/span>Redis
Redis是一个很新的��目�Q�刚刚发布了1.0版本。Redis本质上是一个Key-Value�c�d��的内存数据库�Q�很像memcached�Q�整个数据库�l�统加蝲在内存当中进行操作，定期通过异步操作把数据库数据flush到硬盘上�q�行保存。因为是�U�内存操作，Redis的性能非常��Q�每�U�可以处理超�q?0万次��d��操作�Q�是我知道的性能最快的Key-Value DB�?nbsp;

Redis的出色之处不仅仅是性能�Q�Redis最大的��力是支持保存List链表和Set集合的数据结构，而且�q�支持对List�q�行各种操作�Q�例如从List两端push和pop数据�Q�取List区间�Q�排序等�{�，对Set支持各种集合的�ƈ集交集操作，此外单个value的最大限制是1GB�Q�不像memcached只能保存1MB的数据，因此Redis可以用来实现很多有用的功能，比方说用他的List来做FIFO双向链表�Q�实��C��个轻量��的高性能消息队列服务�Q�用他的Set可以做高性能的tag�pȝ��{�等。另外Redis也可以对存入的Key-Value讄��expire旉��Q�因此也可以被当作一个功能加强版的memcached来用�?nbsp;

Redis的主要缺�Ҏ��数据库容量受到物理内存的限制�Q�不能用作�v量数据的高性能��d��Q��ƈ且它没有原生的可扩展机制�Q�不��h��scale�Q�可扩展�Q�能力，要依赖客��L��来实现分布式��d��Q�因此Redis适合的场景主要局限在较小数据量的高性能操作和运��上。目前��用Redis的网站有github�Q�Engine Yard�?nbsp;

2�?/span>Tokyo Cabinet和Tokoy Tyrant
TC和TT的开发者是日本人Mikio Hirabayashi�Q�主要被用在日本最大的SNS�|�站mixi.jp上，TC发展的时间最早，现在已经是一个非常成熟的��目�Q�也是Kye-Value数据库领域最大的热点�Q�现在被�q�泛的应用在很多很多�|�站上。TC是一个高性能的存储引擎，而TT提供了多�U�程高�ƈ发服务器�Q�性能也非常出�Ԍ��每秒可以处理4-5万次��d��操作�?nbsp;

TC除了支持Key-Value存储之外�Q�还支持保存Hashtable数据�c�d��Q�因此很像一个简单的数据库表�Q��ƈ且还支持��Z��column的条件查询，分页查询和排序功能，基本上相当于支持单表的基��查询功能了，所以可以简单的替代关系数据库的很多操作�Q�这也是TC受到大家�Ƣ迎的主要原因之一�Q�有一个Ruby的项�?/span>miyazakiresistance��TT的hashtable的操作封装成和ActiveRecord一��L��操作�Q�用��h��非常爽�?nbsp;

TC/TT在mixi的实际应用当中，存储�?000万条以上的数据，同时支撑了上万个�q�发�q�接�Q�是一个久�l�考验的项目。TC在保证了极高的�ƈ发读写性能的同�Ӟ��h��可靠的数据持久化机制�Q�同时还支持�c�M��关系数据库表�l�构的hashtable以及��单的条�g�Q�分��和排序操作�Q�是一个很��的NoSQL数据库�?nbsp;

TC的主要缺�Ҏ��在数据量辑ֈ�上亿�U�别以后�Q��ƈ发写数据性能会大�q�度下降�Q?/span>NoSQL: If Only It Was That Easy提到�Q�他们发现在TC里面插入1.6亿条2-20KB数据的时候，写入性能开始急剧下降。看来是当数据量上亿条的时候，TC性能开始大�q�度下降�Q�从TC作者自己提供的mixi数据来看�Q�至��上千万条数据量的时候还没有遇到�q�么明显的写入性能瓉��?nbsp;

�q�个是Tim Yang做的一�?/span>Memcached�Q�Redis和Tokyo Tyrant的简单的性能评测�Q�仅供参�?/a>

3�?/span>Flare
TC是日本第一大SNS�|�站mixi开发的�Q�而Flare是日本第二大SNS�|�站green.jp开发的�Q�有意思吧。Flare��单的说就是给TC��d��了scale功能。他替换掉了TT部分�Q�自己另外给TC写了�|�络服务器，Flare的主要特点就是支持scale能力�Q�他在网�l�服务端之前��d��了一个node server�Q�来��理后端的多个服务器节点�Q�因此可以动态添加数据库服务节点�Q�删除服务器节点�Q�也支持failover。如果你的��用场景必��要让TC可以scale�Q�那么可以考虑flare�?nbsp;

flare唯一的缺点就是他只支持memcached协议�Q�因此当你��用flare的时候，��׃��能��用TC的table数据�l�构了，只能使用TC的key-value数据�l�构存储�?nbsp;

二、满��x�v量存储需求和讉K��的面向文档的数据库：MongoDB�Q�CouchDB

面向文档的非关系数据库主要解决的问题不是高性能的�ƈ发读写，而是保证��量数据存储的同�Ӟ��h��良好的查询性能。MongoDB是用C++开发的�Q�而CouchDB则是Erlang开发的�Q?nbsp;

1�?/span>MongoDB
MongoDB是一个介于关�p�L��据库和非关系数据库之间的产品�Q�是非关�p�L��据库当中功能最丰富�Q�最像关�p�L��据库的。他支持的数据结构非常松散，是类似json的bjson格式�Q�因此可以存储比较复杂的数据�c�d��。Mongo最大的特点是他支持的查询语�a�非常强大�Q�其语法有点�c�M��于面向对象的查询语言�Q�几乎可以实现类似关�p�L��据库单表查询的绝大部分功能，而且�q�支持对数据建立索引�?nbsp;

Mongo主要解决的是��量数据的访问效率问题，�Ҏ��官方的文档，当数据量辑ֈ�50GB以上的时候，Mongo的数据库讉K��速度是MySQL�?0倍以上。Mongo的�ƈ发读写效率不是特别出�Ԍ��Ҏ��官方提供的性能��试表明�Q�大�U�每�U�可以处�?.5万－1.5�ơ读写请求。对于Mongo的�ƈ发读写性能�Q�我�Q�robbin�Q�也打算有空的时候好好测试一下�?nbsp;

因�ؓMongo主要是支持�v量数据存储的�Q�所以Mongo�q�自带了一个出色的分布式文件系�l�GridFS�Q�可以支持�v量的数据存储�Q�但我也看到有些评论认�ؓGridFS性能不佳�Q�这一点还是有待亲自做�Ҏ��试来验证了�?nbsp;

最后由于Mongo可以支持复杂的数据结构，而且带有强大的数据查询功能，因此非常受到�Ƣ迎�Q�很多项目都考虑用MongoDB来替代MySQL来实��C��是特别复杂的Web应用�Q�比方说why we migrated from MySQL to MongoDB��是一个真实的从MySQL�q�移到MongoDB的案例，�׃��数据量实在太大，所以迁�U�d��了Mongo上面�Q�数据查询的速度得到了非常显著的提升�?nbsp;

MongoDB也有一个ruby的项�?/span>MongoMapper�Q�是模仿Merb的DataMapper�~�写的MongoDB的接口，使用��h��非常��单，几乎和DataMapper一模一��P��功能非常强大易用�?nbsp;

2、CouchDB
CouchDB现在是一个非常有名气的项目，��g��不用多介�l�了。但是我却对CouchDB没有什么兴��，主要是因为CouchDB仅仅提供了基于HTTP REST的接口，因此CouchDB单纯从�ƈ发读写性能来说�Q�是非常�p�糕的，�q�让我立��L��弃了对CouchDB的兴��?nbsp;

三、满��高可扩展性和可用性的面向分布式计��的数据库：Cassandra�Q�Voldemort

面向scale能力的数据库其实主要解决的问题领域和上述两类数据库还不太一��P��它首先必��L��一个分布式的数据库�pȝ��Q�由分布在不同节点上面的数据库共同构成一个数据库服务�pȝ��Q��ƈ且根据这�U�分布式架构来提供online的，��h��Ҏ��的可扩展能力，例如可以不停机的��d��更多数据节点�Q�删除数据节点等�{�。因此像Cassandra常常被看成是一个开源版本的Google BigTable的替代品。Cassandra和Voldemort都是用Java开发的�Q?nbsp;

1�?/span>Cassandra
Cassandra��目是Facebook�?008�q�开源出来的�Q�随后Facebook自己使用Cassandra的另外一个不开源的分支�Q�而开源出来的Cassandra主要被Amazon的Dynamite团队来维护，�q�且Cassandra被认为是Dynamite2.0版本。目前除了Facebook之外�Q�twitter和digg.com都在使用Cassandra�?nbsp;

Cassandra的主要特点就是它不是一个数据库�Q�而是�׃��堆数据库节点共同构成的一个分布式�|�络服务�Q�对Cassandra的一个写操作�Q�会被复制到其他节点上去�Q�对Cassandra的读操作�Q�也会被路由到某个节点上面去��d��。对于一个Cassandra��集来说�Q�扩展性能是比较简单的事情�Q�只��在��集里面��d��节点��可以了。我看到有文章说Facebook的Cassandra��集有超�q?00台服务器构成的数据库��集�?nbsp;

Cassandra也支持比较丰富的数据�l�构和功能强大的查询语言�Q�和MongoDB比较�c�M��Q�查询功能比MongoDB�E�弱一些，twitter的��^台架构部门领导Evan Weaver写了一��文章介�l�Cassandra�Q?/span>http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/�Q�有非常详细的介�l��?nbsp;

Cassandra以单个节�Ҏ��衡量�Q�其节点的�ƈ发读写性能不是特别好，有文章说评测下来Cassandra每秒大约不到1万次��d��h��Q�我也看��C��些对�q�个问题�q�行质疑的评论，但是评�hCassandra单个节点的性能是没有意义的�Q�真实的分布式数据库讉K��pȝ��必然是n多个节点构成的系�l�，其�ƈ发性能取决于整个系�l�的节点数量�Q��\由效率，而不仅仅是单节点的�ƈ发负载能力�?nbsp;

2�?/span>Voldemort
Voldemort是个和Cassandra�c�M��的面向解决scale问题的分布式数据库系�l�，Cassandra来自于Facebook�q�个SNS�|�站�Q�而Voldemort则来自于Linkedin�q�个SNS�|�站。说��h��SNS�|�站为我们�A献了n多的NoSQL数据库，例如Cassandar�Q�Voldemort�Q�Tokyo Cabinet�Q�Flare�{�等。Voldemort的资料不是很多，因此我没有特别仔�l�去�ȝ��Q�Voldemort官方�l�出Voldemort的�ƈ发读写性能也很不错�Q�每�U�超�q�了1.5万次��d��?nbsp;

从Facebook开发Cassandra�Q�Linkedin开发Voldemort�Q�我们也可以大致看出国外大型SNS�|�站对于分布式数据库�Q�特别是�Ҏ��据库的scale能力斚w��的需求是多么�D�切。前面我�Q�robbin�Q�提刎ͼ�web应用的架构当中，web层和app层相�Ҏ��说都很容易横向扩展，唯有数据库是单点的，极难scale�Q�现在Facebook和Linkedin在非关系型数据库的分布式斚w��探烦了一条很好的方向�Q�这也是��Z��么现在Cassandra�q�么热门的主要原因�?nbsp;

如今�Q�NoSQL数据库是个��o人很兴奋的领域，��L��不断有新的技术新的��品冒出来�Q�改变我们已�l��Ş成的固有的技术观念，我自己（robbin�Q�稍微了解了一些，��感觉自己深��q��沉迷�q�去了，可以说NoSQL数据库领域也是博大精��q��Q�我�Q�robbin�Q�也只能��尝辄止�Q�我�Q�robbin�Q�写�q�篇文章既是自己一点点�ȝ��心得�Q�也是抛砖引玉，希望吸引对这个领域有�l�验的朋友来讨论和交��?nbsp;

从我�Q�robbin�Q�个人的兴趣来说�Q�分布式数据库系�l�不是我能实际用到的技术，因此不打��花旉��深入�Q�而其他两个数据领域（高性能NoSQLDB和�v量存储NoSQLDB�Q�都是我很感兴趣的，特别是Redis�Q�TT/TC和MongoDB�q?个NoSQL数据库，因此我接下来��写三篇文章分别详细介绍�q?个数据库�?/span>

Chan Chen 2012-02-18 15:48 发表评论

亚洲国产第一页www,中文字幕亚洲综合久久男男,亚洲人6666成人观看

数据库事务与隔离�{���详解

可重复读(�q�读�Q�phantom reads)

��L��提交 (脏读�Q�dirty reads)

BLOBs and CLOBs

LDAP采用BDB作�ؓ后端数据库的理由

Make Auto Incrementing Field in MongoDB

SP

MySQL Auto Backup

MongoDB Admin Tool -- RockMongo Install for Ubuntu

Indexes in MongoDB

Optimizing Object IDs

Use the collections 'natural primary key' in the _id field.

When possible, use _id values that are roughly in ascending order.

Store Binary GUIDs as BinData, rather than as hex encoded strings

Extract insertion times from _id rather than having a separate timestamp field.

Sort by _id to sort by insertion time

Five Reasons of Choosing MongoDB

MongoDB vs. RDBMS Schema Design

Mongo Metadata

Schema Design for MongoDB

Basic Term of MongoDB

��Z��么要用非关系数据库？

数据库事务与隔离�{��详解