這篇文章是用英文寫的,由于某種原因,這篇文章可以很直接的說是Anti-MongoDB一個和諧的DB(一)。寫一的時候其實有很多問題,還是不很清楚的。所以有了以下的問題:
I has some questions about the nosql and the document database solutions because I just touch the nosql solutions these days,
I tried to understand and find the benefit of the NOSQL solutions (performance and scalability), but I cannot convince myself for the reasons, specially for the complex business related cases,
After read a lot of the articles and find the CAP, relational and Scalability are the three points for the NOSQL solutions,
CAP : only can pickup two of the three factors, and the NOSQL solutions pickup the AP, and use the eventually consistency to handle the consistency, now, let's check the RDBMS, if we have a lot of database servers, we also cannot have a good Consistency because of the performance issues, so we can choose the Master/Slave and asynchronize copy to handle the consistency (Similar with Eventually Consistency) which is similar with the NOSQL, so what is the benefit of the NOSQL (specify document database) from the CAP theory?
No-Relational object : the NOSQL is good at the no-relationship objects, for example, log. but log also can save to the RDBMS without relationship, so for the no-relationship objects, I think the mongo solution and the RDBMS solutions should be have the same performance and scalability. right?
Relational : in the mongodb.org there is a good example as following,

the address is embedded into the student which is reasonable and will make the performance better if we need load the address from the student in the UI, but the RDBMS also can do it for the 1-1 relationship, and the scores need ref to the another collection and which is also similar with the RDBMS and also need touch database two times when we load the course which also similar with RDBMS. so what is the benefit.
Partition and Sharding : RDBMS also provide the solutions (although need change some codes), and RDBMS also can handle them.
NOSQL數(shù)據(jù)庫經(jīng)過了風(fēng)風(fēng)火火的一年,各個解決方案做的一個比一個有個性,并且大部分都有了商業(yè)應(yīng)用,總體來說自己創(chuàng)造出來并且可以進行自行優(yōu)化的東東還是經(jīng)得起歷練的。
MongoDB在過去的一年中,變化非常之大,剛開始關(guān)注它的時候,它只是一個沒有1.0版本的東東,但是現(xiàn)在已經(jīng)加上太多太多的功能了,其中包括 MapReduce,Auto Sharding,等。
經(jīng)過了比較深入的研究(還會繼續(xù)研究),發(fā)現(xiàn)這個最像關(guān)系型數(shù)據(jù)庫的數(shù)據(jù)確實做的很強大。有很多東西還是非常值得探討的。我們先從以下方面進行研究關(guān)系型數(shù)據(jù)庫和非關(guān)系型數(shù)據(jù)庫的區(qū)別,以及為什么要在某種條件下擯棄關(guān)系型數(shù)據(jù)庫。
1. 關(guān)系型數(shù)據(jù)庫的產(chǎn)生就是為關(guān)系所生,如果一條條的都不是關(guān)系型的數(shù)據(jù),需要進行關(guān)系型數(shù)據(jù)庫嗎? 答案很簡單:不需要
經(jīng)典應(yīng)用:Log的存儲 (存儲到關(guān)系型數(shù)據(jù)庫的話,耽誤了我們可憐的不好擴張的數(shù)據(jù)庫呀,如果存儲在文件里面,那又不好進行管理,所以非關(guān)系型數(shù)據(jù)庫是一個很好的解決方案)
2. 關(guān)系型數(shù)據(jù)庫過多的強調(diào)了關(guān)系,關(guān)系型數(shù)據(jù)庫的目標(biāo)是把我們的數(shù)據(jù)庫打造成一個第三范式遍布的數(shù)據(jù)結(jié)構(gòu)(無傳遞函數(shù)依賴和部分函數(shù)依賴)。但是這種拆解變相的多了一次數(shù)據(jù)庫操作,也就是一次IO,性能也就會下降了。 例子如下:當(dāng)我們想打開一個帖子的時候,我們肯定還是想把下面的Comments都拿到的,如果我們直接能把Comments存在這個帖子之下就很容解決了吧。
3. 關(guān)系型數(shù)據(jù)庫過的關(guān)注consistency,其實我們很多的系統(tǒng)中并不需要這么好的consistency,起碼很多的Web2.0或者是普通的網(wǎng)站來說,只要把Support,維護,alert機制做好,不需要太多的consistency一樣可以做出很好的系統(tǒng)。當(dāng)然我們也可以通過一些機制實現(xiàn) eventually consistency (沒有很深入的研究過)。太多的consistency的關(guān)注必然導(dǎo)致最后的available不會做到很好。進而關(guān)系型數(shù)據(jù)庫很難scaling out。為了scaling out read,我們只能去做partition,但是partition很難做呀,一半都會牽扯到很多代碼的改動。這些代碼的改動會嚴(yán)重影響項目的穩(wěn)定性而且風(fēng)險性很大。而為了scaling out write 只能去做master-slave的解決方案(async和sync每種都有自己的問題)。很多NOSQL都解決了這個問題,無論是auto- sharding(因為是key做主的東西,可以很好的拆分)還是replication。(這一塊要進一步研究)
4. Schema問題。關(guān)系型數(shù)據(jù)的schema都是一定的,如果增加或減少一個column那可是一個大動呀。但是NOSQL卻是能很容易的解決這個問題,因為他們就是key-value而已。
NOSQL的提出是一個思想的進步,是一種編程理念的進步,數(shù)據(jù)庫只是一個存儲的庫而已,他不應(yīng)該過多的關(guān)注于其他的business相關(guān)的東西。將來發(fā)展的前景是我們所有的business的邏輯都應(yīng)該在Domain里面體現(xiàn),我們不用關(guān)注下面到底存儲到那里。