這篇文章是用英文寫的,由于某種原因,這篇文章可以很直接的說是Anti-MongoDB一個和諧的DB(一)。寫一的時候其實有很多問題,還是不很清楚的。所以有了以下的問題:
I has some questions about the nosql and the document database solutions because I just touch the nosql solutions these days,
I tried to understand and find the benefit of the NOSQL solutions (performance and scalability), but I cannot convince myself for the reasons, specially for the complex business related cases,
After read a lot of the articles and find the CAP, relational and Scalability are the three points for the NOSQL solutions,
CAP : only can pickup two of the three factors, and the NOSQL solutions pickup the AP, and use the eventually consistency to handle the consistency, now, let's check the RDBMS, if we have a lot of database servers, we also cannot have a good Consistency because of the performance issues, so we can choose the Master/Slave and asynchronize copy to handle the consistency (Similar with Eventually Consistency) which is similar with the NOSQL, so what is the benefit of the NOSQL (specify document database) from the CAP theory?
No-Relational object : the NOSQL is good at the no-relationship objects, for example, log. but log also can save to the RDBMS without relationship, so for the no-relationship objects, I think the mongo solution and the RDBMS solutions should be have the same performance and scalability. right?
Relational : in the mongodb.org there is a good example as following,

the address is embedded into the student which is reasonable and will make the performance better if we need load the address from the student in the UI, but the RDBMS also can do it for the 1-1 relationship, and the scores need ref to the another collection and which is also similar with the RDBMS and also need touch database two times when we load the course which also similar with RDBMS. so what is the benefit.
Partition and Sharding : RDBMS also provide the solutions (although need change some codes), and RDBMS also can handle them.
NOSQL數據庫經過了風風火火的一年,各個解決方案做的一個比一個有個性,并且大部分都有了商業應用,總體來說自己創造出來并且可以進行自行優化的東東還是經得起歷練的。
MongoDB在過去的一年中,變化非常之大,剛開始關注它的時候,它只是一個沒有1.0版本的東東,但是現在已經加上太多太多的功能了,其中包括 MapReduce,Auto Sharding,等。
經過了比較深入的研究(還會繼續研究),發現這個最像關系型數據庫的數據確實做的很強大。有很多東西還是非常值得探討的。我們先從以下方面進行研究關系型數據庫和非關系型數據庫的區別,以及為什么要在某種條件下擯棄關系型數據庫。
1. 關系型數據庫的產生就是為關系所生,如果一條條的都不是關系型的數據,需要進行關系型數據庫嗎? 答案很簡單:不需要
經典應用:Log的存儲 (存儲到關系型數據庫的話,耽誤了我們可憐的不好擴張的數據庫呀,如果存儲在文件里面,那又不好進行管理,所以非關系型數據庫是一個很好的解決方案)
2. 關系型數據庫過多的強調了關系,關系型數據庫的目標是把我們的數據庫打造成一個第三范式遍布的數據結構(無傳遞函數依賴和部分函數依賴)。但是這種拆解變相的多了一次數據庫操作,也就是一次IO,性能也就會下降了。 例子如下:當我們想打開一個帖子的時候,我們肯定還是想把下面的Comments都拿到的,如果我們直接能把Comments存在這個帖子之下就很容解決了吧。
3. 關系型數據庫過的關注consistency,其實我們很多的系統中并不需要這么好的consistency,起碼很多的Web2.0或者是普通的網站來說,只要把Support,維護,alert機制做好,不需要太多的consistency一樣可以做出很好的系統。當然我們也可以通過一些機制實現 eventually consistency (沒有很深入的研究過)。太多的consistency的關注必然導致最后的available不會做到很好。進而關系型數據庫很難scaling out。為了scaling out read,我們只能去做partition,但是partition很難做呀,一半都會牽扯到很多代碼的改動。這些代碼的改動會嚴重影響項目的穩定性而且風險性很大。而為了scaling out write 只能去做master-slave的解決方案(async和sync每種都有自己的問題)。很多NOSQL都解決了這個問題,無論是auto- sharding(因為是key做主的東西,可以很好的拆分)還是replication。(這一塊要進一步研究)
4. Schema問題。關系型數據的schema都是一定的,如果增加或減少一個column那可是一個大動呀。但是NOSQL卻是能很容易的解決這個問題,因為他們就是key-value而已。
NOSQL的提出是一個思想的進步,是一種編程理念的進步,數據庫只是一個存儲的庫而已,他不應該過多的關注于其他的business相關的東西。將來發展的前景是我們所有的business的邏輯都應該在Domain里面體現,我們不用關注下面到底存儲到那里。