NoSQL非關系型數據庫學習（三）NoSQL與RDBMS：何時使用，何時不使用

NoSQL數據庫面臨的挑戰

NoSQL vs RDBMS: Why and why not to use NoSQL over RDBMS?

Naresh Kumar
3 January 2014
source: http://theprofessionalspoint.blogspot.in/2014/01/nosql-vs-rdbms-why-and-why-not-to-use.html

NoSQL (not only SQL) is not a relational database management system (RDBMS). We will discuss what is the difference between NoSQL databases and Relational Databases and then why and why not to use NoSQL database model over traditional and relational database model (RDBMS) in detail. As NoSQL is the new technology, it is also facing many challenges, so will also have a look upon them.

Today, the internet world has billions of users. Big Data, Big Users, and Cloud Computing are the big technologies which every major internet application is using or preparing to use because internet application users are growing day by day and data is becoming more and more complex and unstructured which is very hard to manage using traditional relational database management system (RDBMS). NoSQL technology has the answer to all these problems. NoSQL is meant for Unstructured Big Data and Cloud Computing. A NoSQL database is exactly the type of database that can handle the all sort of unstructured, messy and unpredictable data that our system of engagement requires. NoSQL is a whole new way of thinking about a database.

Difference between NoSQL and Relational Data Models (RDBMS)

Relational and NoSQL data models are very different.

The relational model takes data and separates it into many interrelated tables that contain rows and columns. Tables reference each other through foreign keys that are stored in columns as well. When looking up data, the desired information needs to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application. Similarly, when writing data, the write needs to be coordinated and performed on many tables.

NoSQL databases have a very different model. For example, a document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. A JSON document might, for example, take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object. Aggregating this information may lead to duplication of information, but since storage is no longer cost prohibitive, the resulting data model flexibility, ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-off for web-based applications.

Another major difference is that relational technologies have rigid schemas while NoSQL models are schemaless. Relational technology requires strict definition of a schema prior to storing any data into a database. Changing the schema once data is inserted is a big deal, extremely disruptive and frequently avoided – the exact opposite of the behavior desired in the Big Data era, where application developers need to constantly – and rapidly – incorporate new types of data to enrich their apps.It also may not provide full ACID (atomicity, consistency, isolation, durability) guarantees, but still has a distributed and fault tolerant architecture.

The NoSQL taxonomy supports key-value stores, document store, BigTable, and graph databases.

In comparison, document databases are schemaless, allowing you to freely add fields to JSON documents without having to first define changes. The format of the data being inserted can be changed at any time, without application disruption.

Examples: MongoDB, Cassandra, CouchDB, HBase are the examples of NoSQL.

NoSQL Database Types

Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.

Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB.

Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak and Voldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality.

Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

Why to use NoSQL Databases?

1. NoSQL has Flexible Data Model to Capture Unstructured / Semi-structured Big Data

Data is becoming easier to capture and access through third parties such as Facebook, D&B, and others. Personal user information, geo location data, social graphs, user-generated content, machine logging data, and sensor-generated data are just a few examples of the ever-expanding array of data being captured. It’s not surprising that developers want to enrich existing applications and create new ones made possible by it. And the use of the data is rapidly changing the nature of communication, shopping, advertising, entertainment, and relationship management. Apps that don’t leverage it quickly will quickly fall behind.

Developers want a very flexible database that easily accommodates new data types and isn’t disrupted by content structure changes from third-party data providers. Much of the new data is unstructured and semi-structured, so developers also need a database that is capable of efficiently storing it. Unfortunately, the rigidly defined, schema-based approach used by relational databases makes it impossible to quickly incorporate new types of data, and is a poor fit for unstructured and semi-structured data. NoSQL provides a data model that maps better to these needs.

A lot of applications might gain from this unstructured data model: tools like CRM, ERP, BPM, etc, could use this flexibility to store their data without performing changes on tables or creating generic columns in a database. These databases are also good to create prototypes or fast applications, because this flexibility provides a tool to develop new features very easily.

2. NoSQL is highly and easily scalable (Scale up vs Scale out)

If millions of users are using your app frequently and concurrently, you need to think about the scalable database technology instead of traditional RDBMS. With relational technologies, many application developers find it difficult, or even impossible, to get the dynamic scalability and level of scale they need while also maintaining the performance users demand. You need to switch to NoSQL databases.

For the cloud applications, relational databases were originally the popular choice. Their use was increasingly problematic however, because they are a centralized, share-everything technology that scales up rather than out. This made them a poor fit for applications that require easy and dynamic scalability. NoSQL databases have been built from the ground up to be distributed, scale-out technologies and therefore fit better with the highly distributed nature of the three-tier Internet architecture.

Scale up vs Scale out

To deal with the increase in concurrent users (Big Users) and the amount of data (Big Data), applications and their underlying databases need to scale using one of two choices: scale up or scale out. Scaling up implies a centralized approach that relies on bigger and bigger servers. Scaling out implies a distributed approach that leverages many standard, commodity physical or virtual servers.

Scale up with relational technology: limitations at the database tier

At the web/application tier of the three-tier Internet architecture, a scale out approach has been the default for many years and worked extremely well. As more people use an application, more commodity servers are added to the web/application tier, performance is maintained by distributing load across an increased number of servers, and the cost scales linearly with the number of users.

Prior to NoSQL databases, the default scaling approach at the database tier was to scale up. This was dictated by the fundamentally centralized, shared-everything architecture of relational database technology. To support more concurrent users and/or store more data, you need a bigger and bigger server with more CPUs, more memory, and more disk storage to keep all the tables. Big servers tend to be highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware typically used so effectively at the web/application server tier.

Scale out with NoSQL technology at the database tier

NoSQL databases were developed from the ground up to be distributed, scale out databases. They use a cluster of standard, physical or virtual servers to store data and support database operations. To scale, additional servers are joined to the cluster and the data and database operations are spread across the larger cluster. Since commodity servers are expected to fail from time-to-time, NoSQL databases are built to tolerate and recover from such failure making them highly resilient.

NoSQL databases provide a much easier, linear approach to database scaling. If 10,000 new users start using your application, simply add another database server to your cluster. Add ten thousand more users and add another server. There’s no need to modify the application as you scale since the application always sees a single (distributed) database.

At scale, a distributed scale out approach also usually ends up being cheaper than the scale up alternative. This is a consequence of large, complex, fault tolerant servers being expensive to design, build and support. Licensing costs of commercial relational databases can also be prohibitive because they are priced with a single server in mind. NoSQL databases on the other hand are generally open source, priced to operate on a cluster of servers, and relatively inexpensive.

While implementations differ, NoSQL databases share some characteristics with respect to scaling and performance:
DYNAMIC SCHEMAS

Relational databases require that schemas be defined before you can add data. For example, you might want to store data about your customers such as phone numbers, first and last name, address, city and state – a SQL database needs to know what you are storing in advance.

This fits poorly with agile development approaches, because each time you complete new features, the schema of your database often needs to change. So if you decide, a few iterations into development, that you'd like to store customers' favorite items in addition to their addresses and phone numbers, you'll need to add that column to the database, and then migrate the entire database to the new schema.

If the database is large, this is a very slow process that involves significant downtime. If you are frequently changing the data your application stores – because you are iterating rapidly – this downtime may also be frequent. There's also no way, using a relational database, to effectively address data that's completely unstructured or unknown in advance.

NoSQL databases are built to allow the insertion of data without a predefined schema. That makes it easy to make significant application changes in real-time, without worrying about service interruptions – which means development is faster, code integration is more reliable, and less database administrator time is needed.

AUTO-SHARDING

Because of the way they are structured, relational databases usually scale vertically – a single server has to host the entire database to ensure reliability and continuous availability of data. This gets expensive quickly, places limits on scale, and creates a relatively small number of failure points for database infrastructure. The solution is to scale horizontally, by adding servers instead of concentrating more capacity in a single server.

"Sharding" a database across many server instances can be achieved with SQL databases, but usually is accomplished through SANs and other complex arrangements for making hardware act as a single server. Because the database does not provide this ability natively, development teams take on the work of deploying multiple relational databases across a number of machines. Data is stored in each database instance autonomously. Application code is developed to distribute the data, distribute queries, and aggregate the results of data across all of the database instances. Additional code must be developed to handle resource failures, to perform joins across the different databases, for data rebalancing, replication, and other requirements. Furthermore, many benefits of the relational database, such as transactional integrity, are compromised or eliminated when employing manual sharding.

NoSQL databases, on the other hand, usually support auto-sharding, meaning that they natively and automatically spread data across an arbitrary number of servers, without requiring the application to even be aware of the composition of the server pool. Data and query load are automatically balanced across servers, and when a server goes down, it can be quickly and transparently replaced with no application disruption.

Cloud computing makes this significantly easier, with providers such as Amazon Web Services providing virtually unlimited capacity on demand, and taking care of all the necessary database administration tasks. Developers no longer need to construct complex, expensive platforms to support their applications, and can concentrate on writing application code. Commodity servers can provide the same processing and storage capabilities as a single high-end server for a fraction of the price.

“Sharding” a relational database can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds of servers.

INTEGRATED CACHING

A number of products provide a caching tier for SQL database systems. These systems can improve read performance substantially, but they do not improve write performance, and they add complexity to system deployments. If your application is dominated by reads then a distributed cache should probably be considered, but if your application is dominated by writes or if you have a relatively even mix of reads and writes, then a distributed cache may not improve the overall experience of your end users.

Many NoSQL database technologies have excellent integrated caching capabilities, keeping frequently-used data in system memory as much as possible and removing the need for a separate caching layer that must be maintained.

REPLICATION

Most NoSQL databases also support automatic replication, meaning that you get high availability and disaster recovery without involving separate applications to manage these tasks. The storage environment is essentially virtualized from the developer's perspective.

Challenges of NoSQL

The promise of the NoSQL database has generated a lot of enthusiasm, but there are many obstacles to overcome before they can appeal to mainstream enterprises. Here are a few of the top challenges.
1. Maturity

RDBMS systems have been around for a long time. NoSQL advocates will argue that their advancing age is a sign of their obsolescence, but for most CIOs, the maturity of the RDBMS is reassuring. For the most part, RDBMS systems are stable and richly functional. In comparison, most NoSQL alternatives are in pre-production versions with many key features yet to be implemented.

Living on the technological leading edge is an exciting prospect for many developers, but enterprises should approach it with extreme caution.

2. Support

Enterprises want the reassurance that if a key system fails, they will be able to get timely and competent support. All RDBMS vendors go to great lengths to provide a high level of enterprise support.

In contrast, most NoSQL systems are open source projects, and although there are usually one or more firms offering support for each NoSQL database, these companies often are small start-ups without the global reach, support resources, or credibility of an Oracle, Microsoft, or IBM.

3. Analytics and business intelligence

NoSQL databases have evolved to meet the scaling demands of modern Web 2.0 applications. Consequently, most of their feature set is oriented toward the demands of these applications. However, data in an application has value to the business that goes beyond the insert-read-update-delete cycle of a typical Web application. Businesses mine information in corporate databases to improve their efficiency and competitiveness, and business intelligence (BI) is a key IT issue for all medium to large companies.

NoSQL databases offer few facilities for ad-hoc query and analysis. Even a simple query requires significant programming expertise, and commonly used BI tools do not provide connectivity to NoSQL.

Some relief is provided by the emergence of solutions such as HIVE or PIG, which can provide easier access to data held in Hadoop clusters and perhaps eventually, other NoSQL databases. Quest Software has developed a product -- Toad for Cloud Databases -- that can provide ad-hoc query capabilities to a variety of NoSQL databases.

4. Administration

The design goals for NoSQL may be to provide a zero-admin solution, but the current reality falls well short of that goal. NoSQL today requires a lot of skill to install and a lot of effort to maintain.

5. Expertise

There are literally millions of developers throughout the world, and in every business segment, who are familiar with RDBMS concepts and programming. In contrast, almost every NoSQL developer is in a learning mode. This situation will address naturally over time, but for now, it's far easier to find experienced RDBMS programmers or administrators than a NoSQL expert.

Conclusion

NoSQL databases are becoming an increasingly important part of the database landscape, and when used appropriately, can offer real benefits. However, enterprises should proceed with caution with full awareness of the legitimate limitations and issues that are associated with these databases.

您目前處于： InfoQ首頁新聞 NoSQL與RDBMS：何時使用，何時不使用

NoSQL與RDBMS：何時使用，何時不使用

作者張龍發布于 1月 08, 2014

Naresh Kumar是位軟件工程師與熱情的博主，對于編程與新事物擁有極大的興趣，非常樂于與其他開發者和程序員分享技術上的研究成果。近日，Naresh撰文比較了NoSQL與RDBMS，并詳細介紹了他們各自的特點與適用的場景。

NoSQL并不是關系型數據庫管理系統，本文將會介紹NoSQL數據庫與關系型數據庫之間的差別，同時還會討論在何種場景下應該使用NoSQL，何種場景下不應該使用。由于NoSQL還是個相對較新的技術，因此它還面臨著很多挑戰。

時至今日，互聯網上有數以億計的用戶。大數據與云計算已經成為很多主要的互聯網應用都在使用或是準備使用的技術，這是因為互聯網用戶每天都在不斷增長，數據也變得越來越復雜，而且有很多非結構化的數據存在，這是很難通過傳統的關系型數據庫管理系統來處理的。NoSQL技術則能比較好地解決這個問題，它主要用于非結構化的大數據與云計算上。從這個角度來看，NoSQL是一種全新的數據庫思維方式。

為何要使用NoSQL數據庫？

1.NoSQL具有靈活的數據模型，可以處理非結構化/半結構化的大數據

現在，我們可以通過Facebook、D&B等第三方輕松獲得與訪問數據，如個人用戶信息、地理位置數據、社交圖譜、用戶產生的內容、機器日志數據以及傳感器生成的數據等。對這些數據的使用正在快速改變著通信、購物、廣告、娛樂以及關系管理的特質。沒有使用這些數據的應用很快就會被用戶所遺忘。開發者希望使用非常靈活的數據庫，能夠輕松容納新的數據類型，并且不會被第三方數據提供商內容結構的變化所累。很多新數據都是非結構化或是半結構化的，因此開發者還需要能夠高效存儲這種數據的數據庫。但遺憾的是，關系型數據庫所使用的定義嚴格、基于模式的方式是無法快速容納新的數據類型的，對于非結構化或是半結構化的數據更是無能為力。NoSQL提供的數據模型則能很好地滿足這種需求。很多應用都會從這種非結構化數據模型中獲益，比如說CRM、ERP、BPM等等，他們可以通過這種靈活性存儲數據而無需修改表或是創建更多的列。這些數據庫也非常適合于創建原型或是快速應用，因為這種靈活性使得新特性的開發變得非常容易。

2.NoSQL很容易實現可伸縮性（向上擴展與水平擴展）

如果有很多用戶在頻繁且并發地使用你的應用，那么你就需要考慮可伸縮的數據庫技術而非傳統的RDBMS了。對于關系型技術來說，很多應用開發者會發現動態的可伸縮性是難以實現的，這時就應該考慮切換到NoSQL數據庫上。對于云應用來說，關系型數據庫一開始是普遍的選擇。然而，在使用過程中卻遇到了越來越多的問題，原因就在于他們是中心化的，向上擴展而非水平擴展的。這使得他們不適合于那些需要簡單且動態可伸縮性的應用。NoSQL數據庫從一開始就是分布式、水平擴展的，因此非常適合于互聯網應用分布式的特性。

在三層互聯網架構的Web/應用層上，多年來向上擴展已經成為默認的擴展方式了。隨著應用使用人數的激增，我們需要添加更多的服務器，性能則是通過負載均衡來實現的，這時的代價與用戶數量成線性比例關系。在NoSQL數據庫之前，數據庫層的默認擴展方式就是向上擴展。為了支持更多的并發用戶以及存儲更多的數據，你需要越來越好的服務器，更好的CPU、更多的內存、更大的磁盤來維護所有表。然而，好的服務器意味著更加復雜、私有、并且也更加昂貴。這與Web/應用層所使用的便宜的硬件形成了鮮明的對比。

3.動態模式

關系型數據庫需要在添加數據前先定義好模式。比如說，你需要存儲客戶的電話號碼、姓名、地址、城市與州等信息，SQL數據庫需要提前知曉你要存的是什么。這對于敏捷開發模式來說是場災難，因為每次完成新特性時，數據庫的模式通常都需要改變。因此，如果在開發過程中想將客戶喜歡的條目加到數據庫中，那就得向表中添加這一列才行，然后要做的就是將整個數據庫遷移到新的模式上。

4.自動分片

由于是結構化的，關系型數據庫通常會垂直擴展，單臺服務器要持有整個數據庫來確保可靠性與數據的持續可用性。這樣做的代價就是非常昂貴、擴展受到限制，并且數據庫基礎設施會成為失敗點。這個問題的解決方案就是水平擴展，添加服務器而不是為單臺服務器增加更多的能力。NoSQL數據庫通常都支持自動分片，這意味著他們本質上就會自動在多臺服務器上分發數據，應用甚至都不知道這些事情。數據與查詢負載會自動在多臺服務器上做到平衡，當某臺服務器當機時，它能快速且透明地被替換掉。

5.復制

大多數NoSQL數據庫也支持自動復制，這意味著你可以獲得高可用性與災備恢復功能。從開發者的角度來看，存儲環境本質上是虛擬化的。

NoSQL數據庫面臨的挑戰

1.成熟度

RDBMS系統由來已久。NoSQL擁護者們會說RDBMS的高齡是其衰退的標志，不過對于大多數CIO來說，RDBMS的成熟讓人放心。對于大多數情況來說，RDBMS系統是穩定且功能豐富的。相比較而言，大多數NoSQL數據庫則還有很多特性有待實現。

2.支持

企業需要的是安心，如果關鍵系統出現了故障，他們可以獲得即時的支持。所有RDBMS廠商都在不遺余力地提供良好的企業支持。與之相反，大多數NoSQL系統都是開源項目，雖然每種數據庫都有那么幾家公司提供支持，不過這些公司大多都是小的初創公司，沒有全球支持資源，也沒有Oracle、微軟或是IBM那種令人放心的公信力。

3.分析與商業智能

NoSQL數據庫在Web 2.0應用時代開始出現。因此，大多數特性都是面向這些應用的需要的。然而，應用中的數據對于業務來說是有價值的，這種價值遠遠超出了Web應用那種CRUD。企業數據庫中的業務信息可以幫助改進效率并提升競爭力，商業智能對于大中型企業來說是個非常關鍵的IT問題。

4.管理

NoSQL的設計目標是提供零管理的解決方案，不過當今的現實卻離這個目標還相去甚遠。現在的NoSQL需要很多技巧才能用好，并且需要不少人力、物力來維護。

5.專業

全球有很多開發者，每個業務部門都會有熟悉RDBMS概念與編程的人。相反，幾乎每個NoSQL開發者都處于學習模式。這種狀況會隨著時間的流逝而發生改觀。但現在，找到一個有經驗的RDBMS程序員或是管理員要比NoSQL專家容易多了。

結論
NoSQL數據庫正在成為數據庫領域的重要力量。如果使用恰當，那么它會帶來很多好處。然而，企業應該非常小心并注意到這些數據庫的限制與問題。

posted on 2014-01-13 12:12 crazycy 閱讀(1101) 評論(0) 編輯收藏所屬分類: JavaEE技術、DBMS

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: SpringMVC+MyBatis - 16 Maven部署Web項目報錯webxml attribute is required SpringMVC+MyBatis - 13 回頭看spring mvc:annotation-driven對應的消息轉換（包含日期處理）-系列12的強化 SpringMVC+MyBatis - 12 spring mvc4返回的json日期為Long的解決方案 SpringMVC+MyBatis - 11 SiteMash的一個小陷阱 SpringMVC+MyBatis - 10 I18N標簽的使用 SpringMVC+MyBatis - 9 Spring的EnCache(Shiro Cache的解決方案是基于這個文章的) SpringMVC+MyBatis - 8 Shiro異常：EhCache initialization exception: Another unnamed CacheManager already exists in the same VM SpringMVC+MyBatis - 6 SpringMVC Restful風格下的靜態資源 SpringMVC+MyBatis - 5 Security-Shiro-01 SpringMVC+MyBatis - 4 Spring請求參數

cuiyi's blog（崔毅 crazycy）

NoSQL非關系型數據庫學習（三）NoSQL與RDBMS：何時使用，何時不使用

NoSQL數據庫面臨的挑戰

NoSQL與RDBMS：何時使用，何時不使用

為何要使用NoSQL數據庫？

導航

我參與的團隊

隨筆分類

相冊

積分與排名

最新評論

閱讀排行榜