4 minutes
Intro
Ya’ll know how it is
High level differences
SQL Databases | NoSQL Databases | |
Storage Model | Each row represents an entity while each column represents a data point. | Document: JSON documents
Key-value: key-value pairs
Wide-column: tables with rows and dynamic columns
Graph: nodes and edges |
Schema | Each row conforms to a strict, fixed schema. Columns must be decided and chosen beforehand before data entry can commence.
Rigid | Flexible |
Purpose | General Purpose | Document: General purpose
Key-value: Large amounts of data with simple lookup
Wide-column: Large amounts of data with predictable query patterns
Graph: Analysing and traversing relationships between connected graph-like data |
Real world examples | Oracle, MySQL, Microsoft SQL Server, and PostgreSQL | Document: MongoDB and CouchDB,
Key-value: Redis and DynamoDB,
Wide-column: Cassandra and HBase,
Graph: Neo4j and Amazon Neptune |
Scaling | Vertical | Horizontal |
ACID | Compliant | Not compliant.
Typically eventually consistent, but can do multi-record ACID transactions at the cost of latency |
History | Developed in the 1970s with a focus on the de-duplication of data. ie normalisation | Developed in the late 2000s with a focus on scaling and allowing for rapid application change. |
Benefits of SQL
Data is ACID compliant
Here’s a quick refresher on what that means:
Atomicity: All transactions must succeed or fail completely and cannot be left partially complete, even in the case of system failure.
Consistency: The database must follow rules that validate and prevent corruption at every step. Data is always in a consistent state.
Isolation: Concurrent transactions cannot affect each other. Achieved through locking of various database resources.
Durability: Transactions are final, and even system failure cannot “roll back” a complete transaction.
Benefits of NoSQL
- Flexible data models
- Horizontal Scaling
- High performance with simple queries
- Rapid development
The key driving force in whether you would use one or the other:
If you’ve read a few articles or watched a few videos on this topic, you will notice that these info pieces will often mention the fact that SQL is scaled vertically while NoSQL is scaled horizontally and move on to some other difference. But there is actually a very deep implication with this:
CAP theorem informs us that in a distributed system we can only achieve two out of three tenets when it comes to Consistency, availability, and Partition Tolerance. This applies to databases as well.
Truth bomb: The choice of what database you use will always be informed by what the most appropriate consistency model for your domain is.
We will explore this concept in depth in my article on CAP theory and relate it back to distributed noSQL databases. But in short here’s a quick reference you can use when choosing your database:
CA (Consistency & Availability)
- SQL RDBMS
- MSSQL
- MySQL
- PostgresQL
AP (Availability & Partition Tolerance)
- CouchDB
- CassandraDB
- Dynamodb
CP (Consistency & Availability)
- Redis
- MongoDB
sources
https://www.mongodb.com/nosql-explained/nosql-vs-sql
https://www.integrate.io/blog/the-sql-vs-nosql-difference
https://www.educative.io/courses/grokking-the-system-design-interview