Learning Intro to NoSQL

harshitrajlnctcse · September 4, 2023, 2:00pm

harshitrajlnctcse · September 4, 2023, 3:48pm

What is NoSQL?
Taxonomy of NoSQL
CAP theorem
NoSQL vs RDBMS
Benefits and Drawbacks of NoSQL

harshitrajlnctcse · September 4, 2023, 3:50pm

moni97kumar · September 5, 2023, 2:06pm

What is NoSQL?

   -> NoSQL stands for Not-a-SQL
   -> It is a non-relational database.
   -> It is an approach to database design that enables the storage & querying of data outside the traditional structures found in relational databases (i.e) other than table format.
   -> It is also a type of Database Management System (DBMS).
   -> That designed to handle & store large volumes of unstructured & semi-structured data.
   -> This uses flexible data models that can adapt to changes in data structures & are capable of scaling horizontally to handle growing amounts of data.
   -> NoSQL databases have expanded to include a wide range of different database architectures and data models.

Taxonomy (Classification) of NoSQL:-

-> Types of No SQL Databses:
        1. Document Databases -> Eg: SOFTWARES uses Document DB:- MongoDB, AWS - Simple DB, Couch DB
        2. Key Value Databases -> Eg: Redis, AWS - Dynamo DB, 
        3. Column-oriented Databases -> Eg: Apache Cassandra(Discord Application uses this DB), HBase etc.
        4. Graph Databases -> Eg: Neo4j, Orient DB, Nebula Graph, Oracle NoSQL DB etc.

CAP theorem

CAP Theorem:-
  1. C stands for Consistency:
       -> Delay means here data inconsistency.
       -> Whereas delay is acceptable in small scenarios where loss is less like Zoom call, Whatsapp video call etc.
       -> But delay is not acceptable at all times mainly in huge investment scenarios.
       -> Example 1: 
            => In any chatting Applications, consistency is more important.
            => Ordering also matters like message2 from B after message1 from A person.
       -> Example 2:
            =>  IRCTC, Bank, GPay etc.
                    - Availability is okay for just 1 hour
                    - But 100% Consistency should be there.
       -> How we ensure that out application is consistent highly? Not by replicates data in multiple places.

  2. A stands for Availability
       -> Flipkart(mobile products) for 4 sec delay: A or C? In this case, people switch to Amazon or other apps to shop, if Flipkart's availability stops.
       -> Hospital: Availability matters a lot than consistency.
       -> Instagram is a big application. So we need to see Feature-wise.
              - Post, Story, Reels: Availability
              - Like or Comment (open public chat): Availability
              - Messenger/Chat: Consistency 
      -> How we ensure that out application is available highly? Using Replication
      ->  Examples: 
          i) Video chat app
          ii) Voice Chat app
          iii) Live Streaming
          iv) Comments (open chat)

  3. P stands for partition
      - How our application quickly transform from down phase to up phase?
      - Fault-oriented /Partition-oriented
      - P always be there.
      - Only between C & A, we as senior engineers will decide to whether implement  C along with P or  A along with P.
      - This is purely based on what kind of application/scenario we use.

NoSQL vs RDBMS

🚨Core Terminology Differences:-
     - SQL 
         => Relational Databases
         => Stored data in Tables 
         => Database consists of many "tables".
         => Every table will have Rows are called as "Records/Data".
         => Columns are called as "Attributes".

     - NoSQL:
         => Non-Relational Databases
         => Actual data is stored in Documents(rows).
         => Database consists of many "collections".
         => Rows are called as "Documents". In MongoDB, it is known as "Focus".
         => Columns are called as "Fields".

Benefits and Drawbacks of NoSQL

   -> Fast access
   -> Flexible schemas/tables/entities
   -> Easy to scale
   -> Avoids join

harshitrajlnctcse · September 6, 2023, 3:38am

nice notes @moni97kumar

knmsurajmishra001 · September 6, 2023, 1:07pm

Ans-:1 NoSQL, which stands for “Not Only SQL,” is a broad category of database management systems that are designed to store and retrieve data in ways that differ from traditional relational databases (SQL databases). NoSQL databases are typically used for handling large volumes of unstructured or semi-structured data, and they provide flexibility and scalability that can be advantageous in certain applications.

Key characteristics of NoSQL databases include:

Schema-less or Schema-flexible: NoSQL databases do not require a fixed schema, meaning you can add new fields or data structures without modifying the entire database structure. This flexibility is particularly useful in scenarios where data structures are subject to change.
Distributed and Scalable: Many NoSQL databases are designed to be distributed across multiple servers or nodes, making it easier to scale horizontally as data volumes grow. This distributed architecture improves performance and fault tolerance.
3.Non-relational: Unlike relational databases, NoSQL databases do not rely on structured tables with rows and columns. Instead, they use data models that better suit the specific requirements of the application.
NoSQL databases are often used in modern web and mobile applications, big data processing, real-time analytics, and scenarios where data structures are evolving rapidly. However, it’s important to choose the right type of NoSQL database based on the specific requirements of your application, as each type has its strengths and weaknesses.

Ans-:2
NoSQL databases can be categorized into several types based on their data models and the way they store and retrieve data. Here’s a taxonomy of NoSQL databases that classifies them into four main categories:

Document Stores:

Key Characteristics: Document stores store data in flexible, semi-structured documents. Each document can have a different structure, and they are typically represented in formats like JSON or BSON (binary JSON).
Use Cases: Document stores are suitable for applications where data can have varying structures, such as content management systems, e-commerce platforms, and user profiles.
Examples: MongoDB, Couchbase, CouchDB

Key-Value Stores:

Key Characteristics: Key-value stores store data as pairs of keys and values. They are simple and efficient for read and write operations but offer limited querying capabilities.
Use Cases: Key-value stores are often used for caching, session management, and scenarios where fast data retrieval is crucial.
Examples: Redis, Amazon DynamoDB, Riak

Column-Family Stores:

Key Characteristics: Column-family stores organize data into columns rather than rows. They are designed for handling large volumes of data with high write throughput and are often used in distributed systems.
Use Cases: Column-family stores are suitable for time-series data, sensor data, and scenarios where data needs to be stored and retrieved quickly.
Examples: Apache Cassandra, HBase, ScyllaDB

Graph Databases:

Key Characteristics: Graph databases are designed to model and query data with complex relationships. They use nodes and edges to represent entities and their connections in a graph structure.
Use Cases: Graph databases excel in applications like social networks, recommendation engines, fraud detection, and any scenario involving interrelated data.
Examples: Neo4j, Amazon Neptune, OrientDB

Ans:-3
The CAP theorem, also known as Brewer’s theorem, is a concept in distributed computing that describes the trade-offs between three essential properties of a distributed system: Consistency, Availability, and Partition tolerance. Eric Brewer introduced this theorem in 2000, and it has since become a fundamental principle for understanding the behavior of distributed systems.

The CAP theorem states that, in a distributed system, you can achieve at most two out of the following three properties simultaneously:

Consistency (C): (in terms of delays consistency)delay can be accepted like in hotstar match but in satelite launch there no delay can be accepted.
All nodes in the system see the same data at the same time. In other words, when a write operation is successful, all subsequent read operations will return the updated data. This ensures that there is no divergence in the data seen by different nodes.
Availability (A): it’s required no delay when it’s delay then people switch to another option it means always available when need in emergency situation for example hospital Every request (read or write) to the system receives a response without guaranteeing that it contains the most up-to-date data. In other words, the system is operational and responsive even if some nodes are unavailable or there’s a network partition. It doesn’t mean all nodes are necessarily available or that the data is consistent across all nodes.
Partition Tolerance (P): The system can continue to operate correctly, even in the presence of network partitions or communication failures that prevent some nodes from communicating with others. Network partitions can occur due to network failures or delays.

According to the CAP theorem, you must make a trade-off between these properties, and it’s impossible to simultaneously achieve all three in a distributed system. Here are some practical implications of this theorem:

If you prioritize Consistency and Partition Tolerance (CP), it means you want to ensure that your data remains consistent even when network partitions occur. This might lead to reduced availability during network partitions.
If you prioritize Availability and Partition Tolerance (AP), it means you want your system to remain operational and responsive even when network partitions occur. This may result in eventual consistency, where different nodes may have slightly different versions of data, but the system continues to function.
If you prioritize Consistency and Availability (CA), it means you are willing to sacrifice partition tolerance. This approach may work well in systems where network partitions are rare or unlikely.

In practice, different distributed databases and systems make different trade-offs based on their design and intended use cases. For example, some NoSQL databases prioritize AP (e.g., Cassandra), while others prioritize CP (e.g., HBase), and some databases offer tunable consistency levels to allow users to make their own trade-offs.

The choice of which properties to prioritize should be driven by the specific requirements and constraints of your application and the level of tolerance for potential trade-offs between consistency, availability, and partition tolerance.
Ans-:4
NoSQL (Not Only SQL) and RDBMS (Relational Database Management System) are two different types of database management systems, each with its own strengths, weaknesses, and best use cases. Here’s a comparison of NoSQL and RDBMS based on various factors:

Data Model:

NoSQL: NoSQL databases support various data models, including document, key-value, column-family, and graph. These databases are schema-flexible, allowing you to store unstructured or semi-structured data.
RDBMS: RDBMS uses a tabular structure with well-defined schemas consisting of tables, rows, and columns. Data must conform to the predefined schema.

Scalability:

NoSQL: NoSQL databases are designed for horizontal scalability. They can handle large amounts of data and high write throughput by adding more servers or nodes to the system.
RDBMS: Traditional RDBMS systems are typically scaled vertically by adding more resources (CPU, RAM) to a single server. Horizontal scaling can be challenging and may require complex sharding or partitioning strategies.

Consistency:

NoSQL: NoSQL databases often provide options for tunable consistency. Depending on the chosen consistency level, they may prioritize availability and partition tolerance over strong consistency.
RDBMS: RDBMS systems typically offer strong consistency, ensuring that data remains consistent across all transactions. This may result in blocking or reduced availability during updates.

Query Language:

NoSQL: NoSQL databases often use specialized query languages or APIs tailored to their data models. These languages may not support complex SQL-like queries.
RDBMS: RDBMS systems use SQL (Structured Query Language), which is a powerful and standardized language for querying and manipulating data.

ACID Transactions:

NoSQL: NoSQL databases may provide eventual consistency by default, but they may also offer ACID (Atomicity, Consistency, Isolation, Durability) transactions in some cases, depending on the database type and configuration.
RDBMS: RDBMS systems typically offer strong support for ACID transactions, ensuring data integrity and consistency.

Use Cases:

NoSQL: NoSQL databases are well-suited for applications with rapidly changing data structures, high write loads, or where horizontal scalability is essential. Common use cases include real-time analytics, content management systems, IoT, and social media platforms.
RDBMS: RDBMS systems excel in applications with well-defined schemas, complex relationships between data, and where data consistency is critical. They are often used in financial systems, e-commerce, and traditional business applications.

Complex Relationships:

NoSQL: NoSQL databases can handle complex relationships, but graph databases are specifically designed for efficiently traversing and querying highly interconnected data.
RDBMS: RDBMS systems are well-suited for managing complex relationships through JOIN operations and foreign key constraints.

Examples:

NoSQL: Examples of NoSQL databases include MongoDB, Cassandra, Redis, and Neo4j.
RDBMS: Examples of RDBMS systems include MySQL, PostgreSQL, Oracle Database, and Microsoft SQL Server.

Core Terminology Differences:-
     - SQL 
         => Relational Databases
         => Stored data in Tables 
         => Database consists of many "tables".
         => Every table will have Rows are called as "Records/Data".
         => Columns are called as "Attributes".

     - NoSQL:
         => Non-Relational Databases
         => Actual data is stored in Documents(rows).
         => Database consists of many "collections".
         => Rows are called as "Documents". In MongoDB, it is known as "Focus".
         => Columns are called as "Fields".

Ans-:5
Benefits of NoSQL Databases:

Schema flexibility
Scalability
High write throughput
Fast read and write operations
Support for various data models
Distribution and fault tolerance
Horizontal scaling

Drawbacks of NoSQL Databases:

Limited query capabilities
Consistency trade-offs
Learning curve
Data integrity challenges
Lack of standardization
Limited transactions in some cases
Maturity and ecosystem variation