DatabaseEngineering

1.Nested Join - nested for loop - O (N * N).

- easiest to implement - time consuming for large dataset

- better with either small datasets or index on join attributes

2.Merge Join - sort them first and compare - O(NlogN) + O(NlogN)

- Efficient for large datasets

- pre-join preparation required (sorting)

- scanning of relation happens once while joining

- can leverage indexes if available to make it faster

3.Hash Join - using hashing

- not good for range queries

- creating a hash table for the query you are searching ( user_id is the key)

- used for equi joins

- efficient for large datasets

- requires additional memory

- pre join preparation required

-> hash table construction - Hash function should distribute data evenly

SQL Engine: Take a look at data, look statistics across all table using cardinality

Databases SQL VS NOSQL:

what if your application logic is present in database like it we want to restart the application logic then the database restart is heavier

Is it database is file system? like that but file system is for not structure data and we can’t read one record of data and databases are persistent stores.

How do database store data?

Hardware: Magnetic tapes, solid state disk(exprensive then magnetic tape);

Algorithm used in Datastores: B+ trees, pages, Hash Tables, pointers.

How do datastores read data?

Hash Tables and B+ trees are uses indexes and fast lookup → hashtables and foreign keys (pointers)

what is graph database?

stores data internally node and edges. and used to perform graph queries efficiently. Some relation db perform graph operation like having a top layer for db which as graph operation layer and these ar enot optimizer and not efficiency.

what is time series database?

stores record that are part of the time series and aggregate and compress time-stamped data

if you are looking to store metric data or monitory data work well and most of these are return on top of sql db

what is object oriented databases?

designed to worked with complex data objects

what kind of database you choose?

PostgreSQL high consistency, durability and in terms of availability it doesn’t provide any of the shelf you need read replica.

Cassadra: as a cluster arch if one of the nodes down another nodes takes place and not provide much consistency but in-built fault tolerance.

noe4j: excellent for graph

As engineer database is abstract to provide read and writes