PostgreSQL Internals:
PostgreSQL is a SQL database with a structured schema of tables with rrows and columns.
PostgreSQL support advanced datatypes like JSON ,XML and JSONB.
Key Differences between JSON and JSONB
Storage Format:
JSON: Stores data as plain text.
JSONB: Stores data in a parsed, binary format.
Performance:
JSON: Slower for search operations because it must be parsed each time.
JSONB: Faster for search and indexing because the data is pre-parsed.
Flexibility:
JSON: Preserves the input format, including whitespace, and the order of keys.
JSONB: Removes extra whitespace and does not preserve the order of keys.
Indexing:
JSON: Does not support indexing of JSON fields.
JSONB: Supports indexing, making it faster for querying.
Connection Management:
Note: clients mean not user it is application
PostgreSQL implements ‘Process Per User’ Client/Server Model.
Core idea: Every Client Process Connects to Exactly one Backend Process.
it very expensive but it has it’s own advantages
PostMaster Process:
PostMaster Process is the main “Supervisor“ that accepts client Connections on the port (default 5432) and creates Backend Process to handle.
Backend Process is the one that parses process and executes the query, while postmaster continues to accept client connections.
Shared Buffers: PostgreSQL uses a shared memory area known as shared buffers, which caches frequently accessed data from the disk. This allows all processes to access the same cached data without needing to load it multiple times.
- When a process reads data from a table, it first checks the shared buffers. If the data is already cached there (from a previous read by another process), it can directly access it from memory. If not, the data is loaded from disk into the shared buffers and made available to all other processes.
These Backend Process communicate with each other through a number of shared memory regions.
we can observe Backend Process running in postgreSQL by Querying ‘pg-stat-activity’.
How are Tcp Connections shared with Backend Process?
[threads can share the memory but process are isolated]
Client Created Connection with Postmaster
Postmaster forked out a backend process.
Backend Process Executes the query.
How will Backend Process Send Response to Tcp Connection setUp in the Postmater?
Because child process inherits the file descriptors from parent process? And Because even socket = File , the child can access socket from any parent.
Why Process per user Model?
Cleaning up every process is catestrpoic
pgBouncer plays a role for the proxy
Fault Tolerance: if one connection / backend process leads to a fatal panic it would not affect other connection/query execution?
How the Databases Joins work? Nested Loops, Sort-Merge Joins, Hashtable join.
Postgres Query Parsing:
Postgres passes the query to understand what needs to be done.
parser stage: grammer is defined in the file scan.l and gram.y
Lexer(.l file)→ Tokenizers,Recoginzes identifiers and output “tokens“
parser → grammer rules with actions
when every query pass this lexer will identifiers the takens and pass into the parser
What gets executed is C code ,So Scan.l and gram.y are transformed to c
Scan.l ————> scan.c (flex [ unix tools]).
gram.y —————> gram.c (bison [ unix tools]).