PostgreSQL interesting facts

Introduction

This post highlights some powerful features of PostgreSQL, based on our team’s experience at Medici where we implemented architectures using both DynamoDB and PostgreSQL.

Features

JSON Support

One of the standout features of PostgreSQL is its support for JSON data through the json and jsonb types. This feature allows the flexibility of storing structured data without forcing a rigid schema. However, it’s still generally a good idea to prefer a normalized relational design for core data.

Consider this example:

create table people (
  id serial primary key,
  properties jsonb
);

You can insert data in JSON format like so:

insert into people (properties) values ('{"firstName":"Alan", "lastName": "Turing", "alive": false}');
insert into people (properties) values ('{"firstName":"Barbara", "lastName": "Liskov", "alive": true}');
insert into people (properties) values ('{"firstName":"Grace", "lastName": "Hopper", "alive": false}');

Note that jsonb will not preserve the original order of fields:

select * from people;

The result would be:

 id |                          properties
----+---------------------------------------------------------------
  1 | {"alive": false, "lastName": "Turing", "firstName": "Alan"}
  2 | {"alive": true, "lastName": "Liskov", "firstName": "Barbara"}
  3 | {"alive": false, "lastName": "Hopper", "firstName": "Grace"}
(3 rows)

You can also query JSON fields directly:

select id, properties->>'firstName' as first_name from people where properties->>'firstName'='Alan';

This kind of flexibility is powerful but can be computationally expensive if you’re relying on these queries for large data sets.

To improve performance, you can create GIN (Generalized Inverted Index) indexes on jsonb columns:

create index people_properties_idx on people using gin(properties);

With the GIN index in place, the queries involving JSON fields will become more efficient.

UUID

PostgreSQL has native UUID support. Since PostgreSQL 13, you can generate UUIDs without any extension:

create table people (
  id uuid default gen_random_uuid() primary key,
  properties jsonb
);

For older versions, use the uuid-ossp extension:

create extension if not exists "uuid-ossp";

create table people (
  id uuid default uuid_generate_v4() primary key,
  properties jsonb
);

UUIDs are useful for distributed systems since they can be generated without coordination. The tradeoff is storage size: 16 bytes versus 4 bytes for integer or 8 bytes for bigint.

Partial Indexes

Indexes in PostgreSQL consume disk space and memory. Partial indexes allow you to index only the portion of your data that you query frequently, reducing overhead.

If we extend our people table to include a deleted_at timestamp to indicate soft deletion:

create table people (
  id serial primary key,
  properties jsonb,
  deleted_at timestamptz
);

We can create an index that excludes deleted records:

create index active_people_idx on people (id) where deleted_at is null;

This reduces index size and improves query performance for active records.

PostGIS

PostGIS is the most widely used geographic extension for PostgreSQL. It provides spatial data types and functions for geographic queries.

Performance depends on how you use spatial operations. For example, ORDER BY ST_Distance(a, b) is slower than using the <-> operator, which can leverage spatial indexes.

Materialized Views

Materialized views let you store the result of complex, costly queries as a table. They can be refreshed concurrently, meaning you can keep data relatively up-to-date without locking the view during the refresh. However, there’s no built-in way to automatically refresh these views when the underlying tables change, so you’ll need to implement a mechanism for that.

GIN Indexes for Arrays

PostgreSQL supports GIN indexes for efficiently querying arrays. Imagine you have a chat application and store participants as an array:

create table chats (
  id serial primary key,
  participants text[]
);

With a GIN index on the participants column, you can efficiently query chats involving certain users.

Foreign Keys and MVCC

Foreign keys are useful for ensuring data integrity, particularly in cases involving financial data. However, they can be a performance bottleneck, especially for high write loads.

PostgreSQL uses Multi-Version Concurrency Control (MVCC), which allows readers and writers to operate concurrently without blocking each other. However, long-running transactions can cause table bloat and lock contention.

Miscellaneous Features

UPSERT (INSERT INTO ... ON CONFLICT DO ...) allows for conflict resolution during inserts, though it only works for one constraint at a time.
Queue Management: PostgreSQL supports queues through NOTIFY and LISTEN, or using extensions like pgq. While not the best choice for queues compared to specialized systems like RabbitMQ, it can be handy if you want to avoid adding another service.

Libraries for PostgreSQL

Python: psycopg - the standard PostgreSQL adapter (psycopg 3 is the current version, psycopg2 is still widely used)
Java: Standard JDBC driver, or JDBI for a lighter abstraction
Golang: pgx - the recommended PostgreSQL driver for Go

Conclusion

PostgreSQL offers a rich feature set including JSON support, partial indexes, spatial extensions, and strong transactional guarantees. These capabilities make it suitable for a wide range of applications, from simple CRUD systems to complex analytical workloads.

Introduction#

Features#

JSON Support#

UUID#

Partial Indexes#

PostGIS#

Materialized Views#

GIN Indexes for Arrays#

Foreign Keys and MVCC#

Miscellaneous Features#

Libraries for PostgreSQL#

Conclusion#