Introduction
This post highlights some powerful features of PostgreSQL, based on our team’s experience at Medici where we implemented architectures using both DynamoDB and PostgreSQL.
Features
JSON Support
One of the standout features of PostgreSQL is its support for JSON data through the json and jsonb types. This feature allows the flexibility of storing structured data without forcing a rigid schema. However, it’s still generally a good idea to prefer a normalized relational design for core data.
Consider this example:
create table people (
id serial primary key,
properties jsonb
);
You can insert data in JSON format like so:
insert into people (properties) values ('{"firstName":"Alan", "lastName": "Turing", "alive": false}');
insert into people (properties) values ('{"firstName":"Barbara", "lastName": "Liskov", "alive": true}');
insert into people (properties) values ('{"firstName":"Grace", "lastName": "Hopper", "alive": false}');
Note that jsonb will not preserve the original order of fields:
select * from people;
The result would be:
id | properties
----+---------------------------------------------------------------
1 | {"alive": false, "lastName": "Turing", "firstName": "Alan"}
2 | {"alive": true, "lastName": "Liskov", "firstName": "Barbara"}
3 | {"alive": false, "lastName": "Hopper", "firstName": "Grace"}
(3 rows)
You can also query JSON fields directly:
select id, properties->>'firstName' as first_name from people where properties->>'firstName'='Alan';
This kind of flexibility is powerful but can be computationally expensive if you’re relying on these queries for large data sets.
To improve performance, you can create GIN (Generalized Inverted Index) indexes on jsonb columns:
create index people_properties_idx on people using gin(properties);
With the GIN index in place, the queries involving JSON fields will become more efficient.
UUID
PostgreSQL has native UUID support. Since PostgreSQL 13, you can generate UUIDs without any extension:
create table people (
id uuid default gen_random_uuid() primary key,
properties jsonb
);
For older versions, use the uuid-ossp extension:
create extension if not exists "uuid-ossp";
create table people (
id uuid default uuid_generate_v4() primary key,
properties jsonb
);
UUIDs are useful for distributed systems since they can be generated without coordination. The tradeoff is storage size: 16 bytes versus 4 bytes for integer or 8 bytes for bigint.
Partial Indexes
Indexes in PostgreSQL consume disk space and memory. Partial indexes allow you to index only the portion of your data that you query frequently, reducing overhead.
If we extend our people table to include a deleted_at timestamp to indicate soft deletion:
create table people (
id serial primary key,
properties jsonb,
deleted_at timestamptz
);
We can create an index that excludes deleted records:
create index active_people_idx on people (id) where deleted_at is null;
This reduces index size and improves query performance for active records.
PostGIS
PostGIS is the most widely used geographic extension for PostgreSQL. It provides spatial data types and functions for geographic queries.
Performance depends on how you use spatial operations. For example, ORDER BY ST_Distance(a, b) is slower than using the <-> operator, which can leverage spatial indexes.
Materialized Views
Materialized views let you store the result of complex, costly queries as a table. They can be refreshed concurrently, meaning you can keep data relatively up-to-date without locking the view during the refresh. However, there’s no built-in way to automatically refresh these views when the underlying tables change, so you’ll need to implement a mechanism for that.
GIN Indexes for Arrays
PostgreSQL supports GIN indexes for efficiently querying arrays. Imagine you have a chat application and store participants as an array:
create table chats (
id serial primary key,
participants text[]
);
With a GIN index on the participants column, you can efficiently query chats involving certain users.
Foreign Keys and MVCC
Foreign keys are useful for ensuring data integrity, particularly in cases involving financial data. However, they can be a performance bottleneck, especially for high write loads.
PostgreSQL uses Multi-Version Concurrency Control (MVCC), which allows readers and writers to operate concurrently without blocking each other. However, long-running transactions can cause table bloat and lock contention.
Miscellaneous Features
- UPSERT (
INSERT INTO ... ON CONFLICT DO ...) allows for conflict resolution during inserts, though it only works for one constraint at a time. - Queue Management: PostgreSQL supports queues through
NOTIFYandLISTEN, or using extensions likepgq. While not the best choice for queues compared to specialized systems like RabbitMQ, it can be handy if you want to avoid adding another service.
Libraries for PostgreSQL
- Python: psycopg - the standard PostgreSQL adapter (psycopg 3 is the current version, psycopg2 is still widely used)
- Java: Standard JDBC driver, or JDBI for a lighter abstraction
- Golang: pgx - the recommended PostgreSQL driver for Go
Conclusion
PostgreSQL offers a rich feature set including JSON support, partial indexes, spatial extensions, and strong transactional guarantees. These capabilities make it suitable for a wide range of applications, from simple CRUD systems to complex analytical workloads.