CrashLoopBackOff

Dev, Ops, etc.

The Postgresql logo

Turning rows into nested objects

Beating object-relational impedance mismatch with Postgres

The data representation when querying a relational database often doesn’t match the representation we use inside our programming language of choice.

This challenge is called Object-relational impedance mismatch, and is apparent when we want to translate denormalised data in rows to nested objects and arrays, especially when joining across tables in a one-to-many or many-to-many scenario.

Fortunately, Postgres provides LATERAL subqueries coupled with native JSON processing to make this easy!

Don’t blindly trust the recommendations of a stranger on the internet!

If you are taking this approach - especially in your application’s hot path - then I’d suggest that you benchmark it against the alternatives (ie. returning joined rows and processing in the app).

I personally rely on this method when doing any investigation, especially when coupled with jq.

JOINing in

Given the following DDL the creates 3 tables with a ‘many-to-many’ relationship:

create table posts (
id uuid not null,
name text not null,
content text not null,
primary key(id)
);
create table tags (
id uuid not null,
name text not null,
primary key(id)
);
create table post_tags (
post_id uuid not null,
tag_id uuid not null,
primary key(post_id, tag_id),
foreign key(post_id) references posts(id),
foreign key(tag_id) references tags(id)
);

Which we seed this with some data:

insert into posts (id, name, content)
values
('bc963376-b5c4-4cf4-9705-999abd515c96', 'Chasing the Gopher', '...'),
('fd154bb5-d112-4a54-845e-bdc57d72ab08', 'Feeling Rusty', '...'),
('ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0', 'Taming the Python', '...')
;
insert into tags (id, name)
values
('a64b2f87-504d-4f3e-b989-916ffc0a507b', 'statically-typed'),
('31fad421-385a-484c-ad78-72dc4b9afaba', 'dynamically-typed'),
('dde3159b-f7bf-46e0-b096-a9ec123bec90', 'garbage-collected'),
('46b649c6-7e5e-4327-8a2a-610c86800f03', 'manual-memory-management')
;
insert into post_tags (post_id, tag_id)
values
(
'bc963376-b5c4-4cf4-9705-999abd515c96', -- Chasing the Gopher
'a64b2f87-504d-4f3e-b989-916ffc0a507b' -- statically-typed
),
(
'fd154bb5-d112-4a54-845e-bdc57d72ab08', -- Feeling Rusty
'a64b2f87-504d-4f3e-b989-916ffc0a507b' -- statically-typed
),
(
'ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0', -- Taming the Python
'31fad421-385a-484c-ad78-72dc4b9afaba' -- dynamically-typed
),
(
'bc963376-b5c4-4cf4-9705-999abd515c96', -- Chasing the Gopher
'dde3159b-f7bf-46e0-b096-a9ec123bec90' -- statically-typed
),
(
'ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0', -- Taming the Python
'dde3159b-f7bf-46e0-b096-a9ec123bec90' -- garbage-collected
),
(
'fd154bb5-d112-4a54-845e-bdc57d72ab08', -- Feeling Rusty
'46b649c6-7e5e-4327-8a2a-610c86800f03' -- manual-memory-management
)
;

Now we can query these tables using regular joins to attach tags to posts:

select * from posts
left join post_tags on posts.id = post_tags.post_id
left join tags on tags.id = post_tags.tag_id

Which results in the following…

idnamecontentpost_idtag_ididname
bc963376-b5c4-4cf4-9705-999abd515c96Chasing the Gopherbc963376-b5c4-4cf4-9705-999abd515c96a64b2f87-504d-4f3e-b989-916ffc0a507ba64b2f87-504d-4f3e-b989-916ffc0a507bstatically-typed
fd154bb5-d112-4a54-845e-bdc57d72ab08Feeling Rustyfd154bb5-d112-4a54-845e-bdc57d72ab08a64b2f87-504d-4f3e-b989-916ffc0a507ba64b2f87-504d-4f3e-b989-916ffc0a507bstatically-typed
ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0Taming the Pythonceb9082f-eb92-4cc6-b0f5-3f358ff71bd031fad421-385a-484c-ad78-72dc4b9afaba31fad421-385a-484c-ad78-72dc4b9afabadynamically-typed
bc963376-b5c4-4cf4-9705-999abd515c96Chasing the Gopherbc963376-b5c4-4cf4-9705-999abd515c96dde3159b-f7bf-46e0-b096-a9ec123bec90dde3159b-f7bf-46e0-b096-a9ec123bec90garbage-collected
ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0Taming the Pythonceb9082f-eb92-4cc6-b0f5-3f358ff71bd0dde3159b-f7bf-46e0-b096-a9ec123bec90dde3159b-f7bf-46e0-b096-a9ec123bec90garbage-collected
fd154bb5-d112-4a54-845e-bdc57d72ab08Feeling Rustyfd154bb5-d112-4a54-845e-bdc57d72ab0846b649c6-7e5e-4327-8a2a-610c86800f0346b649c6-7e5e-4327-8a2a-610c86800f03manual-memory-management

As you can see, each post is duplicated for every associate tag.

Moving LATERALy

Leveraging Postgres’ LATERAL subqueries, and json aggeregation functions such as json_agg, we can vastly simplify the resulting data:

create view posts_with_tags as
select
posts.*,
tags.tags
from posts
left join lateral (
select
coalesce(json_agg(tags.*), '[]'::json) as tags
from post_tags
inner join tags
on tags.id = post_tags.tag_id
where post_tags.post_id = posts.id
) tags on true;

And now, when we query this view:

select * from posts_with_tags;
idnamecontenttags
bc963376-b5c4-4cf4-9705-999abd515c96Chasing the Gopher[{"id":"a64b2f87-504d-4f3e-b989-916ffc0a507b","name":"statically-typed"},{"id":"dde3159b-f7bf-46e0-b096-a9ec123bec90","name":"garbage-collected"}]
fd154bb5-d112-4a54-845e-bdc57d72ab08Feeling Rusty[{"id":"a64b2f87-504d-4f3e-b989-916ffc0a507b","name":"statically-typed"},{"id":"46b649c6-7e5e-4327-8a2a-610c86800f03","name":"manual-memory-management"}]
ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0Taming the Python[{"id":"31fad421-385a-484c-ad78-72dc4b9afaba","name":"dynamically-typed"},{"id":"dde3159b-f7bf-46e0-b096-a9ec123bec90","name":"garbage-collected"}]

And we can even spit the whole thing out as JSON:

with data as (
select * from posts_with_tags
)
select json_agg(data) from data
[
{
"id": "bc963376-b5c4-4cf4-9705-999abd515c96",
"name": "Chasing the Gopher",
"content": "...",
"tags": [
{
"id": "a64b2f87-504d-4f3e-b989-916ffc0a507b",
"name": "statically-typed"
},
{
"id": "dde3159b-f7bf-46e0-b096-a9ec123bec90",
"name": "garbage-collected"
}
]
},
{
"id": "fd154bb5-d112-4a54-845e-bdc57d72ab08",
"name": "Feeling Rusty",
"content": "...",
"tags": [
{
"id": "a64b2f87-504d-4f3e-b989-916ffc0a507b",
"name": "statically-typed"
},
{
"id": "46b649c6-7e5e-4327-8a2a-610c86800f03",
"name": "manual-memory-management"
}
]
},
{
"id": "ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0",
"name": "Taming the Python",
"content": "...",
"tags": [
{
"id": "31fad421-385a-484c-ad78-72dc4b9afaba",
"name": "dynamically-typed"
},
{
"id": "dde3159b-f7bf-46e0-b096-a9ec123bec90",
"name": "garbage-collected"
}
]
}
]

A note on denormalisation

Another way of tackling this problem is through data denormalisation.

This approach involves storing related data together so that reads are simpler and faster due to lack of joins. For example, this might mean storing tags and comments as part of a post.

In Postgres you could achieve this with an array or JSON column, and heavily denormalised systems generally leverage dedicated document DBs (ie. MongoDB), completely foregoing relational DBs.

While this optimises for reads, it then becomes more costly and difficult to update, as you now have to update any related resources in every place it is stored. For example, if a tag was renamed from k8s to kubernetes, every post must then be updated individually to reflect this change.

One high profile instance of this was Instagram - source.