bug in postgresql’s GIN index

it’s a story without happy end. i’m absolutely sure there’s an index-corruption bug in PostgreSQL 13; since i have not seen any mention of it being fixed – it’s likely in the following releases as well. i’ve tried to get attention to it on the pgsql-bugs mailing list but i failed. likely because i’ve never found a way to reliably reproduce it. it happens on our production system, after few days or weeks full of intensive inserts and updates.

we have a table with structure as below:

CREATE TABLE public.entity (
    entity_id bigint NOT NULL,
    attribute_value_ids integer[] NOT NULL,
    attribute_name_ids integer[] NOT NULL,
[..]
);

ALTER TABLE ONLY public.entity ADD CONSTRAINT entity_pkey PRIMARY KEY
(entity_id);
CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids public.gin__int_ops);
CREATE INDEX entity_attribute_value_ids_gin ON public.entity USING gin
(attribute_value_ids public.gin__int_ops);

such a table holds ~50M rows; we make few million of UPDATEs per day, all of them change values of columns covered with GIN indices.

after a while query that uses GIN index starts returning rows that don’t meet the criteria:

SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{4980}' )

would returns rows which DO NOT have 4980 in the attribute_name_ids array.

more embarrassingly – query as below, which should never return any row, returns some:

SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{4980}' )
AND NOT ( (attribute_name_ids||0) && '{4980}') ;

how do we live with this madness? every few hours we run below and pretend that problem isn’t that big.

REINDEX INDEX CONCURRENTLY entity_attribute_value_ids_gin

within few months we’ll upgrade our servers to PostgreSQL 15, turn off the reindexing cron job and test heavily to see if we can spot similar behavior.

Leave a Reply Cancel reply