it’s a story without happy end. i’m absolutely sure there’s an index-corruption bug in PostgreSQL 13; since i have not seen any mention of it being fixed – it’s likely in the following releases as well. i’ve tried to get attention to it on the pgsql-bugs mailing list but i failed. likely because i’ve never found a way to reliably reproduce it. it happens on our production system, after few days or weeks full of intensive inserts and updates.
we have a table with structure as below:
CREATE TABLE public.entity (
entity_id bigint NOT NULL,
attribute_value_ids integer[] NOT NULL,
attribute_name_ids integer[] NOT NULL,
[..]
);
ALTER TABLE ONLY public.entity ADD CONSTRAINT entity_pkey PRIMARY KEY
(entity_id);
CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids public.gin__int_ops);
CREATE INDEX entity_attribute_value_ids_gin ON public.entity USING gin
(attribute_value_ids public.gin__int_ops);
such a table holds ~50M rows; we make few million of UPDATEs per day, all of them change values of columns covered with GIN indices.
after a while query that uses GIN index starts returning rows that don’t meet the criteria:
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{4980}' )
would returns rows which DO NOT have 4980 in the attribute_name_ids array.
more embarrassingly – query as below, which should never return any row, returns some:
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{4980}' )
AND NOT ( (attribute_name_ids||0) && '{4980}') ;
how do we live with this madness? every few hours we run below and pretend that problem isn’t that big.
REINDEX INDEX CONCURRENTLY entity_attribute_value_ids_gin
within few months we’ll upgrade our servers to PostgreSQL 15, turn off the reindexing cron job and test heavily to see if we can spot similar behavior.