{"id":3548,"date":"2023-09-04T19:31:20","date_gmt":"2023-09-04T18:31:20","guid":{"rendered":"https:\/\/kudzia.eu\/b\/?p=3548"},"modified":"2023-09-04T19:31:20","modified_gmt":"2023-09-04T18:31:20","slug":"bug-in-postgresqls-gin-index","status":"publish","type":"post","link":"https:\/\/kudzia.eu\/b\/2023\/09\/bug-in-postgresqls-gin-index\/","title":{"rendered":"bug in postgresql&#8217;s GIN index"},"content":{"rendered":"\n<p>it&#8217;s a story without happy end. i&#8217;m absolutely sure there&#8217;s an index-corruption bug in PostgreSQL 13; since i have not seen any mention of it being fixed &#8211; it&#8217;s likely in the following releases as well. i&#8217;ve <a href=\"https:\/\/www.postgresql.org\/message-id\/flat\/CAJYBUS9AaEgFTXnZnAWir993-MOVLnXefQmyLd9erXewZhXiHA%40mail.gmail.com#2feb57d1f21a201e0b391bfd40b0f9fb\">tried to get attention to it<\/a> on the pgsql-bugs mailing list but i failed. likely because i&#8217;ve never found a way to reliably reproduce it. it happens on our production system, after few days or weeks full of intensive inserts and updates.<\/p>\n\n\n\n<p>we have a table with structure as below:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE public.entity (\r\n    entity_id bigint NOT NULL,\r\n    attribute_value_ids integer&#91;] NOT NULL,\r\n    attribute_name_ids integer&#91;] NOT NULL,\r\n&#91;..]\r\n);\r\n\r\nALTER TABLE ONLY public.entity ADD CONSTRAINT entity_pkey PRIMARY KEY\r\n(entity_id);\r\nCREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin\r\n(attribute_name_ids public.gin__int_ops);\r\nCREATE INDEX entity_attribute_value_ids_gin ON public.entity USING gin\r\n(attribute_value_ids public.gin__int_ops);\r\n\r<\/code><\/pre>\n\n\n\n<p>such a table holds ~50M rows; we make few million of UPDATEs per day, all of them change values of columns covered with GIN indices. <\/p>\n\n\n\n<p>after a while query that uses GIN index starts returning rows that don&#8217;t meet the criteria:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT entity_id FROM entity WHERE ( attribute_name_ids &amp;&amp; '{4980}' )\r\n<\/code><\/pre>\n\n\n\n<p>would returns rows which DO NOT have 4980 in the attribute_name_ids array.<\/p>\n\n\n\n<p>more embarrassingly &#8211; query as below, which should never return any row, returns some:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT entity_id FROM entity WHERE ( attribute_name_ids &amp;&amp; '{4980}' )\r\nAND NOT ( (attribute_name_ids||0) &amp;&amp; '{4980}') ;\n<\/code><\/pre>\n\n\n\n<p>how do we live with this madness? every few hours we run below and pretend that problem isn&#8217;t that big. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>REINDEX INDEX CONCURRENTLY entity_attribute_value_ids_gin<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>within few months we&#8217;ll upgrade our servers to PostgreSQL 15, turn off the reindexing cron job and test heavily to see if we can spot similar behavior.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>it&#8217;s a story without happy end. i&#8217;m absolutely sure there&#8217;s an index-corruption bug in PostgreSQL 13; since i have not seen any mention of it being fixed &#8211; it&#8217;s likely in the following releases as well. i&#8217;ve tried to get attention to it on the pgsql-bugs mailing list but i failed. likely because i&#8217;ve never [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[121,51],"tags":[],"class_list":["post-3548","post","type-post","status-publish","format-standard","hentry","category-postgresql","category-unimportant"],"_links":{"self":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/comments?post=3548"}],"version-history":[{"count":1,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3548\/revisions"}],"predecessor-version":[{"id":3549,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3548\/revisions\/3549"}],"wp:attachment":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/media?parent=3548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/categories?post=3548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/tags?post=3548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}