BRIN Index

select version(); version ----------------------------------------------------------------------------------------------------- PostgreSQL 10beta1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18) 6.3.0 20170516, 64-bit (1 ligne) create unlogged table ventes(id integer generated by default as identity, dtv timestamp, mtv integer); CREATE TABLE insert into ventes(dtv,mtv) select generate_series(date_trunc('second', current_timestamp-interval '1 year'), date_trunc('second', current_timestamp), '1 second'), trunc(random() * 100000 + 1); INSERT 0 31536001 Time: 111241,341 ms (01:51,241) analyze ventes; ANALYZE Temps : 950,687 ms \d+ Liste des relations Schéma | Nom | Type | Propriétaire | Taille | Description --------+---------------+----------+--------------+------------+------------- public | ventes | table | postgres | 1570 MB | public | ventes_id_seq | séquence | postgres | 8192 bytes | (2 lignes) create index ventes_br1 on ventes using brin(dtv); CREATE INDEX Time: 14416,084 ms (00:14,416) explain select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2017','DD/MM/YYYY') and dtv < to_timestamp('27/03/2017','DD/MM/YYYY'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=214048.09..214048.10 rows=1 width=32) -> Gather (cost=214047.87..214048.08 rows=2 width=32) Workers Planned: 2 -> Partial Aggregate (cost=213047.87..213047.88 rows=1 width=32) -> Parallel Bitmap Heap Scan on ventes (cost=325.42..211737.89 rows=523992 width=4) Recheck Cond: ((dtv >= to_timestamp('13/03/2017'::text, 'DD/MM/YYYY'::text)) AND (dtv < to_timestamp('27/03/2017'::text, 'DD/MM/YYYY'::text))) -> Bitmap Index Scan on ventes_br1 (cost=0.00..11.02 rows=1265457 width=0) Index Cond: ((dtv >= to_timestamp('13/03/2017'::text, 'DD/MM/YYYY'::text)) AND (dtv < to_timestamp('27/03/2017'::text, 'DD/MM/YYYY'::text))) (8 lignes) select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2017','DD/MM/YYYY') and dtv < to_timestamp('27/03/2017','DD/MM/YYYY'); avg -------------------- 50005.625440298507 (1 ligne) Temps : 939,459 ms ... Temps : 893,945 ms ... Temps : 874,450 ms update ventes set dtv = dtv - (2*id || ' seconds')::interval; UPDATE 31536001 Time: 168223,692 ms (02:48,224) analyze ventes; ANALYZE Time: 1015,110 ms (00:01,015) explain select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2016','DD/MM/YYYY') and dtv < to_timestamp('27/03/2016','DD/MM/YYYY'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=425256.12..425256.13 rows=1 width=32) -> Gather (cost=425255.90..425256.11 rows=2 width=32) Workers Planned: 2 -> Partial Aggregate (cost=424255.90..424255.91 rows=1 width=32) -> Parallel Bitmap Heap Scan on ventes (cost=594.99..421830.41 rows=970196 width=4) Recheck Cond: ((dtv >= to_timestamp('13/03/2016'::text, 'DD/MM/YYYY'::text)) AND (dtv < to_timestamp('27/03/2016'::text, 'DD/MM/YYYY'::text))) -> Bitmap Index Scan on ventes_br1 (cost=0.00..12.88 rows=2340290 width=0) Index Cond: ((dtv >= to_timestamp('13/03/2016'::text, 'DD/MM/YYYY'::text)) AND (dtv < to_timestamp('27/03/2016'::text, 'DD/MM/YYYY'::text))) (8 lignes) select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2016','DD/MM/YYYY') and dtv < to_timestamp('27/03/2016','DD/MM/YYYY'); avg -------------------- 49998.774835482804 (1 ligne) Time: 15135,023 ms (00:15,135) ... Time: 14742,244 ms (00:14,742) ... Time: 13735,830 ms (00:13,736) reindex index ventes_br1; REINDEX Time: 20509,522 ms (00:20,510) select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2016','DD/MM/YYYY') and dtv < to_timestamp('27/03/2016','DD/MM/YYYY'); avg -------------------- 49998.774835482804 (1 ligne) Temps : 927,762 ms ... Temps : 893,742 ms ... Temps : 893,876 ms drop table ventes; DROP TABLE Time: 1679,116 ms (00:01,679) create unlogged table ventes(id integer generated by default as identity, dtv timestamp, mtv integer); CREATE TABLE insert into ventes(dtv,mtv) select generate_series(date_trunc('second', current_timestamp-interval '1 year'), date_trunc('second', current_timestamp), '1 second'), trunc(random() * 100000 + 1); INSERT 0 31536001 Time: 111241,341 ms (01:51,241) analyze ventes; ANALYZE Temps : 665,498 ms create index ventes_i1 on ventes(dtv); CREATE INDEX Time: 42692,040 ms (00:42,692) explain select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2017','DD/MM/YYYY') and dtv < to_timestamp('27/03/2017','DD/MM/YYYY'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=30209.76..30209.77 rows=1 width=32) -> Gather (cost=30209.55..30209.76 rows=2 width=32) Workers Planned: 2 -> Partial Aggregate (cost=29209.55..29209.56 rows=1 width=32) -> Parallel Index Scan using ventes_i1 on ventes (cost=0.57..27958.52 rows=500410 width=4) Index Cond: ((dtv >= to_timestamp('13/03/2017'::text, 'DD/MM/YYYY'::text)) AND (dtv < to_timestamp('27/03/2017'::text, 'DD/MM/YYYY'::text))) (6 lignes) select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2017','DD/MM/YYYY') and dtv < to_timestamp('27/03/2017','DD/MM/YYYY'); avg -------------------- 50069.208669983416 (1 ligne) Temps : 474,721 ms ... Temps : 401,297 ms ... Temps : 391,558 ms update ventes set dtv = dtv - (2*id || ' seconds')::interval; UPDATE 31536001 Time: 299068,647 ms (04:59,069) analyze ventes; ANALYZE Time: 1055,081 ms (00:01,055) explain select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2016','DD/MM/YYYY') and dtv < to_timestamp('27/03/2016','DD/MM/YYYY'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=63390.33..63390.34 rows=1 width=32) -> Gather (cost=63390.11..63390.32 rows=2 width=32) Workers Planned: 2 -> Partial Aggregate (cost=62390.11..62390.12 rows=1 width=32) -> Parallel Index Scan using ventes_i1 on ventes (cost=0.57..59857.68 rows=1012974 width=4) Index Cond: ((dtv >= to_timestamp('13/03/2016'::text, 'DD/MM/YYYY'::text)) AND (dtv < to_timestamp('27/03/2016'::text, 'DD/MM/YYYY'::text))) (6 lignes) select avg(mtv) from ventes where dtv >= to_timestamp('13/03/2016','DD/MM/YYYY') and dtv < to_timestamp('27/03/2016','DD/MM/YYYY'); avg -------------------- 49957.756269014550 (1 ligne) Temps : 477,896 ms ... Temps : 419,119 ms ... Temps : 419,120 ms

      Que pouvons-nous constater ?
      Tout d'abord, la présence de l’index BRIN a nettement moins ralenti la mise à jour que la présence de l’index B-Tree : 2min48s contre 4min59s.
      Les index B-Tree nécessitent rarement d’être reconstruits, bien qu’il existe des cas très précis où une réindexation est bénéfique. Après mise à jour, lors d’une recherche portant toujours sur une plage de 14 jours sur 365, l’index B-Tree conserve ici toute son efficacité avec des temps d'exécution restant sous la demi-seconde.
      En revanche l’index BRIN perd complètement son efficacité après l'update. Les temps d’exécution passent de moins de 1 seconde à 14 secondes. Après réindexation, l’index BRIN retrouve cependant son efficacité initiale avec des temps d'exécution repassant sous la seconde. A noter qu’un VACUUM de la table, manuel ou réalisé automatiquement par AUTOVACUUM, recalcule également les min/max évoqués précédemment et a donc le même effet positif.
      La vitesse de mise à jour a donc été privilégiée par rapport à la pertinence de l’index BRIN. Il convient donc, après une mise à jour importante affectant une colonne indexée par un Block Range Index, de considérer une réindexation voire une suppression de l’index. En effet, dans l’exemple fourni, la colonne DTV est toujours candidate a une indexation BRIN mais une mise à jour pourrait très bien entraîner le cas présenté dans la deuxième partie. Le DBA peut donner l’alerte mais il est déjà trop tard si un probléme d’indexation entraînant une perte de performance est découvert en production. L’indexation nécessite une connaissance fonctionnelle et doit avoir été réfléchie au niveau du développement.

Indexation : un BRIN de folie - partie 3

(sujet préalablement traité avec la version 9.5)

Mise à jour : 21/05/2017