PostgreSQL索引(2) – 接口

2020-10-01 孟庆钟 Comments 0 Comment

本文翻译自Egor Rogov的英文博客，且已征得Egor Rogov的同意。译者做了少量修改。

审校：纪昀红，中国人民大学信息学院在读博士生。

第一篇文章中提到，一个AM必须能够提供关于它自己的信息。下面介绍AM接口的结构。

1. 属性

AM的所有属性都存储在pg_am(“am”是access method的简称)系统表中。我们可以从这张系统表中查出所有可用的AM列表：

select amname from pg_am;

 amname
--------
 btree
 hash
 gist
 gin
 spgist
 brin
(6 rows)

虽然顺序扫描也是一种AM，但由于历史原因，它没有被列入其中。

在PG 9.5及之前，每个属性用pg_am表中的一个字段表示。从PG 9.6开始，需要使用特殊的函数查询这些属性，并且这些属性被分为三个层次。

AM层（使用这个AM的所有索引）的属性：pg_indexam_has_property
某个具体索引的属性：pg_index_has_property
某个具体索引中单个列的属性：pg_index_column_has_property

前两个层次主要为未来考虑，目前同一种AM的所有索引都具有相同的属性。

下面4个属性是AM层的属性（以btree为例）：

select a.amname, p.name, pg_indexam_has_property(a.oid,p.name)
from pg_am a,
     unnest(array[
         'can_order','can_unique','can_multi_col','can_exclude'
     ]) p(name)
where a.amname = 'btree'
order by a.amname;

 amname |     name      | pg_indexam_has_property
--------+---------------+-------------------------
 btree  | can_order     | t
 btree  | can_unique    | t
 btree  | can_multi_col | t
 btree  | can_exclude   | t
(4 rows)

can_order：AM是否可以按某种顺序返回结果（比如，从小到大，仅仅Btree支持）
can_unique：是否支持唯一性和主键（仅Btree支持）
can_multi_col：是否支持多列索引
can_exclude：是否支持排它约束

下面的属性是某个具体索引的属性：

select p.name, pg_index_has_property('t_a_idx'::regclass,p.name)
from unnest(array[
       'clusterable','index_scan','bitmap_scan','backward_scan'
     ]) p(name);

     name      | pg_index_has_property
---------------+-----------------------
 clusterable   | t
 index_scan    | t
 bitmap_scan   | t
 backward_scan | t
(4 rows)

clusterable：是否可以使用这个索引进行cluster（将表中的数据按照此索引的顺序存放）
index_scan：是否支持Index Scan，这个属性看起来有些奇怪，其实，并不是所有的索引都可以依次返回每个TID —— 有的索引只能一次性返回所有结果，即仅支持位图扫描
bitmap_scan：是否支持位图扫描
backward_scan：是否支持反向扫描，即按照与建索引时相反的顺序返回结果

下面是具体索引中某个列的属性：

select p.name,
     pg_index_column_has_property('t_a_idx'::regclass,1,p.name)
from unnest(array[
       'asc','desc','nulls_first','nulls_last','orderable',
       'distance_orderable','returnable','search_array',
       'search_nulls'
     ]) p(name);

        name        | pg_index_column_has_property
--------------------+------------------------------
 asc                | t
 desc               | f
 nulls_first        | f
 nulls_last         | t
 orderable          | t
 distance_orderable | f
 returnable         | t
 search_array       | t
 search_nulls       | t
(9 rows)

asc, desc, nulls_first, nulls_last, orderable：这些属性与对列值的排序相关（将在介绍Btree索引时展开）
distance_orderable：是否可以按操作符所决定的顺序返回结果（仅GiST和RUM索引支持）
returnable：是否支持Index Only Scan
search_array：是否可以直接将数组作为查询条件，即使用indexed-field IN (list_of_constants)和indexed-field = ANY(array_of_constants)中的数组，例如a in (1,2,3,4,5)中的(1,2,3,4,5)，仅Btree索引支持
search_nulls：是否支持使用IS NULL或IS NOT NULL作为条件查询

我们已经详细地讨论了一些属性。有一些属性与具体的AM相关，我们将在讨论各种AM时继续介绍。

操作符类和操作符族

除了上文介绍的AM的接口之后，还需要知道一个AM可以支持哪些数据类型和操作符。为了达到这个目的，PG引入了操作符类（operator class）和操作符族（operator family）的概念。

一个操作符类包含一个最小的操作符（可能还包含一些辅助函数）的集合，使得索引可以操作某个特定的数据类型。

一个操作符类被某些操作符族包含。更确切的说，一个操作符族可以包含多个操作符类，如果它们具有相同的语义。例如，integer_ops族包含int8_ops、int4_ops、int2_ops三个类。这三个类分别对应bigint、integer、smallint三个数据类型, 它们的精度不同，但是语义相同：

select opfname, opcname, opcintype::regtype
from pg_opclass opc, pg_opfamily opf
where opf.opfname = 'integer_ops'
and opc.opcfamily = opf.oid
and opf.opfmethod = (
      select oid from pg_am where amname = 'btree'
    );

   opfname   | opcname  | opcintype
-------------+----------+-----------
 integer_ops | int2_ops | smallint
 integer_ops | int4_ops | integer
 integer_ops | int8_ops | bigint
(3 rows)

再举一例：datetime_ops族包含操作日期的操作符类

select opfname, opcname, opcintype::regtype
from pg_opclass opc, pg_opfamily opf
where opf.opfname = 'datetime_ops'
and opc.opcfamily = opf.oid
and opf.opfmethod = (
      select oid from pg_am where amname = 'btree'
    );

   opfname    |     opcname     |          opcintype          
--------------+-----------------+-----------------------------
 datetime_ops | date_ops        | date
 datetime_ops | timestamptz_ops | timestamp with time zone
 datetime_ops | timestamp_ops   | timestamp without time zone
(3 rows)

一个操作符族可以包含多个操作符类用来比较不同类型的值。将操作符类按操作符族分组，可以使得优化器在谓词条件中的数据类型与索引列的类型不匹配时照样可以使用索引。操作符族还可以包含其它辅助函数。

大多数情况下，我们无需了解操作符类、操作符族。通常情况下，我们只需使用默认的操作符创建索引即可。

然而，有时需要显式指定具体的操作符类。下面给出一个需要显式指定的例子：在collation不是C的数据库中，常规的索引不支持LIKE操作：

show lc_collate;

 lc_collate 
-------------
 en_US.UTF-8
(1 row)

explain (costs off) select * from t where b like 'A%';

         QUERY PLAN          
-----------------------------
 Seq Scan on t
   Filter: (b ~~ 'A%'::text)
(2 rows)

我们可以使用text_pattern_ops操作符类创建索引，从而消除这个限制。

create index on t(b text_pattern_ops);
explain (costs off) select * from t where b like 'A%';

                           QUERY PLAN                          
----------------------------------------------------------------
 Bitmap Heap Scan on t
   Filter: (b ~~ 'A%'::text)
   ->  Bitmap Index Scan on t_b_idx1
         Index Cond: ((b ~>=~ 'A'::text) AND (b ~<~ 'B'::text))
(4 rows)

系统表

作为本文的总结，下面列出一个简化的系统表的关系图，它展示了与操作符类、操作符族直接相关的一些系统表之间的关系。

PG官方对这些系统表做了详细的介绍。

使用这些系统表，我们不用查询文档，就可以找到很多问题的答案。例如，Btree索引支持什么数据类型？

select opcname, opcintype::regtype
from pg_opclass
where opcmethod = (select oid from pg_am where amname = 'btree')
order by opcintype::regtype::text;

       opcname       |          opcintype          
---------------------+-----------------------------
 abstime_ops         | abstime
 array_ops           | anyarray
 enum_ops            | anyenum
 range_ops           | anyrange
 int8_ops            | bigint
 bit_ops             | bit
 varbit_ops          | bit varying
 bool_ops            | boolean
 bytea_ops           | bytea
 char_ops            | "char"
 bpchar_pattern_ops  | character
 bpchar_ops          | character
 date_ops            | date
 float8_ops          | double precision
 inet_ops            | inet
 cidr_ops            | inet
 int4_ops            | integer
 interval_ops        | interval
 jsonb_ops           | jsonb
 macaddr_ops         | macaddr
 macaddr8_ops        | macaddr8
 money_ops           | money
 name_ops            | name
 numeric_ops         | numeric
 oid_ops             | oid
 oidvector_ops       | oidvector
 pg_lsn_ops          | pg_lsn
 float4_ops          | real
 record_image_ops    | record
 record_ops          | record
 reltime_ops         | reltime
 int2_ops            | smallint
 varchar_pattern_ops | text
 text_pattern_ops    | text
 text_ops            | text
 varchar_ops         | text
 tid_ops             | tid
 timestamp_ops       | timestamp without time zone
 timestamptz_ops     | timestamp with time zone
 time_ops            | time without time zone
 timetz_ops          | time with time zone
 tinterval_ops       | tinterval
 tsquery_ops         | tsquery
 tsvector_ops        | tsvector
 uuid_ops            | uuid
(45 rows)

一个操作符类包含哪些操作符（使得条件中包含这个操作符时可以使用这个索引）？

select amop.amopopr::regoperator
from pg_opclass opc, pg_opfamily opf, pg_am am, pg_amop amop
where opc.opcname = 'int4_ops'
and opf.oid = opc.opcfamily
and am.oid = opf.opfmethod
and amop.amopfamily = opc.opcfamily
and am.amname = 'btree'
and amop.amoplefttype = opc.opcintype;

       amopopr        
----------------------
 <(integer,bigint)
 <=(integer,bigint)
 =(integer,bigint)
 >=(integer,bigint)
 >(integer,bigint)
 <(integer,smallint)
 <=(integer,smallint)
 =(integer,smallint)
 >=(integer,smallint)
 >(integer,smallint)
 <(integer,integer)
 <=(integer,integer)
 =(integer,integer)
 >=(integer,integer)
 >(integer,integer)
(15 rows)

数据库内核研究

MengQingzhong.com

PostgreSQL索引(2) – 接口

2020-10-01 孟庆钟 Comments 0 Comment

1. 属性

操作符类和操作符族

系统表

发表回复取消回复

1. 属性

操作符类和操作符族

系统表

发表回复 取消回复

发表回复取消回复