怎么使用PostgreSQL中的Bloom索引

167次阅读

共计 4445 个字符，预计需要花费 12 分钟才能阅读完成。

这篇文章主要讲解了“怎么使用 PostgreSQL 中的 Bloom 索引”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着丸趣 TV 小编的思路慢慢深入，一起来研究和学习“怎么使用 PostgreSQL 中的 Bloom 索引”吧！

简介
Bloom Index 源于 Bloom filter(布隆过滤器), 布隆过滤器用于在使用少量的空间的情况下可以很快速的判定某个值是否在集合中, 其缺点是存在假阳性 False Positives, 因此需要 Recheck 来判断该值是否在集合中, 但布隆过滤器不存在假阴性, 也就是说, 对于某个值如果过滤器返回不存在, 那就是不存在.

结构
其结构如下图所示:

第一个 page 为 metadata, 然后每一行都会有一个 bit array(signature)和 TID 与其对应.

示例
创建数据表, 插入数据

testdb=# drop table if exists t_bloom;
DROP TABLE
testdb=# CREATE TABLE t_bloom (id int, dept int, id2 int, id3 int, id4 int, id5 int,id6 int,id7 int,details text, zipcode int);
CREATE TABLE
testdb=# 
testdb=# INSERT INTO t_bloom 
testdb-# SELECT (random() * 1000000)::int, (random() * 1000000)::int,
testdb-# (random() * 1000000)::int,(random() * 1000000)::int,(random() * 1000000)::int,(random() * 1000000)::int, 
testdb-# (random() * 1000000)::int,(random() * 1000000)::int,md5(g::text), floor(random()* (20000-9999 + 1) + 9999) 
testdb-# from generate_series(1,16*1024*1024) g;
INSERT 0 16777216
testdb=# 
testdb=# analyze t_bloom;
ANALYZE
testdb=# 
testdb=# select pg_size_pretty(pg_table_size( t_bloom 
 pg_size_pretty 
----------------
 1619 MB
(1 row)

创建 Btree 索引

testdb=# 
testdb=# create index idx_t_bloom_btree on t_bloom using btree(id,dept,id2,id3,id4,id5,id6,id7,zipcode);
CREATE INDEX
testdb=# \di+ idx_t_bloom_btree
 List of relations
 Schema | Name | Type | Owner | Table | Size | Description 
--------+-------------------+-------+-------+---------+--------+-------------
 public | idx_t_bloom_btree | index | pg12 | t_bloom | 940 MB | 
(1 row)

执行查询

testdb=# EXPLAIN ANALYZE select * from t_bloom where id4 = 305294 and zipcode = 13266;
 QUERY PLAN 
---------------------------------------------------------------------------------------------------------
 Index Scan using idx_t_bloom_btree on t_bloom (cost=0.56..648832.73 rows=1 width=69) (actual time=2648.215..2648.215 rows=0
 loops=1)
 Index Cond: ((id4 = 305294) AND (zipcode = 13266))
 Planning Time: 3.244 ms
 Execution Time: 2659.804 ms
(4 rows)
testdb=# EXPLAIN ANALYZE select * from t_bloom where id5 = 241326 and id6 = 354198;
 QUERY PLAN 
---------------------------------------------------------------------------------------------------------
 Index Scan using idx_t_bloom_btree on t_bloom (cost=0.56..648832.73 rows=1 width=69) (actual time=2365.533..2365.533 rows=0
 loops=1)
 Index Cond: ((id5 = 241326) AND (id6 = 354198))
 Planning Time: 1.918 ms
 Execution Time: 2365.629 ms
(4 rows)

创建 Bloom 索引

testdb=# create extension bloom;
CREATE EXTENSION
testdb=# CREATE INDEX idx_t_bloom_bloom ON t_bloom USING bloom(id, dept, id2, id3, id4, id5, id6, id7, zipcode) 
testdb-# WITH (length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4, col8=4, col9=4);
CREATE INDEX
testdb=# \di+ idx_t_bloom_bloom
 List of relations
 Schema | Name | Type | Owner | Table | Size | Description 
--------+-------------------+-------+-------+---------+--------+-------------
 public | idx_t_bloom_bloom | index | pg12 | t_bloom | 225 MB | 
(1 row)

执行查询

testdb=# EXPLAIN ANALYZE select * from t_bloom where id4 = 305294 and zipcode = 13266;
 QUERY PLAN 
-------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t_bloom (cost=283084.16..283088.18 rows=1 width=69) (actual time=998.727..998.727 rows=0 loops=1)
 Recheck Cond: ((id4 = 305294) AND (zipcode = 13266))
 Rows Removed by Index Recheck: 12597
 Heap Blocks: exact=12235
 -  Bitmap Index Scan on idx_t_bloom_bloom (cost=0.00..283084.16 rows=1 width=0) (actual time=234.893..234.893 rows=12597
 loops=1)
 Index Cond: ((id4 = 305294) AND (zipcode = 13266))
 Planning Time: 31.482 ms
 Execution Time: 998.975 ms
(8 rows)
testdb=# EXPLAIN ANALYZE select * from t_bloom where id5 = 241326 and id6 = 354198;
 QUERY PLAN 
-------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t_bloom (cost=283084.16..283088.18 rows=1 width=69) (actual time=1019.621..1019.621 rows=0 loops=1)
 Recheck Cond: ((id5 = 241326) AND (id6 = 354198))
 Rows Removed by Index Recheck: 13033
 Heap Blocks: exact=12633
 -  Bitmap Index Scan on idx_t_bloom_bloom (cost=0.00..283084.16 rows=1 width=0) (actual time=204.873..204.873 rows=13033
 loops=1)
 Index Cond: ((id5 = 241326) AND (id6 = 354198))
 Planning Time: 0.441 ms
 Execution Time: 1019.811 ms
(8 rows)

从执行结果来看, 在查询条件中没有非前导列 (上例中为 id1) 的情况下多列任意组合查询,bloom index 会优于 btree index.

感谢各位的阅读，以上就是“怎么使用 PostgreSQL 中的 Bloom 索引”的内容了，经过本文的学习后，相信大家对怎么使用 PostgreSQL 中的 Bloom 索引这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是丸趣 TV，丸趣 TV 小编将为大家推送更多相关知识点的文章，欢迎关注！

正文完