当前位置:Gxlcms > 数据库问题 > 转载:postgresql分区与优化

转载:postgresql分区与优化

时间:2021-07-01 10:21:17 帮助过:38人阅读


--对于分区表constraint_exclusion 这个参数需要配置为partition或on postgres=# show constraint_exclusion ; constraint_exclusion ---------------------- partition --创建父子表, 用于存储分区数据 create table t(id int primary key); create table t1(like t including all) inherits(t); create table t2(like t including all) inherits(t); create table t3(like t including all) inherits(t); create table t4(like t including all) inherits(t); --PostgreSQL的子表和子表之间的约束是没有任何关系的, 所以也可以有重叠, 即非全局约束. alter table t1 add constraint ck_t1_1 check(id<0); alter table t2 add constraint ck_t2_1 check(id>=0 and id<100); alter table t3 add constraint ck_t3_1 check(id>=100 and id<200); alter table t4 add constraint ck_t4_1 check(id>=200); --分区字段传入常量, 执行时扫描的是父表和约束对应的子表 : postgres=# explain select * from t where id=10; QUERY PLAN ----------------------------------------------------------------------------- Append (cost=0.00..8.17 rows=2 width=4) -> Seq Scan on t (cost=0.00..0.00 rows=1 width=4) Filter: (id = 10) -> Index Only Scan using t2_pkey on t2 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 10) (5 rows) --分区字段传入常量, 执行时扫描的是父表和约束对应的子表; postgres=# prepare p_test as select * from t where id=$1; PREPARE postgres=# explain execute p_test(1); QUERY PLAN ----------------------------------------------------------------------------- Append (cost=0.00..8.17 rows=2 width=4) -> Seq Scan on t (cost=0.00..0.00 rows=1 width=4) Filter: (id = 1) -> Index Only Scan using t2_pkey on t2 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 1) (5 rows) --子句查询, 执行时扫描的是父表和所有子表, 注意这里使用的子查询是子表的查询, 理论上应该是扫描父表和该子表 postgres=# explain select * from t where id=(select id from t1 limit 1); QUERY PLAN ----------------------------------------------------------------------------- Append (cost=0.01..32.70 rows=5 width=4) InitPlan 1 (returns $0) -> Limit (cost=0.00..0.01 rows=1 width=4) -> Seq Scan on t1 t1_1 (cost=0.00..34.00 rows=2400 width=4) -> Seq Scan on t (cost=0.00..0.00 rows=1 width=4) Filter: (id = $0) -> Index Only Scan using t1_pkey on t1 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = $0) -> Index Only Scan using t2_pkey on t2 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = $0) -> Index Only Scan using t3_pkey on t3 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = $0) -> Index Only Scan using t4_pkey on t4 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = $0) (14 rows) --综上可知在对分区表进行查询时最好使用字面常量,而不要使用子查询之类复杂的sql --如果子表上约束删除,则pg不得不把删除约束的子表也加入到查询中(即使子表可以忽略) alter table t4 drop constraint ck_t4_1; postgres=# explain select * from t where id=10; QUERY PLAN ----------------------------------------------------------------------------- Append (cost=0.00..16.34 rows=3 width=4) -> Seq Scan on t (cost=0.00..0.00 rows=1 width=4) Filter: (id = 10) -> Index Only Scan using t2_pkey on t2 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 10) -> Index Only Scan using t4_pkey on t4 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 10) (7 rows) --如果constraint_exclusion设置为off,pg不得不进行全表扫描 postgres=# set constraint_exclusion=off; SET postgres=# explain select * from t where id=10; QUERY PLAN ----------------------------------------------------------------------------- Append (cost=0.00..32.69 rows=5 width=4) -> Seq Scan on t (cost=0.00..0.00 rows=1 width=4) Filter: (id = 10) -> Index Only Scan using t1_pkey on t1 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 10) -> Index Only Scan using t2_pkey on t2 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 10) -> Index Only Scan using t3_pkey on t3 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 10) -> Index Only Scan using t4_pkey on t4 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 10) (11 rows) --分区表上一般针对分区建立相对应的分区索引 --建在父表的索引为全局索引,但如果你表没有数据要查询子表时,则分区表要进行全表扫描 --父表建立的全局索引 postgres=# \d+ p Table "public.p" Column | Type | Modifiers | Storage | Stats target | Description -----------+--------------------------------+-----------+---------+--------------+------------- city_id | integer | not null | plain | | logtime | timestamp(0) without time zone | not null | plain | | peaktemp | integer | | plain | | unitsales | integer | | plain | | Indexes: "idx_city_id" btree (city_id) "idx_p_logtime" btree (logtime) Triggers: delete_p_trigger BEFORE DELETE ON p FOR EACH ROW EXECUTE PROCEDURE p_delete_trigger() insert_p_trigger BEFORE INSERT ON p FOR EACH ROW EXECUTE PROCEDURE p_insert_trigger() Child tables: p_201201, p_201202, p_201203, p_201204, p_201205, p_201206, p_201207, p_201208, p_201209, p_201210, p_201211, p_201212, p_default Has OIDs: no --分区没有索引,不能使用父表索引 postgres=# explain select * from p_201202 where city_id=2 and logtime=timestamp ‘2012-02-02 12:59:59‘; QUERY PLAN ---------------------------------------------------------------------------------------------- Seq Scan on p_201202 (cost=0.00..214.01 rows=2 width=20) Filter: ((city_id = 2) AND (logtime = ‘2012-02-02 12:59:59‘::timestamp without time zone)) (2 rows) --建立分区索引,可以使用分区索引 postgres=# CREATE INDEX idx_p_201202_city_id ON p_201202 (city_id); CREATE INDEX postgres=# explain select * from p_201202 where city_id=2 and logtime=timestamp ‘2012-02-02 12:59:59‘; QUERY PLAN -------------------------------------------------------------------------------------- Index Scan using idx_p_201202_city_id on p_201202 (cost=0.29..8.33 rows=2 width=20) Index Cond: (city_id = 2) Filter: (logtime = ‘2012-02-02 12:59:59‘::timestamp without time zone) --也可以指定只查询父表的数据 postgres=# select * from only p; city_id | logtime | peaktemp | unitsales ---------+---------+----------+----------- (0 rows) --如果一个分区表,父子表之间不再有继承关系,则查询父表时不再过滤到子表 postgres=# alter table t3 no inherit t; ALTER TABLE postgres=# explain select count(*) from t; QUERY PLAN ------------------------------------------------------------------ Aggregate (cost=73.50..73.51 rows=1 width=0) -> Append (cost=0.00..62.80 rows=4281 width=0) -> Seq Scan on t (cost=0.00..0.00 rows=1 width=0) -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=0) -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=0) (5 rows) --再次添加继承,查询父表可以过滤到子表 postgres=# alter table t3 inherit t; ALTER TABLE postgres=# explain select count(*) from t; QUERY PLAN ------------------------------------------------------------------ Aggregate (cost=110.25..110.26 rows=1 width=0) -> Append (cost=0.00..94.20 rows=6421 width=0) -> Seq Scan on t (cost=0.00..0.00 rows=1 width=0) -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=0) -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=0) -> Seq Scan on t3 (cost=0.00..31.40 rows=2140 width=0) (6 rows) --以下为p表测试数据代码 CREATE TABLE p ( city_id int not null, logtime timestamp(0) not null, peaktemp int, unitsales int ); CREATE INDEX idx_p_logtime ON p (logtime); CREATE TABLE p_201201 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201202 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201203 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201204 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201205 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201206 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201207 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201208 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201209 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201210 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201211 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_201212 (LIKE p INCLUDING all) INHERITS (p); CREATE TABLE p_default (LIKE p INCLUDING all) INHERITS (p); CREATE OR REPLACE FUNCTION p_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.logtime >= DATE ‘2012-01-01‘ AND NEW.logtime < DATE ‘2012-02-01‘ ) THEN INSERT INTO p_201201 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-02-01‘ AND NEW.logtime < DATE ‘2012-03-01‘ ) THEN INSERT INTO p_201202 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-03-01‘ AND NEW.logtime < DATE ‘2012-04-01‘ ) THEN INSERT INTO p_201203 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-04-01‘ AND NEW.logtime < DATE ‘2012-05-01‘ ) THEN INSERT INTO p_201204 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-05-01‘ AND NEW.logtime < DATE ‘2012-06-01‘ ) THEN INSERT INTO p_201205 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-06-01‘ AND NEW.logtime < DATE ‘2012-07-01‘ ) THEN INSERT INTO p_201206 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-07-01‘ AND NEW.logtime < DATE ‘2012-08-01‘ ) THEN INSERT INTO p_201207 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-08-01‘ AND NEW.logtime < DATE ‘2012-09-01‘ ) THEN INSERT INTO p_201208 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-09-01‘ AND NEW.logtime < DATE ‘2012-10-01‘ ) THEN INSERT INTO p_201209 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-10-01‘ AND NEW.logtime < DATE ‘2012-11-01‘ ) THEN INSERT INTO p_201210 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-11-01‘ AND NEW.logtime < DATE ‘2012-12-01‘ ) THEN INSERT INTO p_201211 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2012-12-01‘ AND NEW.logtime < DATE ‘2013-01-01‘ ) THEN INSERT INTO p_201212 VALUES (NEW.*); ELSIF ( NEW.logtime >= DATE ‘2013-01-01‘ OR NEW.logtime < DATE ‘2012-01-01‘ ) THEN INSERT INTO p_default VALUES (NEW.*); ELSE RAISE EXCEPTION ‘Date out of range. Fix the p_insert_trigger() function!‘; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION p_delete_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( OLD.logtime >= DATE ‘2012-01-01‘ AND OLD.logtime < DATE ‘2012-02-01‘ ) THEN DELETE FROM p_201201 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-02-01‘ AND OLD.logtime < DATE ‘2012-03-01‘ ) THEN DELETE FROM p_201202 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-03-01‘ AND OLD.logtime < DATE ‘2012-04-01‘ ) THEN DELETE FROM p_201203 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-04-01‘ AND OLD.logtime < DATE ‘2012-05-01‘ ) THEN DELETE FROM p_201204 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-05-01‘ AND OLD.logtime < DATE ‘2012-06-01‘ ) THEN DELETE FROM p_201205 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-06-01‘ AND OLD.logtime < DATE ‘2012-07-01‘ ) THEN DELETE FROM p_201206 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-07-01‘ AND OLD.logtime < DATE ‘2012-08-01‘ ) THEN DELETE FROM p_201207 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-08-01‘ AND OLD.logtime < DATE ‘2012-09-01‘ ) THEN DELETE FROM p_201208 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-09-01‘ AND OLD.logtime < DATE ‘2012-10-01‘ ) THEN DELETE FROM p_201209 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-10-01‘ AND OLD.logtime < DATE ‘2012-11-01‘ ) THEN DELETE FROM p_201210 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-11-01‘ AND OLD.logtime < DATE ‘2012-12-01‘ ) THEN DELETE FROM p_201211 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2012-12-01‘ AND OLD.logtime < DATE ‘2013-01-01‘ ) THEN DELETE FROM p_201212 WHERE logtime=OLD.logtime; ELSIF ( OLD.logtime >= DATE ‘2013-01-01‘ OR OLD.logtime < DATE ‘2012-01-01‘ ) THEN DELETE FROM p_default WHERE logtime=OLD.logtime; ELSE RAISE EXCEPTION ‘Date out of range. Fix the p_insert_trigger() function!‘; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER insert_p_trigger BEFORE INSERT ON p FOR EACH ROW EXECUTE PROCEDURE p_insert_trigger(); CREATE TRIGGER delete_p_trigger BEFORE DELETE ON p FOR EACH ROW EXECUTE PROCEDURE p_delete_trigger(); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (1, timestamp ‘2012-01-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (2, timestamp ‘2012-02-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (3, timestamp ‘2012-03-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (4, timestamp ‘2012-04-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (5, timestamp ‘2012-05-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (6, timestamp ‘2012-06-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (7, timestamp ‘2012-07-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (8, timestamp ‘2012-08-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (9, timestamp ‘2012-09-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (10, timestamp ‘2012-10-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (11, timestamp ‘2012-11-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (12, timestamp ‘2012-12-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (13, timestamp ‘2013-01-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (14, timestamp ‘2011-12-02 12:59:59‘, 20, 10); INSERT INTO p (city_id, logtime, peaktemp, unitsales) select m, timestamp ‘2012-02-02 12:59:59‘, 20, 10 from generate_series(1,10000) m; explain select * from p_201202 where city_id=2 and logtime=timestamp ‘2012-02-02 12:59:59‘;

转载:https://yq.aliyun.com/articles/2637?spm=5176.100240.searchblog.12.59Jibq#

 

转载:postgresql分区与优化

标签:

人气教程排行