Compare commits

..

5 Commits

Author SHA1 Message Date
Tom Lane
bad0763a4d Fix erroneous -Werror=missing-braces on old GCC.
In the same spirit as 5e0c761d0 and some earlier commits,
suppress a chorus of buildfarm warnings about braces in
these initializers.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs48GzM-Ff7vr=_CeqaXxFBB9UntqtaW1cjU8hOo62AbOOg@mail.gmail.com
2023-12-24 23:36:33 -05:00
Alexander Korotkov
0a93f803f4 Fix a comment for remove_self_joins_recurse()
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
2023-12-25 01:33:34 +02:00
Alexander Korotkov
b5fb6736ed Don't constrain self-join removal due to PHVs
Self-join removal appears to be safe to apply with placeholder variables
as long as we handle PlaceHolderVar in replace_varno_walker() and replace
relid in phinfo->ph_lateral.

Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
2023-12-25 01:33:26 +02:00
Alexander Korotkov
8a8ed916f7 Handle PlaceHolderVar case in replace_varno_walker
This commit also retires sje_walker.  This increases the generalty of replacing
varno in the parse tree and simplifies the code.

Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
2023-12-25 01:33:08 +02:00
Alexander Korotkov
12915a58ee Enhance checkpointer restartpoint statistics
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding
three new columns: restartpoints_timed, restartpoints_req, and
restartpoints_done. These additions aim to improve the visibility and
monitoring of restartpoint processes on replicas.

Previously, it was challenging to differentiate between successful and failed
restartpoint requests. This limitation arises because restartpoints on replicas
are dependent on checkpoint records from the primary, and cannot occur more
frequently than these checkpoints.

The new columns allow for clear distinction and tracking of restartpoint
requests, their triggers, and successful completions.  This enhancement aids
database administrators and developers in better understanding and diagnosing
issues related to restartpoint behavior, particularly in scenarios where
restartpoint requests may fail.

System catalog is changed.  Catversion is bumped.

Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru
Author: Anton A. Melnikov
Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
2023-12-25 01:12:36 +02:00
14 changed files with 190 additions and 64 deletions

View File

@ -2982,6 +2982,33 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>restartpoints_timed</structfield> <type>bigint</type>
</para>
<para>
Number of scheduled restartpoints due to timeout or after a failed attempt to perform it
</para></entry>
</row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>restartpoints_req</structfield> <type>bigint</type>
</para>
<para>
Number of requested restartpoints
</para></entry>
</row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>restartpoints_done</structfield> <type>bigint</type>
</para>
<para>
Number of restartpoints that have been performed
</para></entry>
</row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>write_time</structfield> <type>double precision</type>

View File

@ -655,14 +655,41 @@
directory.
Restartpoints can't be performed more frequently than checkpoints on the
primary because restartpoints can only be performed at checkpoint records.
A restartpoint is triggered when a checkpoint record is reached if at
least <varname>checkpoint_timeout</varname> seconds have passed since the last
restartpoint, or if WAL size is about to exceed
<varname>max_wal_size</varname>. However, because of limitations on when a
restartpoint can be performed, <varname>max_wal_size</varname> is often exceeded
during recovery, by up to one checkpoint cycle's worth of WAL.
A restartpoint can be demanded by a schedule or by an external request.
The <structfield>restartpoints_timed</structfield> counter in the
<link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>
view counts the first ones while the <structfield>restartpoints_req</structfield>
the second.
A restartpoint is triggered by schedule when a checkpoint record is reached
if at least <xref linkend="guc-checkpoint-timeout"/> seconds have passed since
the last performed restartpoint or when the previous attempt to perform
the restartpoint has failed. In the last case, the next restartpoint
will be scheduled in 15 seconds.
A restartpoint is triggered by request due to similar reasons like checkpoint
but mostly if WAL size is about to exceed <xref linkend="guc-max-wal-size"/>
However, because of limitations on when a restartpoint can be performed,
<varname>max_wal_size</varname> is often exceeded during recovery,
by up to one checkpoint cycle's worth of WAL.
(<varname>max_wal_size</varname> is never a hard limit anyway, so you should
always leave plenty of headroom to avoid running out of disk space.)
The <structfield>restartpoints_done</structfield> counter in the
<link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>
view counts the restartpoints that have really been performed.
</para>
<para>
In some cases, when the WAL size on the primary increases quickly,
for instance during massive INSERT,
the <structfield>restartpoints_req</structfield> counter on the standby
may demonstrate a peak growth.
This occurs because requests to create a new restartpoint due to increased
XLOG consumption cannot be performed because the safe checkpoint record
since the last restartpoint has not yet been replayed on the standby.
This behavior is normal and does not lead to an increase in system resource
consumption.
Only the <structfield>restartpoints_done</structfield>
counter among the restartpoint-related ones indicates that noticeable system
resources have been spent.
</para>
<para>

View File

@ -1141,6 +1141,9 @@ CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
pg_stat_get_checkpointer_write_time() AS write_time,
pg_stat_get_checkpointer_sync_time() AS sync_time,
pg_stat_get_checkpointer_buffers_written() AS buffers_written,

View File

@ -453,7 +453,7 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
Assert(!bms_is_member(relid, phinfo->ph_lateral));
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
!bms_is_member(ojrelid, phinfo->ph_eval_at))
@ -472,6 +472,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
/* ph_lateral might or might not be empty */
phv->phrels = replace_relid(phv->phrels, relid, subst);
phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
@ -1456,7 +1458,16 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
}
return false;
}
if (IsA(node, RestrictInfo))
else if (IsA(node, PlaceHolderVar))
{
PlaceHolderVar *phv = (PlaceHolderVar *) node;
phv->phrels = replace_relid(phv->phrels, ctx->from, ctx->to);
phv->phnullingrels = replace_relid(phv->phnullingrels, ctx->from, ctx->to);
/* fall through to recurse into the placeholder's expression */
}
else if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
int relid = -1;
@ -1641,26 +1652,6 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
ec->ec_relids = replace_relid(ec->ec_relids, from, to);
}
static bool
sje_walker(Node *node, ReplaceVarnoContext *ctx)
{
if (node == NULL)
return false;
if (IsA(node, Var))
{
Var *var = (Var *) node;
if (var->varno == ctx->from)
{
var->varno = ctx->to;
var->varnosyn = ctx->to;
}
return false;
}
return expression_tree_walker(node, sje_walker, (void *) ctx);
}
/*
* Remove a relation after we have proven that it participates only in an
* unneeded unique self join.
@ -1868,7 +1859,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
}
/* Replace varno in all the query structures */
query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
query_tree_walker(root->parse, replace_varno_walker, &ctx,
QTW_EXAMINE_SORTGROUP);
/* Replace links in the planner info */
remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
@ -2125,20 +2117,8 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
joinrelids = bms_add_member(joinrelids, k);
/*
* Be safe to do not remove tables participated in complicated PH
* PHVs should not impose any constraints on removing self joins.
*/
foreach(lc, root->placeholder_list)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
/* there isn't any other place to eval PHV */
if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
bms_is_subset(phinfo->ph_needed, joinrelids) ||
bms_is_member(r, phinfo->ph_lateral))
break;
}
if (lc)
continue;
/*
* At this stage, joininfo lists of inner and outer can contain
@ -2206,7 +2186,7 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
/*
* Gather indexes of base relations from the joinlist and try to eliminate self
* joins. To avoid complexity, limit the max power of this set by a GUC.
* joins.
*/
static Relids
remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)

View File

@ -340,6 +340,8 @@ CheckpointerMain(void)
pg_time_t now;
int elapsed_secs;
int cur_timeout;
bool chkpt_or_rstpt_requested = false;
bool chkpt_or_rstpt_timed = false;
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
@ -358,7 +360,7 @@ CheckpointerMain(void)
if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
{
do_checkpoint = true;
PendingCheckpointerStats.num_requested++;
chkpt_or_rstpt_requested = true;
}
/*
@ -372,7 +374,7 @@ CheckpointerMain(void)
if (elapsed_secs >= CheckPointTimeout)
{
if (!do_checkpoint)
PendingCheckpointerStats.num_timed++;
chkpt_or_rstpt_timed = true;
do_checkpoint = true;
flags |= CHECKPOINT_CAUSE_TIME;
}
@ -408,6 +410,24 @@ CheckpointerMain(void)
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;
if (chkpt_or_rstpt_timed)
{
chkpt_or_rstpt_timed = false;
if (do_restartpoint)
PendingCheckpointerStats.restartpoints_timed++;
else
PendingCheckpointerStats.num_timed++;
}
if (chkpt_or_rstpt_requested)
{
chkpt_or_rstpt_requested = false;
if (do_restartpoint)
PendingCheckpointerStats.restartpoints_requested++;
else
PendingCheckpointerStats.num_requested++;
}
/*
* We will warn if (a) too soon since last checkpoint (whatever
* caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@ -471,6 +491,9 @@ CheckpointerMain(void)
* checkpoints happen at a predictable spacing.
*/
last_checkpoint_time = now;
if (do_restartpoint)
PendingCheckpointerStats.restartpoints_performed++;
}
else
{

View File

@ -49,6 +49,9 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(num_timed);
CHECKPOINTER_ACC(num_requested);
CHECKPOINTER_ACC(restartpoints_timed);
CHECKPOINTER_ACC(restartpoints_requested);
CHECKPOINTER_ACC(restartpoints_performed);
CHECKPOINTER_ACC(write_time);
CHECKPOINTER_ACC(sync_time);
CHECKPOINTER_ACC(buffers_written);
@ -116,6 +119,9 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(num_timed);
CHECKPOINTER_COMP(num_requested);
CHECKPOINTER_COMP(restartpoints_timed);
CHECKPOINTER_COMP(restartpoints_requested);
CHECKPOINTER_COMP(restartpoints_performed);
CHECKPOINTER_COMP(write_time);
CHECKPOINTER_COMP(sync_time);
CHECKPOINTER_COMP(buffers_written);

View File

@ -1193,6 +1193,24 @@ pg_stat_get_checkpointer_num_requested(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_requested);
}
Datum
pg_stat_get_checkpointer_restartpoints_timed(PG_FUNCTION_ARGS)
{
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_timed);
}
Datum
pg_stat_get_checkpointer_restartpoints_requested(PG_FUNCTION_ARGS)
{
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_requested);
}
Datum
pg_stat_get_checkpointer_restartpoints_performed(PG_FUNCTION_ARGS)
{
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_performed);
}
Datum
pg_stat_get_checkpointer_buffers_written(PG_FUNCTION_ARGS)
{

View File

@ -265,7 +265,7 @@ BlockRefTableSetLimitBlock(BlockRefTable *brtab,
BlockNumber limit_block)
{
BlockRefTableEntry *brtentry;
BlockRefTableKey key = {0}; /* make sure any padding is zero */
BlockRefTableKey key = {{0}}; /* make sure any padding is zero */
bool found;
memcpy(&key.rlocator, rlocator, sizeof(RelFileLocator));
@ -300,7 +300,7 @@ BlockRefTableMarkBlockModified(BlockRefTable *brtab,
BlockNumber blknum)
{
BlockRefTableEntry *brtentry;
BlockRefTableKey key = {0}; /* make sure any padding is zero */
BlockRefTableKey key = {{0}}; /* make sure any padding is zero */
bool found;
#ifndef FRONTEND
MemoryContext oldcontext = MemoryContextSwitchTo(brtab->mcxt);
@ -340,7 +340,7 @@ BlockRefTableEntry *
BlockRefTableGetEntry(BlockRefTable *brtab, const RelFileLocator *rlocator,
ForkNumber forknum, BlockNumber *limit_block)
{
BlockRefTableKey key = {0}; /* make sure any padding is zero */
BlockRefTableKey key = {{0}}; /* make sure any padding is zero */
BlockRefTableEntry *entry;
Assert(limit_block != NULL);
@ -517,7 +517,7 @@ WriteBlockRefTable(BlockRefTable *brtab,
for (i = 0; i < brtab->hash->members; ++i)
{
BlockRefTableSerializedEntry *sentry = &sdata[i];
BlockRefTableKey key = {0}; /* make sure any padding is zero */
BlockRefTableKey key = {{0}}; /* make sure any padding is zero */
unsigned j;
/* Write the serialized entry itself. */

View File

@ -57,6 +57,6 @@
*/
/* yyyymmddN */
#define CATALOG_VERSION_NO 202312211
#define CATALOG_VERSION_NO 202312251
#endif

View File

@ -5721,6 +5721,21 @@
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
{ oid => '8743',
descr => 'statistics: number of timed restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_timed', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_timed' },
{ oid => '8744',
descr => 'statistics: number of backend requested restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_requested' },
{ oid => '8745',
descr => 'statistics: number of backend performed restartpoints',
proname => 'pg_stat_get_checkpointer_restartpoints_performed', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_performed' },
{ oid => '2771',
descr => 'statistics: number of buffers written by the checkpointer',
proname => 'pg_stat_get_checkpointer_buffers_written', provolatile => 's',

View File

@ -262,6 +262,9 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter num_timed;
PgStat_Counter num_requested;
PgStat_Counter restartpoints_timed;
PgStat_Counter restartpoints_requested;
PgStat_Counter restartpoints_performed;
PgStat_Counter write_time; /* times in milliseconds */
PgStat_Counter sync_time;
PgStat_Counter buffers_written;

View File

@ -6821,20 +6821,37 @@ on true;
Filter: (id IS NOT NULL)
(8 rows)
-- Check that SJE does not remove self joins if a PHV references the removed
-- rel laterally.
explain (costs off)
-- Check that PHVs do not impose any constraints on removing self joins
explain (verbose, costs off)
select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
QUERY PLAN
---------------------------------------------------
QUERY PLAN
----------------------------------------------------------
Nested Loop Left Join
-> Nested Loop
-> Seq Scan on emp1 t1
-> Index Scan using emp1_pkey on emp1 t2
Index Cond: (id = t1.id)
-> Function Scan on generate_series t3
(6 rows)
Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
-> Seq Scan on public.emp1 t2
Output: t2.id, t2.code
Filter: (t2.id IS NOT NULL)
-> Function Scan on pg_catalog.generate_series t3
Output: t3.t3, t2.id
Function Call: generate_series(1, 1)
(8 rows)
explain (verbose, costs off)
select * from generate_series(1,10) t1(id) left join
lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
on true;
QUERY PLAN
------------------------------------------------------
Nested Loop Left Join
Output: t1.id, (t1.id), t3.id
-> Function Scan on pg_catalog.generate_series t1
Output: t1.id
Function Call: generate_series(1, 10)
-> Seq Scan on public.emp1 t3
Output: t3.id, t1.id
Filter: (t3.id IS NOT NULL)
(8 rows)
-- We can remove the join even if we find the join can't duplicate rows and
-- the base quals of each side are different. In the following case we end up

View File

@ -1822,6 +1822,9 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_buf_written_clean() AS buffers_cle
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_checkpointer| SELECT pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
pg_stat_get_checkpointer_write_time() AS write_time,
pg_stat_get_checkpointer_sync_time() AS sync_time,
pg_stat_get_checkpointer_buffers_written() AS buffers_written,

View File

@ -2600,12 +2600,16 @@ select * from emp1 t1 left join
on true)
on true;
-- Check that SJE does not remove self joins if a PHV references the removed
-- rel laterally.
explain (costs off)
-- Check that PHVs do not impose any constraints on removing self joins
explain (verbose, costs off)
select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
explain (verbose, costs off)
select * from generate_series(1,10) t1(id) left join
lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
on true;
-- We can remove the join even if we find the join can't duplicate rows and
-- the base quals of each side are different. In the following case we end up
-- moving quals over to s1 to make it so it can't match any rows.