By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.
The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.
A new parameter sync_replication_slots enables or disables this new
process.
On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.
A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.
Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
Presently, we rely on each individual signal handler to save the
initial value of errno and then restore it before returning if
needed. This is easily forgotten and, if missed, often goes
undetected for a long time.
In commit 3b00fdba9f, we introduced a wrapper signal handler
function that checks whether MyProcPid matches getpid(). This
commit moves the aforementioned errno restoration code from the
individual signal handlers to the new wrapper handler so that we no
longer need to worry about missing it.
Reviewed-by: Andres Freund, Noah Misch
Discussion: https://postgr.es/m/20231121212008.GA3742740%40nathanxps13
Commit 16671ba6e7 moved the code that sends "sorry, too many clients
already" and other such messages, but it had the effect that we would
send that error even if the the startup packet processing failed, e.g.
because the client sent an invalid startup packet. That was not
intentional.
Spotted while reading the code again.
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.
In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.
A new parameter summarize_wal enables or disables this new background
process. The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.
Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
After commit fd5e8b440d, InitProcess() is called later in the
EXEC_BACKEND startup sequence, so it's enough to set the
am_autovacuum_[launcher|worker] variables at the same place as in the
!EXEC_BACKEND case.
The order of process initialization steps is now more consistent
between !EXEC_BACKEND and EXEC_BACKEND modes. InitProcess() is called
at the same place in either mode. We can now also move the
AttachSharedMemoryStructs() call into InitProcess() itself. This
reduces the number of "#ifdef EXEC_BACKEND" blocks.
Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
The BackgroundWorker struct is now passed the same way as the Port
struct. Seems more consistent.
This makes it possible to move InitProcess later in SubPostmasterMain
(in next commit), as we no longer need to access shared memory to read
background worker entry.
Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
For clarity, have separate functions for *creating* the shared memory
and semaphores at postmaster or single-user backend startup, and
for *attaching* to existing shared memory structures in EXEC_BACKEND
case. CreateSharedMemoryAndSemaphores() is now called only at
postmaster startup, and a new AttachSharedMemoryStructs() function is
called at backend startup in EXEC_BACKEND mode.
Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
The padding bytes written to the backend params file were
uninitialized. That's harmless because we don't access the padding
bytes when we read the file back in, but Valgrind doesn't know
that. In any case, clear the padding bytes to make Valgrind happy.
Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/014768ed-8b39-c44f-b07c-098c87b1644c@gmail.com
A WaitEventSet holds file descriptors or event handles (on Windows).
If FreeWaitEventSet is not called, those fds or handles are leaked.
Use ResourceOwners to track WaitEventSets, to clean those up
automatically on error.
This was a live bug in async Append nodes, if a FDW's
ForeignAsyncRequest function failed. (In back branches, I will apply a
more localized fix for that based on PG_TRY-PG_FINALLY.)
The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.
In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.
Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane. Fixes bug #17828.
Reviewed-by: Alexander Lakhin, Thomas Munro
Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
This adds a new option called BGWORKER_BYPASS_ROLELOGINCHECK to the
flags available to BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().
This gives the possibility to bgworkers to bypass the role login check,
making possible the use of a role that has no login rights while not
being a superuser. PostgresInit() gains a new flag called
INIT_PG_OVERRIDE_ROLE_LOGIN, taking advantage of the refactoring done in
4800a5dfb4c4.
Regression tests are added to worker_spi to check the behavior of this
new option with bgworkers.
Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/bcc36259-7850-4882-97ef-d6b905d2fc51@gmail.com
The check is not about processing the startup packet, so the calling
function seems like a more natural place. I'm also working on a patch
that moves 'canAcceptConnections' out of the Port struct, and this
makes that refactoring more convenient.
Reviewed-by: Tristan Partin
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
InitPostgres() has been using a set of boolean arguments to control its
behavior, and a patch under discussion was aiming at expanding it with a
third one. In preparation for expanding this area, this commit switches
all the current boolean arguments of this routine to a single bits32
argument instead. Two values are currently supported for the flags:
- INIT_PG_LOAD_SESSION_LIBS to load [session|local]_preload_libraries at
startup.
- INIT_PG_OVERRIDE_ALLOW_CONNS to allow connection to a database even if
it has !datallowconn. This is used by bgworkers.
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZSTn66_BXRZCeaqS@paquier.xyz
The comment claimed that it is "called from postmaster", but it is
actually called in the child process, pretty early in the process
initialization. I guess you could interpret "called from postmaster"
to mean that, but it seems wrong to me. Rename the function to be
consistent with other functions with similar role.
Reviewed-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi
Keep track of the used size of the array. That avoids looping through
the whole array in a few places. It doesn't matter from a performance
point of view since the array is small anyway, but this feels less
surprising and is a little less code. Now that we have an explicit
NumListenSockets variable that is statically initialized to 0, we
don't need the loop to initialize the array.
Allocate the array in PostmasterContext. The array isn't needed in
child processes, so this allows reusing that memory. We could easily
make the array resizable now, but we haven't heard any complaints
about the current 64 sockets limit.
Discussion: https://www.postgresql.org/message-id/7bb7ad65-a018-2419-742f-fa5fd877d338@iki.fi
Presently, frontend code that needs to use these macros must either
include storage/fd.h, which declares several frontend-unsafe
functions, or duplicate the macros. This commit moves these macros
to common/file_utils.h, which is safe for both frontend and backend
code. Consequently, we can also remove the duplicated macros in
pg_checksums and stop including storage/fd.h in pg_rewind.
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZOP5qoUualu5xl2Z%40paquier.xyz
After commit b0bea38705, syslogger prints 63 warnings about failing to
close a listen socket at postmaster startup. That's because the
syslogger process forks before the ListenSockets array is initialized,
so ClosePostmasterPorts() calls "close(0)" 64 times. The first call
succeeds, because fd 0 is stdin.
This has been like this since commit 9a86f03b4e in version 13, which
moved the SysLogger_Start() call to before initializing ListenSockets.
We just didn't notice until commit b0bea38705 added the LOG message.
Reported by Michael Paquier and Jeff Janes.
Author: Michael Paquier
Discussion: https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9%40paquier.xyz
Discussion: https://www.postgresql.org/message-id/ZO0fgDwVw2SUJiZx@paquier.xyz#482670177eb4eaf4c9f03c1eed963e5f
Backpatch-through: 13
It's good hygiene if e.g. an extension launches a subprogram when
being loaded. We went through some effort to close them in the child
process in EXEC_BACKEND mode, but it's better to not hand them down to
the child process in the first place. We still need to close them
after fork when !EXEC_BACKEND, but it's a little simpler.
In the passing, LOG a message if closing the client connection or
listen socket fails. Shouldn't happen, but if it does, would be nice
to know.
Reviewed-by: Tristan Partin, Andres Freund, Thomas Munro
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
This commit introduces descriptively-named macros for the
identifiers used in wire protocol messages. These new macros are
placed in a new header file so that they can be easily used by
third-party code.
Author: Dave Cramer
Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com
Unlike commands such as CREATE DATABASE, ProcessStartupPacket()
does not perform multibyte-aware truncation of overlength names.
This means that connection attempts might fail even if the user
provides the same overlength names that were used in CREATE
DATABASE, CREATE ROLE, etc. Ideally, we'd do the same multibyte-
aware truncation in both code paths, but it doesn't seem worth the
added complexity of trying to discover the encoding of the names.
Instead, let's simply skip truncating the names in the startup
packet and let the user/database lookup fail later on. With this
change, users must provide the exact names stored in the catalogs,
even if the names were truncated.
This reverts commit d18c1d1f51.
Author: Bertrand Drouvot
Reviewed-by: Kyotaro Horiguchi, Tom Lane
Discussion: https://postgr.es/m/07436793-1426-29b2-f924-db7422a05fb7%40gmail.com
Further to commit 6a9229da, checking for NULL is now redundant. An "out
of memory" error would have been thrown already by palloc() and treated
as FATAL, so we can delete a few more lines.
Back-patch to all releases, like those other commits.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/4040668.1679013388%40sss.pgh.pa.us
A comment was left behind claiming that we needed to use malloc() rather
than palloc() because the corresponding free would run in another
thread, but that's not true anymore. Remove that comment. And, with
the reason being gone, we might as well actually use palloc().
Back-patch to supported releases, like d41a178b.
Discussion: https://postgr.es/m/CA%2BhUKG%2BpdM9v3Jv4tc2BFx2jh_daY3uzUyAGBhtDkotEQDNPYw%40mail.gmail.com
Our waitpid() emulation didn't prevent a PID from being recycled by the
OS before the call to waitpid(). The postmaster could finish up
tracking more than one child process with the same PID, and confuse
them.
Fix, by moving the guts of pgwin32_deadchild_callback() into waitpid(),
so that resources are released synchronously. The process and PID
continue to exist until we close the process handle, which only happens
once we're ready to adjust our book-keeping of running children.
This seems to explain a couple of failures on CI. It had never been
reported before, despite the code being as old as the Windows port.
Perhaps Windows started recycling PIDs more rapidly, or perhaps timing
changes due to commit 7389aad6 made it more likely to break.
Thanks to Alexander Lakhin for analysis and Andres Freund for tracking
down the root cause.
Back-patch to all supported branches.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230208012852.bvkn2am4h4iqjogq%40awork3.anarazel.de
In the 90s we needed to deal with computers that still had the
pre-standard signal masking APIs. That hasn't been relevant for a very
long time on Unix systems, and c94ae9d8 got rid of a remaining
dependency in our Windows porting code. PG_SETMASK didn't expose
save/restore functionality, so we'd already started using sigprocmask()
directly in places, creating the visual distraction of having two ways
to spell it. It's not part of the API that extensions are expected to
be using (but if they are, the change will be trivial). It seems like a
good time to drop the old macro and just call the standard POSIX
function.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BKfQgrhHP2DLTohX1WwubaCBHmTzGnAEDPZ-Gug-Xskg%40mail.gmail.com
Several flags were marked volatile and in some cases used sig_atomic_t
because they were accessed from signal handlers. After commit 7389aad6,
we can just use unqualified bool.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGLMoeZNZY6gYdLUQmuoW_a8bKyLvtuZkd_zHcGVOfDzBA%40mail.gmail.com
We'd like to use TimestampDifferenceMilliseconds with the stop_time
possibly being TIMESTAMP_INFINITY, but up to now it's disclaimed
responsibility for overflow cases. Define it to clamp its output to
the range [0, INT_MAX], handling overflow correctly. (INT_MAX rather
than LONG_MAX seems appropriate, because the function is already
described as being intended for calculating wait times for WaitLatch
et al, and that infrastructure only handles waits up to INT_MAX.
Also, this choice gets rid of cross-platform behavioral differences.)
Having done that, we can replace some ad-hoc code in walreceiver.c
with a simple call to TimestampDifferenceMilliseconds.
While at it, fix some buglets in existing callers of
TimestampDifferenceMilliseconds: basebackup_copy.c had not read the
memo about TimestampDifferenceMilliseconds never returning a negative
value, and postmaster.c had not read the memo about Min() and Max()
being macros with multiple-evaluation hazards. Neither of these
quite seem worth back-patching.
Patch by me; thanks to Nathan Bossart for review.
Discussion: https://postgr.es/m/3126727.1674759248@sss.pgh.pa.us
Modify the new event loop code from commit 7389aad6 so that it checks
for work requested by signal handlers even if it doesn't see a latch
event yet.
This gives priority to shutdown and reload requests where the latch will
be reported later in the event array, or in a later call to
WaitEventSetWait(), due to scheduling details. In particular, this
guarantees that a SIGHUP-then-connect sequence (as seen in
authentication tests) causes the postmaster to process the reload before
accepting the connection. If the WaitEventSetWait() call saw the socket
as ready, and the reload signal was generated before the connection,
then the latest time the signal handler should be able to run is after
poll/epoll_wait/kevent returns but before we check the
pending_pm_reload_request flag.
While here, also shift the handling of child exit below reload requests,
per Tom Lane's observation that that might start new processes, so we
should make sure we pick up new settings first.
This probably explains the one-off failure of build farm animal
malleefowl.
Reported-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com
When the length was too short, the server read outside the allocation.
That yielded the same log noise as sending the correct length with
(backendPID,cancelAuthCode) matching nothing. Change to a message about
the unexpected length. Given the attacker's lack of control over the
memory layout and the general lack of diversity in memory layouts at the
code in question, we doubt a would-be attacker could cause a segfault.
Hence, while the report arrived via security@postgresql.org, this is not
a vulnerability. Back-patch to v11 (all supported versions).
Andrey Borodin, reviewed by Tom Lane. Reported by Andrey Borodin.
This will ease a follow-up move that will generate automatically this
code. The C file is renamed, for consistency with the node-related
files whose code are generated by gen_node_support.pl:
- queryjumble.c -> queryjumblefuncs.c
- utils/queryjumble.h -> nodes/queryjumble.h
Per a suggestion from Peter Eisentraut.
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz
This provides a way to reserve connection slots for non-superusers.
The slots reserved via the new GUC are available only to users who
have the new predefined role pg_use_reserved_connections.
superuser_reserved_connections remains as a final reserve in case
reserved_connections has been exhausted.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
This is in preparation for adding a new reserved_connections GUC,
but aligning the GUC name with the variable name is also a good
idea on general principle.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
Switch to a design similar to regular backends, instead of the previous
arrangement where signal handlers did non-trivial state management and
called fork(). The main changes are:
* The postmaster now has its own local latch to wait on. (For now, we
don't want other backends setting its latch directly, but that could
probably be made to work with more research on robustness.)
* The existing signal handlers are cut in two: a handle_pm_XXX() part
that just sets pending_pm_XXX flags and the latch, and a
process_pm_XXX() part that runs later when the latch is seen.
* Signal handlers are now installed with the regular pqsignal()
function rather than the special pqsignal_pm() function; historical
portability concerns about the effect of SA_RESTART on select() are no
longer relevant, and we don't need to block signals anymore.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
Logical WAL senders display now as follows, gaining a database name:
postgres: walsender USER DATABASE HOST(PORT) STATE
Physical WAL senders show up the same, as of:
postgres: walsender USER HOST(PORT) STATE
This information was missing, hence it was not possible to know from ps
if a WAL sender was a logical or a physical one, and on which database
it is connected when it is logical.
Author: Tatsuhiro Nakamori
Reviewed-by: Fujii Masao, Bharath Rupireddy
Discussion: https://postgr.es/m/36a3b137e82e0ea9fe7e4234f03b64a1@oss.nttdata.com
The postmaster normally sends SIGQUIT to force-terminate its
child processes after a child crash or immediate-stop request.
If that doesn't result in child exit within a few seconds,
we follow it up with SIGKILL. This patch provides GUC flags
that allow either of these signals to be replaced with SIGABRT.
On typically-configured Unix systems, that will result in a
core dump being produced for each such child. This can be
useful for debugging problems, although it's not something you'd
want to have on in production due to the risk of disk space
bloat from lots of core files.
The old postmaster -T switch, which sent SIGSTOP in place of
SIGQUIT, is changed to be the same as send_abort_for_crash.
As far as I can tell from the code comments, the intent of
that switch was just to block things for long enough to force
core dumps manually, which seems like an unnecessary extra step.
(Maybe at the time, there was no way to get most kernels to
produce core files with per-PID names, requiring manual core
file renaming after each one. But now it's surely the hard way.)
I also took the opportunity to remove the old postmaster -n
(skip shmem reinit) switch, which hasn't actually done anything
in decades, though the documentation still claimed it did.
Discussion: https://postgr.es/m/2251016.1668797294@sss.pgh.pa.us
Commit 9abb2bfc left behind code to block signals inside signal
handlers on Windows, because our signal porting layer didn't have
sigaction(). Provide a minimal implementation that is capable of
blocking signals, to get rid of platform differences. See also related
commit c94ae9d8.
Discussion: https://postgr.es/m/CA%2BhUKGKKKfcgx6jzok9AYenp2TNti_tfs8FMoJpL8%2B0Gsy%3D%3D_A%40mail.gmail.com
This is similar to 7d25958, and this commit takes care of all the
remaining inconsistencies between the initial value used in the C
variable associated to a GUC and its default value stored in the GUC
tables (as of pg_settings.boot_val).
Some of the initial values of the GUCs updated rely on a compile-time
default. These are refactored so as the GUC table and its C declaration
use the same values. This makes everything consistent with other
places, backend_flush_after, bgwriter_flush_after, port,
checkpoint_flush_after doing so already, for example.
Extracted from a larger patch by Peter Smith. The spots updated in the
modules are from me.
Author: Peter Smith, Michael Paquier
Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby
Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com