pl/pgsql's notion of an "expression" is very broad, encompassing
any SQL SELECT query that returns a single column and no more than
one row. So there are cases, for example evaluation of an aggregate
function, where the query involves significant work and it'd be useful
to run it with parallel workers. This used to be possible, but
commits 3eea7a0c9 et al unintentionally disabled it.
The simplest fix is to make exec_eval_expr() pass maxtuples = 0
rather than 2 to exec_run_select(). This avoids the new rule that
we will never use parallelism when a nonzero "count" limit is passed
to ExecutorRun(). (Note that the pre-3eea7a0c9 behavior was indeed
unsafe, so reverting that rule is not in the cards.) The reason
for passing 2 before was that exec_eval_expr() will throw an error
if it gets more than one returned row, so we figured that as soon
as we have two rows we know that will happen and we might as well
stop running the query. That choice was cost-free when it was made;
but disabling parallelism is far from cost-free, so now passing 2
amounts to optimizing a failure case at the expense of useful cases.
An expression query that can return more than one row is certainly
broken. People might now need to wait a bit longer to discover such
breakage; but hopefully few will use enormously expensive cases as
their first test of new pl/pgsql logic.
Author: Dipesh Dhameliya <dipeshdhameliya125@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABgZEgdfbnq9t6xXJnmXbChNTcWFjeM_6nuig41tm327gYi2ig@mail.gmail.com
Backpatch-through: 13
libxml2 has deprecated the members of xmlBuffer, and it is recommended
to access them with dedicated routines. We have only one case in the
tree where this shows an impact: xml2/xpath.c where "content" was
getting directly accessed. The rest of the code looked fine, checking
the PostgreSQL code with libxml2 close to the top of its "2.14" branch.
xmlBufferContent() exists since year 2000 based on a check of the
upstream libxml2 tree, so let's switch to it.
Like 400928b83bd2, backpatch all the way down as this can have an impact
on all the branches already released once newer versions of libxml2 get
more popular.
Reported-by: Walid Ibrahim <walidib@amazon.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aGdSdcR4QTjEHX6s@paquier.xyz
Backpatch-through: 13
With tables defined like this,
CREATE TABLE ip (id int PRIMARY KEY);
CREATE TABLE ic (id int) INHERITS (ip);
ALTER TABLE ic ALTER id DROP NOT NULL;
pg_upgrade fails during the schema restore phase due to this error:
ERROR: column "id" in child table must be marked NOT NULL
This can only be fixed by marking the child column as NOT NULL before
the upgrade, which could take an arbitrary amount of time (because ic's
data must be scanned). Have pg_upgrade's check mode warn if that
condition is found, so that users know what to adjust before running the
upgrade for real.
Author: Ali Akbar <the.apaan@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CACQjQLoMsE+1pyLe98pi0KvPG2jQQ94LWJ+PTiLgVRK4B=i_jg@mail.gmail.com
Attempting to use commit timestamps during bootstrapping leads to an
assertion failure, that can be reached for example with an initdb -c
that enables track_commit_timestamp. It makes little sense to register
a commit timestamp for a BootstrapTransactionId, so let's disable the
activation of the module in this case.
This problem has been independently reported once by each author of this
commit. Each author has proposed basically the same patch, relying on
IsBootstrapProcessingMode() to skip the use of commit_ts during
bootstrap. The test addition is a suggestion by me, and is applied down
to v16.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Andy Fan <zhihuifan1213@163.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/OSCPR01MB14966FF9E4C4145F37B937E52F5102@OSCPR01MB14966.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/87plejmnpy.fsf@163.com
Backpatch-through: 13
Commits 8319e5cb5 et al missed the fact that ATPostAlterTypeCleanup
contains three calls to ATPostAlterTypeParse, and the other two
also need protection against passing a relid that we don't yet
have lock on. Add similar logic to those code paths, and add
some test cases demonstrating the need for it.
In v18 and master, the test cases demonstrate that there's a
behavioral discrepancy between stored generated columns and virtual
generated columns: we disallow changing the expression of a stored
column if it's used in any composite-type columns, but not that of
a virtual column. Since the expression isn't actually relevant to
either sort of composite-type usage, this prohibition seems
unnecessary; but changing it is a matter for separate discussion.
For now we are just documenting the existing behavior.
Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: CACJufxGKJtGNRRSXfwMW9SqVOPEMdP17BJ7DsBf=tNsv9pWU9g@mail.gmail.com
Backpatch-through: 13
The documentation previously stated that the wal_status column is NULL
if restart_lsn is NULL in the pg_replication_slots view. This is incorrect,
and wal_status can be "lost" even when restart_lsn is NULL.
This commit removes the incorrect description.
Back-patched to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Discussion: https://postgr.es/m/c9d23cdc-b5dd-455a-8ee9-f1f24d701d89@oss.nttdata.com
Backpatch-through: 13
The existing code assumed that O_RDONLY is defined as 0, but this is
not required by POSIX and is not true on GNU Hurd. We can avoid
the assumption by relying on O_ACCMODE to mask the fcntl() result.
(Hopefully, all supported platforms define that.)
Author: Michael Banck <mbanck@gmx.net>
Co-authored-by: Samuel Thibault
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6862e8d1.050a0220.194b8d.76fa@mx.google.com
Discussion: https://postgr.es/m/68480868.5d0a0220.1e214d.68a6@mx.google.com
Backpatch-through: 13
These functions have been around for some time, but commits
48b5aa3143 and 15afb7d61c were only back-patched to v16. Let's
back-patch them to all supported versions now.
Reported-by: David Rowley <dgrowleyml@gmail.com> (commit 48b5aa3143)
Author: Bruce Momjian <bruce@momjian.us> (commit 48b5aa3143)
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> (commit 15afb7d61c)
Reviewed-by: Michael Paquier <michael@paquier.xyz> (commit 15afb7d61c)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/aGMCxHxLfeMdQk8m%40nathan
Backpatch-through: 13-15
Sometimes a table's constraint may depend on a column of another
table, so that we have to update the constraint when changing the
referenced column's type. We need to have lock on the constraint's
table to do that. ATPostAlterTypeCleanup believed that this case
was only possible for FOREIGN KEY constraints, but it's wrong at
least for CHECK and EXCLUDE constraints; and in general, we'd
probably need exclusive lock to alter any sort of constraint.
So just remove the contype check and acquire lock for any other
table. This prevents a "you don't have lock" assertion failure,
though no ill effect is observed in production builds.
We'll error out later anyway because we don't presently support
physically altering column types within stored composite columns.
But the catalog-munging is basically all there, so we may as well
make that part work.
Bug: #18970
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: jian he <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18970-a7d1cfe1f8d5d8d9@postgresql.org
Backpatch-through: 13
This is purely cosmetic, as dsm_attach() interprets its argument as
a dsm_handle (i.e., an unsigned integer), but we might as well fix
it.
Oversight in commit 4db3744f1f.
Author: Jianghua Yang <yjhjstz@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAAZLFmRxkUD5jRs0W3K%3DUe4_ZS%2BRcAb0PCE1S0vVJBn3sWH2UQ%40mail.gmail.com
Backpatch-through: 13
This fixes two issues with the handling of VacuumParams in vacuum_rel().
This code path has the idea to change the passed-in pointer of
VacuumParams for the "truncate" and "index_cleanup" options for the
relation worked on, impacting the two following scenarios where
incorrect options may be used because a VacuumParams pointer is shared
across multiple relations:
- Multiple relations in a single VACUUM command.
- TOAST relations vacuumed with their main relation.
The problem is avoided by providing to the two callers of vacuum_rel()
copies of VacuumParams, before the pointer is updated for the "truncate"
and "index_cleanup" options.
The refactoring of the VACUUM option and parameters done in 0d831389749a
did not introduce an issue, but it has encouraged the problem we are
dealing with in this commit, with b84dbc8eb80b for "truncate" and
a96c41feec6b for "index_cleanup" that have been added a couple of years
after the initial refactoring. HEAD will be improved with a different
patch that hardens the uses of VacuumParams across the tree. This
cannot be backpatched as it introduces an ABI breakage.
The backend portion of the patch has been authored by Nathan, while I
have implemented the tests. The tests rely on injection points to check
the option values, making them faster, more reliable than the tests
originally proposed by Shihao, and they also provide more coverage.
This part can only be backpatched down to v17.
Reported-by: Shihao Zhong <zhong950419@gmail.com>
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAGRkXqTo+aK=GTy5pSc-9cy8H2F2TJvcrZ-zXEiNJj93np1UUw@mail.gmail.com
Backpatch-through: 13
The regression test added in commit 1230be12f failed with
CLOBBER_CACHE_ALWAYS builds as it depends on cache behavior. This test
failure occurred only on v13 because the original data loss problem
was fixed differently in v13 compared to v14 and later versions,
resulting in different expected-output files.
This commit adds an extra expected-output file to cover both regular
and CLOBBER_CACHE_ALWAYS build cases.
Oversight in 1230be12f.
Per buildfarm member trilobite.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/dbf561f7-465e-4086-adfa-733b9b9a34b3@gmail.com
The link returns 404 and no replacement is available in the project
on Sourceforge where the content once was. Since we already link to
resources for both beginner and experienced docs hackers, remove the
the dead link.
Backpatch to all supported versions as the link was added in 8.1.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Magnus Hagander <magnus@hagander.net>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxH=YzQPDOe+2WuYZ7seD-BOyjCBmP6JiErpoSiVZWDRnw@mail.gmail.com
Backpatch-through: 13
Previously, the UUID functions documentation defined the "UUID" index entry
to link to the UUID data type page, even though that entry already exists there.
Instead, the UUID functions page should define its own index entry linking
to itself.
This commit updates the UUID index entry in the UUID functions documentation
to point to the correct section, improving navigation and avoiding duplication.
Back-patch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/f33e0493-5773-4296-87c5-7ce459054cfe@oss.nttdata.com
Backpatch-through: 13
Specify whether the bucket bounds are inclusive or exclusive,
and improve some other vague language. Explain the behavior that
occurs when the "low" bound is greater than the "high" bound.
Make width_bucket_numeric's comment more like that for
width_bucket_float8, in particular noting that infinite
bounds are rejected (since they became possible in v14).
Reported-by: Ben Peachey Higdon <bpeacheyhigdon@gmail.com>
Author: Robert Treat <rob@xzilla.net>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/2BD74F86-5B89-4AC1-8F13-23CED3546AC1@gmail.com
Backpatch-through: 13
While choosing an autogenerated name for an index, look for
pre-existing relations using a SnapshotDirty snapshot, instead of the
previous behavior that considered only committed-good pg_class rows.
This allows us to detect and avoid conflicts against indexes that are
still being built.
It's still possible to fail due to a race condition, but the window
is now just the amount of time that it takes DefineIndex to validate
all its parameters, call smgrcreate(), and enter the index's pg_class
row. Formerly the race window covered the entire time needed to
create and fill an index, which could be very long if the table is
large. Worse, if the conflicting index creation is part of a larger
transaction, it wouldn't be visible till COMMIT.
So this isn't a complete solution, but it should greatly ameliorate
the problem, and the patch is simple enough to be back-patchable.
It might at some point be useful to do the same for pg_constraint
entries (cf. ChooseConstraintName, ConstraintNameExists, and related
functions). However, in the absence of field complaints, I'll leave
that alone for now. The relation-name test should be good enough for
index-based constraints, while foreign-key constraints seem to be okay
since they require exclusive locks to create.
Bug: #18959
Reported-by: Maximilian Chrzan <maximilian.chrzan@here.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/18959-f63b53b864bb1417@postgresql.org
Backpatch-through: 13
Commit 4909b38af0 introduced logic to distribute invalidation messages
from catalog-modifying transactions to all concurrent in-progress
transactions. However, since each transaction distributes not only its
original invalidation messages but also previously distributed
messages to other transactions, this leads to an exponential increase
in allocation request size for invalidation messages, ultimately
causing memory allocation failure.
This commit fixes this issue by tracking distributed invalidation
messages separately per decoded transaction and not redistributing
these messages to other in-progress transactions. The maximum size of
distributed invalidation messages that one transaction can store is
limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the
distributed invalidation messages exceeds this threshold, we
invalidate all caches in locations where distributed invalidation
messages need to be executed.
Back-patch to all supported versions where we introduced the fix by
commit 4909b38af0.
Note that this commit adds two new fields to ReorderBufferTXN to store
the distributed transactions. This change breaks ABI compatibility in
back branches, affecting third-party extensions that depend on the
size of the ReorderBufferTXN struct, though this scenario seems
unlikely.
Additionally, it adds a new flag to the txn_flags field of
ReorderBufferTXN to indicate distributed invalidation message
overflow. This should not affect existing implementations, as it is
unlikely that third-party extensions use unused bits in the txn_flags
field.
Bug: #18938#18942
Author: vignesh C <vignesh21@gmail.com>
Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: John Hutchins <john.hutchins@wicourts.gov>
Reported-by: Laurence Parry <greenreaper@hotmail.com>
Reported-by: Max Madden <maxmmadden@gmail.com>
Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com
Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org
Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org
Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com
Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com
Backpatch-through: 13
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart. The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.
The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when calculating
the oldest LSN for WAL segments removal at the end of checkpoint. This idea
was proposed by Tomas Vondra in the discussion. Unlike 291221c46575, this
fix doesn't affect ABI and is intended for back branches.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets
LP_DEAD items on in whatever state it was in when _bt_killitems was
called. In particular, make sure that so->dropPin scans don't acquire a
pin whose reference is saved in so->currPos.buf.
Allowing _bt_killitems to change so->currPos.buf like this is wrong.
The immediate consequence of allowing it is that code in _bt_steppage
(that copies so->currPos into so->markPos) will behave as if the scan is
a !so->dropPin scan. so->markPos will therefore retain the buffer pin
indefinitely, even though _bt_killitems only needs to acquire a pin
(along with a lock) for long enough to mark known-dead items LP_DEAD.
This issue came to light following a report of a failure of an assertion
from recent commit e6eed40e. The test case in question involves the use
of mark and restore. An initial call to _bt_killitems takes place that
leaves so->currPos.buf in a state that is inconsistent with the scan
being so->dropPin. A subsequent call to _bt_killitems for the same
position (following so->currPos being saved in so->markPos, and then
restored as so->currPos) resulted in the failure of an assertion that
tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin
(non-assert builds got a "resource was not closed" WARNING instead).
The same problem exists on earlier releases, though the issue is far
more subtle there. Recent commit e6eed40e introduced the so->dropPin
field as a partial replacement for testing so->currPos.buf directly.
Earlier releases won't get an assertion failure (or buffer pin leak),
but they will allow the second _bt_killitems call from the test case to
behave as if a buffer pin was consistently held since the original call
to _bt_readpage. This is wrong; there will have been an initial window
during which no pin was held on the so->currPos page, and yet the second
_bt_killitems call will neglect to check if so->currPos.lsn continues to
match the page's now-current LSN.
As a result of all this, it's just about possible that _bt_killitems
will set the wrong items LP_DEAD (on release branches). This could only
happen with merge joins (the sole user of nbtree mark/restore support),
when a concurrently inserted index tuple used a recently-recycled TID
(and only when the new tuple was inserted onto the same page as a
distinct concurrently-removed tuple with the same TID). This is exactly
the scenario that _bt_killitems' check of the page's now-current LSN
against the LSN stashed in currPos was supposed to prevent.
A follow-up commit will make nbtree completely stop conditioning whether
or not a position's pin needs to be dropped on whether the 'buf' field
is set. All call sites that might need to drop a still-held pin will be
taught to rely on the scan-level so->dropPin field recently introduced
by commit e6eed40e. That will make bugs of the same general nature as
this one impossible (or make them much easier to detect, at least).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com
Backpatch-through: 13
Traditionally, libpq's pqPutMsgEnd has rounded down the amount-to-send
to be a multiple of 8K when it is eagerly writing some data. This
still seems like a good idea when sending through a Unix socket, as
pipes typically have a buffer size of 8K or some fraction/multiple of
that. But there's not much argument for it on a TCP connection, since
(a) standard MTU values are not commensurate with that, and (b) the
kernel typically applies its own packet splitting/merging logic.
Worse, our SSL and GSSAPI code paths both have API stipulations that
if they fail to send all the data that was offered in the previous
write attempt, we mustn't offer less data in the next attempt; else
we may get "SSL error: bad length" or "GSSAPI caller failed to
retransmit all data needing to be retried". The previous write
attempt might've been pqFlush attempting to send everything in the
buffer, so pqPutMsgEnd can't safely write less than the full buffer
contents. (Well, we could add some more state to track exactly how
much the previous write attempt was, but there's little value evident
in such extra complication.) Hence, apply the round-down only on
AF_UNIX sockets, where we never use SSL or GSSAPI.
Interestingly, we had a very closely related bug report before,
which I attempted to fix in commit d053a879b. But the test case
we had then seemingly didn't trigger this pqFlush-then-pqPutMsgEnd
scenario, or at least we failed to recognize this variant of the bug.
Bug: #18907
Reported-by: Dorjpalam Batbaatar <htgn.dbat.95@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18907-d41b9bcf6f29edda@postgresql.org
Backpatch-through: 13
The documentation for the pg_authid system catalog and the
pg_shadow system view indicates that passwords might be stored in
cleartext, but that hasn't been possible for some time.
Oversight in commit eb61136dc7.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aD2yKkZro4nbl5ol%40nathan
Backpatch-through: 13
We store values for these options as array elements with the syntax
"name=value", hence a name containing "=" confuses matters when
it's time to read the array back in. Since validation of the
options is often done (long) after this conversion to array format,
that leads to confusing and off-point error messages. We can
improve matters by rejecting names containing "=" up-front.
(Probably a better design would have involved pairs of array
elements, but it's too late now --- and anyway, there's no
evident use-case for option names like this. We already
reject such names in some other contexts such as GUCs.)
Reported-by: Chapman Flack <jcflack@acm.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chapman Flack <jcflack@acm.org>
Discussion: https://postgr.es/m/6830EB30.8090904@acm.org
Backpatch-through: 13
PLy_elog_impl and its subroutine PLy_traceback intended to avoid
leaking any PyObject reference counts, but their coverage of the
matter was sadly incomplete. In particular, out-of-memory errors
in most of the string-construction subroutines could lead to
reference count leaks, because those calls were outside the
PG_TRY blocks responsible for dropping reference counts.
Fix by (a) adjusting the scopes of the PG_TRY blocks, and
(b) moving the responsibility for releasing the reference counts
of the traceback-stack objects to PLy_elog_impl. This requires
some additional "volatile" markers, but not too many.
In passing, fix an ancient thinko: use of the "e_module_o" PyObject
was guarded by "if (e_type_s)", where surely "if (e_module_o)"
was meant. This would only have visible consequences if the
"__name__" attribute were present but the "__module__" attribute
wasn't, which apparently never happens; but someday it might.
Rearranging the PG_TRY blocks requires indenting a fair amount
of code one more tab stop, which I'll do separately for clarity.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2954090.1748723636@sss.pgh.pa.us
Backpatch-through: 13
A few places that access system catalogs don't set up an active
snapshot before potentially accessing their TOAST tables. To fix,
push an active snapshot just before each section of code that might
require accessing one of these TOAST tables, and pop it shortly
afterwards. While at it, this commit adds some rather strict
assertions in an attempt to prevent such issues in the future.
Commit 16bf24e0e4 recently removed pg_replication_origin's TOAST
table in order to fix the same problem for that catalog. On the
back-branches, those bugs are left in place. We cannot easily
remove a catalog's TOAST table on released major versions, and only
replication origins with extremely long names are affected. Given
the low severity of the issue, fixing older versions doesn't seem
worth the trouble of significantly modifying the patch.
Also, on v13 and v14, the aforementioned strict assertions have
been omitted because commit 2776922201, which added
HaveRegisteredOrActiveSnapshot(), was not back-patched. While we
could probably back-patch it now, I've opted against it because it
seems unlikely that new TOAST snapshot issues will be introduced in
the oldest supported versions.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org
Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
Backpatch-through: 13
postgres_fdw tries to use PG_TRY blocks to ensure that it will
eventually free the PGresult created by the remote modify command.
However, it's fundamentally impossible for this scheme to work
reliably when there's RETURNING data, because the query could fail
in between invocations of postgres_fdw's DirectModify methods.
There is at least one instance of exactly this situation in the
regression tests, and the ensuing session-lifespan leak is visible
under Valgrind.
We can improve matters by using a memory context reset callback
attached to the ExecutorState context. That ensures that the
PGresult will be freed when the ExecutorState context is torn
down, even if control never reaches postgresEndDirectModify.
I have little faith that there aren't other potential PGresult
leakages in the backend modules that use libpq. So I think it'd
be a good idea to apply this concept universally by creating
infrastructure that attaches a reset callback to every PGresult
generated in the backend. However, that seems too invasive for
v18 at this point, let alone the back branches. So for the
moment, apply this narrow fix that just makes DirectModify safe.
I have a patch in the queue for the more general idea, but it
will have to wait for v19.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
Backpatch-through: 13
Our GSSAPI code only allows packet sizes up to 16kB. However it
emerges that during authentication, larger packets might be needed;
various authorities suggest 48kB or 64kB as the maximum packet size.
This limitation caused login failure for AD users who belong to many
AD groups. To add insult to injury, we gave an unintelligible error
message, typically "GSSAPI context establishment error: The routine
must be called again to complete its function: Unknown error".
As noted in code comments, the 16kB packet limit is effectively a
protocol constant once we are doing normal data transmission: the
GSSAPI code splits the data stream at those points, and if we change
the limit then we will have cross-version compatibility problems
due to the receiver's buffer being too small in some combinations.
However, during the authentication exchange the packet sizes are
not determined by us, but by the underlying GSSAPI library. So we
might as well just try to send what the library tells us to.
An unpatched recipient will fail on a packet larger than 16kB,
but that's not worse than the sender failing without even trying.
So this doesn't introduce any meaningful compatibility problem.
We still need a buffer size limit, but we can easily make it be
64kB rather than 16kB until transport negotiation is complete.
(Larger values were discussed, but don't seem likely to add
anything.)
Reported-by: Chris Gooch <cgooch@bamfunds.com>
Fix-suggested-by: Jacob Champion <jacob.champion@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DS0PR22MB5971A9C8A3F44BCC6293C4DABE99A@DS0PR22MB5971.namprd22.prod.outlook.com
Backpatch-through: 13
Previously, XactLockTableWait() and ConditionalXactLockTableWait() could enter
a non-interruptible loop when they successfully acquired a lock on a transaction
but the transaction still appeared to be running. Since this loop continued
until the transaction completed, it could result in long, uninterruptible waits.
Although this scenario is generally unlikely since XactLockTableWait() and
ConditionalXactLockTableWait() can basically acquire a transaction lock
only when the transaction is not running, it can occur in a hot standby.
In such cases, the transaction may still appear active due to
the KnownAssignedXids list, even while no lock on the transaction exists.
For example, this situation can happen when creating a logical replication
slot on a standby.
The cause of the non-interruptible loop was the absence of CHECK_FOR_INTERRUPTS()
within it. This commit adds CHECK_FOR_INTERRUPTS() to the loop in both functions,
ensuring they can be interrupted safely.
Back-patch to all supported branches.
Author: Kevin K Biju <kevinkbiju@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com
Backpatch-through: 13
Commits 53af9491a043 and 2d5fe514052a fixed a number of problems with
foreign keys that reference partitioned tables, and a query to detect
already broken FKs was supplied with the release notes for 17.1, 16.5,
15.9, 14.14, 13.17. However, that query has a bug that causes it to
wrongly report self-referential foreign keys even when they are correct,
so if the user was to drop and rebuild the FKs as indicated, the query
would continue to report them as needing to be repaired. Here we fix
the query to not have that problem.
Reported-by: Paul Foerster <paul.foerster@gmail.com>
Discussion: https://postgr.es/m/5456A1D0-CD47-4315-9C65-71B27E7A2906@gmail.com
Backpatch-through: 13-17
If we hit out-of-memory between creating the PGconn and inserting
it into dblink's hashtable, we'd lose track of the PGconn, which
is quite bad since it represents a live connection to a remote DB.
Fix by rearranging things so that we create the hashtable entry
first.
Also reduce the number of states we have to deal with by getting rid
of the separately-allocated remoteConn object, instead allocating it
in-line in the hashtable entries. (That incidentally removes a
session-lifespan memory leak observed in the regression tests.)
There is an apparently-irreducible remaining OOM hazard, which
is that if the connection fails at the libpq level (ie it's
CONNECTION_BAD) then we have to pstrdup the PGconn's error message
before we can release it, and theoretically that could fail. However,
in such cases we're only leaking memory not a live remote connection,
so I'm not convinced that it's worth sweating over.
This is a pretty low-probability failure mode of course, but losing
a live connection seems bad enough to justify back-patching.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/1346940.1748381911@sss.pgh.pa.us
Backpatch-through: 13
pg_stat_statements anticipates that certain constant locations may be
recorded multiple times and attempts to avoid calculating a length for
these locations in fill_in_constant_lengths().
However, during generate_normalized_query() where normalized query
strings are generated, these locations are not excluded from
consideration. This could increment the parameter number counter for
every recorded occurrence at such a location, leading to an incorrect
normalization in certain cases with gaps in the numbers reported.
For example, take this query:
SELECT WHERE '1' IN ('2'::int, '3'::int::text)
Before this commit, it would be normalized like that, with gaps in the
parameter numbers:
SELECT WHERE $1 IN ($3::int, $4::int::text)
However the correct, less confusing one should be like that:
SELECT WHERE $1 IN ($2::int, $3::int::text)
This commit fixes the computation of the parameter numbers to track the
number of constants replaced with an $n by a separate counter instead of
the iterator used to loop through the list of locations.
The underlying query IDs are not changed, neither are the normalized
strings for existing PGSS hash entries. New entries with fresh
normalized queries would automatically get reshaped based on the new
parameter numbering.
Issue discovered while discussing a separate problem for HEAD, but this
affects all the stable branches.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tzxvWXsacGyxrixdhy3tTTDfJQqxyFBRFh31nNHBQ5qA@mail.gmail.com
Backpatch-through: 13
As written, the test was throwing an error because of an unbalanced
parenthesis. The regex used in the test is adjusted to not fail and to
test the case of an opening parenthesis in a character class after some
nested square brackets.
Oversight in d46911e584d4.
Discussion: https://postgr.es/m/16ab039d1af455652bdf4173402ddda145f2c73b.camel@cybertec.at
The code that translates SIMILAR TO pattern matching expressions to
POSIX-style regular expressions did not consider that square brackets
can be nested. For example, in an expression like [[:alpha:]%_], the
logic replaced the placeholders '_' and '%' but it should not.
This commit fixes the conversion logic by tracking the nesting level of
square brackets marking character class areas, while considering that
in expressions like []] or [^]] the first closing square bracket is a
regular character. Multiple tests are added to show how the conversions
should or should not apply applied while in a character class area, with
specific cases added for all the characters converted outside character
classes like an opening parenthesis '(', dollar sign '$', etc.
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/16ab039d1af455652bdf4173402ddda145f2c73b.camel@cybertec.at
Backpatch-through: 13
The documentation for exported snapshots in logical decoding previously
stated that snapshot creation may fail on a hot standby. This is no longer
accurate, as snapshot exporting on standbys has been supported since
PostgreSQL 10. This commit removes the outdated description.
Additionally, the docs referred to the NOEXPORT_SNAPSHOT option to
suppress snapshot exporting in CREATE_REPLICATION_SLOT. However,
since PostgreSQL 15, NOEXPORT_SNAPSHOT is considered legacy syntax
and retained only for backward compatibility. This commit updates
the documentation for v15 and later to use the modern equivalent:
SNAPSHOT 'nothing'. The older syntax is preserved in documentation for
v14 and earlier.
Back-patched to all supported branches.
Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Kevin K Biju <kevinkbiju@gmail.com>
Discussion: https://postgr.es/m/174791480466.798.17122832105389395178@wrigleys.postgresql.org
Backpatch-through: 13
Invalid indexes are suffixed with "_ccnew" or "_ccold". The
documentation missed to mention the initial underscore.
ChooseRelationName() may also append an extra number if indexes with a
similar name already exist; let's add a note about that too.
Author: Alec Cozens <acozens@pixelpower.com>
Discussion: https://postgr.es/m/174733277404.1455388.11471370288789479593@wrigleys.postgresql.org
Backpatch-through: 13
In the grammar, <expr> is a c_expr, which accepts only a limited set
of simple constants and expressions without parens. The deparsing
logic didn't quite match the grammar rule, and failed to use parens
e.g. for "5::bigint".
To fix, always surround the expression with parens. Would be nice to
omit the parens in simple cases, but unfortunately it's non-trivial to
detect such simple cases. Even if the expression is a simple literal
123 in the original query, after parse analysis it becomes a FuncExpr
with COERCE_IMPLICIT_CAST rather than a simple Const.
Reported-by: yonghao lee
Backpatch-through: 13
Discussion: https://www.postgresql.org/message-id/18929-077d6b7093b176e2@postgresql.org
Prevent moving the confirmed_flush backwards, as this could lead to data
duplication issues caused by replicating already replicated changes.
This can happen when a client acknowledges an LSN it doesn't have to do
anything for, and thus didn't store persistently. After a restart, the
client can send the prior LSN that it stored persistently as an
acknowledgement, but we need to ignore such an LSN to avoid retreating
confirm_flush LSN.
Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Tested-by: Nisha Moond <nisha.moond412@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10Ly0x7HhwQ@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB57164AB5716AF2E477D53F6F9489A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Per the letter of the C11 standard, one must #define
__STDC_WANT_LIB_EXT1__ as 1 before including <string.h> in order to
have access to memset_s(). It appears that many platforms are lenient
about this, because we weren't doing it and yet the code appeared to
work anyway. But we now find that with -std=c11, macOS is strict and
doesn't declare memset_s, leading to compile failures since we try to
use it anyway. (Given the lack of prior reports, perhaps this is new
behavior in the latest SDK? No matter, we're clearly in the wrong.)
In addition to the immediate problem, which could be fixed merely by
adding the needed #define to explicit_bzero.c, it seems possible that
our configure-time probe for memset_s() could fail in case a platform
implements the function in some odd way due to this spec requirement.
This concern can be fixed in largely the same way that we dealt with
strchrnul() in 6da2ba1d8: switch to using a declaration-based
configure probe instead of a does-it-link probe.
Back-patch to v13 where we started using memset_s().
Reported-by: Lakshmi Narayana Velayudam <dev.narayana.v@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAA4pTnLcKGG78xeOjiBr5yS7ZeE-Rh=FaFQQGOO=nPzA1L8yEA@mail.gmail.com
Backpatch-through: 13
This aligns the copyright and legal notice wordig with commit
a233a603bab8 and pgweb commit 2d764dbc083ab8. Backpatch down
to all supported versions.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Dave Page <dpage@pgadmin.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/744E414E-3F52-404C-97FB-ED9B3AA37DC8@yesql.se
Backpatch-through: 13
With GB18030 as source encoding, applications could crash the server via
SQL functions convert() or convert_from(). Applications themselves
could crash after passing unterminated GB18030 input to libpq functions
PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or
PQescapeString(). Extension code could crash by passing unterminated
GB18030 input to jsonapi.h functions. All those functions have been
intended to handle untrusted, unterminated input safely.
A crash required allocating the input such that the last byte of the
allocation was the last byte of a virtual memory page. Some malloc()
implementations take measures against that, making the SIGSEGV hard to
reach. Back-patch to v13 (all supported versions).
Author: Noah Misch <noah@leadboat.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
Start the file with static functions not specific to pe_test_vectors
tests. This way, new tests can use them without disrupting the file's
layout. Change report_result() PQExpBuffer arguments to plain strings.
Back-patch to v13 (all supported versions), for the next commit.
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207