61052 Commits

Author SHA1 Message Date
Tomas Vondra
1aff1dc8df Revert "Improve accounting for memory used by shared hash tables"
This reverts commit f5930f9a98ea65d659d41600a138e608988ad122.

This broke the expansion of private hash tables, which reallocates the
directory. But that's impossible when it's allocated together with the
other fields, and dir_realloc() failed with BogusFree. Clearly, this
needs rethinking.

Discussion: https://postgr.es/m/CAApHDvriCiNkm=v521AP6PKPfyWkJ++jqZ9eqX4cXnhxLv8w-A@mail.gmail.com
2025-04-04 04:43:50 +02:00
Amit Langote
88f55bc976 Make derived clause lookup in EquivalenceClass more efficient
Derived clauses are stored in ec_derives, a List of RestrictInfos.
These clauses are later looked up by matching the left and right
EquivalenceMembers along with the clause's parent EC.

This linear search becomes expensive in queries with many joins or
partitions, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.

To improve performance, introduce a hash table (ec_derives_hash) that
is built when the list reaches 32 entries -- the same threshold used
for join_rel_hash. The original list is retained alongside the hash
table to support EC merging and serialization
(_outEquivalenceClass()).

Each clause is stored in the hash table using a canonicalized key: the
EquivalenceMember with the lower memory address is placed in the key
before the one with the higher memory address. This avoids storing or
searching for both permutations of the same clause. For clauses
involving a constant EM, the key places NULL in the first slot and the
non-constant EM in the second.

The hash table is initialized using list_length(ec_derives_list) as
the size hint. simplehash internally adjusts this to the next power of
two after dividing by the fillfactor, so this typically results in at
least 64 buckets near the threshold -- avoiding immediate resizing
while adapting to the actual number of entries.

The lookup logic for derived clauses is now centralized in
ec_search_derived_clause_for_ems(), which consults the hash table when
available and falls back to the list otherwise.

The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old
ec_derives field did not. This ensures consistent cleanup and avoids
leaking memory when large lists are discarded.

An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() so that it is
enforced consistently, regardless of whether the hash table or list is
used for lookup.

This design incorporates suggestions by David Rowley, who proposed
both the key canonicalization and the initial sizing approach to
balance memory usage and CPU efficiency.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Amit Langote <amitlangote09@gmail.com>
Tested-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com
2025-04-04 10:45:05 +09:00
Amit Langote
887160d1be Add assertion to verify derived clause has constant RHS
find_derived_clause_for_ec_member() searches for a previously-derived
clause that equates a non-constant EquivalenceMember to a constant.
It is only called for EquivalenceClasses with ec_has_const set, and
with a non-constant member the EquivalenceMember to search for.

The matched clause is expected to have the non-constant member on the
left-hand side and the constant EquivalenceMember on the right.

Assert that the RHS is indeed a constant, to catch violations of this
structure and enforce assumptions made by
generate_base_implied_equalities_const().

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAExHW5scMxyFRqOFE6ODmBiW2rnVBEmeEcA-p4W_CyuEikURdA@mail.gmail.com
2025-04-04 10:45:05 +09:00
Melanie Plageman
67be093562 Use AIO batchmode for bitmap heap scans
Previously bitmap heap scan was not AIO batchmode safe because of the
visibility map reads potentially done for the "skip fetch" optimization
(which skipped fetching tuples from the heap if the pages were all
visible and none of the columns were used in the query).

The skip fetch optimization implementation was found to have bugs and
was removed in 459e7bf8e2f8, so we can safely enable batchmode for
bitmap heap scans.
2025-04-03 18:23:02 -04:00
Melanie Plageman
54a3615f15 Remove misleading read stream asserts in a few users
Several read stream users asserted that the read stream was exhausted
after looping on that very condition. It was pointed out in an a
review of an as-of-yet uncommitted read stream user [1] that this was
confusing and could lead the reader to think there was a possibility of
some kind of race condition. Remove these asserts.

[1] https://postgr.es/m/F9ACE8D0-B807-4A17-B6BD-87EF0717983D%40yesql.se
2025-04-03 18:22:37 -04:00
Tom Lane
dbd437e670 Fix oversight in commit 0dca5d68d.
As coded, fmgr_sql() would get an assertion failure for a SQL function
that has an empty body and is declared to return some type other than
VOID.  Typically you'd never get that far because fmgr_sql_validator()
would reject such a definition (I suspect that's how come I managed to
miss the bug).  But if check_function_bodies is off or the function is
polymorphic, the validation check wouldn't get made.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0fde377a-3870-4d18-946a-ce008ee5bb88@gmail.com
2025-04-03 16:03:12 -04:00
Daniel Gustafsson
46c4c7cbc6 oauth: Remove timeout from t/002_client when not needed
The connect_timeout=1 setting for the --hang-forever test was left in
place and used by later tests, causing unexpected timeouts on slower
buildfarm animals. Remove it when no longer needed.

Per buildfarm member skink, reported by Andres on Discord.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Andres Freund <andres@anarazel.de>
2025-04-03 20:41:09 +02:00
Daniel Gustafsson
8ae0a37932 oauth: Fix build on platforms without epoll/kqueue
register_socket() missed a variable declaration if neither
HAVE_SYS_EPOLL_H nor HAVE_SYS_EVENT_H was defined.

While we're fixing that, adjust the tests to check pg_config.h for one
of the multiplexer implementations, rather than assuming that Windows is
the only platform without support. (Christoph reported this on
hurd-amd64, an experimental Debian.)

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/Z-sPFl27Y0ZC-VBl%40msg.df7cb.de
2025-04-03 20:37:52 +02:00
Jeff Davis
945126234b Fix unintentional 'NULL' string literal in pg_upgrade.
Introduced in 2a083ab807.

Discussion: https://postgr.es/m/e852442da35b4f31acc600ed98bbee0f12e65e0c.camel@j-davis.com
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2025-04-03 11:04:37 -07:00
Jeff Davis
b81ffa13e3 pg_upgrade check for Unicode-dependent relations.
This check will not cause an upgrade failure, only a warning.

Discussion: https://postgr.es/m/ef03d678b39a64392f4b12e0f59d1495c740969e.camel%40j-davis.com
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
2025-04-03 10:45:38 -07:00
Masahiko Sawada
fd09c1316b Restrict copying of invalidated replication slots.
Previously, invalidated logical and physical replication slots could
be copied using the pg_copy_logical_replication_slot and
pg_copy_physical_replication_slot functions. Replication slots that
were invalidated for reasons other than WAL removal retained their
restart_lsn. This meant that a new slot copied from an invalidated
slot could have a restart_lsn pointing to a WAL segment that might
have already been removed.

This commit restricts the copying of invalidated replication slots.

Backpatch to v16, where slots could retain their restart_lsn when
invalidated for reasons other than WAL removal.

For v15 and earlier, this check is not required since slots can only
be invalidated due to WAL removal, and existing checks already handle
this issue.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CANhcyEU65aH0VYnLiu%3DOhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg%40mail.gmail.com
Backpatch-through: 16
2025-04-03 10:30:00 -07:00
Álvaro Herrera
f104192e52
Remove duplicate set of print_notnull
I inserted the second one by mistake in commit 14e87ffa5c54.

Reported-by: jian he <jian.universality@gmail.com>
Confirmed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CACJufxFqckBFxPfCixHHbOr0zMLksviTj2m3o12-tErfx_PvTg@mail.gmail.com
2025-04-03 17:34:25 +02:00
Daniel Gustafsson
b82e7eddb0 Add missing declarations to pg_config.h.in
Add missing pg_config.h.in declarations from 09be39112654
where the corresponding autoconf/meson declarations were
added.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/70145721-6949-4ABF-BB54-63F866488DF8@yesql.se
2025-04-03 13:57:27 +02:00
Daniel Gustafsson
2da74d8d64 libpq: Add support for dumping SSL key material to file
This adds a new connection parameter which instructs libpq to
write out keymaterial clientside into a file in order to make
connection debugging with Wireshark and similar tools possible.
The file format used is the standardized NSS format.

Author: Abhishek Chanda <abhishek.becs@gmail.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CAKiP-K85C8uQbzXKWf5wHQPkuygGUGcufke713iHmYWOe9q2dA@mail.gmail.com
2025-04-03 13:16:43 +02:00
Heikki Linnakangas
e4309f73f6 Add support for sorted gist index builds to btree_gist
This enables sortsupport in the btree_gist extension for faster builds
of gist indexes.

Sorted gist index build strategy is the new default now. Regression
tests are unchanged (except for one small change in the 'enum' test to
add coverage for enum values added later) and are using the sorted
build strategy instead.

One version of this was committed a long time ago already, in commit
9f984ba6d2, but it was quickly reverted because of buildfarm
failures. The failures were presumably caused by some small bugs, but
we never got around to debug and commit it again. This patch was
written from scratch, implementing the same idea, with some fragments
and ideas from the original patch.

Author: Bernd Helmle <mailings@oopsware.de>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
2025-04-03 13:46:35 +03:00
Heikki Linnakangas
9370978da8 Fix boilerplate comments in btree_gist
A few of these were copy-pasted wrong, like the comment "Bytea ops" in
btree_numeric.c. Instead of fixing the incorrect ones, replace them
all with generic comment "GiST support functions".

Also tidy up the inconsistent newlines between various functions while
we're at it.
2025-04-03 13:39:33 +03:00
Peter Eisentraut
82a46cca99 Update Unicode data to Unicode 16.0.0
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/146349e4-4687-4321-91af-f235572490a8@eisentraut.org
2025-04-03 12:00:09 +02:00
Peter Eisentraut
231064aa0f plpython: Add test for returning Python set from SETOF function
This is claimed in the documentation but there was a no test case for
it.

Reported-by: Bogdan Grigorenko <gri.bogdan.2020@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/173543330569.680.6706329879058172623%40wrigleys.postgresql.org
2025-04-03 11:09:50 +02:00
Amit Kapila
d1d83827ba Doc: Improve -R option added in e5aeed4b80.
Author: Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAHut+PvJPnaL=70SbBe3fYg2nq74Z=Yv4X=zRpUWYfOi-q6=2w@mail.gmail.com
2025-04-03 14:27:13 +05:30
Álvaro Herrera
8806e4e8de
002_pg_upgrade.pl: Move pg_dump test code for better stability
The alleged "statistics pg_dump bug" that prevented us from enabling
stats dumping in commit 172259afb563 wasn't a pg_dump bug after all: it
was just a side effect of not running pg_dump at the right time (namely,
before giving autovacuum some time to do its thing and then disabling it
to stabilize things).  Move the code around to fix this problem and
enable statistics dumping.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Diagnosed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/5f3703fd7f27da62a8f3615218f937507f522347.camel@j-davis.com
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
2025-04-03 10:16:24 +02:00
Álvaro Herrera
abe56227b2
002_pg_upgrade.pl: rename some variables for clarity
This renames %node_params to %old_node_params, @initdb_params to
@old_initdb_params, and adds separate @new_initdb_params and
%new_node_params rather than reusing the former in confusing ways.

Extracted from a larger patch from the same author.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
2025-04-03 09:56:58 +02:00
Richard Guo
ea5d3f5233 Remove duplicated comment in get_relation_constraints
The check for non-inheritable constraints is performed later, and the
same comment is included at that point.

While we're here, remove one extraneous blank line.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CACJufxETi6x86S8EkH8mRfOcm2AenoE9t1pyCFVMpU34gVhF3w@mail.gmail.com
2025-04-03 16:43:53 +09:00
Peter Eisentraut
84fea854c9 Update Unicode data to CLDR 47
No actual changes result.
2025-04-03 09:20:25 +02:00
Peter Eisentraut
bbf24fe2f1 Update code comment
Commit 4e7f62bc386 added a new input file to a script but didn't
update the comment listing the input files.
2025-04-03 09:20:25 +02:00
Peter Eisentraut
34f04aa653 Fix update-unicode make target
The addition of SpecialCasing.txt by commit 286a365b9c2 was not added
to the make target dependencies, so the invoked script would fail
because the required file wasn't downloaded first.  (The meson version
appears to work correctly.)
2025-04-03 09:20:25 +02:00
Amit Kapila
4868c96bc8 Fix slot synchronization for two_phase enabled slots.
The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To address the issue on HEAD, the two_phase_at field of the slot is
exposed by the pg_replication_slots view and allows the slot
synchronization to copy this value to the corresponding synced slot on the
standby server.

This bug is likely to occur if the user toggles the two_phase option to
true after initial slot creation. Given that altering the two_phase option
of a replication slot is not allowed in PostgreSQL 17, this bug is less
likely to occur. We can't change the view/function definition in
backbranch so we can't push the same fix but we are brainstorming an
appropriate solution for PG17.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB5724CC7C288535BBCEEE65DA94A72@TYAPR01MB5724.jpnprd01.prod.outlook.com
2025-04-03 12:26:54 +05:30
Tom Lane
a7187c3723 Remove unnecessary type violation in tsvectorrecv().
compareentry() is declared to work on WordEntryIN structs, but
tsvectorrecv() is using it in two places to work on WordEntry
structs.  This is almost okay, since WordEntry is the first
field of WordEntryIN.  But on machines with 8-byte pointers,
WordEntryIN will have a larger alignment spec than WordEntry,
and it's at least theoretically possible that the compiler
could generate code that depends on the larger alignment.

Given the lack of field reports, this may be just a hypothetical bug
that upsets nothing except sanitizer tools.  Or it may be real on
certain hardware but nobody's tried to use tsvectorrecv() on such
hardware.  In any case we should fix it, and the fix is trivial:
just change compareentry() so that it works on WordEntry without any
mention of WordEntryIN.  We can also get rid of the quite-useless
intermediate function WordEntryCMP.

Bug: #18875
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18875-07a29c49c825a608@postgresql.org
Backpatch-through: 13
2025-04-02 16:17:43 -04:00
Andres Freund
24da5b239a Add test for HeapBitmapScan's broken skip_fetch optimization
In the previous commit HeapBitmapScan's skip_fetch optimization was removed,
due to being broken in not easily fixable ways. Add a test that verifies we
don't re-introduce this bug if somebody tries to re-add the feature.

Only add the test to master for now, it's possible it's not entirely
stable. That seems sufficient, as we're not going to re-introduce the feature
on the backbranches. I did verify that the test passes on all branches. If the
test turns out to be unproblematic, we can backpatch it later, should we feel
a need to do so.

Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com
2025-04-02 14:58:39 -04:00
Andres Freund
459e7bf8e2 Remove HeapBitmapScan's skip_fetch optimization
The optimization does not take the removal of TIDs by a concurrent vacuum into
account. The concurrent vacuum can remove dead TIDs and make pages ALL_VISIBLE
while those dead TIDs are referenced in the bitmap. This can lead to a
skip_fetch scan returning too many tuples.

It likely would be possible to implement this optimization safely, but we
don't have the necessary infrastructure in place. Nor is it clear that it's
worth building that infrastructure, given how limited the skip_fetch
optimization is.

In the backbranches we just disable the optimization by always passing
need_tuples=true to table_beginscan_bm(). We can't perform API/ABI changes in
the backbranches and we want to make the change as minimal as possible.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reported-By: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com
Backpatch-through: 13
2025-04-02 14:54:20 -04:00
Tom Lane
0dca5d68d7 Change SQL-language functions to use the plan cache.
In the historical implementation of SQL functions (if they don't get
inlined), we built plans for all the contained queries at first call
within an outer query, and then re-used those plans for the duration
of the outer query, and then forgot everything.  This was not ideal,
not least because the plans could not be customized to specific values
of the function's parameters.  Our plancache infrastructure seems
mature enough to be used here.  That will solve both the problem with
not being able to build custom plans and the problem with not being
able to share work across successive outer queries.

Aside from those performance concerns, this change fixes a
longstanding bugaboo with SQL functions: you could not write DDL that
would affect later statements in the same function.  That's mostly
still true with new-style SQL functions, since the results of parse
analysis are baked into the stored query trees (and protected by
dependency records).  But for old-style SQL functions, it will now
work much as it does with PL/pgSQL functions, because we delay parse
analysis and planning of each query until we're ready to run it.
Some edge cases that require replanning are now handled better too;
see for example the new rowsecurity test, where we now detect an RLS
context change that was previously missed.

One other edge-case change that might be worthy of a release note
is that we now insist that a SQL function's result be generated
by the physically-last query within it.  Previously, if the last
original query was deleted by a DO INSTEAD NOTHING rule, we'd be
willing to take the result from the preceding query instead.
This behavior was undocumented except in source-code comments,
and it seems hard to believe that anyone's relying on it.

Along the way to this feature, we needed a few infrastructure changes:

* The plancache can now take either a raw parse tree or an
analyzed-but-not-rewritten Query as the starting point for a
CachedPlanSource.  If given a Query, it is caller's responsibility
that nothing will happen to invalidate that form of the query.
We use this for new-style SQL functions, where what's in pg_proc is
serialized Query(s) and we trust the dependency mechanism to disallow
DDL that would break those.

* The plancache now offers a way to invoke a post-rewrite callback
to examine/modify the rewritten parse tree when it is rebuilding
the parse trees after a cache invalidation.  We need this because
SQL functions sometimes adjust the parse tree to make its output
exactly match the declared result type; if the plan gets rebuilt,
that has to be re-done.

* There is a new backend module utils/cache/funccache.c that
abstracts the idea of caching data about a specific function
usage (a particular function and set of input data types).
The code in it is moved almost verbatim from PL/pgSQL, which
has done that for a long time.  We use that logic now for
SQL-language functions too, and maybe other PLs will have use
for it in the future.

Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/8216639.NyiUUSuA9g@aivenlaptop
2025-04-02 14:06:02 -04:00
Heikki Linnakangas
e9e7b66044 Add GiST and btree sortsupport routines for range types
For GiST, having a sortsupport function allows building the index
using the "sorted build" method, which is much faster.

For b-tree, the sortsupport routine doesn't give any new
functionality, but speeds up sorting a tiny bit. The difference is not
very significant, about 2% in cursory testing on my laptop, because
the range type comparison function has quite a lot of overhead from
detoasting. In any case, since we have the function for GiST anyway,
we might as well register it for the btree opfamily too.

Author: Bernd Helmle <mailings@oopsware.de>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
2025-04-02 19:51:28 +03:00
Heikki Linnakangas
ea3f9b6da3 docs: Fix column count attribute in table
Nothing seems to actually depend on the attribute, as the docs built
successfully, but let's be tidy.

Reported offlist by Matthias van de Meent
2025-04-02 18:21:07 +03:00
Tomas Vondra
46df9487d9 Improve accounting for PredXactList, RWConflictPool and PGPROC
Various places allocated shared memory by first allocating a small chunk
using ShmemInitStruct(), followed by ShmemAlloc() calls to allocate more
memory. Unfortunately, ShmemAlloc() does not update ShmemIndex, so this
affected pg_shmem_allocations - it only shown the initial chunk.

This commit modifies the following allocations, to allocate everything
as a single chunk, and then split it internally.

- PredXactList
- RWConflictPool
- PGPROC structures
- Fast-Path Lock Array

The fast-path lock array is allocated separately, not as a part of the
PGPROC structures allocation.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2L28vHzRankszhqz7deXURxKncxfirnuW68zD7+hVAqaS5GQ@mail.gmail.com
2025-04-02 17:14:28 +02:00
Tomas Vondra
f5930f9a98 Improve accounting for memory used by shared hash tables
pg_shmem_allocations tracks the memory allocated by ShmemInitStruct(),
but for shared hash tables that covered only the header and hash
directory.  The remaining parts (segments and buckets) were allocated
later using ShmemAlloc(), which does not update the shmem accounting.
Thus, these allocations were not shown in pg_shmem_allocations.

This commit improves the situation by allocating all the hash table
parts at once, using a single ShmemInitStruct() call. This way the
ShmemIndex entries (and thus pg_shmem_allocations) better reflect the
proper size of the hash table.

This affects allocations for private (non-shared) hash tables too, as
the hash_create() code is shared. For non-shared tables this however
makes no practical difference.

This changes the alignment a bit. ShmemAlloc() aligns the chunks using
CACHELINEALIGN(), which means some parts (header, directory, segments)
were aligned this way. Allocating all parts as a single chunk removes
this (implicit) alignment. We've considered adding explicit alignment,
but we've decided not to - it seems to be merely a coincidence due to
using the ShmemAlloc() API, not due to necessity.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2L28vHzRankszhqz7deXURxKncxfirnuW68zD7+hVAqaS5GQ@mail.gmail.com
2025-04-02 17:14:28 +02:00
Tom Lane
bd178960c6 Need to do CommandCounterIncrement after StoreAttrMissingVal.
Without this, an additional change to the same pg_attribute row
within the same command will fail.  This is possible at least with
ALTER TABLE ADD COLUMN on a multiple-inheritance-pathway structure.
(Another potential hazard is that immediately-following operations
might not see the missingval.)

Introduced by 95f650674, which split the former coding that
used a single pg_attribute update to change both atthasdef and
atthasmissing/attmissingval into two updates, but missed that
this should entail two CommandCounterIncrements as well.  Like
that fix, back-patch through v13.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/025a3ffa-5eff-4a88-97fb-8f583b015965@gmail.com
Backpatch-through: 13
2025-04-02 11:13:01 -04:00
Heikki Linnakangas
b05751220b docs: Add a new section and a table listing protocol versions
Move the discussion on protocol versions and version negotiation to a
new "Protocol versions" section. Add a table listing all the different
protocol versions, starting from the obsolete protocol version 2, and
the PostgreSQL versions that support each.

Discussion: https://www.postgresql.org/message-id/69f53970-1d55-4165-9151-6fb524e36af9@iki.fi
2025-04-02 16:41:51 +03:00
Heikki Linnakangas
a460251f0a Make cancel request keys longer
Currently, the cancel request key is a 32-bit token, which isn't very
much entropy. If you want to cancel another session's query, you can
brute-force it. In most environments, an unauthorized cancellation of
a query isn't very serious, but it nevertheless would be nice to have
more protection from it. Hence make the key longer, to make it harder
to guess.

The longer cancellation keys are generated when using the new protocol
version 3.2. For connections using version 3.0, short 4-bytes keys are
still used.

The new longer key length is not hardcoded in the protocol anymore,
the client is expected to deal with variable length keys, up to 256
bytes. This flexibility allows e.g. a connection pooler to add more
information to the cancel key, which might be useful for finding the
connection.

Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi
2025-04-02 16:41:48 +03:00
Heikki Linnakangas
285613c60a libpq: Add min/max_protocol_version connection options
All supported version of the PostgreSQL server send the
NegotiateProtocolVersion message when an unsupported minor protocol
version is requested by a client. But many other applications that
implement the PostgreSQL protocol (connection poolers, or other
databases) do not, and the same is true for PostgreSQL server versions
older than 9.3. Connecting to such other applications thus fails if a
client requests a protocol version different than 3.0.

This patch adds a max_protocol_version connection option to libpq that
specifies the protocol version that libpq should request from the
server. Currently only 3.0 is supported, but that will change in a
future commit that bumps the protocol version. Even after that version
bump the default will likely stay 3.0 for the time being. Once more of
the ecosystem supports the NegotiateProtocolVersion message we might
want to change the default to the latest minor version.

This also adds the similar min_protocol_version connection option, to
allow the client to specify that connecting should fail if a lower
protocol version is attempted by the server. This can be used to
ensure that certain protocol features are used, which can be
particularly useful if those features impact security.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/CAGECzQTfc_O%2BHXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAGECzQRbAGqJnnJJxTdKewTsNOovUt4bsx3NFfofz3m2j-t7tA@mail.gmail.com
2025-04-02 16:41:45 +03:00
Heikki Linnakangas
5070349102 libpq: Handle NegotiateProtocolVersion message differently
Previously libpq would always error out if the server sends a
NegotiateProtocolVersion message. This was fine because libpq only
supported a single protocol version and did not support any protocol
parameters. But in the upcoming commits, we will introduce a new
protocol version and the NegotiateProtocolVersion message starts to
actually be used.

This patch modifies the client side checks to allow a range of
supported protocol versions, instead of only allowing the exact
version that was requested. Currently this "range" only contains the
3.0 version, but in a future commit we'll change this.

Also clarify the error messages, making them suitable for the world
where libpq will support multiple protocol versions and protocol
extensions.

Note that until the later commits that introduce new protocol version,
this change does not have any behavioural effect, because libpq will
only request version 3.0 and will never send protocol parameters, and
therefore will never receive a NegotiateProtocolVersion message from
the server.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/CAGECzQTfc_O%2BHXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAGECzQRbAGqJnnJJxTdKewTsNOovUt4bsx3NFfofz3m2j-t7tA@mail.gmail.com
2025-04-02 16:41:42 +03:00
Peter Eisentraut
748e98d05b Fix code comment
The changes made in commit d2b4b4c2259 contained incorrect comments:
They said that certain forward declarations were necessary to "avoid
including pathnodes.h here", but the file is itself pathnodes.h!  So
change the comment to just say it's a forward declaration in one case,
and in the other case we don't need the declaration at all because it
already appeared earlier in the file.
2025-04-02 14:46:47 +02:00
Heikki Linnakangas
09be391126 Add timingsafe_bcmp(), for constant-time memory comparison
timingsafe_bcmp() should be used instead of memcmp() or a naive
for-loop, when comparing passwords or secret tokens, to avoid leaking
information about the secret token by timing. This commit just
introduces the function but does not change any existing code to use
it yet.

Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/7b86da3b-9356-4e50-aa1b-56570825e234@iki.fi
2025-04-02 15:32:40 +03:00
Heikki Linnakangas
85d799ba8a docs: Update phrase on message lengths in the protocol
The reasoning for why all the message formats are parseable without
the explicit message length field is anachronistic; the real reason is
that protocol version 2 did not have a message length field. There's
nothing wrong with relying on the message length, like we do in the
CopyData messags, even though it often still makes sense to have
length fields for individual parts in messages.

Discussion: https://www.postgresql.org/message-id/02a4eed2-98f0-4796-9d4f-12128ff44fe0@iki.fi
2025-04-02 15:32:33 +03:00
Andres Freund
a6285b150a tests: Fix incompatibility of test_aio with *_FORCE_RELEASE
The test added in 93bc3d75d8e failed in a build with RELCACHE_FORCE_RELEASE
and CATCACHE_FORCE_RELEASE defined. The test intentionally forgets to exit
batchmode - normally that would trigger an error at the end of the
transaction, which the test verifies.  However, with RELCACHE_FORCE_RELEASE
and CATCACHE_FORCE_RELEASE defined, we get other code (output function lookup)
entering batchmode and erroring out because batchmode isn't allowed to be
entered recursively.

Fix that by changing the queries in question to not output any rows. That's
not exactly pretty, but seems to avoid the problem reliably.

Eventually we might want to make RELCACHE_FORCE_RELEASE and
CATCACHE_FORCE_RELEASE GUCs, so we can disable them where necessary - this
isn't the first test having difficulty with those debug options. But that's
for later.

Per buildfarm member prion.

Discussion: https://postgr.es/m/uc62i6vi5gd4bi6wtjj5poadqxolgy55e7ihkmf3mthjegb6zl@zqo7xez7sc2r
2025-04-02 07:57:11 -04:00
Andres Freund
43dca8a116 tests: Cope with WARNINGs during failed CREATE DB on windows
The test added in 93bc3d75d8e sometimes fails on windows, due to warnings like
WARNING:  some useless files may be left behind in old database directory "base/16514"

The reason for that is createdb_failure_callback() does not ensure that there
are no open file descriptors for files in the partially created,
to-be-dropped, database. We do take care in dropdb(), but that involves
waiting for checkpoints and a ProcSignalBarrier, which we probably don't want
to do in an error callback.  This should probably be fixed one day, but for
now 001_aio.pl needs to cope.

Per buildfarm animals fairywren and drongo.

Discussion: https://postgr.es/m/uc62i6vi5gd4bi6wtjj5poadqxolgy55e7ihkmf3mthjegb6zl@zqo7xez7sc2r
2025-04-02 07:51:48 -04:00
Peter Eisentraut
eec0040c4b Add support for NOT ENFORCED in foreign key constraints
This expands the NOT ENFORCED constraint flag, previously only
supported for CHECK constraints (commit ca87c415e2f), to foreign key
constraints.

Normally, when a foreign key constraint is created on a table, action
and check triggers are added to maintain data integrity.  With this
patch, if a constraint is marked as NOT ENFORCED, integrity checks are
no longer required, making these triggers unnecessary.  Consequently,
when creating a NOT ENFORCED foreign key constraint, triggers will not
be created, and the constraint will be marked as NOT VALID.
Similarly, if an existing foreign key constraint is changed to NOT
ENFORCED, the associated triggers will be dropped, and the constraint
will also be marked as NOT VALID.  Conversely, if a NOT ENFORCED
foreign key constraint is changed to ENFORCED, the necessary triggers
will be created, and the will be changed to VALID by performing
necessary validation.

Since not-enforced foreign key constraints have no triggers, the
shortcut used for example in psql and pg_dump to skip looking for
foreign keys if the relation is known not to have triggers no longer
applies.  (It already didn't work for partitioned tables.)

Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Tested-by: Triveni N <triveni.n@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
2025-04-02 13:36:44 +02:00
Andres Freund
327d987df1 tests: Cope with io_method in TEMP_CONFIG in test_aio
If io_method is set in TEMP_CONFIG the test added in 93bc3d75d8e fails,
because it assumes the io_method specified at initdb is actually used.

Fix that by appending the io_method again, after initdb (and thus after
TEMP_CONFIG has been added by Cluster.pm).

Per buildfarm animal bumblebee

Discussion: https://postgr.es/m/zh5u22wbpcyfw2ddl3lsvmsxf4yvsrvgxqwwmfjddc4c2khsgp@gfysyjsaelr5
2025-04-02 07:00:40 -04:00
Alexander Korotkov
bc22dc0e0d Get rid of WALBufMappingLock
Allow multiple backends to initialize WAL buffers concurrently.  This way
`MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without
taking a single LWLock in exclusive mode.

The new algorithm works as follows:
 * reserve a page for initialization using XLogCtl->InitializeReserved,
 * ensure the page is written out,
 * once the page is initialized, try to advance XLogCtl->InitializedUpTo and
   signal to waiters using XLogCtl->InitializedUpToCondVar condition
   variable,
 * repeat previous steps until we reserve initialization up to the target
   WAL position,
 * wait until concurrent initialization finishes using a
   XLogCtl->InitializedUpToCondVar.

Now, multiple backends can, in parallel, concurrently reserve pages,
initialize them, and advance XLogCtl->InitializedUpTo to point to the latest
initialized page.

Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Tested-by: Michael Paquier <michael@paquier.xyz>
2025-04-02 12:44:24 +03:00
Fujii Masao
b53b88109f Improve error message when standby does accept connections.
Even after reaching the minimum recovery point, if there are long-lived
write transactions with 64 subtransactions on the primary, the recovery
snapshot may not yet be ready for hot standby, delaying read-only
connections on the standby. Previously, when read-only connections were
not accepted due to this condition, the following error message was logged:

    FATAL:  the database system is not yet accepting connections
    DETAIL:  Consistent recovery state has not been yet reached.

This DETAIL message was misleading because the following message was
already logged in this case:

    LOG:  consistent recovery state reached

This contradiction, i.e., indicating that the recovery state was consistent
while also stating it wasn’t, caused confusion.

This commit improves the error message to better reflect the actual state:

    FATAL: the database system is not yet accepting connections
    DETAIL: Recovery snapshot is not yet ready for hot standby.
    HINT: To enable hot standby, close write transactions with more than 64 subtransactions on the primary server.

To implement this, the commit introduces a new postmaster signal,
PMSIGNAL_RECOVERY_CONSISTENT. When the startup process reaches
a consistent recovery state, it sends this signal to the postmaster,
allowing it to correctly recognize that state.

Since this is not a clear bug, the change is applied only to the master
branch and is not back-patched.

Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/02db8cd8e1f527a8b999b94a4bee3165@oss.nttdata.com
2025-04-02 15:13:01 +09:00
David Rowley
121d774cae Doc: add information about partition locking
The documentation around locking of partitions for the executor startup
phase of run-time partition pruning wasn't clear about which partitions
were being locked.  Fix that.

Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAApHDvp738G75HfkKcfXaf3a8s%3D6mmtOLh46tMD0D2hAo1UCzA%40mail.gmail.com
Backpatch-through: 13
2025-04-02 14:02:44 +13:00
Melanie Plageman
b3219c69fc aio: Add errcontext for processing I/Os for another backend
Push an ErrorContextCallback adding additional detail about the process
performing the I/O and the owner of the I/O when those are not the same.

For io_method worker, this adds context specifying which process owns
the I/O that the I/O worker is processing.

For io_method io_uring, this adds context only when a backend is
*completing* I/O for another backend. It specifies the pid of the owning
process.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/rdml3fpukrqnas7qc5uimtl2fyytrnu6ymc2vjf2zuflbsjuul%40hyizyjsexwmm
2025-04-01 19:53:07 -04:00