51efe38cb92f introduced bunch of tests for idle_in_transaction_session_timeout,
transaction_timeout and statement_timeout. These tests were too flaky on some
slow buildfarm machines, so we plan to replace them with TAP tests using
injection points. This commit removes flaky tests.
Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin
This commit introduces a new field 'sublevels_up' in ReplaceVarnoContext,
and enhances replace_varno_walker() to:
1) recurse into subselects with sublevels_up increased, and
2) perform the replacement only when varlevelsup is equal to sublevels_up.
This commit also fixes some outdated comments. And besides adding relevant
test cases, it makes some unification over existing SJE test cases.
Discussion: https://postgr.es/m/CAMbWs4-%3DPO6Mm9gNnySbx0VHyXjgnnYYwbN9dth%3DTLQweZ-M%2Bg%40mail.gmail.com
Author: Richard Guo
Reviewed-by: Andrei Lepikhov, Alexander Korotkov
build_child_join_sjinfo creates a derived SpecialJoinInfo in
the short-lived GEQO context, but afterwards the semi_rhs_exprs
from that may be used in a UniquePath for a child base relation.
This breaks the expectation that all base-relation-level structures
are in the planning-lifespan context, leading to use of a dangling
pointer with probable ensuing crash later on in create_unique_plan.
To fix, copy the expression trees when making a UniquePath.
Per bug #18360 from Alexander Lakhin. This has been broken since
partitionwise joins were added, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/18360-a23caf3157f34e62@postgresql.org
The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.
The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.
It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.
The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
943f7ae1c869 has changed GetSlotInvalidationCause() so as it would
return the last element of SlotInvalidationCauses[] when an incorrect
conflict reason name is given by a caller, but this should return
RS_INVAL_NONE in such cases, even if such a state should never be
reached in practice.
Per gripe from Peter Smith.
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/CAHut+PtsrSWxczpGkSaSVtJo+BXrvJ3Hwp5gES14bbL-G+HL7A@mail.gmail.com
By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.
The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.
A new parameter sync_replication_slots enables or disables this new
process.
On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.
A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.
Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
The reason was that the ALTER SUBSCRIPTION .. SET PUBLICATION will lead to
the restarting of apply worker and after the restart, the apply worker
will use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before restart the origin has not
been updated and the WAL start location points to a location before where
PUBLICATION exists which can lead to the error "publication ... does not
exist".
Fix it by recreating the subscription as a newly created subscription will
start processing WAL from the recent WAL location and will see the
required publication.
This behavior has existed from the time logical replication was introduced
but is exposed by this test and we have started a discussion for a better
fix for this problem.
As per Buildfarm
Diagnosed-by: Amit Kapila
Author: Vignesh C
Discussion: https://postgr.es/m/3307255.1706911634@sss.pgh.pa.us
This is part of an effort to reduce the number of special cases in the
automatically generated node support functions.
Allegedly, only certain fields of the Constraint node are valid based
on contype. But this has historically not been kept up to date in the
read/write functions. The Constraint node is only used for debugging
DDL statements, so there are no strong requirements for its output,
and there is no enforcement for its correctness. (There was no read
support before a6bc3301925.) Commits e7a552f303c and abf46ad9c7b are
examples of where omissions were fixed.
This patch just removes the custom read/write implementations for the
Constraint node type. Now we just output all the fields, which is a
bit more than before, but at least we don't have to maintain these
functions anymore. Also, we lose the string representation of the
contype field, but for this marginal use case that seems tolerable.
This patch also changes the documentation of the Constraint struct to
put less emphasis on grouping fields by constraint type but rather
document for each field how it's used.
Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
Since the size of the string representation of an uuid is fixed, there
is no benefit in using a StringInfo. This commit simplifies uuid_oud()
to not rely on a StringInfo, where avoiding the overhead of the string
manipulation makes the function substantially faster.
A COPY TO on a relation with one UUID attribute can show up to a 40%
speedup when the bottleneck is the COPY computation with uuid_out()
showing up at the top of the profiles (numbered measure here, Laurenz
has mentioned something closer to 20% faster runtimes), for example when
the data is fully in shared buffers or the OS cache.
Author: Laurenz Albe
Reviewed-by: Andres Freund, Michael Paquier
Description: https://postgr.es/m/679d5455cbbb0af667ccb753da51a475bae1eaed.camel@cybertec.at
This commit switches the handling of the conflict cause strings for
replication slots to use a table rather than being explicitly listed,
using a C99-designated initializer syntax for the array elements. This
makes the whole more readable while easing future maintenance with less
areas to update when adding a new conflict reason.
This is similar to 74a730631065, but the scale of the change is smaller
as there are less conflict causes than LWLock builtin tranche names.
Author: Bharath Rupireddy
Reviewed-by: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CALj2ACUxSLA91QGFrJsWNKs58KXb1C03mbuwKmzqqmoAKLwJaw@mail.gmail.com
Verify that a user running MERGE with a DO NOTHING clause has
privileges to read the table, even if no columns are referenced. Such
privileges were already required if the ON clause or any of the WHEN
conditions referenced any column at all, so there's no functional change
in practice.
This change fixes an assertion failure in the case where no column is
referenced by the command and the WHEN clauses are all DO NOTHING.
Backpatch to 15, where MERGE was introduced.
Reported-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/4d65a385-7efa-4436-a825-0869f89d9d92@postgrespro.ru
This option is useful to bypass the default behavior of init() which
would create the data folder of a new cluster by copying it from a
template previously initdb'd, if any. Copying the data folder is much
cheaper than running initdb, but some tests may want to force that. For
example, one scenario of pg_combinebackup updated in this commit needs a
different system ID for two nodes.
Previously, this could only be achieved by unsetting
$ENV{'INITDB_TEMPLATE'}, which could become a problem in complex node
setups by making tests less efficient.
Author: Amul Sul
Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/Zc1tX9lLonLGu6oH@paquier.xyz
The explanation of interval's behavior in datatype.sgml wasn't wrong
exactly, but it was unclear, partly because it buried the lede about
there being three internal fields. Rearrange and wordsmith for more
clarity.
The discussion of extract() claimed that input of type date was
handled by casting, but actually there's been a separate SQL function
taking date for a very long time. Also, it was mostly silent about
how interval inputs are handled, but there are several field types
for which it seems useful to be specific.
Improve discussion of justify_days()/justify_hours() too.
In passing, remove vertical space in some groups of examples,
as there was little consistency about whether to have such space
or not. (I only did this within the datetime functions section;
there are some related inconsistencies elsewhere.)
Per discussion of bug #18348 from Michael Bondarenko. There
may be some code changes coming out of that discussion too,
but we likely won't back-patch them. This docs-only patch
seems useful to back-patch, though I only carried it back to
v13 because it didn't apply easily in v12.
Discussion: https://postgr.es/m/18348-b097a3587dfde8a4@postgresql.org
When the partition pruning code finds an OpExpr with an operator that
does not belong to the partition key's opfamily, the code checks to see
if the negator of the operator is the opfamily's BTEqualStrategyNumber
operator so that partition pruning can support that operator and invert
the matching partitions. Doing this only works for LIST partitioned
tables.
Here we fix a minor correctness issue where when we discover we're not
pruning for a LIST partitioned table, we return PARTCLAUSE_NOMATCH.
PARTCLAUSE_NOMATCH is only meant to be used when the clause may match
another partitioned key column. For this case, the clause is not going
to be any more useful to another partitioned key as the partition strategy
is not going to change from one key to the next.
Noticed while working 4c2369ac5. No backpatch because returning
PARTCLAUSE_NOMATCH instead of PARTCLAUSE_UNSUPPORTED mostly just causes
wasted effort checking subsequent partition keys against a clause that
will never be used for pruning.
In passing, correct a comment for get_matching_range_bounds() which
mentions that an 'opstrategy' of 0 is supported. It's not, so fix the
comment. This was pointed out by Alexander Lakhin.
Discussion: https://postgr.es/m/CAApHDvqriy8mPOFJ_Bd66YGXJ4+XULpv-4YdB+ePdCQFztyisA@mail.gmail.com
Discussion: https://postgr.es/m/312fb507-9b5e-cf83-d8ed-cd0da72a902c@gmail.com
The invalidation of an active slot is done in two steps:
- Termination of the backend holding it, if any.
- Report that the slot is obsolete, with a conflict cause depending on
the slot's data.
This can be racy because between these two steps the slot mutex would be
released while doing system calls, which means that the effective_xmin
and effective_catalog_xmin could advance during that time, detecting a
conflict cause different than the one originally wanted before the
process owning a slot is terminated.
Holding the mutex longer is not an option, so this commit changes the
code to record the LSNs stored in the slot during the termination of the
process owning the slot.
Bonus thanks to Alexander Lakhin for the various tests and the analysis.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/ZaTjW2Xh+TQUCOH0@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 16
Partition pruning wrongly assumed that, for a table partitioned on a
boolean column, a clause in the form "boolcol IS NOT false" and "boolcol
IS NOT true" could be inverted to correspondingly become "boolcol IS true"
and "boolcol IS false". These are not equivalent as the NOT version
matches the opposite boolean value *and* NULLs. This incorrect assumption
meant that partition pruning pruned away partitions that could contain
NULL values.
Here we fix this by correctly not pruning partitions which could store
NULLs.
To be affected by this, the table must be partitioned by a NULLable boolean
column and queries would have to contain "boolcol IS NOT false" or "boolcol
IS NOT true". This could result in queries filtering out NULL values
with a LIST partitioned table and "ERROR: invalid strategy number 0"
for RANGE and HASH partitioned tables.
Reported-by: Alexander Lakhin
Bug: #18344
Discussion: https://postgr.es/m/18344-8d3f00bada6d09c6@postgresql.org
Backpatch-through: 12
Before the previous commit, the test could hang until
LOG_SNAPSHOT_INTERVAL_MS (15s), until checkpoint_timeout (300s), or
indefinitely. An indefinite hang was awfully improbable. It entailed
the test reaching checkpoint_timeout before the
DecodingContextFindStartpoint() of a CREATE SUBSCRIPTION, yet after the
preceding WAL record. Back-patch to v16, which introduced the test.
Bertrand Drouvot, reported by Noah Misch.
Discussion: https://postgr.es/m/20240211010227.a2.nmisch@google.com
One IPC::Run::start() used an IPC::Run::timer() without checking for
expiration. The other used no timeout or timer. Back-patch to v16,
which introduced the test.
Reviewed by Bertrand Drouvot.
Discussion: https://postgr.es/m/20240211010227.a2.nmisch@google.com
intoasc(), a wrapper for PGTYPESinterval_to_asc that converts an
interval to its textual representation, used a plain memcpy() when
copying its result. This could miss a zero-termination in the result
string, leading to an incorrect result.
The routines in informix.c do not provide the length of their result
buffer, which would allow a replacement of strcpy() to safer strlcpy()
calls, but this requires an ABI breakage and that cannot happen in
back-branches.
Author: Oleg Tselebrovskiy
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/bf47888585149f83b276861a1662f7e4@postgrespro.ru
Backpatch-through: 12
pgtypes_alloc() can return NULL when failing an allocation, which is
something that PGTYPEStimestamp_defmt_asc() has forgotten about when
translating a timestamp for 'D', 'r', 'R' and 'T' as these require a
temporary allocation.
This is unlikely going to be a problem in practice, so no backpatch is
done.
Author: Oleg Tselebrovskiy
Discussion: https://postgr.es/m/bf47888585149f83b276861a1662f7e4@postgrespro.ru
Commit 6b80394781 introduced integer comparison functions designed
to be as efficient as possible while avoiding overflow. This
commit makes use of these functions in many of the in-tree qsort()
comparators to help ensure transitivity. Many of these comparator
functions should also see a small performance boost.
Author: Mats Kindahl
Reviewed-by: Andres Freund, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
This commit adds integer comparison functions that are designed to
be as efficient as possible while avoiding overflow. A follow-up
commit will make use of these functions in many of the in-tree
qsort() comparators. The new functions are not better in all cases
(e.g., when the comparator function is inlined), so it is important
to consider the context before using them.
Author: Mats Kindahl
Reviewed-by: Tom Lane, Heikki Linnakangas, Andres Freund, Thomas Munro, Andrey Borodin, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
Previously, some callers requested XLOG_BLCKSZ bytes
unconditionally. While this did not cause a problem, because the extra
bytes are ignored, it's confusing and makes it harder to add safety
checks. Additionally, the comment about zero padding was incorrect.
With this commit, all callers request the number of bytes they
actually need.
Author: Bharath Rupireddy
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CALj2ACWBRFac2TingD3PE3w2EBHXUHY3=AEEZPJmqhpEOBGExg@mail.gmail.com
A child table can specify a compression or storage method different
from its parents. This was previously an error. (But this was
inconsistently enforced because for example the settings could be
changed later using ALTER TABLE.) This now also allows an explicit
override if multiple parents have different compression or storage
settings, which was previously an error that could not be overridden.
The compression and storage properties remains unchanged in a child
inheriting from parent(s) after its creation, i.e., when using ALTER
TABLE ... INHERIT. (This is not changed.)
Before this change, the error detail would mention the first pair of
conflicting parent compression or storage methods. But with this
change it waits till the child specification is considered by which
time we may have encountered many such conflicting pairs. Hence the
error detail after this change does not include the conflicting
compression/storage methods. Those can be obtained from parent
definitions if necessary. The code to maintain list of all
conflicting methods or even the first conflicting pair does not seem
worth the convenience it offers. This change is inline with what we
do with conflicting default values.
Before this commit, the specified storage method could be stored in
ColumnDef::storage (CREATE TABLE ... LIKE) or ColumnDef::storage_name
(CREATE TABLE ...). This caused the MergeChildAttribute() and
MergeInheritedAttribute() to ignore a storage method specified in the
child definition since it looked only at ColumnDef::storage. This
commit removes ColumnDef::storage and instead uses
ColumnDef::storage_name to save any storage method specification. This
is similar to how compression method specification is handled.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7@eisentraut.org