PostgreSQL

mirror of https://github.com/postgres/postgres.git synced 2025-07-13 00:01:36 -04:00

Author	SHA1	Message	Date
Peter Geoghegan	0b018fabaa	Set relfrozenxid to oldest extant XID seen by VACUUM. When VACUUM set relfrozenxid before now, it set it to whatever value was used to determine which tuples to freeze -- the FreezeLimit cutoff. This approach was very naive. The relfrozenxid invariant only requires that new relfrozenxid values be <= the oldest extant XID remaining in the table (at the point that the VACUUM operation ends), which in general might be much more recent than FreezeLimit. VACUUM now carefully tracks the oldest remaining XID/MultiXactId as it goes (the oldest remaining values _after_ lazy_scan_prune processing). The final values are set as the table's new relfrozenxid and new relminmxid in pg_class at the end of each VACUUM. The oldest XID might come from a tuple's xmin, xmax, or xvac fields. It might even come from one of the table's remaining MultiXacts. Final relfrozenxid values must still be >= FreezeLimit in an aggressive VACUUM (FreezeLimit still acts as a lower bound on the final value that aggressive VACUUM can set relfrozenxid to). Since standard VACUUMs still make no guarantees about advancing relfrozenxid, they might as well set relfrozenxid to a value from well before FreezeLimit when the opportunity presents itself. In general standard VACUUMs may now set relfrozenxid to any value > the original relfrozenxid and <= OldestXmin. Credit for the general idea of using the oldest extant XID to set pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com	2022-04-03 09:57:21 -07:00
Peter Geoghegan	74388a1ac3	Avoid VACUUM reltuples distortion. Add a heuristic that avoids distortion in the pg_class.reltuples estimates used by VACUUM. Without the heuristic, successive manually run VACUUM commands (run against a table that is never modified after initial bulk loading) will scan the same page in each VACUUM operation. Eventually pg_class.reltuples may reach the point where one single heap page is accidentally considered highly representative of the entire table. This is likely to be completely wrong, since the last heap page typically has fewer tuples than average for the table. It's not obvious that this was a problem prior to commit 44fa8488, which made vacuumlazy.c consistently scan the last heap page (even when it is all-visible in the visibility map). It seems possible that there were more subtle variants of the same problem that went unnoticed for quite some time, though. Commit 44fa8488 simplified certain aspects of when and how relation truncation was considered, but it did not introduce the "scan the last page" behavior. Essentially the same behavior was introduced much earlier, in commit e8429082. It was conditioned on whether or not truncation looked promising towards the end of the initial heap pass by VACUUM until recently, which was at least somewhat protective. That doesn't seem like something that we should be relying on, though. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkNKORurux459M64mR63Aw4Jq7MBRVcX=CvALqN3A88WA@mail.gmail.com	2022-02-16 17:15:50 -08:00
Peter Geoghegan	efa4a9462a	Consolidate VACUUM xid cutoff logic. Push the logic for determining whether or not a VACUUM operation will be aggressive down into vacuum_set_xid_limits(). This makes the function's signature significantly simpler, and seems clearer overall. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com	2022-02-11 18:26:15 -08:00
Peter Geoghegan	872770fd6c	Add VACUUM instrumentation for scanned pages, relfrozenxid. Report on scanned pages within VACUUM VERBOSE and autovacuum logging. These are pages that were physically examined during the VACUUM operation. Note that this can include a small number of pages that were marked all-visible in the visibility map by some earlier VACUUM operation. VACUUM won't skip all-visible pages that aren't part of a range of all-visible pages that's at least 32 blocks in length (partly to avoid missing out on opportunities to advance relfrozenxid during non-aggressive VACUUMs). Commit 44fa8488 simplified the definition of scanned pages. It became the complement of the pages (of those pages from rel_pages) that were skipped using the visibility map. And so scanned pages precisely indicates how effective the visibility map was at saving work. (Before now we displayed the number of pages skipped via the visibility map when happened to be frozen pages, but not when they were merely all-visible, which was less useful to users.) Rename the user-visible OldestXmin output field to "removal cutoff", and show some supplementary information: how far behind the cutoff is (number of XIDs behind) by the time the VACUUM operation finished. This will help users to figure out what's _not_ working in extreme cases where VACUUM is fundamentally unable to remove dead tuples or freeze older tuples (e.g., due to a leaked replication slot). Also report when relfrozenxid is advanced by VACUUM in output that immediately follows "removal cutoff". This structure is intended to highlight the relationship between the new relfrozenxid value for the table, and the VACUUM operation's removal cutoff. Finally, add instrumentation of "missed dead tuples", and the number of pages that had at least one such tuple. These are fully DEAD (not just RECENTLY_DEAD) tuples with storage that could not be pruned due to failure to acquire a cleanup lock on a heap page. This is a replacement for the "skipped due to pin" instrumentation removed by commit 44fa8488. It shows more details than before for pages where failing to get a cleanup lock actually resulted in VACUUM missing out on useful work, but usually shows nothing at all instead (the mere fact that we couldn't get a cleanup lock is usually of no consequence whatsoever now). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com	2022-02-11 16:48:40 -08:00
Tom Lane	62e28097ce	Remove useless inline marker. Putting "inline" on a function that's not used anywhere in its own file is useless unless the linker is doing global optimization, a method we don't generally enable. Moreover, it draws warnings from some buildfarm members (curculio at least). Looks like this was sloppiness in cc8b25712, which moved the function from somewhere else where the inline marker was more appropriate.	2022-01-22 17:11:33 -05:00
Peter Geoghegan	49c9d9fcfa	Unify VACUUM VERBOSE and autovacuum logging. The log_autovacuum_min_duration instrumentation used its own dedicated code for logging, which was not reused by VACUUM VERBOSE. This was highly duplicative, and sometimes led to each code path using slightly different accounting for essentially the same information. Clean things up by making VACUUM VERBOSE reuse the same instrumentation code. This code restructuring changes the structure of the VACUUM VERBOSE output itself, but that seems like an overall improvement. The most noticeable change in VACUUM VERBOSE output is that it no longer outputs a distinct message per index per round of index vacuuming. Most of the same information (about each index) is now shown in its new per-operation summary message. This is far more legible. A few details are no longer displayed by VACUUM VERBOSE, but that's no real loss in practice, especially in the common case where we don't need multiple index scans/rounds of vacuuming. This super fine-grained information is still available via DEBUG2 messages, which might still be useful in debugging scenarios. VACUUM VERBOSE now shows new instrumentation, which is typically very useful: all of the log_autovacuum_min_duration instrumentation that it missed out on before now. This includes information about WAL overhead, buffers hit/missed/dirtied information, and I/O timing information. VACUUM VERBOSE still retains a few INFO messages of its own. This is limited to output concerning the progress of heap rel truncation, as well as some basic information about parallel workers. These details are still potentially quite useful. They aren't a good fit for the log output, which must summarize the whole operation. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WzmW4Me7_qR4X4ka7pxP-jGmn7=Npma_-Z-9Y1eD0MQRLw@mail.gmail.com	2022-01-14 16:50:34 -08:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Amit Kapila	8e1fae1938	Move parallel vacuum code to vacuumparallel.c. This commit moves parallel vacuum related code to a new file commands/vacuumparallel.c so that any table AM supporting indexes can utilize parallel vacuum in order to call index AM callbacks (ambulkdelete and amvacuumcleanup) with parallel workers. Another reason for this refactoring is that the parallel vacuum isn't specific to heap so it doesn't make sense to keep this code in heap/vacuumlazy.c. Author: Masahiko Sawada, based on suggestion from Andres Freund Reviewed-by: Hou Zhijie, Amit Kapila, Haiying Tang Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de	2021-12-23 11:42:52 +05:30
Amit Kapila	cc8b25712b	Move index vacuum routines to vacuum.c. An upcoming patch moves parallel vacuum code out of vacuumlazy.c. This code restructuring will allow both lazy vacuum and parallel vacuum to use index vacuum functions. Author: Masahiko Sawada Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de	2021-12-22 07:55:14 +05:30
Peter Geoghegan	3499df0dee	Support disabling index bypassing by VACUUM. Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding reloption): make it into a ternary style boolean parameter. It now exposes a third option, "auto". The "auto" option (which is now the default) enables the "bypass index vacuuming" optimization added by commit 1e55e7d1. "VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM simply do any required index vacuuming, regardless of how few dead tuples are encountered during the first scan of the target heap relation (unless there are exactly zero). This gives users a way of opting out of the "bypass index vacuuming" optimization, if for whatever reason that proves necessary. It is also expected to be used by PostgreSQL developers as a testing option from time to time. "VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it forcibly disables both index vacuuming and index cleanup. It's not expected to be used much in PostgreSQL 14. The failsafe mechanism added by commit 1e55e7d1 addresses the same problem in a simpler way. INDEX_CLEANUP can now be thought of as a testing and compatibility option. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com	2021-06-18 20:04:07 -07:00
Amit Kapila	0734b0e983	Improve docs and error messages for parallel vacuum. The error messages, docs, and one of the options were using 'parallel degree' to indicate parallelism used by vacuum command. We normally use 'parallel workers' at other places so change it for parallel vacuum accordingly. Author: Bharath Rupireddy Reviewed-by: Dilip Kumar, Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CALj2ACWz=PYrrFXVsEKb9J1aiX4raA+UBe02hdRp_zqDkrWUiw@mail.gmail.com	2021-05-25 09:26:53 +05:30
Tom Lane	def5b065ff	Initial pgindent and pgperltidy run for v14. Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.	2021-05-12 13:14:10 -04:00
Peter Geoghegan	1e55e7d175	Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com	2021-04-07 12:37:45 -07:00
Peter Geoghegan	b4af70cb21	Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com	2021-04-05 13:21:44 -07:00
Stephen Frost	4753ef37e0	Use a WaitLatch for vacuum/autovacuum sleeping Instead of using pg_usleep() in vacuum_delay_point(), use a WaitLatch. This has the advantage that we will realize if the postmaster has been killed since the last time we decided to sleep while vacuuming. Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/CAFh8B=kcdk8k-Y21RfXPu5dX=bgPqJ8TC3p_qxR_ygdBS=JN5w@mail.gmail.com	2021-03-30 12:52:56 -04:00
Thomas Munro	c7ecd6af01	ReadNewTransactionId() -> ReadNextTransactionId(). The new name conveys the effect better, is more consistent with similar functions ReadNextMultiXactId(), ReadNextFullTransactionId(), and matches the name of the variable that it reads. Reported-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzmVR4SakBXQUdhhPpMf1aYvZCnna5%3DHKa7DAgEmBAg%2B8g%40mail.gmail.com	2021-02-15 13:17:02 +13:00
Michael Paquier	7cb3048f38	Add option PROCESS_TOAST to VACUUM This option controls if toast tables associated with a relation are vacuumed or not when running a manual VACUUM. It was already possible to trigger a manual VACUUM on a toast relation without processing its main relation, but a manual vacuum on a main relation always forced a vacuum on its toast table. This is useful in scenarios where the level of bloat or transaction age of the main and toast relations differs a lot. This option is an extension of the existing VACOPT_SKIPTOAST that was used by autovacuum to control if toast relations should be skipped or not. This internal flag is renamed to VACOPT_PROCESS_TOAST for consistency with the new option. A new option switch, called --no-process-toast, is added to vacuumdb. Author: Nathan Bossart Reviewed-by: Kirk Jamison, Michael Paquier, Justin Pryzby Discussion: https://postgr.es/m/BA8951E9-1524-48C5-94AF-73B1F0D7857F@amazon.com	2021-02-09 14:13:57 +09:00
Michael Paquier	a3dc926009	Refactor option handling of CLUSTER, REINDEX and VACUUM This continues the work done in b5913f6. All the options of those commands are changed to use hex values rather than enums to reduce the risk of compatibility bugs when introducing new options. Each option set is moved into a new structure that can be extended with more non-boolean options (this was already the case of VACUUM). The code of REINDEX is restructured so as manual REINDEX commands go through a single routine from utility.c, like VACUUM, to ease the allocation handling of option parameters when a command needs to go through multiple transactions. This can be used as a base infrastructure for future patches related to those commands, including reindex filtering and tablespace support. Per discussion with people mentioned below, as well as Alvaro Herrera and Peter Eisentraut. Author: Michael Paquier, Justin Pryzby Reviewed-by: Alexey Kondratov, Justin Pryzby Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz	2021-01-18 14:03:10 +09:00
Bruce Momjian	ca3b37487b	Update copyright for 2021 Backpatch-through: 9.5	2021-01-02 13:06:25 -05:00
Michael Paquier	947789f1f5	Avoid using tuple from syscache for update of pg_database.datfrozenxid pg_database.datfrozenxid gets updated using an in-place update at the end of vacuum or autovacuum. Since 96cdeae, as pg_database has a toast relation, it is possible for a pg_database tuple to have toast values if there is a large set of ACLs in place. In such a case, the in-place update would fail because of the flattening of the toast values done for the catcache entry fetched. Instead of using a copy from the catcache, this changes the logic to fetch the copy of the tuple by directly scanning pg_database. Per the lack of complaints on the matter, no backpatch is done. Note that before 96cdeae, attempting to insert such a tuple to pg_database would cause a "row is too big" error, so the end-of-vacuum problem was not reachable. Author: Ashwin Agrawal, Junfeng Yang Discussion: https://postgr.es/m/DM5PR0501MB38800D9E4605BCA72DD35557CCE10@DM5PR0501MB3880.namprd05.prod.outlook.com	2020-12-08 12:13:19 +09:00
Alvaro Herrera	dcfff74fb1	Restore lock level to update statusFlags Reverts 27838981be9d (some comments are kept). Per discussion, it does not seem safe to relax the lock level used for this; in order for it to be safe, there would have to be memory barriers between the point we set the flag and the point we set the trasaction Xid, which perhaps would not be so bad; but there would also have to be barriers at the readers' side, which from a performance perspective might be bad. Now maybe this analysis is wrong and it is safe for some reason, but proof of that is not trivial. Discussion: https://postgr.es/m/20201118190928.vnztes7c2sldu43a@alap3.anarazel.de	2020-11-26 12:30:48 -03:00
Alvaro Herrera	27838981be	Relax lock level for setting PGPROC->statusFlags We don't actually need a lock to set PGPROC->statusFlags itself; what we do need is a shared lock on either XidGenLock or ProcArrayLock in order to ensure MyProc->pgxactoff keeps still while we modify the mirror array in ProcGlobal->statusFlags. Some places were using an exclusive lock for that, which is excessive. Relax those to use shared lock only. procarray.c has a couple of places with somewhat brittle assumptions about PGPROC changes: ProcArrayEndTransaction uses only shared lock, so it's permissible to change MyProc only. On the other hand, ProcArrayEndTransactionInternal also changes other procs, so it must hold exclusive lock. Add asserts to ensure those assumptions continue to hold. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20201117155501.GA13805@alvherre.pgsql	2020-11-18 13:24:22 -03:00
Alvaro Herrera	cd9c1b3e19	Rename PGPROC->vacuumFlags to statusFlags With more flags associated to a PGPROC entry that are not related to vacuum (currently existing or planned), the name "statusFlags" describes its purpose better. (The same is done to the mirroring PROC_HDR->vacuumFlags.) No functional changes in this commit. This was suggested first by Hari Babu Kommi in [1] and then by Michael Paquier at [2]. [1] https://postgr.es/m/CAJrrPGcsDC-oy1AhqH0JkXYa0Z2AgbuXzHPpByLoBGMxfOZMEQ@mail.gmail.com [2] https://postgr.es/m/20200820060929.GB3730@paquier.xyz Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20201116182446.qcg3o6szo2zookyr@localhost	2020-11-16 19:42:55 -03:00
Andres Freund	94bc27b576	Centralize horizon determination for temp tables, fixing bug due to skew. This fixes a bug in the edge case where, for a temp table, heap_page_prune() can end up with a different horizon than heap_vacuum_rel(). Which can trigger errors like "ERROR: cannot freeze committed xmax ...". The bug was introduced due to interaction of a7212be8b9e "Set cutoff xmin more aggressively when vacuuming a temporary table." with dc7420c2c92 "snapshot scalability: Don't compute global horizons while building snapshots.". The problem is caused by lazy_scan_heap() assuming that the only reason its HeapTupleSatisfiesVacuum() call would return HEAPTUPLE_DEAD is if the tuple is a HOT tuple, or if the tuple's inserting transaction has aborted since the heap_page_prune() call. But after a7212be8b9e that was also possible in other cases for temp tables, because heap_page_prune() uses a different visibility test after dc7420c2c92. The fix is fairly simple: Move the special case logic for temp tables from vacuum_set_xid_limits() to the infrastructure introduced in dc7420c2c92. That ensures that the horizon used for pruning is at least as aggressive as the one used by lazy_scan_heap(). The concrete horizon used for temp tables is slightly different than the logic in dc7420c2c92, but should always be as aggressive as before (see comments). A significant benefit to centralizing the logic procarray.c is that now the more aggressive horizons for temp tables does not just apply to VACUUM but also to e.g. HOT pruning and the nbtree killtuples logic. Because isTopLevel is not needed by vacuum_set_xid_limits() anymore, I undid the the related changes from a7212be8b9e. This commit also adds an isolation test ensuring that the more aggressive vacuuming and pruning of temp tables keeps working. Debugged-By: Amit Kapila <amit.kapila16@gmail.com> Debugged-By: Tom Lane <tgl@sss.pgh.pa.us> Debugged-By: Ashutosh Sharma <ashu.coek88@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20201014203103.72oke6hqywcyhx7s@alap3.anarazel.de Discussion: https://postgr.es/m/20201015083735.derdzysdtqdvxshp@alap3.anarazel.de	2020-10-28 18:02:31 -07:00
Tom Lane	a7212be8b9	Set cutoff xmin more aggressively when vacuuming a temporary table. Since other sessions aren't allowed to look into a temporary table of our own session, we do not need to worry about the global xmin horizon when setting the vacuum XID cutoff. Indeed, if we're not inside a transaction block, we may set oldestXmin to be the next XID, because there cannot be any in-doubt tuples in a temp table, nor any tuples that are dead but still visible to some snapshot of our transaction. (VACUUM, of course, is never inside a transaction block; but we need to test that because CLUSTER shares the same code.) This approach allows us to always clean out a temp table completely during VACUUM, independently of concurrent activity. Aside from being useful in its own right, that simplifies building reproducible test cases. Discussion: https://postgr.es/m/3490536.1598629609@sss.pgh.pa.us	2020-09-01 18:40:43 -04:00
Tom Lane	3d351d916b	Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com	2020-08-30 12:21:51 -04:00
Noah Misch	676a9c3cc4	Correct several behavior descriptions in comments. Reuse cautionary language from src/test/ssl/README in src/test/kerberos/README. SLRUs have had access to six-character segments names since commit 73c986adde5d73a5e2555da9b5c8facedb146dcd, and recovery stopped calling HeapTupleHeaderAdvanceLatestRemovedXid() in commit 558a9165e081d1936573e5a7d576f5febd7fb55a. The other corrections are more self-evident.	2020-08-15 20:21:52 -07:00
Noah Misch	566372b3d6	Prevent concurrent SimpleLruTruncate() for any given SLRU. The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com	2020-08-15 10:15:53 -07:00
Andres Freund	5788e258bb	snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags. Similar to the previous commit this increases the chance that data frequently needed by GetSnapshotData() stays in l2 cache. As we now take care to not unnecessarily write to ProcGlobal->vacuumFlags, there should be very few modifications to the ProcGlobal->vacuumFlags array. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-14 15:33:35 -07:00
Andres Freund	941697c3c1	snapshot scalability: Introduce dense array of in-progress xids. The new array contains the xids for all connected backends / in-use PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT arrays which can have unused entries interspersed). This improves performance because GetSnapshotData() always needs to scan the xids of all live procarray entries and now there's no need to go through the procArray->pgprocnos indirection anymore. As the set of running top-level xids changes rarely, compared to the number of snapshots taken, this substantially increases the likelihood of most data required for a snapshot being in l2 cache. In read-mostly workloads scanning the xids[] array will sufficient to build a snapshot, as most backends will not have an xid assigned. To keep the xid array dense ProcArrayRemove() needs to move entries behind the to-be-removed proc's one further up in the array. Obviously moving array entries cannot happen while a backend sets it xid. I.e. locking needs to prevent that array entries are moved while a backend modifies its xid. To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray entry is not a very frequent operation, even taking 2PC into account. Due to the above, the dense array entries can only be read or modified while holding ProcArrayLock and/or XidGenLock. This prevents a concurrent ProcArrayRemove() from shifting the dense array while it is accessed concurrently. While the new dense array is very good when needing to look at all xids it is less suitable when accessing a single backend's xid. In particular it would be problematic to have to acquire a lock to access a backend's own xid. Therefore a backend's xid is not just stored in the dense array, but also in PGPROC. This also allows a backend to only access the shared xid value when the backend had acquired an xid. The infrastructure added in this commit will be used for the remaining PGXACT fields in subsequent commits. They are kept separate to make review easier. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-14 15:33:35 -07:00
Andres Freund	dc7420c2c9	snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-12 16:03:49 -07:00
Andres Freund	a9a4a7ad56	code: replace most remaining uses of 'master'. Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de	2020-07-08 13:24:35 -07:00
Peter Eisentraut	0fd2a79a63	Spelling adjustments	2020-06-07 15:06:51 +02:00
Amit Kapila	24d2d38b1e	Fix the usage of parallel and full options of vacuum command. Earlier we were inconsistent in allowing the usage of parallel and full options. Change it such that we disallow them only when they are combined in a way that we don't support. In passing, improve the comments in some of the existing tests of parallel vacuum. Reported-by: Tushar Ahuja Author: Justin Pryzby, Amit Kapila Reviewed-by: Sawada Masahiko, Michael Paquier, Mahendra Singh Thalor and Amit Kapila Discussion: https://postgr.es/m/58c8d171-e665-6fa3-a9d3-d9423b694dae%40enterprisedb.com	2020-04-16 10:55:02 +05:30
Amit Kapila	a6fea120a7	Comments and doc fixes for commit 40d964ec99. Reported-by: Justin Pryzby Author: Justin Pryzby, with few changes by me Reviewed-by: Amit Kapila and Sawada Masahiko Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com	2020-04-14 08:10:27 +05:30
Amit Kapila	2401d93718	Fix coverity complaint about commit 40d964ec99. The coverity complained that dividing integer expressions and then converting the integer quotient to type "double" would lose fractional part. Typecasting one of the arguments of expression with double should fix the report. Author: Mahendra Singh Thalor Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20200329224818.6phnhv7o2q2rfovf@alap3.anarazel.de	2020-04-01 09:28:13 +05:30
Andres Freund	cedffbdb8b	Report wait event for cost-based vacuum delay. Author: Justin Pryzby Discussion: https://postgr.es/m/20200321040750.GD13662@telsasoft.com	2020-03-23 22:53:22 -07:00
Amit Kapila	40d964ec99	Allow vacuum command to process indexes in parallel. This feature allows the vacuum to leverage multiple CPUs in order to process indexes. This enables us to perform index vacuuming and index cleanup with background workers. This adds a PARALLEL option to VACUUM command where the user can specify the number of workers that can be used to perform the command which is limited by the number of indexes on a table. Specifying zero as a number of workers will disable parallelism. This option can't be used with the FULL option. Each index is processed by at most one vacuum process. Therefore parallel vacuum can be used when the table has at least two indexes. The parallel degree is either specified by the user or determined based on the number of indexes that the table has, and further limited by max_parallel_maintenance_workers. The index can participate in parallel vacuum iff it's size is greater than min_parallel_index_scan_size. Author: Masahiko Sawada and Amit Kapila Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra, Mahendra Singh and Sergei Kornilov Tested-by: Mahendra Singh and Prabhat Sahu Discussion: https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com	2020-01-20 07:57:49 +05:30
Bruce Momjian	7559d8ebfa	Update copyrights for 2020 Backpatch-through: update all files in master, backpatch legal files through 9.4	2020-01-01 12:21:45 -05:00
Michael Paquier	7854e07f25	Revert "Rename files and headers related to index AM" This follows multiple complains from Peter Geoghegan, Andres Freund and Alvaro Herrera that this issue ought to be dug more before actually happening, if it happens. Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql	2019-12-27 08:09:00 +09:00
Michael Paquier	8ce3aa9b59	Rename files and headers related to index AM The following renaming is done so as source files related to index access methods are more consistent with table access methods (the original names used for index AMs ware too generic, and could be confused as including features related to table AMs): - amapi.h -> indexam.h. - amapi.c -> indexamapi.c. Here we have an equivalent with backend/access/table/tableamapi.c. - amvalidate.c -> indexamvalidate.c. - amvalidate.h -> indexamvalidate.h. - genam.c -> indexgenam.c. - genam.h -> indexgenam.h. This has been discussed during the development of v12 when table AM was worked on, but the renaming never happened. Author: Michael Paquier Reviewed-by: Fabien Coelho, Julien Rouhaud Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz	2019-12-25 10:23:39 +09:00
Peter Eisentraut	604bd36711	PG_FINALLY This gives an alternative way of catching exceptions, for the common case where the cleanup code is the same in the error and non-error cases. So instead of PG_TRY(); { ... code that might throw ereport(ERROR) ... } PG_CATCH(); { cleanup(); PG_RE_THROW(); } PG_END_TRY(); cleanup(); one can write PG_TRY(); { ... code that might throw ereport(ERROR) ... } PG_FINALLY(); { cleanup(); } PG_END_TRY(); Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com	2019-11-01 11:18:03 +01:00
Thomas Munro	6bda2af039	Fix bug that could try to freeze running multixacts. Commits 801c2dc7 and 801c2dc7 made it possible for vacuum to try to freeze a multixact that is still running. That was prevented by a check, but raised an error. Repair. Back-patch all the way. Author: Nathan Bossart, Jeremy Schneider Reported-by: Jeremy Schneider Reviewed-by: Jim Nasby, Thomas Munro Discussion: https://postgr.es/m/DAFB8AFF-2F05-4E33-AD7F-FF8B0F760C17%40amazon.com	2019-10-17 09:59:21 +13:00
Michael Paquier	c96581abe4	Fix inconsistencies and typos in the tree, take 11 This fixes various typos in docs and comments, and removes some orphaned definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com	2019-08-19 16:21:39 +09:00
Tom Lane	cabe0f298e	Fix "ANALYZE t, t" inside a transaction block. This failed with either "tuple already updated by self" or "duplicate key value violates unique constraint", depending on whether the table had previously been analyzed or not. The reason is that ANALYZE tried to insert or update the same pg_statistic rows twice, and there was no CommandCounterIncrement between. So add one. The same case works fine outside a transaction block, because then there's a whole transaction boundary between, as a consequence of the way VACUUM works. This issue has been latent all along, but the problem was unreachable before commit 11d8d72c2 added the ability to specify multiple tables in ANALYZE. We could, perhaps, alternatively fix it by adding code to de-duplicate the list of VacuumRelations --- but that would add a lot of overhead to work around dumb commands, so it's not attractive. Per bug #15946 from Yaroslav Schekin. Back-patch to v11. (Note: in v11 I also back-patched the test added by commit 23224563d; otherwise the problem doesn't manifest in the test I added, because "vactst" is empty when the tests for multiple ANALYZE targets are reached. That seems like not a very good thing anyway, so I did this rather than rethinking the choice of test case.) Discussion: https://postgr.es/m/15946-5c7570a2884a26cf@postgresql.org	2019-08-10 11:30:11 -04:00
Michael Paquier	8548ddc61b	Fix inconsistencies and typos in the tree, take 9 This addresses more issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com	2019-08-05 12:14:58 +09:00
Tom Lane	8255c7a5ee	Phase 2 pgindent run for v12. Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com	2019-05-22 13:04:48 -04:00
Tom Lane	be76af171c	Initial pgindent run for v12. This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us	2019-05-22 12:55:34 -04:00
Fujii Masao	b84dbc8eb8	Add TRUNCATE parameter to VACUUM. This commit adds new parameter to VACUUM command, TRUNCATE, which specifies that VACUUM should attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system. This parameter, if specified, overrides the vacuum_truncate reloption. If neither the reloption nor the VACUUM option is used, the default is true, as before. Author: Fujii Masao Reviewed-by: Julien Rouhaud, Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com	2019-05-08 02:10:33 +09:00
Andres Freund	fdc7efcc30	Allow pg_class xid & multixid horizons to not be set. This allows table AMs that don't need these horizons. This was already documented in the tableam relation_set_new_filenode callback, but an assert prevented if from actually working (the test AM code contained the change itself). Defang the asserts in the general code, and move the stronger ones into heap AM. Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid / relminmxid. Change the table_relation_copy_for_cluster() interface to allow the AM to overwrite the horizons that get set on the pg_class entry. This'd also in the future allow AMs like heap to compute a relfrozenxid during rewrite that's the table's actual minimum rather than a pre-determined value. Arguably it'd have been better to move the whole computation / setting of those values into the callback, but it seems likely that for other reasons it'd be better to be able to use one value to vacuum/cluster multiple tables (e.g. a toast's horizon shouldn't be different than the table's). Reported-By: Heikki Linnakangas Author: Andres Freund Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi	2019-04-23 21:42:12 -07:00

1 2 3 4 5 ...

564 Commits