PostgreSQL

mirror of https://github.com/postgres/postgres.git synced 2025-05-30 00:02:11 -04:00

Author	SHA1	Message	Date
Simon Riggs	edfc84b878	Cleanup VirtualXact at end of Hot Standby Resolves bug 7572 reported by Daniele Varrazzo	2012-11-29 22:17:15 +00:00
Simon Riggs	fdac4e2ba2	Correctly init fast path fields on PGPROC	2012-11-29 22:12:44 +00:00
Michael Meskes	44fe8ae9f9	When processing nested structure pointer variables ecpg always expected an array datatype which of course is wrong. Applied patch by Muhammad Usama <m.usama@gmail.com> to fix this.	2012-11-29 17:14:49 +01:00
Tom Lane	94c014b532	Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, make the final state be indisvalid = true and indisready = false, which is otherwise nonsensical. This is pretty ugly but we can't add another column without forcing initdb, and it's too late for that in 9.2. (There's a cleaner fix in HEAD.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. Portions of this need to be back-patched even further, but I'll work on that separately. Problem reported by Amit Kapila, diagnosis by Pavan Deolasee, fix by Tom Lane and Andres Freund.	2012-11-29 10:37:13 -05:00
Heikki Linnakangas	ffc3172e4e	If we don't have a backup-end-location, don't claim we've reached it. This was apparently a typo, which caused recovery to think that it immediately reached the end of backup, and allowed the database to start up too early. Reported by Jeff Janes. Backpatch to 9.2, where this code was introduced.	2012-11-28 15:14:19 +02:00
Tom Lane	786afc1ce5	Revert patch for taking fewer snapshots. This reverts commit d573e239f03506920938bf0be56c868d9c3416da, "Take fewer snapshots". While that seemed like a good idea at the time, it caused execution to use a snapshot that had been acquired before locking any of the tables mentioned in the query. This created user-visible anomalies that were not present in any prior release of Postgres, as reported by Tomas Vondra. While this whole area could do with a redesign (since there are related cases that have anomalies anyway), it doesn't seem likely that any future patch would be reasonably back-patchable; and we don't want 9.2 to exhibit a behavior that's subtly unlike either past or future releases. Hence, revert to prior code while we rethink the problem.	2012-11-26 15:55:51 -05:00
Tom Lane	eea6ada926	Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is pretty useless (there being only one output row), but nonetheless it shouldn't fail. But it could fail if "tab" is an inheritance parent, because planagg.c's code for fixing up equivalence classes after making the index-optimized MIN/MAX transformation wasn't prepared to find child-table versions of the aggregate expression. The least ugly fix seems to be to add an option to mutate_eclass_expressions() to skip child-table equivalence class members, which aren't used anymore at this stage of planning so it's not really necessary to fix them. Since child members are ignored in many cases already, it seems plausible for mutate_eclass_expressions() to have an option to ignore them too. Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the same code exists before that, it cannot encounter child-table aggregates AFAICS, because the index optimization transformation cannot succeed on inheritance trees before 9.1 (for lack of MergeAppend).	2012-11-26 12:58:08 -05:00
Heikki Linnakangas	3f7b04d6f6	pg_stat_replication.sync_state was displayed incorrectly at page boundary. XLogRecPtrIsInvalid() only checks the xrecoff field, which is correct when checking if a WAL record could legally begin at the given position, but WAL sending can legally be paused at a page boundary, in which case xrecoff is 0. Use XLByteEQ(..., InvalidXLogRecPtr) instead, which checks that both xlogid and xrecoff are 0. 9.3 doesn't have this problem because XLogRecPtr is now a single 64-bit integer, so XLogRecPtrIsInvalid() does the right thing. Apply to 9.2, and 9.1 where pg_stat_replication view was introduced. Kyotaro HORIGUCHI, reviewed by Fujii Masao.	2012-11-23 19:14:54 +02:00
Tom Lane	430b47f382	Fix pg_resetxlog to use correct path to postmaster.pid. Since we've already chdir'd into the data directory, the file should be referenced as just "postmaster.pid", without prefixing the directory path. This is harmless in the normal case where an absolute PGDATA path is used, but quite dangerous if a relative path is specified, since the program might then fail to notice an active postmaster. Reported by Hari Babu. This got broken in my commit eb5949d190e80360386113fde0f05854f0c9824d, so patch all active versions.	2012-11-22 11:24:46 -05:00
Heikki Linnakangas	dda8b87b6a	Avoid bogus "out-of-sequence timeline ID" errors in standby-mode. When startup process opens a WAL segment after replaying part of it, it validates the first page on the WAL segment, even though the page it's really interested in later in the file. As part of the validation, it checks that the TLI on the page header is >= the TLI it saw on the last page it read. If the segment contains a timeline switch, and we have already replayed it, and then re-open the WAL segment (because of streaming replication got disconnected and reconnected, for example), the TLI check will fail when the first page is validated. Fix that by relaxing the TLI check when re-opening a WAL segment. Backpatch to 9.0. Earlier versions had the same code, but before standby mode was introduced in 9.0, recovery never tried to re-read a segment after partially replaying it. Reported by Amit Kapila, while testing a new feature.	2012-11-22 11:35:56 +02:00
Tom Lane	8af60da9dd	Don't launch new child processes after we've been told to shut down. Once we've received a shutdown signal (SIGINT or SIGTERM), we should not launch any more child processes, even if we get signals requesting such. The normal code path for spawning backends has always understood that, but the postmaster's infrastructure for hot standby and autovacuum didn't get the memo. As reported by Hari Babu in bug #7643, this could lead to failure to shut down at all in some cases, such as when SIGINT is received just before the startup process sends PMSIGNAL_RECOVERY_STARTED: we'd launch a bgwriter and checkpointer, and then those processes would have no idea that they ought to quit. Similarly, launching a new autovacuum worker would result in waiting till it finished before shutting down. Also, switch the order of the code blocks in reaper() that detect startup process crash versus shutdown termination. Once we've sent it a signal, we should not consider that exit(1) is surprising. This is just a cosmetic fix since shutdown occurs correctly anyway, but better not to log a phony complaint about startup process crash. Back-patch to 9.0. Some parts of this might be applicable before that, but given the lack of prior complaints I'm not going to worry too much about older branches.	2012-11-21 15:18:43 -05:00
Tom Lane	278b60598c	Improve handling of INT_MIN / -1 and related cases. Some platforms throw an exception for this division, rather than returning a necessarily-overflowed result. Since we were testing for overflow after the fact, an exception isn't nice. We can avoid the problem by treating division by -1 as negation. Add some regression tests so that we'll find out if any compilers try to optimize away the overflow check conditions. Back-patch of commit 1f7cb5c30983752ff8de833de30afcaee63536d0. Per discussion with Xi Wang, though this is different from the patch he submitted.	2012-11-19 21:21:28 -05:00
Tom Lane	83d48a81f8	Limit values of archive_timeout, post_auth_delay, auth_delay.milliseconds. The previous definitions of these GUC variables allowed them to range up to INT_MAX, but in point of fact the underlying code would suffer overflows or other errors with large values. Reduce the maximum values to something that won't misbehave. There's no apparent value in working harder than this, since very large delays aren't sensible for any of these. (Note: the risk with archive_timeout is that if we're late checking the state, the timestamp difference it's being compared to might overflow. So we need some amount of slop; the choice of INT_MAX/2 is arbitrary.) Per followup investigation of bug #7670. Although this isn't a very significant fix, might as well back-patch.	2012-11-18 17:15:11 -05:00
Tom Lane	89067bc16a	Fix syslogger to not fail when log_rotation_age exceeds 2^31 milliseconds. We need to avoid calling WaitLatch with timeouts exceeding INT_MAX. Fortunately a simple clamp will do the trick, since no harm is done if the wait times out before it's really time to rotate the log file. Per bug #7670 (probably bug #7545 is the same thing, too). In passing, fix bogus definition of log_rotation_age's maximum value in guc.c --- it was numerically right, but only because MINS_PER_HOUR and SECS_PER_MINUTE have the same value. Back-patch to 9.2. Before that, syslogger wasn't using WaitLatch.	2012-11-18 16:16:47 -05:00
Tom Lane	f0461cd861	Improve check_partial_indexes() to consider join clauses in proof attempts. Traditionally check_partial_indexes() has only looked at restriction clauses while trying to prove partial indexes usable in queries. However, join clauses can also be used in some cases; mainly, that a strict operator on "x" proves an "x IS NOT NULL" index predicate, even if the operator is in a join clause rather than a restriction clause. Adding this code fixes a regression in 9.2, because previously we would take join clauses into account when considering whether a partial index could be used in a nestloop inner indexscan path. 9.2 doesn't handle nestloop inner indexscans in the same way, and this consideration was overlooked in the rewrite. Moving the work to check_partial_indexes() is a better solution anyway, since the proof applies whether or not we actually use the index in that particular way, and we don't have to do it over again for each possible outer relation. Per report from Dave Cramer.	2012-11-15 19:29:12 -05:00
Tom Lane	3b4db79e35	Fix the int8 and int2 cases of (minimum possible integer) % (-1). The correct answer for this (or any other case with arg2 = -1) is zero, but some machines throw a floating-point exception instead of behaving sanely. Commit f9ac414c35ea084ff70c564ab2c32adb06d5296f dealt with this in int4mod, but overlooked the fact that it also happens in int8mod (at least on my Linux x86_64 machine). Protect int2mod as well; it's not clear whether any machines fail there (mine does not) but since the test is so cheap it seems better safe than sorry. While at it, simplify the original guard in int4mod: we need only check for arg2 == -1, we don't need to check arg1 explicitly. Xi Wang, with some editing by me.	2012-11-14 17:30:04 -05:00
Tom Lane	5355e39cf5	Fix memory leaks in record_out() and record_send(). record_out() leaks memory: it fails to free the strings returned by the per-column output functions, and also is careless about detoasted values. This results in a query-lifespan memory leakage when returning composite values to the client, because printtup() runs the output functions in the query-lifespan memory context. Fix it to handle these issues the same way printtup() does. Also fix a similar leakage in record_send(). (At some point we might want to try to run output functions in shorter-lived memory contexts, so that we don't need a zero-leakage policy for them. But that would be a significantly more invasive patch, which doesn't seem like material for back-patching.) In passing, use appendStringInfoCharMacro instead of appendStringInfoChar in the innermost data-copying loop of record_out, to try to shave a few cycles from this function's runtime. Per trouble report from Carlos Henrique Reimer. Back-patch to all supported versions.	2012-11-13 14:45:43 -05:00
Simon Riggs	31541778b6	Skip searching for subxact locks at commit. At commit all standby locks are released for the top-level transaction, so searching for locks for each subtransaction is both pointless and costly (N^2) in the presence of many AccessExclusiveLocks.	2012-11-13 16:12:20 -03:00
Simon Riggs	f66e7ab6db	Clarify docs on hot standby lock release Andres Freund and Simon Riggs	2012-11-13 15:56:28 -03:00
Tom Lane	8805ff6580	Fix multiple problems in WAL replay. Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.	2012-11-12 22:05:14 -05:00
Tom Lane	454edf1da9	Check for stack overflow in transformSetOperationTree(). Since transformSetOperationTree() recurses, it can be driven to stack overflow with enough UNION/INTERSECT/EXCEPT clauses in a query. Add a check to ensure it fails cleanly instead of crashing. Per report from Matthew Gerber (though it's not clear whether this is the only thing going wrong for him). Historical note: I think the reasoning behind not putting a check here in the beginning was that the check in transformExpr() ought to be sufficient to guard the whole parser. However, because transformSetOperationTree() recurses all the way to the bottom of the set-operation tree before doing any analysis of the statement's expressions, that check doesn't save it.	2012-11-11 19:56:16 -05:00
Peter Eisentraut	9b06d91ac8	XSLT stylesheet: Add slash to directory name Some versions of the XSLT stylesheets don't handle the missing slash correctly (they concatenate directory and file name without the slash). This might never have worked correctly.	2012-11-08 23:58:05 -05:00
Tom Lane	8354ce9216	Fix WaitLatch() to return promptly when the requested timeout expires. If the sleep is interrupted by a signal, we must recompute the remaining time to wait; otherwise, a steady stream of non-wait-terminating interrupts could delay return from WaitLatch indefinitely. This has been shown to be a problem for the autovacuum launcher, and there may well be other places now or in the future with similar issues. So we'd better make the function robust, even though this'll add at least one gettimeofday call per wait. Back-patch to 9.2. We might eventually need to fix 9.1 as well, but the code is quite different there, and the usage of WaitLatch in 9.1 is so limited that it's not clearly important to do so. Reported and diagnosed by Jeff Janes, though I rewrote his patch rather heavily.	2012-11-08 20:04:54 -05:00
Tom Lane	03787f6392	Don't trash input list structure in does_not_exist_skipping(). The trigger and rule cases need to split up the input name list, but they mustn't corrupt the passed-in data structure, since it could be part of a cached utility-statement parsetree. Per bug #7641.	2012-11-08 11:34:37 -05:00
Alvaro Herrera	9eb80f2ca7	Don't try to use a unopened relation Commit 4c9d0901 mistakenly introduced a call to TransferPredicateLocksToHeapRelation() on an index relation that had been closed a few lines above. Moving up an index_open() call that's below is enough to fix the problem. Discovered by me while testing an unrelated patch.	2012-11-07 16:24:20 -03:00
Bruce Momjian	663c68f0e9	In pg_upgrade docs, mention using base backup as part of rsync for logical replication upgrades. Backpatch to 9.2.	2012-11-07 13:36:08 -05:00
Bruce Momjian	3cef201c19	In pg_upgrade, set synchronous_commit=off for the new cluster, to improve performance when restoring the schema from the old cluster. Backpatch to 9.2.	2012-11-06 14:28:54 -05:00
Tom Lane	329057fd8f	Fix handling of inherited check constraints in ALTER COLUMN TYPE. This case got broken in 8.4 by the addition of an error check that complains if ALTER TABLE ONLY is used on a table that has children. We do use ONLY for this situation, but it's okay because the necessary recursion occurs at a higher level. So we need to have a separate flag to suppress recursion without making the error check. Reported and patched by Pavan Deolasee, with some editorial adjustments by me. Back-patch to 8.4, since this is a regression of functionality that worked in earlier branches.	2012-11-05 13:36:21 -05:00
Tom Lane	2f5390aacc	Fix bogus handling of $(X) (i.e., ".exe") in isolationtester Makefile. I'm not sure why commit 1eb1dde049ccfffc42c80c2bcec14155c58bcc1f seems to have made this start to fail on Cygwin when it never did before --- but nonetheless, the coding was pretty bogus, and unlike the way we handle $(X) anywhere else. Per buildfarm.	2012-11-01 19:48:58 -04:00
Tom Lane	ca2d6a6cef	Limit the number of rel sets considered in consider_index_join_outer_rels. In bug #7626, Brian Dunavant exposes a performance problem created by commit 3b8968f25232ad09001bf35ab4cc59f5a501193e: that commit attempted to consider all possible combinations of indexable join clauses, but if said clauses join to enough different relations, there's an exponential increase in the number of outer-relation sets considered. In Brian's example, all the clauses come from the same equivalence class, which means it's redundant to use more than one of them in an indexscan anyway. So we can prevent the problem in this class of cases (which is probably the majority of real examples) by rejecting combinations that would only serve to add a known-redundant clause. But that still leaves us exposed to exponential growth of planning time when the query has a lot of non-equivalence join clauses that are usable with the same index. I chose to prevent such cases by setting an upper limit on the number of relation sets considered, equal to ten times the number of index clauses considered so far. (This sliding limit still allows new relsets to be added on as we move to additional index columns, which is probably more important than considering even more combinations of clauses for the previous column.) This should keep the amount of work done roughly linear rather than exponential in the apparent query complexity. This part of the fix is pretty ad-hoc; but without a clearer idea of real-world cases for which this would result in markedly inferior plans, it's hard to see how to do better.	2012-11-01 14:08:48 -04:00
Tom Lane	ec397c9099	Document that TCP keepalive settings read as 0 on Unix-socket connections. Per bug #7631 from Rob Johnson. The code is operating as designed, but the docs didn't explain it.	2012-10-31 14:26:33 -04:00
Alvaro Herrera	fb3590dde0	Fix ALTER EXTENSION / SET SCHEMA In its original conception, it was leaving some objects into the old schema, but without their proper pg_depend entries; this meant that the old schema could be dropped, causing future pg_dump calls to fail on the affected database. This was originally reported by Jeff Frost as #6704; there have been other complaints elsewhere that can probably be traced to this bug. To fix, be more consistent about altering a table's subsidiary objects along the table itself; this requires some restructuring in how tables are relocated when altering an extension -- hence the new AlterTableNamespaceInternal routine which encapsulates it for both the ALTER TABLE and the ALTER EXTENSION cases. There was another bug lurking here, which was unmasked after fixing the previous one: certain objects would be reached twice via the dependency graph, and the second attempt to move them would cause the entire operation to fail. Per discussion, it seems the best fix for this is to do more careful tracking of objects already moved: we now maintain a list of moved objects, to avoid attempting to do it twice for the same object. Authors: Alvaro Herrera, Dimitri Fontaine Reviewed by Tom Lane	2012-10-31 10:48:41 -03:00
Tom Lane	7e951ba6e1	Prefer actual constants to pseudo-constants in equivalence class machinery. generate_base_implied_equalities_const() should prefer plain Consts over other em_is_const eclass members when choosing the "pivot" value that all the other members will be equated to. This makes it more likely that the generated equalities will be useful in constraint-exclusion proofs. Per report from Rushabh Lathia.	2012-10-26 14:19:39 -04:00
Tom Lane	725fa25e20	In pg_dump, dump SEQUENCE SET items in the data not pre-data section. Represent a sequence's current value as a separate TableDataInfo dumpable object, so that it can be dumped within the data section of the archive rather than in pre-data. This fixes an undesirable inconsistency between the meanings of "--data-only" and "--section=data", and also fixes dumping of sequences that are marked as extension configuration tables, as per a report from Marko Kreen back in July. The main cost is that we do one more SQL query per sequence, but that's probably not very meaningful in most databases. Back-patch to 9.1, since it has the extension configuration issue even though not the --section switch.	2012-10-26 12:12:48 -04:00
Tom Lane	1dec7c7c6c	Prevent parser from believing that views have system columns. Views should not have any pg_attribute entries for system columns. However, we forgot to remove such entries when converting a table to a view. This could lead to crashes later on, if someone attempted to reference such a column, as reported by Kohei KaiGai. This problem is corrected properly in HEAD (by removing the pg_attribute entries during conversion), but in the back branches we need to defend against existing mis-converted views. This fix costs us an extra syscache lookup per system column reference, which is annoying but probably not really measurable in the big scheme of things.	2012-10-24 14:53:49 -04:00
Kevin Grittner	523ecaf404	Correct predicate locking for DROP INDEX CONCURRENTLY. For the non-concurrent case there is an AccessExclusiveLock lock on both the index and the heap at a time during which no other process is using either, before which the index is maintained and used for scans, and after which the index is no longer used or maintained. Predicate locks can safely be moved from the index to the related heap relation under the protection of these locks. This was done prior to the introductin of DROP INDEX CONCURRENTLY and continues to be done for non-concurrent index drops. For concurrent index drops, the predicate locks must be moved when there are no index scans in progress on that index and no more can subsequently start, and before heap inserts stop maintaining the index. As long as these conditions are guaranteed when the TransferPredicateLocksToHeapRelation() function is called, stronger locks are not needed for correctness. Kevin Grittner based on questions by Tom Lane in reviewing the DROP INDEX CONCURRENTLY patch and in cooperation with Andres Freund and Simon Riggs. Back-patch of commit 4c9d0901f135d724a9f3cfa4140a5afd44b10f08	2012-10-21 17:26:32 -05:00
Tom Lane	a4ef1f09bd	Fix pg_dump's handling of DROP DATABASE commands in --clean mode. In commit 4317e0246c645f60c39e6572644cff1cb03b4c65, I accidentally broke this behavior while rearranging code to ensure that --create wouldn't affect whether a DATABASE entry gets put into archive-format output. Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is either useless or dangerous depending on the usage scenario. It should not do that, and no longer does. A bright spot is that this refactoring makes it easy to allow the combination of --clean and --create to work sensibly, ie, emit DROP DATABASE then CREATE DATABASE before reconnecting. Ordinarily we'd consider that a feature addition and not back-patch it, but it seems silly to not include the extra couple of lines required in the 9.2 version of the code. Per report from Guillaume Lelarge, though this is slightly more extensive than his proposed patch.	2012-10-20 16:58:42 -04:00
Tom Lane	12b721a7f0	Fix UtilityContainsQuery() to handle CREATE TABLE AS EXECUTE correctly. The code seems to have been written to handle the pre-parse-analysis representation, where an ExecuteStmt would appear directly under CreateTableAsStmt. But in reality the function is only run on already-parse-analyzed statements, so there will be a Query node in between. We'd not noticed the bug because the function is generally not used at all except in extended query protocol. Per report from Robert Haas and Rushabh Lathia.	2012-10-19 18:33:53 -04:00
Tom Lane	8a3249f124	Fix hash_search to avoid corruption of the hash table on out-of-memory. An out-of-memory error during expand_table() on a palloc-based hash table would leave a partially-initialized entry in the table. This would not be harmful for transient hash tables, since they'd get thrown away anyway at transaction abort. But for long-lived hash tables, such as the relcache hash, this would effectively corrupt the table, leading to crash or other misbehavior later. To fix, rearrange the order of operations so that table enlargement is attempted before we insert a new entry, rather than after adding it to the hash table. Problem discovered by Hitoshi Harada, though this is a bit different from his proposed patch.	2012-10-19 15:24:10 -04:00
Tom Lane	645984e40c	Fix ruleutils to print "INSERT INTO foo DEFAULT VALUES" correctly. Per bug #7615 from Marko Tiikkaja. Apparently nobody ever tried this case before ...	2012-10-19 13:39:57 -04:00
Simon Riggs	c567535742	Fix orphan on cancel of drop index concurrently. Canceling DROP INDEX CONCURRENTLY during wait could allow an orphaned index to be left behind which could not be dropped. Backpatch to 9.2 Andres Freund, tested by Abhijit Menon-Sen	2012-10-19 09:57:32 +01:00
Andrew Dunstan	f61013a438	Use a more portable platform test.	2012-10-18 16:15:49 -04:00
Heikki Linnakangas	e7bab081a9	Further tweaking of the readfile() function in pg_ctl. Don't leak a file descriptor if the file is empty or we can't read its size. Expect there to be a newline at the end of the last line, too. If there isn't, ignore anything after the last newline. This makes it a tiny bit more robust in case the file is appended to concurrently, so that we don't return the last line if it hasn't been fully written yet. And this makes the code a bit less obscure, anyway. Per Tom Lane's suggestion. Backpatch to all supported branches.	2012-10-18 22:30:38 +03:00
Simon Riggs	623e49c0c0	Isolation test for DROP INDEX CONCURRENTLY for recent concurrent changes. Abhijit Menon-Sen	2012-10-18 19:44:13 +01:00
Simon Riggs	5da1c4b7cc	Re-think guts of DROP INDEX CONCURRENTLY. Concurrent behaviour was flawed when using a two-step process, so add an additional phase of processing to ensure concurrency for both SELECTs and INSERT/UPDATE/DELETEs. Backpatch to 9.2 Andres Freund, tweaked by me	2012-10-18 19:05:14 +01:00
Tom Lane	0237b39452	Fix planning of non-strict equivalence clauses above outer joins. If a potential equivalence clause references a variable from the nullable side of an outer join, the planner needs to take care that derived clauses are not pushed to below the outer join; else they may use the wrong value for the variable. (The problem arises only with non-strict clauses, since if an upper clause can be proven strict then the outer join will get simplified to a plain join.) The planner attempted to prevent this type of error by checking that potential equivalence clauses aren't outerjoin-delayed as a whole, but actually we have to check each side separately, since the two sides of the clause will get moved around separately if it's treated as an equivalence. Bugs of this type can be demonstrated as far back as 7.4, even though releases before 8.3 had only a very ad-hoc notion of equivalence clauses. In addition, we neglected to account for the possibility that such clauses might have nonempty nullable_relids even when not outerjoin-delayed; so the equivalence-class machinery lacked logic to compute correct nullable_relids values for clauses it constructs. This oversight was harmless before 9.2 because we were only using RestrictInfo.nullable_relids for OR clauses; but as of 9.2 it could result in pushing constructed equivalence clauses to incorrect places. (This accounts for bug #7604 from Bill MacArthur.) Fix the first problem by adding a new test check_equivalence_delay() in distribute_qual_to_rels, and fix the second one by adding code in equivclass.c and called functions to set correct nullable_relids for generated clauses. Although I believe the second part of this is not currently necessary before 9.2, I chose to back-patch it anyway, partly to keep the logic similar across branches and partly because it seems possible we might find other reasons why we need valid values of nullable_relids in the older branches. Add regression tests illustrating these problems. In 9.0 and up, also add test cases checking that we can push constants through outer joins, since we've broken that optimization before and I nearly broke it again with an overly simplistic patch for this problem.	2012-10-18 12:30:25 -04:00
Simon Riggs	1260912703	Revert tests for drop index concurrently.	2012-10-18 15:26:02 +01:00
Simon Riggs	d4412fa0e5	Add isolation tests for DROP INDEX CONCURRENTLY. Backpatch to 9.2 to ensure bugs are fixed. Abhijit Menon-Sen	2012-10-18 13:40:10 +01:00
Tom Lane	d7598aeea9	Close un-owned SMgrRelations at transaction end. If an SMgrRelation is not "owned" by a relcache entry, don't allow it to live past transaction end. This design allows the same SMgrRelation to be used for blind writes of multiple blocks during a transaction, but ensures that we don't hold onto such an SMgrRelation indefinitely. Because an SMgrRelation typically corresponds to open file descriptors at the fd.c level, leaving it open when there's no corresponding relcache entry can mean that we prevent the kernel from reclaiming deleted disk space. (While CacheInvalidateSmgr messages usually fix that, there are cases where they're not issued, such as DROP DATABASE. We might want to add some more sinval messaging for that, but I'd be inclined to keep this type of logic anyway, since allowing VFDs to accumulate indefinitely for blind-written relations doesn't seem like a good idea.) This code replaces a previous attempt towards the same goal that proved to be unreliable. Back-patch to 9.1 where the previous patch was added.	2012-10-17 12:38:28 -04:00
Tom Lane	a1f064fc2b	Revert "Use "transient" files for blind writes, take 2". This reverts commit fba105b1099f4f5fa7283bb17cba6fed2baa8d0c. That approach had problems with the smgr-level state not tracking what we really want to happen, and with the VFD-level state not tracking the smgr-level state very well either. In consequence, it was still possible to hold kernel file descriptors open for long-gone tables (as in recent report from Tore Halset), and yet there were also cases of FDs being closed undesirably soon. A replacement implementation will follow.	2012-10-17 12:37:15 -04:00

1 2 3 4 5 ...

34053 Commits