This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.
In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.
Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
FK constraints that are marked NOT VALID may later be VALIDATED, which uses an
ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced
table. Significantly reduces lock strength and duration when adding FKs.
New state visible from psql.
Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas
There isn't any need to track this state on a table-wide basis, and trying
to do so introduces undesirable semantic fuzziness. Move the flag to
pg_index, where it clearly describes just a single index and can be
immutable after index creation.
Toast tables have identical pg_class.oid and pg_class.relfilenode, but
for clarity it is good to preserve the pg_class.oid.
Update comments regarding what is preserved, and do some
variable/function renaming for clarity.
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp. It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.
For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.
This is intended as infrastructure for the upcoming unlogged tables
patch, as well as for future possible work on global temporary tables.
After a SQL object is created, we provide an opportunity for security
or logging plugins to get control; for example, a security label provider
could use this to assign an initial security label to newly created
objects. The basic infrastructure is (hopefully) reusable for other types
of events that might require similar treatment.
KaiGai Kohei, with minor adjustments.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.
Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1. It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.
Review by Jaime Casanova and Tom Lane.
be added during GRANT and can only be removed during REVOKE; and fix its
callers to not lie to it about the existing set of dependencies when
instantiating a formerly-default ACL. The previous coding accidentally failed
to malfunction so long as default ACLs contain only references to the object's
owning role, because that role is ignored by updateAclDependencies. However
this is obviously pretty fragile, as well as being an undocumented assumption.
The new coding is a few lines longer but IMO much clearer.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
of shared or nailed system catalogs. This has two key benefits:
* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
* We no longer have to use an unsafe reindex-in-place approach for reindexing
shared catalogs.
CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.
Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.
This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch. As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
the relfilenode of currently-not-relocatable system catalogs.
1. Get rid of inval.c's dependency on relfilenode, by not having it emit
smgr invalidations as a result of relcache flushes. Instead, smgr sinval
messages are sent directly from smgr.c when an actual relation delete or
truncate is done. This makes considerably more structural sense and allows
elimination of a large number of useless smgr inval messages that were
formerly sent even in cases where nothing was changing at the
physical-relation level. Note that this reintroduces the concept of
nontransactional inval messages, but that's okay --- because the messages
are sent by smgr.c, they will be sent in Hot Standby slaves, just from a
lower logical level than before.
2. Move setNewRelfilenode out of catalog/index.c, where it never logically
belonged, into relcache.c; which is a somewhat debatable choice as well but
better than before. (I considered catalog/storage.c, but that seemed too
low level.) Rename to RelationSetNewRelfilenode.
3. Cosmetic cleanups of some other relfilenode manipulations.
Attributes can now have options, just as relations and tablespaces do, and
the reloptions code is used to parse, validate, and store them. For
simplicity and because these options are not performance critical, we store
them in a separate cache rather than the main relcache.
Thanks to Alex Hunsaker for the review.
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.
autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
Modify pg_dump --binary-upgrade and add backend support routines to
support the preservation of pg_type oids when doing a binary upgrade.
This allows user-defined composite types and arrays to be binary
upgraded.
support any indexable commutative operator, not just equality. Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.
Jeff Davis, reviewed by Robert Haas
the privileges that will be applied to subsequently-created objects.
Such adjustments are always per owning role, and can be restricted to objects
created in particular schemas too. A notable benefit is that users can
override the traditional default privilege settings, eg, the PUBLIC EXECUTE
privilege traditionally granted by default for functions.
Petr Jelinek
hand-assigned rowtype OIDs, even when they are not "bootstrapped" catalogs
that have handmade type rows in pg_type.h. Give pg_database such an OID.
Restore the availability of C macros for the rowtype OIDs of the bootstrapped
catalogs. (These macros are now in the individual catalogs' .h files,
though, not in pg_type.h.)
This commit doesn't do anything especially useful by itself, but it's
necessary infrastructure for reverting some ill-considered changes in
relcache.c.
or previously truncated in the current (sub)transaction. This is safe since
if the (sub)transaction later rolls back, we'd just discard the rel's current
physical file anyway. This avoids unreasonable growth in the number of
transient files when a relation is repeatedly truncated. Per a performance
gripe a couple weeks ago from Todd Cook.
This was foreseen to be a good idea long ago, but nobody had got round
to doing it. The recent patch for deferred unique constraints made
transformConstraintAttrs() ugly enough that I decided it was time.
This change will also greatly simplify parsing of deferred CHECK constraints,
if anyone ever gets around to implementing that.
While at it, add a location field to Constraint, and use that to provide
an error cursor for some of the constraint-related error messages.
conindid is the index supporting a constraint. We can use this not only for
unique/primary-key constraints, but also foreign-key constraints, which
depend on the unique index that constrains the referenced columns.
tgconstrindid is just copied from the constraint's conindid field, or is
zero for triggers not associated with constraints.
This is mainly intended as infrastructure for upcoming patches, but it has
some virtue in itself, since it exposes a relationship that you formerly
had to grovel in pg_depend to determine. I simplified one information_schema
view accordingly. (There is a pg_dump query that could also use conindid,
but I left it alone because it wasn't clear it'd get any faster.)
Stefan Kaltenbrunner. The most reasonable behavior (at least for the near
term) seems to be to ignore the PlaceHolderVar and examine its argument
instead. In support of this, change the API of pull_var_clause() to allow
callers to request recursion into PlaceHolderVars. Currently
estimate_num_groups() is the only customer for that behavior, but where
there's one there may be others.
relations (including a temp table's indexes and toast table/index), and
false for normal relations. For ease of checking, this commit just adds
the column and fills it correctly --- revising the relation access machinery
to use it will come separately.
TABLE: if the command is executed by someone other than the table owner (eg,
a superuser) and the table has a toast table, the toast table's pg_type row
ends up with the wrong typowner, ie, the command issuer not the table owner.
This is quite harmless for most purposes, since no interesting permissions
checks consult the pg_type row. However, it could lead to unexpected failures
if one later tries to drop the role that issued the command (in 8.1 or 8.2),
or strange warnings from pg_dump afterwards (in 8.3 and up, which will allow
the DROP ROLE because we don't create a "redundant" owner dependency for table
rowtypes). Problem identified by Cott Lang.
Back-patch to 8.1. The problem is actually far older --- the CLUSTER variant
can be demonstrated in 7.0 --- but it's mostly cosmetic before 8.1 because we
didn't track ownership dependencies before 8.1. Also, fixing it before 8.1
would require changing the call signature of heap_create_with_catalog(), which
seems to carry a nontrivial risk of breaking add-on modules.
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.
This will make it easier to add new forks with similar truncation logic,
like the visibility map.
heap_form_tuple. Since this removes the last remaining caller of
heap_addheader, remove it.
Extracted from the column privileges patch from Stephen Frost, with further
code cleanups by me.
("there might be triggers") rather than an exact count. This is necessary
catalog infrastructure for the upcoming patch to reduce the strength of
locking needed for trigger addition/removal. Split out and committed
separately for ease of reviewing/testing.
In passing, also get rid of the unused pg_class columns relukeys, relfkeys,
and relrefs, which haven't been maintained in many years and now have no
chance of ever being maintained (because of wishing to avoid locking).
Simon Riggs
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values). Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.
Kris Jurka
lengthof(SysAtt) not FirstLowInvalidHeapAttributeNumber, for consistency with
the other uses of the SysAtt array, and to make it clearer that it doesn't
walk off the end of that array.
free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).
This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.
Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.
most node types used in expression trees (both before and after parse
analysis). This allows us to place an error cursor in many situations
where we formerly could not, because the information wasn't available
beyond the very first level of parse analysis. There's a fair amount
of work still to be done to persuade individual ereport() calls to actually
include an error location, but this gets the initdb-forcing part of the
work out of the way; and the situation is already markedly better than
before for complaints about unimplementable implicit casts, such as
CASE and UNION constructs with incompatible alternative data types.
Per my proposal of a few days ago.
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend. There's probably more that should be done along this line,
but this is a start anyway.