Introduce the concept of read-only StringInfos

There were various places in our codebase which conjured up a StringInfo by manually assigning the StringInfo fields and setting the data field to point to some existing buffer. There wasn't much consistency here as to what fields like maxlen got set to and in one location we didn't correctly ensure that the buffer was correctly NUL terminated at len bytes, as per what was documented as required in stringinfo.h Here we introduce 2 new functions to initialize StringInfos. One allows callers to initialize a StringInfo passing along a buffer that is already allocated by palloc. Here the StringInfo code uses this buffer directly rather than doing any memcpying into a new allocation. Having this as a function allows us to verify the buffer is correctly NUL terminated. StringInfos initialized this way can be appended to and reset just like any other normal StringInfo. The other new initialization function also accepts an existing buffer, but the given buffer does not need to be a pointer to a palloc'd chunk. This buffer could be a pointer pointing partway into some palloc'd chunk or may not even be palloc'd at all. StringInfos initialized this way are deemed as "read-only". This means that it's not possible to append to them or reset them. For the latter of the two new initialization functions mentioned above, we relax the requirement that the data buffer must be NUL terminated. Relaxing this requirement is convenient in a few places as it can save us from having to allocate an entire new buffer just to add the NUL terminator or save us from having to temporarily add a NUL only to have to put the original char back again later. Incompatibility note: Here we also forego adding the NUL in a few places where it does not seem to be required. These locations are passing the given StringInfo into a type's receive function. It does not seem like any of our built-in receive functions require this, but perhaps there's some UDT out there in the wild which does require this. It is likely worthy of a mention in the release notes that a UDT's receive function mustn't rely on the input StringInfo being NUL terminated. Author: David Rowley Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
Prevent duplicate RTEPermissionInfo for plain-inheritance parents
2025-07-22 00:01:40 -04:00 · 2023-10-26 16:31:48 +13:00 · 2023-10-26 11:53:56 +09:00 · 2023-10-26 07:06:55 +05:30 · 2023-10-25 17:34:51 -04:00 · 2023-10-25 16:26:59 -05:00
45 changed files with 3529 additions and 218 deletions
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@ -5306,6 +5306,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
      </listitem>
     </varlistentry>

+     <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+      <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the query planner's optimization which analyses
+        the query tree and replaces self joins with semantically equivalent
+        single scans.  Takes into consideration only plain tables.
+        The default is <literal>on</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
     <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
      <term><varname>enable_seqscan</varname> (<type>boolean</type>)
      <indexterm>
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@ -412,12 +412,6 @@ EXEC SQL DISCONNECT <optional><replaceable>connection</replaceable></optional>;
     </simpara>
    </listitem>

-    <listitem>
-     <simpara>
-      <literal>DEFAULT</literal>
-     </simpara>
-    </listitem>
-
    <listitem>
     <simpara>
      <literal>CURRENT</literal>
@ -7130,7 +7124,6 @@ EXEC SQL DEALLOCATE DESCRIPTOR mydesc;
 <synopsis>
 DISCONNECT <replaceable class="parameter">connection_name</replaceable>
 DISCONNECT [ CURRENT ]
-DISCONNECT DEFAULT
 DISCONNECT ALL
 </synopsis>
   </refsynopsisdiv>
@ -7171,15 +7164,6 @@ DISCONNECT ALL
      </listitem>
     </varlistentry>

-     <varlistentry id="ecpg-sql-disconnect-default">
-      <term><literal>DEFAULT</literal></term>
-      <listitem>
-       <para>
-        Close the default connection.
-       </para>
-      </listitem>
-     </varlistentry>
-
     <varlistentry id="ecpg-sql-disconnect-all">
      <term><literal>ALL</literal></term>
      <listitem>
@ -7198,13 +7182,11 @@ DISCONNECT ALL
 int
 main(void)
 {
-    EXEC SQL CONNECT TO testdb AS DEFAULT USER testuser;
    EXEC SQL CONNECT TO testdb AS con1 USER testuser;
    EXEC SQL CONNECT TO testdb AS con2 USER testuser;
    EXEC SQL CONNECT TO testdb AS con3 USER testuser;

    EXEC SQL DISCONNECT CURRENT;  /* close con3          */
-    EXEC SQL DISCONNECT DEFAULT;  /* close DEFAULT       */
    EXEC SQL DISCONNECT ALL;      /* close con2 and con1 */

    return 0;
@ -7772,11 +7754,11 @@ SET CONNECTION [ TO | = ] <replaceable class="parameter">connection_name</replac
      </listitem>
     </varlistentry>

-     <varlistentry id="ecpg-sql-set-connection-default">
-      <term><literal>DEFAULT</literal></term>
+     <varlistentry id="ecpg-sql-set-connection-current">
+      <term><literal>CURRENT</literal></term>
      <listitem>
       <para>
-        Set the connection to the default connection.
+        Set the connection to the current connection (thus, nothing happens).
       </para>
      </listitem>
     </varlistentry>
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@ -383,6 +383,79 @@ make prefix=/usr/local/pgsql.new install
    </para>
   </step>

+   <step>
+    <title>Prepare for publisher upgrades</title>
+
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     slots. This helps avoid the need for manually defining the same
+     logical slots on the new publisher. Migration of logical slots is
+     only supported when the old cluster is version 17.0 or later.
+     Logical slots on clusters before version 17.0 will silently be
+     ignored.
+    </para>
+
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     Re-enable the subscription after the upgrade.
+    </para>
+
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the logical slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-wal-level"><varname>wal_level</varname></link> as
+       <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to a value greater than or equal to the number of slots
+       present in the old cluster.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed in the new PostgreSQL executable directory.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The old cluster has replicated all the transactions and logical decoding
+       messages to subscribers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>conflicting</structfield>
+       is <literal>true</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The new cluster must not have permanent logical slots, i.e.,
+       there must be no slots where
+       <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>temporary</structfield>
+       is <literal>false</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+   </step>
+
   <step>
    <title>Stop both servers</title>

@ -650,8 +723,9 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
       Configure the servers for log shipping.  (You do not need to run
       <function>pg_backup_start()</function> and <function>pg_backup_stop()</function>
       or take a file system backup as the standbys are still synchronized
-       with the primary.)  Replication slots are not copied and must
-       be recreated.
+       with the primary.)  Only logical slots on the primary are copied to the
+       new standby, but other slots on the old standby are not copied so must
+       be recreated manually.
      </para>
     </step>

--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@ -29,14 +29,11 @@
  <title>Index Methods and Operator Classes</title>

  <para>
-   The <classname>pg_am</classname> table contains one row for every
-   index method (internally known as access method).  Support for
-   regular access to tables is built into
-   <productname>PostgreSQL</productname>, but all index methods are
-   described in <classname>pg_am</classname>.  It is possible to add a
-   new index access method by writing the necessary code and
-   then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   Operator classes are associated with an index access method, such
+   as <link linkend="btree">B-Tree</link>
+   or <link linkend="gin">GIN</link>. Custom index access method may be
+   defined with <xref linkend="sql-create-access-method"/>. See
+   <xref linkend="indexam"/> for details.
  </para>

  <para>
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@ -3494,6 +3494,22 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 							  List *restrictlist,
 							  List *exprlist, List *oprlist)
+{
+	return relation_has_unique_index_ext(root, rel, restrictlist,
+										 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ *	  Same as relation_has_unique_index_for(), but supports extra_clauses
+ *	  parameter.  If extra_clauses isn't NULL, return baserestrictinfo clauses
+ *	  which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+							  List *restrictlist,
+							  List *exprlist, List *oprlist,
+							  List **extra_clauses)
 {
 	ListCell   *ic;

@ -3549,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List	   *exprs = NIL;

 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@ -3600,6 +3617,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 				if (match_index_to_operand(rexpr, c, ind))
 				{
 					matched = true; /* column is unique */
+
+					if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+					{
+						MemoryContext oldMemCtx =
+							MemoryContextSwitchTo(root->planner_cxt);
+
+						/*
+						 * Add filter clause into a list allowing caller to
+						 * know if uniqueness have made not only by join
+						 * clauses.
+						 */
+						Assert(bms_is_empty(rinfo->left_relids) ||
+							   bms_is_empty(rinfo->right_relids));
+						if (extra_clauses)
+							exprs = lappend(exprs, rinfo);
+						MemoryContextSwitchTo(oldMemCtx);
+					}
+
 					break;
 				}
 			}
@ -3642,7 +3677,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,

 		/* Matched all key columns of this index? */
 		if (c == ind->nkeycolumns)
+		{
+			if (extra_clauses)
+				*extra_clauses = exprs;
 			return true;
+		}
 	}

 	return false;
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@ -231,6 +231,11 @@ query_planner(PlannerInfo *root,
 	 */
 	reduce_unique_semijoins(root);

+	/*
+	 * Remove self joins on a unique column.
+	 */
+	joinlist = remove_useless_self_joins(root, joinlist);
+
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@ -494,13 +494,8 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 		childrte->inh = false;
 	childrte->securityQuals = NIL;

-	/*
-	 * No permission checking for the child RTE unless it's the parent
-	 * relation in its child role, which only applies to traditional
-	 * inheritance.
-	 */
-	if (childOID != parentOID)
-		childrte->perminfoindex = 0;
+	/* No permission checking for child RTEs. */
+	childrte->perminfoindex = 0;

 	/* Link not-yet-fully-filled child RTE into data structures */
 	parse->rtable = lappend(parse->rtable, childrte);
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@ -774,10 +774,7 @@ LogicalParallelApplyLoop(shm_mq_handle *mqh)
 			if (len == 0)
 				elog(ERROR, "invalid message length");

-			s.cursor = 0;
-			s.maxlen = -1;
-			s.data = (char *) data;
-			s.len = len;
+			initReadOnlyStringInfo(&s, data, len);

 			/*
 			 * The first byte of messages sent from leader apply worker to
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@ -600,12 +600,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)

 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);

-	/*
-	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding messages.
-	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	/* If we don't have snapshot, there is no point in decoding messages */
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;

 	message = (xl_logical_message *) XLogRecGetData(r);
@ -622,6 +618,26 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			  SnapBuildXactNeedsSkip(builder, buf->origptr)))
 		return;

+	/*
+	 * We also skip decoding in fast_forward mode. This check must be last
+	 * because we don't want to set the processing_required flag unless we
+	 * have a decodable message.
+	 */
+	if (ctx->fast_forward)
+	{
+		/*
+		 * We need to set processing_required flag to notify the message's
+		 * existence to the caller. Usually, the flag is set when either the
+		 * COMMIT or ABORT records are decoded, but this must be turned on
+		 * here because the non-transactional logical message is decoded
+		 * without waiting for these records.
+		 */
+		if (!message->transactional)
+			ctx->processing_required = true;
+
+		return;
+	}
+
 	/*
 	 * If this is a non-transactional change, get the snapshot we're expected
 	 * to use. We only get here when the snapshot is consistent, and the
@ -1286,7 +1302,21 @@ static bool
 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 				  Oid txn_dbid, RepOriginId origin_id)
 {
-	return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
-			(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
-			ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+	if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+		(txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+		FilterByOrigin(ctx, origin_id))
+		return true;
+
+	/*
+	 * We also skip decoding in fast_forward mode. In passing set the
+	 * processing_required flag to indicate that if it were not for
+	 * fast_forward mode, processing would have been required.
+	 */
+	if (ctx->fast_forward)
+	{
+		ctx->processing_required = true;
+		return true;
+	}
+
+	return false;
 }
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@ -29,6 +29,7 @@
 #include "postgres.h"

 #include "access/xact.h"
+#include "access/xlogutils.h"
 #include "access/xlog_internal.h"
 #include "fmgr.h"
 #include "miscadmin.h"
@ -41,6 +42,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/inval.h"
 #include "utils/memutils.h"

 /* data for errcontext callback */
@ -1949,3 +1951,76 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
+ */
+bool
+LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
+{
+	bool		has_pending_wal = false;
+
+	Assert(MyReplicationSlot);
+
+	PG_TRY();
+	{
+		LogicalDecodingContext *ctx;
+
+		/*
+		 * Create our decoding context in fast_forward mode, passing start_lsn
+		 * as InvalidXLogRecPtr, so that we start processing from the slot's
+		 * confirmed_flush.
+		 */
+		ctx = CreateDecodingContext(InvalidXLogRecPtr,
+									NIL,
+									true,	/* fast_forward */
+									XL_ROUTINE(.page_read = read_local_xlog_page,
+											   .segment_open = wal_segment_open,
+											   .segment_close = wal_segment_close),
+									NULL, NULL, NULL);
+
+		/*
+		 * Start reading at the slot's restart_lsn, which we know points to a
+		 * valid record.
+		 */
+		XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);
+
+		/* Invalidate non-timetravel entries */
+		InvalidateSystemCaches();
+
+		/* Loop until the end of WAL or some changes are processed */
+		while (!has_pending_wal && ctx->reader->EndRecPtr < end_of_wal)
+		{
+			XLogRecord *record;
+			char	   *errm = NULL;
+
+			record = XLogReadRecord(ctx->reader, &errm);
+
+			if (errm)
+				elog(ERROR, "could not find record for logical decoding: %s", errm);
+
+			if (record != NULL)
+				LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+			has_pending_wal = ctx->processing_required;
+
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/* Clean up */
+		FreeDecodingContext(ctx);
+		InvalidateSystemCaches();
+	}
+	PG_CATCH();
+	{
+		/* clear all timetravel entries */
+		InvalidateSystemCaches();
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	return has_pending_wal;
+}
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@ -879,6 +879,7 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData *tuple)
 	/* Read the data */
 	for (i = 0; i < natts; i++)
 	{
+		char	   *buff;
 		char		kind;
 		int			len;
 		StringInfo	value = &tuple->colvalues[i];
@ -899,19 +900,18 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData *tuple)
 				len = pq_getmsgint(in, 4);	/* read length */

 				/* and data */
-				value->data = palloc(len + 1);
-				pq_copymsgbytes(in, value->data, len);
+				buff = palloc(len + 1);
+				pq_copymsgbytes(in, buff, len);

 				/*
-				 * Not strictly necessary for LOGICALREP_COLUMN_BINARY, but
-				 * per StringInfo practice.
+				 * NUL termination is required for LOGICALREP_COLUMN_TEXT mode
+				 * as input functions require that.  For
+				 * LOGICALREP_COLUMN_BINARY it's not technically required, but
+				 * it's harmless.
 				 */
-				value->data[len] = '\0';
+				buff[len] = '\0';

-				/* make StringInfo fully valid */
-				value->len = len;
-				value->cursor = 0;
-				value->maxlen = len;
+				initStringInfoFromString(value, buff, len);
 				break;
 			default:
 				elog(ERROR, "unrecognized data representation type '%c'", kind);
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@ -3582,10 +3582,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 					/* Ensure we are reading the data into our memory context. */
 					MemoryContextSwitchTo(ApplyMessageContext);

-					s.data = buf;
-					s.len = len;
-					s.cursor = 0;
-					s.maxlen = -1;
+					initReadOnlyStringInfo(&s, buf, len);

 					c = pq_getmsgbyte(&s);

--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@ -1423,6 +1423,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,

 		SpinLockRelease(&s->mutex);

+		/*
+		 * The logical replication slots shouldn't be invalidated as
+		 * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+		 *
+		 * The following is just a sanity check.
+		 */
+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));
+		}
+
 		if (active_pid != 0)
 		{
 			/*
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@ -1817,23 +1817,19 @@ exec_bind_message(StringInfo input_message)

 			if (!isNull)
 			{
-				const char *pvalue = pq_getmsgbytes(input_message, plength);
+				char	   *pvalue;

 				/*
-				 * Rather than copying data around, we just set up a phony
+				 * Rather than copying data around, we just initialize a
 				 * StringInfo pointing to the correct portion of the message
-				 * buffer.  We assume we can scribble on the message buffer so
-				 * as to maintain the convention that StringInfos have a
-				 * trailing null.  This is grotty but is a big win when
-				 * dealing with very large parameter strings.
+				 * buffer.  We assume we can scribble on the message buffer to
+				 * add a trailing NUL which is required for the input function
+				 * call.
 				 */
-				pbuf.data = unconstify(char *, pvalue);
-				pbuf.maxlen = plength + 1;
-				pbuf.len = plength;
-				pbuf.cursor = 0;
-
-				csave = pbuf.data[plength];
-				pbuf.data[plength] = '\0';
+				pvalue = unconstify(char *, pq_getmsgbytes(input_message, plength));
+				csave = pvalue[plength];
+				pvalue[plength] = '\0';
+				initReadOnlyStringInfo(&pbuf, pvalue, plength);
 			}
 			else
 			{
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@ -784,7 +784,6 @@ array_agg_deserialize(PG_FUNCTION_ARGS)
 		{
 			int			itemlen;
 			StringInfoData elem_buf;
-			char		csave;

 			if (result->dnulls[i])
 			{
@ -799,28 +798,19 @@ array_agg_deserialize(PG_FUNCTION_ARGS)
 						 errmsg("insufficient data left in message")));

 			/*
-			 * Rather than copying data around, we just set up a phony
-			 * StringInfo pointing to the correct portion of the input buffer.
-			 * We assume we can scribble on the input buffer so as to maintain
-			 * the convention that StringInfos have a trailing null.
+			 * Rather than copying data around, we just initialize a
+			 * StringInfo pointing to the correct portion of the message
+			 * buffer.
 			 */
-			elem_buf.data = &buf.data[buf.cursor];
-			elem_buf.maxlen = itemlen + 1;
-			elem_buf.len = itemlen;
-			elem_buf.cursor = 0;
+			initReadOnlyStringInfo(&elem_buf, &buf.data[buf.cursor], itemlen);

 			buf.cursor += itemlen;

-			csave = buf.data[buf.cursor];
-			buf.data[buf.cursor] = '\0';
-
 			/* Now call the element's receiveproc */
 			result->dvalues[i] = ReceiveFunctionCall(&iodata->typreceive,
 													 &elem_buf,
 													 iodata->typioparam,
 													 -1);
-
-			buf.data[buf.cursor] = csave;
 		}
 	}

--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@ -1475,7 +1475,6 @@ ReadArrayBinary(StringInfo buf,
 	{
 		int			itemlen;
 		StringInfoData elem_buf;
-		char		csave;

 		/* Get and check the item length */
 		itemlen = pq_getmsgint(buf, 4);
@ -1494,21 +1493,13 @@ ReadArrayBinary(StringInfo buf,
 		}

 		/*
-		 * Rather than copying data around, we just set up a phony StringInfo
-		 * pointing to the correct portion of the input buffer. We assume we
-		 * can scribble on the input buffer so as to maintain the convention
-		 * that StringInfos have a trailing null.
+		 * Rather than copying data around, we just initialize a StringInfo
+		 * pointing to the correct portion of the message buffer.
 		 */
-		elem_buf.data = &buf->data[buf->cursor];
-		elem_buf.maxlen = itemlen + 1;
-		elem_buf.len = itemlen;
-		elem_buf.cursor = 0;
+		initReadOnlyStringInfo(&elem_buf, &buf->data[buf->cursor], itemlen);

 		buf->cursor += itemlen;

-		csave = buf->data[buf->cursor];
-		buf->data[buf->cursor] = '\0';
-
 		/* Now call the element's receiveproc */
 		values[i] = ReceiveFunctionCall(receiveproc, &elem_buf,
 										typioparam, typmod);
@ -1520,8 +1511,6 @@ ReadArrayBinary(StringInfo buf,
 					(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 					 errmsg("improper binary format in array element %d",
 							i + 1)));
-
-		buf->data[buf->cursor] = csave;
 	}

 	/*
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@ -17,6 +17,7 @@
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
 #include "utils/array.h"
 #include "utils/builtins.h"

@ -261,3 +262,46 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)

 	PG_RETURN_VOID();
 }
+
+/*
+ * Verify the given slot has already consumed all the WAL changes.
+ *
+ * Returns true if there are no decodable WAL records after the
+ * confirmed_flush_lsn. Otherwise false.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
+ */
+Datum
+binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
+{
+	Name		slot_name;
+	XLogRecPtr	end_of_wal;
+	bool		found_pending_wal;
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	/* We must check before dereferencing the argument */
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed");
+
+	CheckSlotPermissions();
+
+	slot_name = PG_GETARG_NAME(0);
+
+	/* Acquire the given slot */
+	ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+	Assert(SlotIsLogical(MyReplicationSlot));
+
+	/* Slots must be valid as otherwise we won't be able to scan the WAL */
+	Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+	end_of_wal = GetFlushRecPtr(NULL);
+	found_pending_wal = LogicalReplicationSlotHasPendingWal(end_of_wal);
+
+	/* Clean up */
+	ReplicationSlotRelease();
+
+	PG_RETURN_BOOL(!found_pending_wal);
+}
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@ -569,7 +569,6 @@ record_recv(PG_FUNCTION_ARGS)
 		int			itemlen;
 		StringInfoData item_buf;
 		StringInfo	bufptr;
-		char		csave;

 		/* Ignore dropped columns in datatype, but fill with nulls */
 		if (att->attisdropped)
@ -619,25 +618,19 @@ record_recv(PG_FUNCTION_ARGS)
 			/* -1 length means NULL */
 			bufptr = NULL;
 			nulls[i] = true;
-			csave = 0;			/* keep compiler quiet */
 		}
 		else
 		{
+			char	   *strbuff;
+
 			/*
-			 * Rather than copying data around, we just set up a phony
-			 * StringInfo pointing to the correct portion of the input buffer.
-			 * We assume we can scribble on the input buffer so as to maintain
-			 * the convention that StringInfos have a trailing null.
+			 * Rather than copying data around, we just initialize a
+			 * StringInfo pointing to the correct portion of the message
+			 * buffer.
 			 */
-			item_buf.data = &buf->data[buf->cursor];
-			item_buf.maxlen = itemlen + 1;
-			item_buf.len = itemlen;
-			item_buf.cursor = 0;
-
+			strbuff = &buf->data[buf->cursor];
 			buf->cursor += itemlen;
-
-			csave = buf->data[buf->cursor];
-			buf->data[buf->cursor] = '\0';
+			initReadOnlyStringInfo(&item_buf, strbuff, itemlen);

 			bufptr = &item_buf;
 			nulls[i] = false;
@ -667,8 +660,6 @@ record_recv(PG_FUNCTION_ARGS)
 						(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 						 errmsg("improper binary format in record column %d",
 								i + 1)));
-
-			buf->data[buf->cursor] = csave;
 		}
 	}

--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@ -1027,6 +1027,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enable removal of unique self-joins."),
+			NULL,
+			GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+		},
+		&enable_self_join_removal,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@ -96,7 +96,6 @@ static time_t start_time;
 static char postopts_file[MAXPGPATH];
 static char version_file[MAXPGPATH];
 static char pid_file[MAXPGPATH];
-static char backup_file[MAXPGPATH];
 static char promote_file[MAXPGPATH];
 static char logrotate_file[MAXPGPATH];

@ -2447,7 +2446,6 @@ main(int argc, char **argv)
 		snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
 		snprintf(version_file, MAXPGPATH, "%s/PG_VERSION", pg_data);
 		snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
-		snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);

 		/*
 		 * Set mask based on PGDATA permissions,
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32

+# required for 003_upgrade_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+
 subdir = src/bin/pg_upgrade
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@ -33,6 +33,8 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_new_cluster_logical_replication_slots(void);
+static void check_old_cluster_for_valid_slots(bool live_check);


 /*
@ -89,8 +91,11 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);

-	/* Extract a list of databases and tables from the old cluster */
-	get_db_and_rel_infos(&old_cluster);
+	/*
+	 * Extract a list of databases, tables, and logical replication slots from
+	 * the old cluster.
+	 */
+	get_db_rel_and_slot_infos(&old_cluster, live_check);

 	init_tablespaces();

@ -107,6 +112,13 @@ check_and_dump_old_cluster(bool live_check)
 	check_for_reg_data_type_usage(&old_cluster);
 	check_for_isn_and_int8_passing_mismatch(&old_cluster);

+	/*
+	 * Logical replication slots can be migrated since PG17. See comments atop
+	 * get_old_cluster_logical_slot_infos().
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
+		check_old_cluster_for_valid_slots(live_check);
+
 	/*
 	 * PG 16 increased the size of the 'aclitem' type, which breaks the
 	 * on-disk format for existing data.
@ -200,7 +212,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);

 	check_new_cluster_is_empty();

@ -223,6 +235,8 @@ check_new_cluster(void)
 	check_for_prepared_transactions(&new_cluster);

 	check_for_new_tablespace_dir();
+
+	check_new_cluster_logical_replication_slots();
 }


@ -1451,3 +1465,151 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
+
+/*
+ * check_new_cluster_logical_replication_slots()
+ *
+ * Verify that there are no logical replication slots on the new cluster and
+ * that the parameter settings necessary for creating slots are sufficient.
+ */
+static void
+check_new_cluster_logical_replication_slots(void)
+{
+	PGresult   *res;
+	PGconn	   *conn;
+	int			nslots_on_old;
+	int			nslots_on_new;
+	int			max_replication_slots;
+	char	   *wal_level;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+		return;
+
+	nslots_on_old = count_old_cluster_logical_slots();
+
+	/* Quick return if there are no logical slots to be migrated. */
+	if (nslots_on_old == 0)
+		return;
+
+	conn = connectToServer(&new_cluster, "template1");
+
+	prep_status("Checking for new cluster logical replication slots");
+
+	res = executeQueryOrDie(conn, "SELECT count(*) "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"temporary IS FALSE;");
+
+	if (PQntuples(res) != 1)
+		pg_fatal("could not count the number of logical replication slots");
+
+	nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+	if (nslots_on_new)
+		pg_fatal("Expected 0 logical replication slots but found %d.",
+				 nslots_on_new);
+
+	PQclear(res);
+
+	res = executeQueryOrDie(conn, "SELECT setting FROM pg_settings "
+							"WHERE name IN ('wal_level', 'max_replication_slots') "
+							"ORDER BY name DESC;");
+
+	if (PQntuples(res) != 2)
+		pg_fatal("could not determine parameter settings on new cluster");
+
+	wal_level = PQgetvalue(res, 0, 0);
+
+	if (strcmp(wal_level, "logical") != 0)
+		pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+				 wal_level);
+
+	max_replication_slots = atoi(PQgetvalue(res, 1, 0));
+
+	if (nslots_on_old > max_replication_slots)
+		pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of "
+				 "logical replication slots (%d) on the old cluster",
+				 max_replication_slots, nslots_on_old);
+
+	PQclear(res);
+	PQfinish(conn);
+
+	check_ok();
+}
+
+/*
+ * check_old_cluster_for_valid_slots()
+ *
+ * Verify that all the logical slots are valid and have consumed all the WAL
+ * before shutdown.
+ */
+static void
+check_old_cluster_for_valid_slots(bool live_check)
+{
+	char		output_path[MAXPGPATH];
+	FILE	   *script = NULL;
+
+	prep_status("Checking for valid logical replication slots");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_logical_replication_slots.txt");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+			/* Is the slot usable? */
+			if (slot->invalid)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script, "The slot \"%s\" is invalid\n",
+						slot->slotname);
+
+				continue;
+			}
+
+			/*
+			 * Do additional check to ensure that all logical replication
+			 * slots have consumed all the WAL before shutdown.
+			 *
+			 * Note: This can be satisfied only when the old cluster has been
+			 * shut down, so we skip this for live checks.
+			 */
+			if (!live_check && !slot->caught_up)
+			{
+				if (script == NULL &&
+					(script = fopen_priv(output_path, "w")) == NULL)
+					pg_fatal("could not open file \"%s\": %s",
+							 output_path, strerror(errno));
+
+				fprintf(script,
+						"The slot \"%s\" has not consumed the WAL yet\n",
+						slot->slotname);
+			}
+		}
+	}
+
+	if (script)
+	{
+		fclose(script);
+
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains logical replication slots that can't be upgraded.\n"
+				 "You can remove invalid slots and/or consume the pending WAL for other slots,\n"
+				 "and then restart the upgrade.\n"
+				 "A list of the problematic slots is in the file:\n"
+				 "    %s", output_path);
+	}
+
+	check_ok();
+}
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@ -46,7 +46,9 @@ library_name_compare(const void *p1, const void *p2)
 /*
 * get_loadable_libraries()
 *
- *	Fetch the names of all old libraries containing C-language functions.
+ *	Fetch the names of all old libraries containing either C-language functions
+ *	or are corresponding to logical replication output plugins.
+ *
 *	We will later check that they all exist in the new installation.
 */
 void
@ -55,6 +57,7 @@ get_loadable_libraries(void)
 	PGresult  **ress;
 	int			totaltups;
 	int			dbnum;
+	int			n_libinfos;

 	ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
 	totaltups = 0;
@ -81,7 +84,12 @@ get_loadable_libraries(void)
 		PQfinish(conn);
 	}

-	os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
+	/*
+	 * Allocate memory for required libraries and logical replication output
+	 * plugins.
+	 */
+	n_libinfos = totaltups + count_old_cluster_logical_slots();
+	os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) * n_libinfos);
 	totaltups = 0;

 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
@ -89,6 +97,7 @@ get_loadable_libraries(void)
 		PGresult   *res = ress[dbnum];
 		int			ntups;
 		int			rowno;
+		LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;

 		ntups = PQntuples(res);
 		for (rowno = 0; rowno < ntups; rowno++)
@ -101,6 +110,23 @@ get_loadable_libraries(void)
 			totaltups++;
 		}
 		PQclear(res);
+
+		/*
+		 * Store the names of output plugins as well. There is a possibility
+		 * that duplicated plugins are set, but the consumer function
+		 * check_loadable_libraries() will avoid checking the same library, so
+		 * we do not have to consider their uniqueness here.
+		 */
+		for (int slotno = 0; slotno < slot_arr->nslots; slotno++)
+		{
+			if (slot_arr->slots[slotno].invalid)
+				continue;
+
+			os_info.libraries[totaltups].name = pg_strdup(slot_arr->slots[slotno].plugin);
+			os_info.libraries[totaltups].dbnum = dbnum;
+
+			totaltups++;
+		}
 	}

 	pg_free(ress);
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@ -26,6 +26,8 @@ static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
 static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
+static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check);


 /*
@ -266,13 +268,15 @@ report_unmatched_relation(const RelInfo *rel, const DbInfo *db, bool is_new_db)
 }

 /*
- * get_db_and_rel_infos()
+ * get_db_rel_and_slot_infos()
 *
 * higher level routine to generate dbinfos for the database running
 * on the given "port". Assumes that server is already running.
+ *
+ * live_check would be used only when the target is the old cluster.
 */
 void
-get_db_and_rel_infos(ClusterInfo *cluster)
+get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check)
 {
 	int			dbnum;

@ -283,7 +287,17 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	get_db_infos(cluster);

 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-		get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
+	{
+		DbInfo	   *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+		get_rel_infos(cluster, pDbInfo);
+
+		/*
+		 * Retrieve the logical replication slots infos for the old cluster.
+		 */
+		if (cluster == &old_cluster)
+			get_old_cluster_logical_slot_infos(pDbInfo, live_check);
+	}

 	if (cluster == &old_cluster)
 		pg_log(PG_VERBOSE, "\nsource databases:");
@ -600,6 +614,125 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	dbinfo->rel_arr.nrels = num_rels;
 }

+/*
+ * get_old_cluster_logical_slot_infos()
+ *
+ * Gets the LogicalSlotInfos for all the logical replication slots of the
+ * database referred to by "dbinfo". The status of each logical slot is gotten
+ * here, but they are used at the checking phase. See
+ * check_old_cluster_for_valid_slots().
+ *
+ * Note: This function will not do anything if the old cluster is pre-PG17.
+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
+ * which can lead to data loss. It is still not guaranteed for manually created
+ * slots in PG17, so subsequent checks done in
+ * check_old_cluster_for_valid_slots() would raise a FATAL error if such slots
+ * are included.
+ */
+static void
+get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+	LogicalSlotInfo *slotinfos = NULL;
+	int			num_slots = 0;
+
+	/* Logical slots can be migrated since PG17. */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+	{
+		dbinfo->slot_arr.slots = slotinfos;
+		dbinfo->slot_arr.nslots = num_slots;
+		return;
+	}
+
+	conn = connectToServer(&old_cluster, dbinfo->db_name);
+
+	/*
+	 * Fetch the logical replication slot information. The check whether the
+	 * slot is considered caught up is done by an upgrade function. This
+	 * regards the slot as caught up if we don't find any decodable changes.
+	 * See binary_upgrade_logical_slot_has_caught_up().
+	 *
+	 * Note that we can't ensure whether the slot is caught up during
+	 * live_check as the new WAL records could be generated.
+	 *
+	 * We intentionally skip checking the WALs for invalidated slots as the
+	 * corresponding WALs could have been removed for such slots.
+	 *
+	 * The temporary slots are explicitly ignored while checking because such
+	 * slots cannot exist after the upgrade. During the upgrade, clusters are
+	 * started and stopped several times causing any temporary slots to be
+	 * removed.
+	 */
+	res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, "
+							"%s as caught_up, conflicting as invalid "
+							"FROM pg_catalog.pg_replication_slots "
+							"WHERE slot_type = 'logical' AND "
+							"database = current_database() AND "
+							"temporary IS FALSE;",
+							live_check ? "FALSE" :
+							"(CASE WHEN conflicting THEN FALSE "
+							"ELSE (SELECT pg_catalog.binary_upgrade_logical_slot_has_caught_up(slot_name)) "
+							"END)");
+
+	num_slots = PQntuples(res);
+
+	if (num_slots)
+	{
+		int			i_slotname;
+		int			i_plugin;
+		int			i_twophase;
+		int			i_caught_up;
+		int			i_invalid;
+
+		slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
+
+		i_slotname = PQfnumber(res, "slot_name");
+		i_plugin = PQfnumber(res, "plugin");
+		i_twophase = PQfnumber(res, "two_phase");
+		i_caught_up = PQfnumber(res, "caught_up");
+		i_invalid = PQfnumber(res, "invalid");
+
+		for (int slotnum = 0; slotnum < num_slots; slotnum++)
+		{
+			LogicalSlotInfo *curr = &slotinfos[slotnum];
+
+			curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+			curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+			curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+			curr->caught_up = (strcmp(PQgetvalue(res, slotnum, i_caught_up), "t") == 0);
+			curr->invalid = (strcmp(PQgetvalue(res, slotnum, i_invalid), "t") == 0);
+		}
+	}
+
+	PQclear(res);
+	PQfinish(conn);
+
+	dbinfo->slot_arr.slots = slotinfos;
+	dbinfo->slot_arr.nslots = num_slots;
+}
+
+
+/*
+ * count_old_cluster_logical_slots()
+ *
+ * Returns the number of logical replication slots for all databases.
+ *
+ * Note: this function always returns 0 if the old_cluster is PG16 and prior
+ * because we gather slot information only for cluster versions greater than or
+ * equal to PG17. See get_old_cluster_logical_slot_infos().
+ */
+int
+count_old_cluster_logical_slots(void)
+{
+	int			slot_count = 0;
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+		slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+	return slot_count;
+}

 static void
 free_db_and_rel_infos(DbInfoArr *db_arr)
@ -642,8 +775,11 @@ print_db_infos(DbInfoArr *db_arr)

 	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
 	{
-		pg_log(PG_VERBOSE, "Database: \"%s\"", db_arr->dbs[dbnum].db_name);
-		print_rel_infos(&db_arr->dbs[dbnum].rel_arr);
+		DbInfo	   *pDbInfo = &db_arr->dbs[dbnum];
+
+		pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+		print_rel_infos(&pDbInfo->rel_arr);
+		print_slot_infos(&pDbInfo->slot_arr);
 	}
 }

@ -660,3 +796,23 @@ print_rel_infos(RelInfoArr *rel_arr)
 			   rel_arr->rels[relnum].reloid,
 			   rel_arr->rels[relnum].tablespace);
 }
+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+	/* Quick return if there are no logical slots. */
+	if (slot_arr->nslots == 0)
+		return;
+
+	pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+	for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+	{
+		LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+		pg_log(PG_VERBOSE, "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
+			   slot_info->slotname,
+			   slot_info->plugin,
+			   slot_info->two_phase ? "true" : "false");
+	}
+}
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@ -42,6 +42,7 @@ tests += {
    'tests': [
      't/001_basic.pl',
      't/002_pg_upgrade.pl',
+      't/003_upgrade_logical_replication_slots.pl',
    ],
    'test_kwargs': {'priority': 40}, # pg_upgrade tests are slow
  },
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@ -59,6 +59,7 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
+static void create_logical_replication_slots(void);

 ClusterInfo old_cluster,
 			new_cluster;
@ -188,6 +189,21 @@ main(int argc, char **argv)
 			  new_cluster.pgdata);
 	check_ok();

+	/*
+	 * Migrate the logical slots to the new cluster.  Note that we need to do
+	 * this after resetting WAL because otherwise the required WAL would be
+	 * removed and slots would become unusable.  There is a possibility that
+	 * background processes might generate some WAL before we could create the
+	 * slots in the new cluster but we can ignore that WAL as that won't be
+	 * required downstream.
+	 */
+	if (count_old_cluster_logical_slots())
+	{
+		start_postmaster(&new_cluster, true);
+		create_logical_replication_slots();
+		stop_postmaster(false);
+	}
+
 	if (user_opts.do_sync)
 	{
 		prep_status("Sync data directory to disk");
@ -593,7 +609,7 @@ create_new_objects(void)
 		set_frozenxids(true);

 	/* update new_cluster info now that we have objects in the databases */
-	get_db_and_rel_infos(&new_cluster);
+	get_db_rel_and_slot_infos(&new_cluster, false);
 }

 /*
@ -862,3 +878,59 @@ set_frozenxids(bool minmxid_only)

 	check_ok();
 }
+
+/*
+ * create_logical_replication_slots()
+ *
+ * Similar to create_new_objects() but only restores logical replication slots.
+ */
+static void
+create_logical_replication_slots(void)
+{
+	prep_status_progress("Restoring logical replication slots in the new cluster");
+
+	for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *old_db = &old_cluster.dbarr.dbs[dbnum];
+		LogicalSlotInfoArr *slot_arr = &old_db->slot_arr;
+		PGconn	   *conn;
+		PQExpBuffer query;
+
+		/* Skip this database if there are no slots */
+		if (slot_arr->nslots == 0)
+			continue;
+
+		conn = connectToServer(&new_cluster, old_db->db_name);
+		query = createPQExpBuffer();
+
+		pg_log(PG_STATUS, "%s", old_db->db_name);
+
+		for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+		{
+			LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+			/* Constructs a query for creating logical replication slots */
+			appendPQExpBuffer(query,
+							  "SELECT * FROM "
+							  "pg_catalog.pg_create_logical_replication_slot(");
+			appendStringLiteralConn(query, slot_info->slotname, conn);
+			appendPQExpBuffer(query, ", ");
+			appendStringLiteralConn(query, slot_info->plugin, conn);
+			appendPQExpBuffer(query, ", false, %s);",
+							  slot_info->two_phase ? "true" : "false");
+
+			PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+			resetPQExpBuffer(query);
+		}
+
+		PQfinish(conn);
+
+		destroyPQExpBuffer(query);
+	}
+
+	end_progress_output();
+	check_ok();
+
+	return;
+}
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@ -150,6 +150,24 @@ typedef struct
 	int			nrels;
 } RelInfoArr;

+/*
+ * Structure to store logical replication slot information.
+ */
+typedef struct
+{
+	char	   *slotname;		/* slot name */
+	char	   *plugin;			/* plugin */
+	bool		two_phase;		/* can the slot decode 2PC? */
+	bool		caught_up;		/* has the slot caught up to latest changes? */
+	bool		invalid;		/* if true, the slot is unusable */
+} LogicalSlotInfo;
+
+typedef struct
+{
+	int			nslots;			/* number of logical slot infos */
+	LogicalSlotInfo *slots;		/* array of logical slot infos */
+} LogicalSlotInfoArr;
+
 /*
 * The following structure represents a relation mapping.
 */
@ -176,6 +194,7 @@ typedef struct
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
+	LogicalSlotInfoArr slot_arr;	/* array of all LogicalSlotInfo */
 } DbInfo;

 /*
@ -400,7 +419,8 @@ void		check_loadable_libraries(void);
 FileNameMap *gen_db_file_maps(DbInfo *old_db,
 							  DbInfo *new_db, int *nmaps, const char *old_pgdata,
 							  const char *new_pgdata);
-void		get_db_and_rel_infos(ClusterInfo *cluster);
+void		get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
+int			count_old_cluster_logical_slots(void);

 /* option.c */

--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@ -201,6 +201,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	PGconn	   *conn;
 	bool		pg_ctl_return = false;
 	char		socket_string[MAXPGPATH + 200];
+	PQExpBufferData pgoptions;

 	static bool exit_hook_registered = false;

@ -227,23 +228,41 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 				 cluster->sockdir);
 #endif

+	initPQExpBuffer(&pgoptions);
+
 	/*
-	 * Use -b to disable autovacuum.
+	 * Construct a parameter string which is passed to the server process.
 	 *
 	 * Turn off durability requirements to improve object creation speed, and
 	 * we only modify the new cluster, so only use it there.  If there is a
 	 * crash, the new cluster has to be recreated anyway.  fsync=off is a big
 	 * win on ext4.
 	 */
+	if (cluster == &new_cluster)
+		appendPQExpBufferStr(&pgoptions, " -c synchronous_commit=off -c fsync=off -c full_page_writes=off");
+
+	/*
+	 * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+	 * checkpointer process.  If WALs required by logical replication slots
+	 * are removed, the slots are unusable.  This setting prevents the
+	 * invalidation of slots during the upgrade. We set this option when
+	 * cluster is PG17 or later because logical replication slots can only be
+	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 */
+	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+
+	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
 			 cluster->bindir,
 			 log_opts.logdir,
 			 SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
-			 (cluster == &new_cluster) ?
-			 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
+			 pgoptions.data,
 			 cluster->pgopts ? cluster->pgopts : "", socket_string);

+	termPQExpBuffer(&pgoptions);
+
 	/*
 	 * Don't throw an error right away, let connecting throw the error because
 	 * it might supply a reason for the failure.
--- a/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
+++ b/src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
@ -0,0 +1,192 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading logical replication slots
+
+use strict;
+use warnings;
+
+use File::Find qw(find);
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old cluster
+my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
+$old_publisher->init(allows_streaming => 'logical');
+
+# Initialize new cluster
+my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
+$new_publisher->init(allows_streaming => 'logical');
+
+# Setup a pg_upgrade command. This will be used anywhere.
+my @pg_upgrade_cmd = (
+	'pg_upgrade', '--no-sync',
+	'-d', $old_publisher->data_dir,
+	'-D', $new_publisher->data_dir,
+	'-b', $old_publisher->config_data('--bindir'),
+	'-B', $new_publisher->config_data('--bindir'),
+	'-s', $new_publisher->host,
+	'-p', $old_publisher->port,
+	'-P', $new_publisher->port,
+	$mode);
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values
+
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+]);
+$old_publisher->stop();
+
+# 2. Set 'max_replication_slots' to be less than the number of slots (2)
+#	 present on the old cluster.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
+
+# pg_upgrade will fail because the new cluster has insufficient
+# max_replication_slots
+command_checks_all(
+	[@pg_upgrade_cmd],
+	1,
+	[
+		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
+	],
+	[qr//],
+	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
+);
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# Set 'max_replication_slots' to match the number of slots (2) present on the
+# old cluster. Both slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 2");
+
+
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when the slot still has unconsumed WAL records
+
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. At this point neither test_slot1 nor
+#	 test_slot2 has consumed them.
+#
+# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot1
+#	 still has unconsumed WAL records.
+#
+# 3. Emit a non-transactional message. This will cause test_slot2 to detect the
+#	 unconsumed WAL record.
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+		CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+		SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
+		SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
+]);
+$old_publisher->stop;
+
+# pg_upgrade will fail because there are slots still having unconsumed WAL
+# records
+command_checks_all(
+	[@pg_upgrade_cmd],
+	1,
+	[
+		qr/Your installation contains logical replication slots that can't be upgraded./
+	],
+	[qr//],
+	'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
+);
+
+# Verify the reason why the logical replication slot cannot be upgraded
+my $slots_filename;
+
+# Find a txt file that contains a list of logical replication slots that cannot
+# be upgraded. We cannot predict the file's path because the output directory
+# contains a milliseconds timestamp. File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/invalid_logical_replication_slots\.txt/)
+		{
+			$slots_filename = $File::Find::name;
+		}
+	},
+	$new_publisher->data_dir . "/pg_upgrade_output.d");
+
+# Check the file content. Both slots should be reporting that they have
+# unconsumed WAL records.
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+like(
+	slurp_file($slots_filename),
+	qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+	'the previous test failed due to unconsumed WALs');
+
+
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Setup logical replication (first, cleanup slots from the previous tests)
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+$old_publisher->safe_psql(
+	'postgres', qq[
+	SELECT * FROM pg_drop_replication_slot('test_slot1');
+	SELECT * FROM pg_drop_replication_slot('test_slot2');
+	CREATE PUBLICATION regress_pub FOR ALL TABLES;
+]);
+
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init();
+
+$subscriber->start;
+$subscriber->safe_psql(
+	'postgres', qq[
+	CREATE TABLE tbl (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
+]);
+$subscriber->wait_for_subscription_sync($old_publisher, 'regress_sub');
+
+# 2. Temporarily disable the subscription
+$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$old_publisher->stop;
+
+# pg_upgrade should be successful
+command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
+
+# Check that the slot 'regress_sub' has migrated to the new cluster
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Update the connection
+my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
+$subscriber->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub ENABLE;
+]);
+
+# Check whether changes on the new publisher get replicated to the subscriber
+$new_publisher->safe_psql('postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$new_publisher->wait_for_catchup('regress_sub');
+$result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
+is($result, qq(20), 'check changes are replicated to the subscriber');
+
+# Clean up
+$subscriber->stop();
+$new_publisher->stop();
+
+done_testing();
--- a/src/common/stringinfo.c
+++ b/src/common/stringinfo.c
@ -70,10 +70,16 @@ initStringInfo(StringInfo str)
 *
 * Reset the StringInfo: the data buffer remains valid, but its
 * previous content, if any, is cleared.
+ *
+ * Read-only StringInfos as initialized by initReadOnlyStringInfo cannot be
+ * reset.
 */
 void
 resetStringInfo(StringInfo str)
 {
+	/* don't allow resets of read-only StringInfos */
+	Assert(str->maxlen != 0);
+
 	str->data[0] = '\0';
 	str->len = 0;
 	str->cursor = 0;
@ -284,6 +290,9 @@ enlargeStringInfo(StringInfo str, int needed)
 {
 	int			newlen;

+	/* validate this is not a read-only StringInfo */
+	Assert(str->maxlen != 0);
+
 	/*
 	 * Guard against out-of-range "needed" values.  Without this, we can get
 	 * an overflow or infinite loop in the following.
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@ -57,6 +57,6 @@
 */

 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202310181
+#define CATALOG_VERSION_NO	202310261

 #endif
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@ -11379,6 +11379,11 @@
  proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
  proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
  prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'name',
+  prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

 # conversion functions
 { oid => '4302',
--- a/src/include/lib/stringinfo.h
+++ b/src/include/lib/stringinfo.h
@ -20,17 +20,27 @@

 /*-------------------------
 * StringInfoData holds information about an extensible string.
- *		data	is the current buffer for the string (allocated with palloc).
- *		len		is the current string length.  There is guaranteed to be
- *				a terminating '\0' at data[len], although this is not very
- *				useful when the string holds binary data rather than text.
+ *		data	is the current buffer for the string.
+ *		len		is the current string length.  Except in the case of read-only
+ *				strings described below, there is guaranteed to be a
+ *				terminating '\0' at data[len].
 *		maxlen	is the allocated size in bytes of 'data', i.e. the maximum
 *				string size (including the terminating '\0' char) that we can
 *				currently store in 'data' without having to reallocate
- *				more space.  We must always have maxlen > len.
- *		cursor	is initialized to zero by makeStringInfo or initStringInfo,
- *				but is not otherwise touched by the stringinfo.c routines.
- *				Some routines use it to scan through a StringInfo.
+ *				more space.  We must always have maxlen > len, except
+ *				in the read-only case described below.
+ *		cursor	is initialized to zero by makeStringInfo, initStringInfo,
+ *				initReadOnlyStringInfo and initStringInfoFromString but is not
+ *				otherwise touched by the stringinfo.c routines.  Some routines
+ *				use it to scan through a StringInfo.
+ *
+ * As a special case, a StringInfoData can be initialized with a read-only
+ * string buffer.  In this case "data" does not necessarily point at a
+ * palloc'd chunk, and management of the buffer storage is the caller's
+ * responsibility.  maxlen is set to zero to indicate that this is the case.
+ * Read-only StringInfoDatas cannot be appended to or reset.
+ * Also, it is caller's option whether a read-only string buffer has a
+ * terminating '\0' or not.  This depends on the intended usage.
 *-------------------------
 */
 typedef struct StringInfoData
@ -45,7 +55,7 @@ typedef StringInfoData *StringInfo;


 /*------------------------
- * There are two ways to create a StringInfo object initially:
+ * There are four ways to create a StringInfo object initially:
 *
 * StringInfo stringptr = makeStringInfo();
 *		Both the StringInfoData and the data buffer are palloc'd.
@ -56,8 +66,31 @@ typedef StringInfoData *StringInfo;
 *		This is the easiest approach for a StringInfo object that will
 *		only live as long as the current routine.
 *
+ * StringInfoData string;
+ * initReadOnlyStringInfo(&string, existingbuf, len);
+ *		The StringInfoData's data field is set to point directly to the
+ *		existing buffer and the StringInfoData's len is set to the given len.
+ *		The given buffer can point to memory that's not managed by palloc or
+ *		is pointing partway through a palloc'd chunk.  The maxlen field is set
+ *		to 0.  A read-only StringInfo cannot be appended to using any of the
+ *		appendStringInfo functions or reset with resetStringInfo().  The given
+ *		buffer can optionally omit the trailing NUL.
+ *
+ * StringInfoData string;
+ * initStringInfoFromString(&string, palloced_buf, len);
+ *		The StringInfoData's data field is set to point directly to the given
+ *		buffer and the StringInfoData's len is set to the given len.  This
+ *		method of initialization is useful when the buffer already exists.
+ *		StringInfos initialized this way can be appended to using the
+ *		appendStringInfo functions and reset with resetStringInfo().  The
+ *		given buffer must be NUL-terminated.  The palloc'd buffer is assumed
+ *		to be len + 1 in size.
+ *
 * To destroy a StringInfo, pfree() the data buffer, and then pfree() the
 * StringInfoData if it was palloc'd.  There's no special support for this.
+ * However, if the StringInfo was initialized using initReadOnlyStringInfo()
+ * then the caller will need to consider if it is safe to pfree the data
+ * buffer.
 *
 * NOTE: some routines build up a string using StringInfo, and then
 * release the StringInfoData but return the data string itself to their
@ -79,6 +112,48 @@ extern StringInfo makeStringInfo(void);
 */
 extern void initStringInfo(StringInfo str);

+/*------------------------
+ * initReadOnlyStringInfo
+ * Initialize a StringInfoData struct from an existing string without copying
+ * the string.  The caller is responsible for ensuring the given string
+ * remains valid as long as the StringInfoData does.  Calls to this are used
+ * in performance critical locations where allocating a new buffer and copying
+ * would be too costly.  Read-only StringInfoData's may not be appended to
+ * using any of the appendStringInfo functions or reset with
+ * resetStringInfo().
+ *
+ * 'data' does not need to point directly to a palloc'd chunk of memory and may
+ * omit the NUL termination character at data[len].
+ */
+static inline void
+initReadOnlyStringInfo(StringInfo str, char *data, int len)
+{
+	str->data = data;
+	str->len = len;
+	str->maxlen = 0;			/* read-only */
+	str->cursor = 0;
+}
+
+/*------------------------
+ * initStringInfoFromString
+ * Initialize a StringInfoData struct from an existing string without copying
+ * the string.  'data' must be a valid palloc'd chunk of memory that can have
+ * repalloc() called should more space be required during a call to any of the
+ * appendStringInfo functions.
+ *
+ * 'data' must be NUL terminated at 'len' bytes.
+ */
+static inline void
+initStringInfoFromString(StringInfo str, char *data, int len)
+{
+	Assert(data[len] == '\0');
+
+	str->data = data;
+	str->len = len;
+	str->maxlen = len + 1;
+	str->cursor = 0;
+}
+
 /*------------------------
 * resetStringInfo
 * Clears the current content of the StringInfo, if any. The
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@ -71,6 +71,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
 extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 										  List *restrictlist,
 										  List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+										  List *restrictlist, List *exprlist,
+										  List *oprlist, List **extra_clauses);
 extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
 												IndexOptInfo *index,
 												int indexcol);
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@ -20,6 +20,7 @@
 /* GUC parameters */
 #define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
 extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;

 /* query_planner callback to compute query_pathkeys */
 typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@ -104,6 +105,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
 extern bool innerrel_is_unique(PlannerInfo *root,
 							   Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
 							   JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+								   Relids outerrelids, RelOptInfo *innerrel,
+								   JoinType jointype, List *restrictlist,
+								   bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);

 /*
 * prototypes for plan/setrefs.c
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@ -109,6 +109,9 @@ typedef struct LogicalDecodingContext
 	TransactionId write_xid;
 	/* Are we processing the end LSN of a transaction? */
 	bool		end_xact;
+
+	/* Do we need to process any change in fast_forward mode? */
+	bool		processing_required;
 } LogicalDecodingContext;


@ -145,4 +148,6 @@ extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);

+extern bool LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal);
+
 #endif
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@ -430,6 +430,38 @@ explain (costs off)
   Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
 (2 rows)

+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+  select * from ec0 m join ec0 n on m.ff = n.ff
+  join ec1 p on m.ff + n.ff = p.f1;
+               QUERY PLAN               
+----------------------------------------
+ Nested Loop
+   Join Filter: ((n.ff + n.ff) = p.f1)
+   ->  Seq Scan on ec1 p
+   ->  Materialize
+         ->  Seq Scan on ec0 n
+               Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+  select * from ec0 m join ec0 n on m.ff = n.ff
+  join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Nested Loop
+   Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+   ->  Seq Scan on ec1 p
+   ->  Materialize
+         ->  Seq Scan on ec0 n
+               Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
 -- this could be converted, but isn't at present
 explain (costs off)
  select * from tenk1 where unique1 = unique1 or unique2 = unique2;
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@ -6132,6 +6132,814 @@ select * from
 ----+----+----+----
 (0 rows)

+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+                  QUERY PLAN                   
+-----------------------------------------------
+ Seq Scan on sj q
+   Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c 
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+				where q.a = p.a and q.b < 10);
+                QUERY PLAN                
+------------------------------------------
+ Seq Scan on sj q
+   Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c 
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+              QUERY PLAN               
+---------------------------------------
+ Nested Loop
+   Join Filter: (t1.a = t2.c)
+   ->  Seq Scan on sj t2
+   ->  Materialize
+         ->  Seq Scan on sj t1
+               Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+	(select a as x from sj where false) as q1,
+	(select a as y from sj where false) as q2
+where q1.x = q2.y;
+        QUERY PLAN        
+--------------------------
+ Result
+   One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   Join Filter: (t1.a = t2.b)
+   ->  Seq Scan on sj t1
+         Filter: (a = b)
+   ->  Seq Scan on sj t2
+         Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+	and t1.b = t3.b and t3.b = t3.a;
+             QUERY PLAN             
+------------------------------------
+ Nested Loop
+   Join Filter: (t1.a = t3.b)
+   ->  Nested Loop
+         Join Filter: (t1.a = t2.b)
+         ->  Seq Scan on sj t1
+               Filter: (a = b)
+         ->  Seq Scan on sj t2
+               Filter: (b = a)
+   ->  Seq Scan on sj t3
+         Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+	join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+   Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+                QUERY PLAN                
+------------------------------------------
+ Seq Scan on sj t2
+   Filter: (a IS NOT NULL)
+   SubPlan 1
+     ->  Result
+           One-Time Filter: (t2.a = t2.a)
+           ->  Seq Scan on sj
+                 Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+             QUERY PLAN             
+------------------------------------
+ Nested Loop Left Join
+   Join Filter: (y.a = z.q1)
+   ->  Seq Scan on sj y
+         Filter: (a IS NOT NULL)
+   ->  Materialize
+         ->  Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+             QUERY PLAN             
+------------------------------------
+ Nested Loop Left Join
+   Join Filter: (y.a = z.q1)
+   ->  Seq Scan on sj y
+         Filter: (a IS NOT NULL)
+   ->  Materialize
+         ->  Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+  SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+  ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+  (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (t2.a = t4.a)
+   ->  Seq Scan on sj t2
+         Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+   ->  Seq Scan on sj t4
+         Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+	right join sj j1 inner join sj j2 on j1.a = j2.a
+	on true) z
+on true;
+                QUERY PLAN                
+------------------------------------------
+ Nested Loop Left Join
+   ->  Result
+   ->  Nested Loop Left Join
+         ->  Seq Scan on sj j2
+               Filter: (a IS NOT NULL)
+         ->  Materialize
+               ->  Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+  SELECT c1.relname AS relname
+  FROM pg_class c1
+    JOIN pg_class c2
+    ON c1.oid=c2.oid AND c1.oid < 10
+);
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (am.amname = c2.relname)
+   ->  Seq Scan on pg_am am
+   ->  Materialize
+         ->  Index Scan using pg_class_oid_index on pg_class c2
+               Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   Join Filter: (j1.b = j2.b)
+   ->  Seq Scan on sj j1
+         Filter: (a = 2)
+   ->  Seq Scan on sj j2
+         Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c 
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+               QUERY PLAN                
+-----------------------------------------
+ Seq Scan on sj j2
+   Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c 
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+                                                         QUERY PLAN                                                         
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+   Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c 
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+               QUERY PLAN                
+-----------------------------------------
+ Seq Scan on sj j2
+   Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c 
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+               QUERY PLAN                
+-----------------------------------------
+ Seq Scan on sj j2
+   Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c 
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c 
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+           QUERY PLAN            
+---------------------------------
+ Nested Loop
+   Join Filter: (t1.b = t2.b)
+   ->  Seq Scan on sj t2
+         Filter: (a = 42)
+   ->  Seq Scan on sj t1
+         Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+                  QUERY PLAN                   
+-----------------------------------------------
+ Seq Scan on sj j2
+   Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+          QUERY PLAN           
+-------------------------------
+ Nested Loop
+   Join Filter: (j1.b = j2.b)
+   ->  Seq Scan on sj j1
+         Filter: ((a * a) = 1)
+   ->  Seq Scan on sj j2
+         Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+                                                            QUERY PLAN                                                            
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+   Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c 
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.a) = (random()/3 + 3)::int
+  AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+                                             QUERY PLAN                                              
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: (j1.b = j2.b)
+   ->  Seq Scan on sj j1
+         Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+   ->  Seq Scan on sj j2
+         Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+  AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c 
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Seq Scan on sj j2
+   Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+              QUERY PLAN               
+---------------------------------------
+ Nested Loop
+   Join Filter: (j1.b = j2.b)
+   ->  Seq Scan on sj j1
+         Filter: ((2 = a) AND (c = 3))
+   ->  Seq Scan on sj j2
+         Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   Join Filter: (j1.b = j2.b)
+   ->  Seq Scan on sj j1
+         Filter: (a = 2)
+   ->  Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   Join Filter: (j1.b = j2.b)
+   ->  Seq Scan on sj j2
+         Filter: (2 = a)
+   ->  Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Nested Loop
+   Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+   ->  Seq Scan on sj j1
+   ->  Materialize
+         ->  Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+	(is_flag IS NULL OR is_flag = 0)
+	AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+                                    QUERY PLAN                                    
+----------------------------------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on tab_with_flag
+         Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+         Filter: ((is_flag IS NULL) OR (is_flag = 0))
+         ->  Bitmap Index Scan on tab_with_flag_pkey
+               Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: q.b
+   Filter: (sum(q.a) = 1)
+   ->  Seq Scan on sj q
+         Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+  lateral generate_series(1, q.a) gs(i);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Nested Loop
+   Output: 1
+   ->  Seq Scan on public.sj y
+         Output: y.a, y.b, y.c
+         Filter: (y.a IS NOT NULL)
+   ->  Function Scan on pg_catalog.generate_series gs
+         Output: gs.i
+         Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+  lateral generate_series(1, q.a) gs(i);
+                      QUERY PLAN                      
+------------------------------------------------------
+ Nested Loop
+   Output: 1
+   ->  Seq Scan on public.sj y
+         Output: y.a, y.b, y.c
+         Filter: (y.a IS NOT NULL)
+   ->  Function Scan on pg_catalog.generate_series gs
+         Output: gs.i
+         Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+  left join sj r on p.a + q.a = r.a;
+             QUERY PLAN             
+------------------------------------
+ Nested Loop Left Join
+   Join Filter: ((q.a + q.a) = r.a)
+   ->  Seq Scan on sj q
+         Filter: (a IS NOT NULL)
+   ->  Materialize
+         ->  Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Seq Scan on sj q
+   Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+	(sk k1 join sk k2 on k1.a = k2.a)
+	join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Nested Loop
+   Join Filter: (k1.b = j2.b)
+   ->  Nested Loop
+         ->  Index Scan using sk_a_idx on sk k1
+         ->  Index Only Scan using sk_a_idx on sk k2
+               Index Cond: (a = k1.a)
+   ->  Materialize
+         ->  Index Scan using sj_a_key on sj j2
+               Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+	(sk k1 join sk k2 on k1.a = k2.a)
+	join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Nested Loop
+   Join Filter: (k1.b = j2.b)
+   ->  Nested Loop
+         ->  Index Scan using sk_a_idx on sk k1
+         ->  Index Only Scan using sk_a_idx on sk k2
+               Index Cond: (a = k1.a)
+   ->  Materialize
+         ->  Index Scan using sj_a_key on sj j2
+               Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+   Output: e2.id, e2.code, e2.id, e2.code
+   Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain (costs off)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Aggregate
+   ->  Seq Scan on emp1 c3
+         Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain (costs off)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Aggregate
+   ->  Seq Scan on emp1 c3
+         Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain (costs off)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Aggregate
+   ->  Seq Scan on emp1 c3
+         Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different.  In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   Join Filter: (t1.a = t2.a)
+   ->  Seq Scan on sl t1
+         Filter: (b = 1)
+   ->  Seq Scan on sl t2
+         Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+  and t1.c IS NOT NULL and t2.c IS NOT NULL
+  and t2.b IS NOT NULL and t1.b IS NOT NULL
+  and t1.a IS NOT NULL and t2.a IS NOT NULL;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: (t1.a = t2.a)
+   ->  Seq Scan on sl t1
+         Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+   ->  Seq Scan on sl t2
+         Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+  and t1.c IS NOT NULL and t2.c IS NOT NULL
+  and t2.b IS NOT NULL and t1.b IS NOT NULL
+  and t1.a IS NOT NULL and t2.a IS NOT NULL;
+                                         QUERY PLAN                                          
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+   Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+   Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+          QUERY PLAN           
+-------------------------------
+ Nested Loop
+   Join Filter: (n1.a <> n2.a)
+   ->  Seq Scan on sj n2
+         Filter: (a = 1)
+   ->  Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+	(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+          QUERY PLAN           
+-------------------------------
+ Nested Loop
+   Join Filter: (n1.a <> n2.a)
+   ->  Nested Loop
+         ->  Seq Scan on sl
+         ->  Seq Scan on sj n2
+               Filter: (a = 1)
+   ->  Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+	SELECT sj_t2.id AS id FROM sj_t2
+	WHERE EXISTS
+		(
+		SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+		)
+	) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+	SELECT sj_t2.id AS id FROM sj_t2
+	WHERE EXISTS
+		(
+		SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+		)
+	) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Nested Loop
+   Join Filter: (sj_t3.id = sj_t1.id)
+   ->  Nested Loop
+         Join Filter: (sj_t2.id = sj_t3.id)
+         ->  Nested Loop Semi Join
+               ->  Nested Loop
+                     ->  HashAggregate
+                           Group Key: sj_t3.id
+                           ->  Nested Loop
+                                 ->  Seq Scan on sj_t4
+                                 ->  Materialize
+                                       ->  Bitmap Heap Scan on sj_t3
+                                             Recheck Cond: (a = 1)
+                                             ->  Bitmap Index Scan on sj_t3_a_id_idx
+                                                   Index Cond: (a = 1)
+                     ->  Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+                           Index Cond: (id = sj_t3.id)
+               ->  Nested Loop
+                     ->  Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+                           Index Cond: ((a = 1) AND (id = sj_t3.id))
+                     ->  Seq Scan on sj_t4 sj_t4_1
+         ->  Index Only Scan using sj_t2_id_idx on sj_t2
+               Index Cond: (id = sj_t2_1.id)
+   ->  Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+           QUERY PLAN            
+---------------------------------
+ LockRows
+   ->  Seq Scan on sj a2
+         Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+           QUERY PLAN            
+---------------------------------
+ Update on sj sq
+   ->  Seq Scan on sj sz
+         Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+  DO INSTEAD
+    UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+           QUERY PLAN            
+---------------------------------
+ Update on sj
+   ->  Seq Scan on sj
+         Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
 --
 -- Test hints given on incorrect column references are useful
 --
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
 enable_partitionwise_aggregate | off
 enable_partitionwise_join      | off
 enable_presorted_aggregate     | on
+ enable_self_join_removal       | on
 enable_seqscan                 | on
 enable_sort                    | on
 enable_tidscan                 | on
-(21 rows)
+(22 rows)

 -- There are always wait event descriptions for various types.
 select type, count(*) > 0 as ok FROM pg_wait_events
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
 (1 row)

 EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
-                            QUERY PLAN                             
-------------------------------------------------------------------
- Update on base_tbl base_tbl_1
-   ->  Nested Loop
-         ->  Index Scan using base_tbl_pkey on base_tbl base_tbl_1
-               Index Cond: (id = 1)
-         ->  Index Scan using base_tbl_pkey on base_tbl
-               Index Cond: (id = 1)
-               Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+                    QUERY PLAN                    
+--------------------------------------------------
+ Update on base_tbl
+   ->  Index Scan using base_tbl_pkey on base_tbl
+         Index Cond: (id = 1)
+         Filter: ((NOT deleted) AND snoop(data))
+(4 rows)

 DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
 NOTICE:  snooped value: Row 1
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@ -845,7 +845,7 @@ initialize_environment(void)
 		{
 			char		s[16];

-			sprintf(s, "%d", port);
+			snprintf(s, sizeof(s), "%d", port);
 			setenv("PGPORT", s, 1);
 		}
 	}
@ -867,7 +867,7 @@ initialize_environment(void)
 		{
 			char		s[16];

-			sprintf(s, "%d", port);
+			snprintf(s, sizeof(s), "%d", port);
 			setenv("PGPORT", s, 1);
 		}
 		if (user != NULL)
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@ -259,6 +259,22 @@ drop user regress_user_ectest;
 explain (costs off)
  select * from tenk1 where unique1 = unique1 and unique2 = unique2;

+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+  select * from ec0 m join ec0 n on m.ff = n.ff
+  join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+  select * from ec0 m join ec0 n on m.ff = n.ff
+  join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
 -- this could be converted, but isn't at present
 explain (costs off)
  select * from tenk1 where unique1 = unique1 or unique2 = unique2;
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@ -2309,6 +2309,368 @@ select * from
 select * from
  int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok

+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+				where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+	(select a as x from sj where false) as q1,
+	(select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+	and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+	join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+  SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+  ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+  (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+	right join sj j1 inner join sj j2 on j1.a = j2.a
+	on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+  SELECT c1.relname AS relname
+  FROM pg_class c1
+    JOIN pg_class c2
+    ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+  AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.a) = (random()/3 + 3)::int
+  AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+  AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+  AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2
+	WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+	SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+	(is_flag IS NULL OR is_flag = 0)
+	AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+  lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+  lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+  left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+	(sk k1 join sk k2 on k1.a = k2.a)
+	join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+	(sk k1 join sk k2 on k1.a = k2.a)
+	join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain (costs off)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+explain (costs off)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+explain (costs off)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different.  In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+  and t1.c IS NOT NULL and t2.c IS NOT NULL
+  and t2.b IS NOT NULL and t1.b IS NOT NULL
+  and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+  and t1.c IS NOT NULL and t2.c IS NOT NULL
+  and t2.b IS NOT NULL and t1.b IS NOT NULL
+  and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+	(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+	SELECT sj_t2.id AS id FROM sj_t2
+	WHERE EXISTS
+		(
+		SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+		)
+	) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+	SELECT sj_t2.id AS id FROM sj_t2
+	WHERE EXISTS
+		(
+		SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+		)
+	) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+  DO INSTEAD
+    UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
 --
 -- Test hints given on incorrect column references are useful
 --
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@ -367,6 +367,7 @@ CatalogId
 CatalogIdMapEntry
 CatalogIndexState
 ChangeVarNodes_context
+ReplaceVarnoContext
 CheckPoint
 CheckPointStmt
 CheckpointStatsData
@ -1503,6 +1504,8 @@ LogicalRepTyp
 LogicalRepWorker
 LogicalRepWorkerType
 LogicalRewriteMappingData
+LogicalSlotInfo
+LogicalSlotInfoArr
 LogicalTape
 LogicalTapeSet
 LsnReadQueue
@ -2473,6 +2476,7 @@ SeenRelsEntry
 SelectLimit
 SelectStmt
 Selectivity
+SelfJoinCandidate
 SemTPadded
 SemiAntiJoinFactors
 SeqScan
@ -3835,6 +3839,7 @@ unicodeStyleColumnFormat
 unicodeStyleFormat
 unicodeStyleRowFormat
 unicode_linestyle
+UniqueRelInfo
 unit_conversion
 unlogged_relation_entry
 utf_local_conversion_func
Author	SHA1	Message	Date
David Rowley	f0efa5aec1	Introduce the concept of read-only StringInfos There were various places in our codebase which conjured up a StringInfo by manually assigning the StringInfo fields and setting the data field to point to some existing buffer. There wasn't much consistency here as to what fields like maxlen got set to and in one location we didn't correctly ensure that the buffer was correctly NUL terminated at len bytes, as per what was documented as required in stringinfo.h Here we introduce 2 new functions to initialize StringInfos. One allows callers to initialize a StringInfo passing along a buffer that is already allocated by palloc. Here the StringInfo code uses this buffer directly rather than doing any memcpying into a new allocation. Having this as a function allows us to verify the buffer is correctly NUL terminated. StringInfos initialized this way can be appended to and reset just like any other normal StringInfo. The other new initialization function also accepts an existing buffer, but the given buffer does not need to be a pointer to a palloc'd chunk. This buffer could be a pointer pointing partway into some palloc'd chunk or may not even be palloc'd at all. StringInfos initialized this way are deemed as "read-only". This means that it's not possible to append to them or reset them. For the latter of the two new initialization functions mentioned above, we relax the requirement that the data buffer must be NUL terminated. Relaxing this requirement is convenient in a few places as it can save us from having to allocate an entire new buffer just to add the NUL terminator or save us from having to temporarily add a NUL only to have to put the original char back again later. Incompatibility note: Here we also forego adding the NUL in a few places where it does not seem to be required. These locations are passing the given StringInfo into a type's receive function. It does not seem like any of our built-in receive functions require this, but perhaps there's some UDT out there in the wild which does require this. It is likely worthy of a mention in the release notes that a UDT's receive function mustn't rely on the input StringInfo being NUL terminated. Author: David Rowley Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com	2023-10-26 16:31:48 +13:00
Amit Langote	01575ad788	Prevent duplicate RTEPermissionInfo for plain-inheritance parents Currently, expand_single_inheritance_child() doesn't reset perminfoindex in a plain-inheritance parent's child RTE, because prior to 387f9ed0a0, the executor would use the first child RTE to locate the parent's RTEPermissionInfo. That in turn causes add_rte_to_flat_rtable() to create an extra RTEPermissionInfo belonging to the parent's child RTE with the same content as the one belonging to the parent's original ("root") RTE. In 387f9ed0a0, we changed things so that the executor can now use the parent's "root" RTE for locating its RTEPermissionInfo instead of the child RTE, so the latter's perminfoindex need not be set anymore, so make it so. Reported-by: Tom Lane Discussion: https://postgr.es/m/839708.1698174464@sss.pgh.pa.us Backpatch-through: 16	2023-10-26 11:53:56 +09:00
Amit Kapila	29d0a77fa6	Migrate logical slots to the new node during an upgrade. While reading information from the old cluster, a list of logical slots is fetched. At the later part of upgrading, pg_upgrade revisits the list and restores slots by executing pg_create_logical_replication_slot() on the new cluster. Migration of logical replication slots is only supported when the old cluster is version 17.0 or later. If the old node has invalid slots or slots with unconsumed WAL records, the pg_upgrade fails. These checks are needed to prevent data loss. The significant advantage of this commit is that it makes it easy to continue logical replication even after upgrading the publisher node. Previously, pg_upgrade allowed copying publications to a new node. With this patch, adjusting the connection string to the new publisher will cause the apply worker on the subscriber to connect to the new publisher automatically. This enables seamless continuation of logical replication, even after an upgrade. Author: Hayato Kuroda, Hou Zhijie Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com	2023-10-26 07:06:55 +05:30
Tom Lane	bddc2f7480	Doc: remove misleading info about ecpg's CONNECT/DISCONNECT DEFAULT. As far as I can see, ecpg has no notion of a "default" open connection. You can do "CONNECT TO DEFAULT" but that just specifies letting libpq use all its default connection parameters --- the resulting connection is not special subsequently. In particular, SET CONNECTION = DEFAULT and DISCONNECT DEFAULT simply act on a connection named DEFAULT, if you've made one; they do not have special lookup rules. But the documentation of these commands makes it look like they do. Simplest fix, I think, is just to remove the paras suggesting that DEFAULT is special here. Also, SET CONNECTION does have one special lookup rule, which is that it recognizes CURRENT as an alias for the currently selected connection. SET CONNECTION = CURRENT is a no-op, so it's pretty useless, but nonetheless it does something different from selecting a connection by name; so we'd better document it. Per report from Sylvain Frandaz. Back-patch to all supported versions. Discussion: https://postgr.es/m/169824721149.1769274.1553568436817652238@wrigleys.postgresql.org	2023-10-25 17:34:51 -04:00
Nathan Bossart	fdeb6e6a74	Remove dead code in pg_ctl.c. Missed in 39969e2a1e. Author: David Steele Discussion: https://postgr.es/m/0c742f0c-d663-419d-b5a7-4fe867f5566c%40pgmasters.net	2023-10-25 16:26:59 -05:00
Jeff Davis	e9d12a5e22	Doc fix: Interfacing Extensions to Indexes Refer to CREATE ACCESS METHOD rather than suggesting direct changes to pg_am. Also corrects index-specific language that predated table access methods. Discussion: https://postgr.es/m/20231025172551.685b7799455f9a6addcf5afa@sraoss.co.jp Reported-by: Yugo NAGATA <nagata@sraoss.co.jp>	2023-10-25 13:26:11 -07:00
Alexander Korotkov	9ba9c7074f	Fix some regression tests for d3d55ce57136 Add missing (cost off) to explain.	2023-10-25 14:53:14 +03:00
Alexander Korotkov	d3d55ce571	Remove useless self-joins The Self Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if is proved that the join can be replaced with a scan without impacting the query result. Self join and inner relation are replaced with the outer in query, equivalence classes, and planner info structures. Also, inner restrictlist moves to the outer one with removing duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses === selectivity estimations, and potentially can improve total planner prediction for the query. The SJE proof is based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clause. 2. Each matched inner row must be (physically) the same row as the outer one. In this patch we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x 2. Add to the list above the baseretrictinfo of the inner table. 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables. 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination inner and outer clauses must have an exact match. The relation replacement procedure is not trivial and it is partly combined with the one, used to remove useless left joins. Tests, covering this feature, were added to join.sql. Some regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov, Alexander Kuzmenkov Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina Reviewed-by: Alexander Korotkov	2023-10-25 12:59:16 +03:00
Daniel Gustafsson	8f0fd47fa3	Use snprintf instead of sprintf in pg_regress. To avoid static analyzers sounding the alarm, move to using snprintf instead of sprintf. This was an oversight in 66d6086cbcbfc8dee789a6. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/849588.1698179694@sss.pgh.pa.us	2023-10-25 10:53:11 +02:00