Stabilize postgres_fdw tests on 32-bit machines

cac169d68 adjusted DEFAULT_FDW_TUPLE_COST and that seems to have caused a test to become unstable on 32-bit machines. 4b14e1871 tried to fix this as originally the plan was flipping between a Nested Loop and Hash Join. That commit forced the Nested Loop, but there's still flexibility to push or not push the sort to the remote server and 32-bit seems to prefer to push and on 64-bit, the costs prefer not to. Here let's just turn off enable_sort to significantly encourage the sort to take place on the remote server. Reported-by: Michael Paquier, Richard Guo Discussion: https://postgr.es/m/ZUM2IhA8X2lrG50K@paquier.xyz
Make GetConfigOption/GetConfigOptionResetString return "" for NULL.
2025-07-22 00:01:40 -04:00 · 2023-11-03 12:35:37 +13:00 · 2023-11-02 11:53:36 -04:00 · 2023-11-02 11:47:33 -04:00 · 2023-11-02 23:16:34 +13:00 · 2023-11-01 22:47:06 -07:00
23 changed files with 4981 additions and 45 deletions
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@ -4018,27 +4018,26 @@ EXECUTE st1(101, 101);
 00101 | 00101
 (1 row)

+SET enable_hashjoin TO off;
+SET enable_sort TO off;
 -- subquery using stable function (can't be sent to remote)
 PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
-                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
- Sort
+                                                            QUERY PLAN                                                            
+----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop Semi Join
   Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-   Sort Key: t1.c1
-   ->  Nested Loop Semi Join
+   Join Filter: (t2.c3 = t1.c3)
+   ->  Foreign Scan on public.ft1 t1
         Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         Join Filter: (t2.c3 = t1.c3)
-         ->  Foreign Scan on public.ft1 t1
-               Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
-         ->  Materialize
+         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC NULLS LAST
+   ->  Materialize
+         Output: t2.c3
+         ->  Foreign Scan on public.ft2 t2
               Output: t2.c3
-               ->  Foreign Scan on public.ft2 t2
-                     Output: t2.c3
-                     Filter: (date(t2.c4) = '01-17-1970'::date)
-                     Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
-(15 rows)
+               Filter: (date(t2.c4) = '01-17-1970'::date)
+               Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+(12 rows)

 EXECUTE st2(10, 20);
 c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@ -4052,6 +4051,8 @@ EXECUTE st2(101, 121);
 116 |  6 | 00116 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6  | 6          | foo
 (1 row)

+RESET enable_hashjoin;
+RESET enable_sort;
 -- subquery using immutable function (can be sent to remote)
 PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@ -1122,11 +1122,15 @@ PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st1(1, 2);
 EXECUTE st1(1, 1);
 EXECUTE st1(101, 101);
+SET enable_hashjoin TO off;
+SET enable_sort TO off;
 -- subquery using stable function (can't be sent to remote)
 PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
 EXECUTE st2(10, 20);
 EXECUTE st2(101, 121);
+RESET enable_hashjoin;
+RESET enable_sort;
 -- subquery using immutable function (can be sent to remote)
 PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@ -2859,6 +2859,22 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
       </para></entry>
      </row>

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>unicode_assigned</primary>
+        </indexterm>
+        <function>unicode_assigned</function> ( <type>text</type> )
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Returns <literal>true</literal> if all characters in the string are
+        assigned Unicode codepoints; <literal>false</literal> otherwise. This
+        function can only be used when the server encoding is
+        <literal>UTF8</literal>.
+       </para></entry>
+      </row>
+
      <row>
       <entry role="func_table_entry"><para role="func_signature">
        <indexterm>
@ -23427,25 +23443,6 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
        This is equivalent to <function>current_user</function>.
       </para></entry>
      </row>
-
-      <row>
-       <entry role="func_table_entry"><para role="func_signature">
-        <indexterm>
-         <primary>version</primary>
-        </indexterm>
-        <function>version</function> ()
-        <returnvalue>text</returnvalue>
-       </para>
-       <para>
-        Returns a string describing the <productname>PostgreSQL</productname>
-        server's version.  You can also get this information from
-        <xref linkend="guc-server-version"/>, or for a machine-readable
-        version use <xref linkend="guc-server-version-num"/>.  Software
-        developers should use <varname>server_version_num</varname> (available
-        since 8.2) or <xref linkend="libpq-PQserverVersion"/> instead of
-        parsing the text version.
-       </para></entry>
-      </row>
     </tbody>
    </tgroup>
   </table>
@ -26332,6 +26329,80 @@ SELECT collation for ('foo' COLLATE "de_DE");

  </sect2>

+  <sect2 id="functions-info-version">
+   <title>Version Information Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-version"/>
+    print version information.
+   </para>
+
+   <table id="functions-version">
+    <title>Version Information Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>version</primary>
+        </indexterm>
+        <function>version</function> ()
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Returns a string describing the <productname>PostgreSQL</productname>
+        server's version.  You can also get this information from
+        <xref linkend="guc-server-version"/>, or for a machine-readable
+        version use <xref linkend="guc-server-version-num"/>.  Software
+        developers should use <varname>server_version_num</varname> (available
+        since 8.2) or <xref linkend="libpq-PQserverVersion"/> instead of
+        parsing the text version.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>unicode_version</primary>
+        </indexterm>
+        <function>unicode_version</function> ()
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Returns a string representing the version of Unicode used by
+        <productname>PostgreSQL</productname>.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>icu_unicode_version</primary>
+        </indexterm>
+        <function>icu_unicode_version</function> ()
+        <returnvalue>text</returnvalue>
+       </para>
+       <para>
+        Returns a string representing the version of Unicode used by ICU, if
+        the server was built with ICU support; otherwise returns
+        <literal>NULL</literal> </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
  </sect1>

  <sect1 id="functions-admin">
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@ -23,7 +23,9 @@
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
 #include "common/int.h"
+#include "common/unicode_category.h"
 #include "common/unicode_norm.h"
+#include "common/unicode_version.h"
 #include "funcapi.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@ -6237,6 +6239,65 @@ unicode_norm_form_from_string(const char *formstr)
 	return form;
 }

+/*
+ * Returns version of Unicode used by Postgres in "major.minor" format (the
+ * same format as the Unicode version reported by ICU). The third component
+ * ("update version") never involves additions to the character repertiore and
+ * is unimportant for most purposes.
+ *
+ * See: https://unicode.org/versions/
+ */
+Datum
+unicode_version(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TEXT_P(cstring_to_text(PG_UNICODE_VERSION));
+}
+
+/*
+ * Returns version of Unicode used by ICU, if enabled; otherwise NULL.
+ */
+Datum
+icu_unicode_version(PG_FUNCTION_ARGS)
+{
+#ifdef USE_ICU
+	PG_RETURN_TEXT_P(cstring_to_text(U_UNICODE_VERSION));
+#else
+	PG_RETURN_NULL();
+#endif
+}
+
+/*
+ * Check whether the string contains only assigned Unicode code
+ * points. Requires that the database encoding is UTF-8.
+ */
+Datum
+unicode_assigned(PG_FUNCTION_ARGS)
+{
+	text	   *input = PG_GETARG_TEXT_PP(0);
+	unsigned char *p;
+	int			size;
+
+	if (GetDatabaseEncoding() != PG_UTF8)
+		ereport(ERROR,
+				(errmsg("Unicode categorization can only be performed if server encoding is UTF8")));
+
+	/* convert to pg_wchar */
+	size = pg_mbstrlen_with_len(VARDATA_ANY(input), VARSIZE_ANY_EXHDR(input));
+	p = (unsigned char *) VARDATA_ANY(input);
+	for (int i = 0; i < size; i++)
+	{
+		pg_wchar	uchar = utf8_to_unicode(p);
+		int			category = unicode_category(uchar);
+
+		if (category == PG_U_UNASSIGNED)
+			PG_RETURN_BOOL(false);
+
+		p += pg_utf_mblen(p);
+	}
+
+	PG_RETURN_BOOL(true);
+}
+
 Datum
 unicode_normalize_func(PG_FUNCTION_ARGS)
 {
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@ -1473,7 +1473,9 @@ check_GUC_init(struct config_generic *gconf)
 			{
 				struct config_string *conf = (struct config_string *) gconf;

-				if (*conf->variable != NULL && strcmp(*conf->variable, conf->boot_val) != 0)
+				if (*conf->variable != NULL &&
+					(conf->boot_val == NULL ||
+					 strcmp(*conf->variable, conf->boot_val) != 0))
 				{
 					elog(LOG, "GUC (PGC_STRING) %s, boot_val=%s, C-var=%s",
 						 conf->gen.name, conf->boot_val ? conf->boot_val : "<null>", *conf->variable);
@ -4213,8 +4215,7 @@ SetConfigOption(const char *name, const char *value,
 /*
 * Fetch the current value of the option `name', as a string.
 *
- * If the option doesn't exist, return NULL if missing_ok is true (NOTE that
- * this cannot be distinguished from a string variable with a NULL value!),
+ * If the option doesn't exist, return NULL if missing_ok is true,
 * otherwise throw an ereport and don't return.
 *
 * If restrict_privileged is true, we also enforce that only superusers and
@ -4257,7 +4258,8 @@ GetConfigOption(const char *name, bool missing_ok, bool restrict_privileged)
 			return buffer;

 		case PGC_STRING:
-			return *((struct config_string *) record)->variable;
+			return *((struct config_string *) record)->variable ?
+				*((struct config_string *) record)->variable : "";

 		case PGC_ENUM:
 			return config_enum_lookup_by_value((struct config_enum *) record,
@ -4304,7 +4306,8 @@ GetConfigOptionResetString(const char *name)
 			return buffer;

 		case PGC_STRING:
-			return ((struct config_string *) record)->reset_val;
+			return ((struct config_string *) record)->reset_val ?
+				((struct config_string *) record)->reset_val : "";

 		case PGC_ENUM:
 			return config_enum_lookup_by_value((struct config_enum *) record,
@ -5255,7 +5258,14 @@ get_explain_guc_options(int *num)
 				{
 					struct config_string *lconf = (struct config_string *) conf;

-					modified = (strcmp(lconf->boot_val, *(lconf->variable)) != 0);
+					if (lconf->boot_val == NULL &&
+						*lconf->variable == NULL)
+						modified = false;
+					else if (lconf->boot_val == NULL ||
+							 *lconf->variable == NULL)
+						modified = true;
+					else
+						modified = (strcmp(lconf->boot_val, *(lconf->variable)) != 0);
 				}
 				break;

@ -5482,7 +5492,8 @@ write_one_nondefault_variable(FILE *fp, struct config_generic *gconf)
 			{
 				struct config_string *conf = (struct config_string *) gconf;

-				fprintf(fp, "%s", *conf->variable);
+				if (*conf->variable)
+					fprintf(fp, "%s", *conf->variable);
 			}
 			break;

--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@ -248,9 +248,17 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error)
 	 * invalidation of slots during the upgrade. We set this option when
 	 * cluster is PG17 or later because logical replication slots can only be
 	 * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
+	 *
+	 * Use max_logical_replication_workers as 0 to prevent a startup of the
+	 * logical replication launcher while upgrading because it may start apply
+	 * workers that could start receiving changes from the publisher before
+	 * the physical files are put in place, causing corruption on the new
+	 * cluster upgrading to.  Like the previous parameter, this is set only
+	 * when a cluster is PG17 or later as logical slots can only be migrated
+	 * since this version.
 	 */
 	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
-		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+		appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1 -c max_logical_replication_workers=0");

 	/* Use -b to disable autovacuum. */
 	snprintf(cmd, sizeof(cmd),
--- a/src/common/Makefile
+++ b/src/common/Makefile
@ -78,6 +78,7 @@ OBJS_COMMON = \
 	scram-common.o \
 	string.o \
 	stringinfo.o \
+	unicode_category.o \
 	unicode_norm.o \
 	username.o \
 	wait_error.o \
--- a/src/common/meson.build
+++ b/src/common/meson.build
@ -30,6 +30,7 @@ common_sources = files(
  'scram-common.c',
  'string.c',
  'stringinfo.c',
+  'unicode_category.c',
  'unicode_norm.c',
  'username.c',
  'wait_error.c',
--- a/src/common/unicode/Makefile
+++ b/src/common/unicode/Makefile
@ -15,11 +15,15 @@ include $(top_builddir)/src/Makefile.global
 override CPPFLAGS := -DFRONTEND -I. $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)

+LDFLAGS_INTERNAL += $(ICU_LIBS)
+CPPFLAGS += $(ICU_CFLAGS)
+
 # By default, do nothing.
 all:

-update-unicode: unicode_norm_table.h unicode_nonspacing_table.h unicode_east_asian_fw_table.h unicode_normprops_table.h unicode_norm_hashfunc.h
+update-unicode: unicode_category_table.h unicode_norm_table.h unicode_nonspacing_table.h unicode_east_asian_fw_table.h unicode_normprops_table.h unicode_norm_hashfunc.h unicode_version.h
 	mv $^ $(top_srcdir)/src/include/common/
+	$(MAKE) category-check
 	$(MAKE) normalization-check

 # These files are part of the Unicode Character Database. Download
@ -28,6 +32,12 @@ update-unicode: unicode_norm_table.h unicode_nonspacing_table.h unicode_east_asi
 UnicodeData.txt EastAsianWidth.txt DerivedNormalizationProps.txt CompositionExclusions.txt NormalizationTest.txt: $(top_builddir)/src/Makefile.global
 	$(DOWNLOAD) https://www.unicode.org/Public/$(UNICODE_VERSION)/ucd/$(@F)

+unicode_version.h: generate-unicode_version.pl
+	$(PERL) $< --version $(UNICODE_VERSION)
+
+unicode_category_table.h: generate-unicode_category_table.pl UnicodeData.txt
+	$(PERL) $<
+
 # Generation of conversion tables used for string normalization with
 # UTF-8 strings.
 unicode_norm_hashfunc.h: unicode_norm_table.h
@ -45,9 +55,14 @@ unicode_normprops_table.h: generate-unicode_normprops_table.pl DerivedNormalizat
 	$(PERL) $^ >$@

 # Test suite
+category-check: category_test
+	./category_test
+
 normalization-check: norm_test
 	./norm_test

+category_test: category_test.o ../unicode_category.o | submake-common
+
 norm_test: norm_test.o ../unicode_norm.o | submake-common

 norm_test.o: norm_test_table.h
@ -64,7 +79,7 @@ norm_test_table.h: generate-norm_test_table.pl NormalizationTest.txt


 clean:
-	rm -f $(OBJS) norm_test norm_test.o
+	rm -f $(OBJS) category_test category_test.o norm_test norm_test.o

 distclean: clean
 	rm -f UnicodeData.txt EastAsianWidth.txt CompositionExclusions.txt NormalizationTest.txt norm_test_table.h unicode_norm_table.h
--- a/src/common/unicode/category_test.c
+++ b/src/common/unicode/category_test.c
@ -0,0 +1,108 @@
+/*-------------------------------------------------------------------------
+ * category_test.c
+ *		Program to test Unicode general category functions.
+ *
+ * Portions Copyright (c) 2017-2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/unicode/category_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+#include "common/unicode_category.h"
+#include "common/unicode_version.h"
+
+/*
+ * Parse version into integer for easy comparison.
+ */
+#ifdef USE_ICU
+static int
+parse_unicode_version(const char *version)
+{
+	int			n,
+				major,
+				minor;
+
+	n = sscanf(version, "%d.%d", &major, &minor);
+
+	Assert(n == 2);
+	Assert(minor < 100);
+
+	return major * 100 + minor;
+}
+#endif
+
+/*
+ * Exhaustively test that the Unicode category for each codepoint matches that
+ * returned by ICU.
+ */
+int
+main(int argc, char **argv)
+{
+#ifdef USE_ICU
+	int			pg_unicode_version = parse_unicode_version(PG_UNICODE_VERSION);
+	int			icu_unicode_version = parse_unicode_version(U_UNICODE_VERSION);
+	int			pg_skipped_codepoints = 0;
+	int			icu_skipped_codepoints = 0;
+
+	printf("Postgres Unicode Version:\t%s\n", PG_UNICODE_VERSION);
+	printf("ICU Unicode Version:\t\t%s\n", U_UNICODE_VERSION);
+
+	for (UChar32 code = 0; code <= 0x10ffff; code++)
+	{
+		uint8_t		pg_category = unicode_category(code);
+		uint8_t		icu_category = u_charType(code);
+
+		if (pg_category != icu_category)
+		{
+			/*
+			 * A version mismatch means that some assigned codepoints in the
+			 * newer version may be unassigned in the older version. That's
+			 * OK, though the test will not cover those codepoints marked
+			 * unassigned in the older version (that is, it will no longer be
+			 * an exhaustive test).
+			 */
+			if (pg_category == PG_U_UNASSIGNED &&
+				pg_unicode_version < icu_unicode_version)
+				pg_skipped_codepoints++;
+			else if (icu_category == PG_U_UNASSIGNED &&
+					 icu_unicode_version < pg_unicode_version)
+				icu_skipped_codepoints++;
+			else
+			{
+				printf("FAILURE for codepoint %06x\n", code);
+				printf("Postgres category:	%02d %s %s\n", pg_category,
+					   unicode_category_abbrev(pg_category),
+					   unicode_category_string(pg_category));
+				printf("ICU category:		%02d %s %s\n", icu_category,
+					   unicode_category_abbrev(icu_category),
+					   unicode_category_string(icu_category));
+				printf("\n");
+				exit(1);
+			}
+		}
+	}
+
+	if (pg_skipped_codepoints > 0)
+		printf("Skipped %d codepoints unassigned in Postgres due to Unicode version mismatch.\n",
+			   pg_skipped_codepoints);
+	if (icu_skipped_codepoints > 0)
+		printf("Skipped %d codepoints unassigned in ICU due to Unicode version mismatch.\n",
+			   icu_skipped_codepoints);
+
+	printf("category_test: All tests successful!\n");
+	exit(0);
+#else
+	printf("ICU support required for test; skipping.\n");
+	exit(0);
+#endif
+}
--- a/src/common/unicode/generate-unicode_category_table.pl
+++ b/src/common/unicode/generate-unicode_category_table.pl
@ -0,0 +1,204 @@
+#!/usr/bin/perl
+#
+# Generate a code point category table and its lookup utilities, using
+# Unicode data files as input.
+#
+# Input: UnicodeData.txt
+# Output: unicode_category_table.h
+#
+# Copyright (c) 2000-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+use FindBin;
+use lib "$FindBin::RealBin/../../tools/";
+
+my $CATEGORY_UNASSIGNED = 'Cn';
+
+my $output_path = '.';
+
+GetOptions('outdir:s' => \$output_path);
+
+my $output_table_file = "$output_path/unicode_category_table.h";
+
+my $FH;
+
+# Read entries from UnicodeData.txt into a list of codepoint ranges
+# and their general category.
+my @category_ranges = ();
+my $range_start = undef;
+my $range_end = undef;
+my $range_category = undef;
+
+# If between a "<..., First>" entry and a "<..., Last>" entry, the gap in
+# codepoints represents a range, and $gap_category is equal to the
+# category for both (which must match). Otherwise, the gap represents
+# unassigned code points.
+my $gap_category = undef;
+
+open($FH, '<', "$output_path/UnicodeData.txt")
+  or die "Could not open $output_path/UnicodeData.txt: $!.";
+while (my $line = <$FH>)
+{
+	my @elts = split(';', $line);
+	my $code = hex($elts[0]);
+	my $name = $elts[1];
+	my $category = $elts[2];
+
+	die "codepoint out of range" if $code > 0x10FFFF;
+	die "unassigned codepoint in UnicodeData.txt" if $category eq $CATEGORY_UNASSIGNED;
+
+	if (!defined($range_start)) {
+		my $code_str = sprintf "0x%06x", $code;
+		die if defined($range_end) || defined($range_category) || defined($gap_category);
+		die "unexpected first entry <..., Last>" if ($name =~ /Last>/);
+		die "expected 0x000000 for first entry, got $code_str" if $code != 0x000000;
+
+		# initialize
+		$range_start = $code;
+		$range_end = $code;
+		$range_category = $category;
+		if ($name =~ /<.*, First>$/) {
+			$gap_category = $category;
+		} else {
+			$gap_category = $CATEGORY_UNASSIGNED;
+		}
+		next;
+	}
+
+	# Gap in codepoints detected. If it's a different category than
+	# the current range, emit the current range and initialize a new
+	# range representing the gap.
+	if ($range_end + 1 != $code && $range_category ne $gap_category) {
+		push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+		$range_start = $range_end + 1;
+		$range_end = $code - 1;
+		$range_category = $gap_category;
+	}
+
+	# different category; new range
+	if ($range_category ne $category) {
+		push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+		$range_start = $code;
+		$range_end = $code;
+		$range_category = $category;
+	}
+
+	if ($name =~ /<.*, First>$/) {
+		die "<..., First> entry unexpectedly follows another <..., First> entry"
+		  if $gap_category ne $CATEGORY_UNASSIGNED;
+		$gap_category = $category;
+	}
+	elsif ($name =~ /<.*, Last>$/) {
+		die "<..., First> and <..., Last> entries have mismatching general category"
+		  if $gap_category ne $category;
+		$gap_category = $CATEGORY_UNASSIGNED;
+	}
+	else {
+		die "unexpected entry found between <..., First> and <..., Last>"
+		  if $gap_category ne $CATEGORY_UNASSIGNED;
+	}
+
+	$range_end = $code;
+}
+close $FH;
+
+die "<..., First> entry with no corresponding <..., Last> entry"
+  if $gap_category ne $CATEGORY_UNASSIGNED;
+
+# emit final range
+push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+
+# emit range for any unassigned code points after last entry
+if ($range_end < 0x10FFFF) {
+	$range_start = $range_end + 1;
+	$range_end = 0x10FFFF;
+	$range_category = $CATEGORY_UNASSIGNED;
+	push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+}
+
+my $num_ranges = scalar @category_ranges;
+
+# See: https://www.unicode.org/reports/tr44/#General_Category_Values
+my $categories = {
+	Cn => 'PG_U_UNASSIGNED',
+	Lu => 'PG_U_UPPERCASE_LETTER',
+	Ll => 'PG_U_LOWERCASE_LETTER',
+	Lt => 'PG_U_TITLECASE_LETTER',
+	Lm => 'PG_U_MODIFIER_LETTER',
+	Lo => 'PG_U_OTHER_LETTER',
+	Mn => 'PG_U_NONSPACING_MARK',
+	Me => 'PG_U_ENCLOSING_MARK',
+	Mc => 'PG_U_SPACING_MARK',
+	Nd => 'PG_U_DECIMAL_NUMBER',
+	Nl => 'PG_U_LETTER_NUMBER',
+	No => 'PG_U_OTHER_NUMBER',
+	Zs => 'PG_U_SPACE_SEPARATOR',
+	Zl => 'PG_U_LINE_SEPARATOR',
+	Zp => 'PG_U_PARAGRAPH_SEPARATOR',
+	Cc => 'PG_U_CONTROL',
+	Cf => 'PG_U_FORMAT',
+	Co => 'PG_U_PRIVATE_USE',
+	Cs => 'PG_U_SURROGATE',
+	Pd => 'PG_U_DASH_PUNCTUATION',
+	Ps => 'PG_U_OPEN_PUNCTUATION',
+	Pe => 'PG_U_CLOSE_PUNCTUATION',
+	Pc => 'PG_U_CONNECTOR_PUNCTUATION',
+	Po => 'PG_U_OTHER_PUNCTUATION',
+	Sm => 'PG_U_MATH_SYMBOL',
+	Sc => 'PG_U_CURRENCY_SYMBOL',
+	Sk => 'PG_U_MODIFIER_SYMBOL',
+	So => 'PG_U_OTHER_SYMBOL',
+	Pi => 'PG_U_INITIAL_PUNCTUATION',
+	Pf => 'PG_U_FINAL_PUNCTUATION'
+};
+
+# Start writing out the output files
+open my $OT, '>', $output_table_file
+  or die "Could not open output file $output_table_file: $!\n";
+
+print $OT <<HEADER;
+/*-------------------------------------------------------------------------
+ *
+ * unicode_category_table.h
+ *	  Category table for Unicode character classification.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/unicode_category_table.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common/unicode_category.h"
+
+/*
+ * File auto-generated by src/common/unicode/generate-unicode_category_table.pl,
+ * do not edit. There is deliberately not an #ifndef PG_UNICODE_CATEGORY_TABLE_H
+ * here.
+ */
+typedef struct
+{
+	uint32		first;			/* Unicode codepoint */
+	uint32		last;			/* Unicode codepoint */
+	uint8		category;		/* General Category */
+}			pg_category_range;
+
+/* table of Unicode codepoint ranges and their categories */
+static const pg_category_range unicode_categories[$num_ranges] =
+{
+HEADER
+
+my $firsttime = 1;
+foreach my $range (@category_ranges) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	my $category = $categories->{$range->{category}};
+	die "category missing: $range->{category}" unless $category;
+	printf $OT "\t{0x%06x, 0x%06x, %s}", $range->{start}, $range->{end}, $category;
+}
+print $OT "\n};\n";
--- a/src/common/unicode/generate-unicode_version.pl
+++ b/src/common/unicode/generate-unicode_version.pl
@ -0,0 +1,46 @@
+#!/usr/bin/perl
+#
+# Generate header file with Unicode version used by Postgres.
+#
+# Output: unicode_version.h
+#
+# Copyright (c) 2000-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+use FindBin;
+use lib "$FindBin::RealBin/../../tools/";
+
+my $output_path = '.';
+my $version_str = undef;
+
+GetOptions('outdir:s' => \$output_path, 'version:s' => \$version_str);
+
+my @version_parts = split /\./, $version_str;
+
+my $unicode_version_str = sprintf "%d.%d", $version_parts[0], $version_parts[1];
+
+my $output_file = "$output_path/unicode_version.h";
+
+# Start writing out the output files
+open my $OT, '>', $output_file
+  or die "Could not open output file $output_file: $!\n";
+
+print $OT <<HEADER;
+/*-------------------------------------------------------------------------
+ *
+ * unicode_version.h
+ *	  Unicode version used by Postgres.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/unicode_version.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define PG_UNICODE_VERSION		"$unicode_version_str"
+HEADER
--- a/src/common/unicode/meson.build
+++ b/src/common/unicode/meson.build
@ -24,6 +24,25 @@ endforeach

 update_unicode_targets = []

+update_unicode_targets += \
+  custom_target('unicode_version.h',
+    output: ['unicode_version.h'],
+    command: [
+      perl, files('generate-unicode_version.pl'),
+      '--outdir', '@OUTDIR@', '--version', UNICODE_VERSION],
+    build_by_default: false,
+  )
+
+update_unicode_targets += \
+  custom_target('unicode_category_table.h',
+    input: [unicode_data['UnicodeData.txt']],
+    output: ['unicode_category_table.h'],
+    command: [
+      perl, files('generate-unicode_category_table.pl'),
+      '--outdir', '@OUTDIR@', '@INPUT@'],
+    build_by_default: false,
+  )
+
 update_unicode_targets += \
  custom_target('unicode_norm_table.h',
    input: [unicode_data['UnicodeData.txt'], unicode_data['CompositionExclusions.txt']],
@ -73,6 +92,17 @@ norm_test_table = custom_target('norm_test_table.h',

 inc = include_directories('.')

+category_test = executable('category_test',
+  ['category_test.c'],
+  dependencies: [frontend_port_code, icu],
+  include_directories: inc,
+  link_with: [common_static, pgport_static],
+  build_by_default: false,
+  kwargs: default_bin_args + {
+    'install': false,
+  }
+)
+
 norm_test = executable('norm_test',
  ['norm_test.c', norm_test_table],
  dependencies: [frontend_port_code],
@ -86,6 +116,16 @@ norm_test = executable('norm_test',

 update_unicode_dep = []

+if not meson.is_cross_build()
+  update_unicode_dep += custom_target('category_test.run',
+    output: 'category_test.run',
+    input: update_unicode_targets,
+    command: [category_test, UNICODE_VERSION],
+    build_by_default: false,
+    build_always_stale: true,
+  )
+endif
+
 if not meson.is_cross_build()
  update_unicode_dep += custom_target('norm_test.run',
    output: 'norm_test.run',
--- a/src/common/unicode/norm_test.c
+++ b/src/common/unicode/norm_test.c
@ -81,6 +81,6 @@ main(int argc, char **argv)
 		}
 	}

-	printf("All tests successful!\n");
+	printf("norm_test: All tests successful!\n");
 	exit(0);
 }
--- a/src/common/unicode_category.c
+++ b/src/common/unicode_category.c
@ -0,0 +1,195 @@
+/*-------------------------------------------------------------------------
+ * unicode_category.c
+ *		Determine general category of Unicode characters.
+ *
+ * Portions Copyright (c) 2017-2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/unicode_category.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/unicode_category.h"
+#include "common/unicode_category_table.h"
+
+/*
+ * Unicode general category for the given codepoint.
+ */
+pg_unicode_category
+unicode_category(pg_wchar ucs)
+{
+	int			min = 0;
+	int			mid;
+	int			max = lengthof(unicode_categories) - 1;
+
+	Assert(ucs >= unicode_categories[0].first &&
+		   ucs <= unicode_categories[max].last);
+
+	while (max >= min)
+	{
+		mid = (min + max) / 2;
+		if (ucs > unicode_categories[mid].last)
+			min = mid + 1;
+		else if (ucs < unicode_categories[mid].first)
+			max = mid - 1;
+		else
+			return unicode_categories[mid].category;
+	}
+
+	Assert(false);
+	return (pg_unicode_category) - 1;
+}
+
+/*
+ * Description of Unicode general category.
+ */
+const char *
+unicode_category_string(pg_unicode_category category)
+{
+	switch (category)
+	{
+		case PG_U_UNASSIGNED:
+			return "Unassigned";
+		case PG_U_UPPERCASE_LETTER:
+			return "Uppercase_Letter";
+		case PG_U_LOWERCASE_LETTER:
+			return "Lowercase_Letter";
+		case PG_U_TITLECASE_LETTER:
+			return "Titlecase_Letter";
+		case PG_U_MODIFIER_LETTER:
+			return "Modifier_Letter";
+		case PG_U_OTHER_LETTER:
+			return "Other_Letter";
+		case PG_U_NONSPACING_MARK:
+			return "Nonspacing_Mark";
+		case PG_U_ENCLOSING_MARK:
+			return "Enclosing_Mark";
+		case PG_U_SPACING_MARK:
+			return "Spacing_Mark";
+		case PG_U_DECIMAL_NUMBER:
+			return "Decimal_Number";
+		case PG_U_LETTER_NUMBER:
+			return "Letter_Number";
+		case PG_U_OTHER_NUMBER:
+			return "Other_Number";
+		case PG_U_SPACE_SEPARATOR:
+			return "Space_Separator";
+		case PG_U_LINE_SEPARATOR:
+			return "Line_Separator";
+		case PG_U_PARAGRAPH_SEPARATOR:
+			return "Paragraph_Separator";
+		case PG_U_CONTROL:
+			return "Control";
+		case PG_U_FORMAT:
+			return "Format";
+		case PG_U_PRIVATE_USE:
+			return "Private_Use";
+		case PG_U_SURROGATE:
+			return "Surrogate";
+		case PG_U_DASH_PUNCTUATION:
+			return "Dash_Punctuation";
+		case PG_U_OPEN_PUNCTUATION:
+			return "Open_Punctuation";
+		case PG_U_CLOSE_PUNCTUATION:
+			return "Close_Punctuation";
+		case PG_U_CONNECTOR_PUNCTUATION:
+			return "Connector_Punctuation";
+		case PG_U_OTHER_PUNCTUATION:
+			return "Other_Punctuation";
+		case PG_U_MATH_SYMBOL:
+			return "Math_Symbol";
+		case PG_U_CURRENCY_SYMBOL:
+			return "Currency_Symbol";
+		case PG_U_MODIFIER_SYMBOL:
+			return "Modifier_Symbol";
+		case PG_U_OTHER_SYMBOL:
+			return "Other_Symbol";
+		case PG_U_INITIAL_PUNCTUATION:
+			return "Initial_Punctuation";
+		case PG_U_FINAL_PUNCTUATION:
+			return "Final_Punctuation";
+	}
+
+	Assert(false);
+	return "Unrecognized";		/* keep compiler quiet */
+}
+
+/*
+ * Short code for Unicode general category.
+ */
+const char *
+unicode_category_abbrev(pg_unicode_category category)
+{
+	switch (category)
+	{
+		case PG_U_UNASSIGNED:
+			return "Cn";
+		case PG_U_UPPERCASE_LETTER:
+			return "Lu";
+		case PG_U_LOWERCASE_LETTER:
+			return "Ll";
+		case PG_U_TITLECASE_LETTER:
+			return "Lt";
+		case PG_U_MODIFIER_LETTER:
+			return "Lm";
+		case PG_U_OTHER_LETTER:
+			return "Lo";
+		case PG_U_NONSPACING_MARK:
+			return "Mn";
+		case PG_U_ENCLOSING_MARK:
+			return "Me";
+		case PG_U_SPACING_MARK:
+			return "Mc";
+		case PG_U_DECIMAL_NUMBER:
+			return "Nd";
+		case PG_U_LETTER_NUMBER:
+			return "Nl";
+		case PG_U_OTHER_NUMBER:
+			return "No";
+		case PG_U_SPACE_SEPARATOR:
+			return "Zs";
+		case PG_U_LINE_SEPARATOR:
+			return "Zl";
+		case PG_U_PARAGRAPH_SEPARATOR:
+			return "Zp";
+		case PG_U_CONTROL:
+			return "Cc";
+		case PG_U_FORMAT:
+			return "Cf";
+		case PG_U_PRIVATE_USE:
+			return "Co";
+		case PG_U_SURROGATE:
+			return "Cs";
+		case PG_U_DASH_PUNCTUATION:
+			return "Pd";
+		case PG_U_OPEN_PUNCTUATION:
+			return "Ps";
+		case PG_U_CLOSE_PUNCTUATION:
+			return "Pe";
+		case PG_U_CONNECTOR_PUNCTUATION:
+			return "Pc";
+		case PG_U_OTHER_PUNCTUATION:
+			return "Po";
+		case PG_U_MATH_SYMBOL:
+			return "Sm";
+		case PG_U_CURRENCY_SYMBOL:
+			return "Sc";
+		case PG_U_MODIFIER_SYMBOL:
+			return "Sk";
+		case PG_U_OTHER_SYMBOL:
+			return "So";
+		case PG_U_INITIAL_PUNCTUATION:
+			return "Pi";
+		case PG_U_FINAL_PUNCTUATION:
+			return "Pf";
+	}
+
+	Assert(false);
+	return "??";				/* keep compiler quiet */
+}
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@ -12019,6 +12019,18 @@
  proname => 'pg_partition_root', prorettype => 'regclass',
  proargtypes => 'regclass', prosrc => 'pg_partition_root' },

+{ oid => '4549', descr => 'Unicode version used by Postgres',
+  proname => 'unicode_version', prorettype => 'text', proargtypes => '',
+  prosrc => 'unicode_version' },
+
+{ oid => '6099', descr => 'Unicode version used by ICU, if enabled',
+  proname => 'icu_unicode_version', prorettype => 'text', proargtypes => '',
+  prosrc => 'icu_unicode_version' },
+
+{ oid => '6105', descr => 'check valid Unicode',
+  proname => 'unicode_assigned', prorettype => 'bool', proargtypes => 'text',
+  prosrc => 'unicode_assigned' },
+
 { oid => '4350', descr => 'Unicode normalization',
  proname => 'normalize', prorettype => 'text', proargtypes => 'text text',
  prosrc => 'unicode_normalize_func' },
--- a/src/include/common/unicode_category.h
+++ b/src/include/common/unicode_category.h
@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * unicode_category.h
+ *	  Routines for determining the category of Unicode characters.
+ *
+ * These definitions can be used by both frontend and backend code.
+ *
+ * Copyright (c) 2017-2023, PostgreSQL Global Development Group
+ *
+ * src/include/common/unicode_category.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef UNICODE_CATEGORY_H
+#define UNICODE_CATEGORY_H
+
+#include "mb/pg_wchar.h"
+
+/*
+ * Unicode General Category Values
+ *
+ * See: https://www.unicode.org/reports/tr44/#General_Category_Values
+ *
+ * The Unicode stability policy guarantees: "The enumeration of
+ * General_Category property values is fixed. No new values will be
+ * added". See: https://www.unicode.org/policies/stability_policy.html
+ *
+ * Numeric values chosen to match corresponding ICU UCharCategory.
+ */
+typedef enum pg_unicode_category
+{
+	PG_U_UNASSIGNED = 0,		/* Cn */
+	PG_U_UPPERCASE_LETTER = 1,	/* Lu */
+	PG_U_LOWERCASE_LETTER = 2,	/* Ll */
+	PG_U_TITLECASE_LETTER = 3,	/* Lt */
+	PG_U_MODIFIER_LETTER = 4,	/* Lm */
+	PG_U_OTHER_LETTER = 5,		/* Lo */
+	PG_U_NONSPACING_MARK = 6,	/* Mn */
+	PG_U_ENCLOSING_MARK = 7,	/* Me */
+	PG_U_SPACING_MARK = 8,		/* Mc */
+	PG_U_DECIMAL_NUMBER = 9,	/* Nd */
+	PG_U_LETTER_NUMBER = 10,	/* Nl */
+	PG_U_OTHER_NUMBER = 11,		/* No */
+	PG_U_SPACE_SEPARATOR = 12,	/* Zs */
+	PG_U_LINE_SEPARATOR = 13,	/* Zl */
+	PG_U_PARAGRAPH_SEPARATOR = 14,	/* Zp */
+	PG_U_CONTROL = 15,			/* Cc */
+	PG_U_FORMAT = 16,			/* Cf */
+	PG_U_PRIVATE_USE = 17,		/* Co */
+	PG_U_SURROGATE = 18,		/* Cs */
+	PG_U_DASH_PUNCTUATION = 19, /* Pd */
+	PG_U_OPEN_PUNCTUATION = 20, /* Ps */
+	PG_U_CLOSE_PUNCTUATION = 21,	/* Pe */
+	PG_U_CONNECTOR_PUNCTUATION = 22,	/* Pc */
+	PG_U_OTHER_PUNCTUATION = 23,	/* Po */
+	PG_U_MATH_SYMBOL = 24,		/* Sm */
+	PG_U_CURRENCY_SYMBOL = 25,	/* Sc */
+	PG_U_MODIFIER_SYMBOL = 26,	/* Sk */
+	PG_U_OTHER_SYMBOL = 27,		/* So */
+	PG_U_INITIAL_PUNCTUATION = 28,	/* Pi */
+	PG_U_FINAL_PUNCTUATION = 29 /* Pf */
+} pg_unicode_category;
+
+extern pg_unicode_category unicode_category(pg_wchar ucs);
+const char *unicode_category_string(pg_unicode_category category);
+const char *unicode_category_abbrev(pg_unicode_category category);
+
+#endif							/* UNICODE_CATEGORY_H */
--- a/src/include/common/unicode_category_table.h
+++ b/src/include/common/unicode_category_table.h
--- a/src/include/common/unicode_version.h
+++ b/src/include/common/unicode_version.h
@ -0,0 +1,14 @@
+/*-------------------------------------------------------------------------
+ *
+ * unicode_version.h
+ *	  Unicode version used by Postgres.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/unicode_version.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define PG_UNICODE_VERSION		"15.1"
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@ -240,6 +240,16 @@ struct config_real
 	void	   *reset_extra;
 };

+/*
+ * A note about string GUCs: the boot_val is allowed to be NULL, which leads
+ * to the reset_val and the actual variable value (*variable) also being NULL.
+ * However, there is no way to set a NULL value subsequently using
+ * set_config_option or any other GUC API.  Also, GUC APIs such as SHOW will
+ * display a NULL value as an empty string.  Callers that choose to use a NULL
+ * boot_val should overwrite the setting later in startup, or else be careful
+ * that NULL doesn't have semantics that are visibly different from an empty
+ * string.
+ */
 struct config_string
 {
 	struct config_generic gen;
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@ -27,6 +27,10 @@ CREATE TABLE icu (def text, en text COLLATE "en-x-icu", upfirst text COLLATE upp
 INSERT INTO icu VALUES ('a', 'a', 'a'), ('b', 'b', 'b'), ('A', 'A', 'A'), ('B', 'B', 'B');
 });

+is( $node1->safe_psql('dbicu', q{SELECT icu_unicode_version() IS NOT NULL}),
+	qq(t),
+	'ICU unicode version defined');
+
 is( $node1->safe_psql('dbicu', q{SELECT def FROM icu ORDER BY def}),
 	qq(A
 a
--- a/src/test/regress/expected/unicode.out
+++ b/src/test/regress/expected/unicode.out
@ -8,6 +8,24 @@ SELECT U&'\0061\0308bc' <> U&'\00E4bc' COLLATE "C" AS sanity_check;
 t
 (1 row)

+SELECT unicode_version() IS NOT NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT unicode_assigned(U&'abc');
+ unicode_assigned 
+------------------
+ t
+(1 row)
+
+SELECT unicode_assigned(U&'abc\+10FFFF');
+ unicode_assigned 
+------------------
+ f
+(1 row)
+
 SELECT normalize('');
 normalize 
 -----------
--- a/src/test/regress/sql/unicode.sql
+++ b/src/test/regress/sql/unicode.sql
@ -5,6 +5,10 @@ SELECT getdatabaseencoding() <> 'UTF8' AS skip_test \gset

 SELECT U&'\0061\0308bc' <> U&'\00E4bc' COLLATE "C" AS sanity_check;

+SELECT unicode_version() IS NOT NULL;
+SELECT unicode_assigned(U&'abc');
+SELECT unicode_assigned(U&'abc\+10FFFF');
+
 SELECT normalize('');
 SELECT normalize(U&'\0061\0308\24D1c') = U&'\00E4\24D1c' COLLATE "C" AS test_default;
 SELECT normalize(U&'\0061\0308\24D1c', NFC) = U&'\00E4\24D1c' COLLATE "C" AS test_nfc;
Author	SHA1	Message	Date
David Rowley	b690e5facb	Stabilize postgres_fdw tests on 32-bit machines cac169d68 adjusted DEFAULT_FDW_TUPLE_COST and that seems to have caused a test to become unstable on 32-bit machines. 4b14e1871 tried to fix this as originally the plan was flipping between a Nested Loop and Hash Join. That commit forced the Nested Loop, but there's still flexibility to push or not push the sort to the remote server and 32-bit seems to prefer to push and on 64-bit, the costs prefer not to. Here let's just turn off enable_sort to significantly encourage the sort to take place on the remote server. Reported-by: Michael Paquier, Richard Guo Discussion: https://postgr.es/m/ZUM2IhA8X2lrG50K@paquier.xyz	2023-11-03 12:35:37 +13:00
Tom Lane	0bc726d95a	Make GetConfigOption/GetConfigOptionResetString return "" for NULL. As per the preceding commit, GUC APIs generally expose NULL-valued string variables as empty strings. Extend that policy to GetConfigOption() and GetConfigOptionResetString(), eliminating a crash hazard for unwary callers, as well as a fundamental ambiguity in GetConfigOption()'s API. No back-patch, since this is an API change and conceivably somebody somewhere is depending on this corner case. Xing Guo, Aleksander Alekseev, Tom Lane Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com	2023-11-02 11:53:36 -04:00
Tom Lane	7704a1a72e	Be more wary about NULL values for GUC string variables. get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN has a NULL boot_val. Nosing around found a couple of other places that seemed insufficiently cautious about NULL string values, although those are likely unreachable in practice. Add some commentary defining the expectations for NULL values of string variables, in hopes of forestalling future additions of more such bugs. Xing Guo, Aleksander Alekseev, Tom Lane Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com	2023-11-02 11:47:33 -04:00
David Rowley	4b14e18714	Attempt to stabilize postgres_fdw tests cac169d68 adjusted DEFAULT_FDW_TUPLE_COST and that seems to have caused a test to become unstable on 32-bit machines. Try to make it stable again. Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZUM2IhA8X2lrG50K@paquier.xyz	2023-11-02 23:16:34 +13:00
Jeff Davis	a02b37fc08	Additional unicode primitive functions. Introduce unicode_version(), icu_unicode_version(), and unicode_assigned(). The latter requires introducing a new lookup table for the Unicode General Category, which is generated along with the other Unicode lookup tables. Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com Reviewed-by: Peter Eisentraut	2023-11-01 22:47:06 -07:00
Michael Paquier	7021d3b176	Prevent startup of logical replication launcher during pg_upgrade The logical replication launcher may start apply workers during an upgrade. This could be the cause of corruptions on a new cluster if these are able to apply changes before the physical files are copied over to the new cluster. The chance of being able to do so is small as pg_upgrade uses its own port and unix domain directory (the latter is customizable with --socketdir), but just preventing the launcher to start is safer at the end, because we are then sure that no changes will be applied. Like 29d0a77fa660 for max_slot_wal_keep_size, this is only set when a cluster uses v17 or newer. Author: Vignesh C Discussion: https://postgr.es/m/CALDaNm2g9ZKf=y8X6z6MsLCuh8WwU-=Q6pLj35NFi2M5BZNS_A@mail.gmail.com	2023-11-02 14:34:51 +09:00