doc: Update parallel join documentation for Parallel Shared Hash.

Thomas Munro Discussion: http://postgr.es/m/CAEepm=3XdL=+bn3=WQVCCT5wwfAEv-4onKpk+XQZdwDXv6etzA@mail.gmail.com
2025-06-17 00:02:17 -04:00 · 2018-03-22 13:25:59 -04:00 · 2018-03-22 13:25:59 -04:00 · f644c3b386
commit f644c3b386
parent 649f179250
1 changed files with 32 additions and 15 deletions
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@ -323,23 +323,40 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
    more other tables using a nested loop, hash join, or merge join.  The
    inner side of the join may be any kind of non-parallel plan that is
    otherwise supported by the planner provided that it is safe to run within
-    a parallel worker.  For example, if a nested loop join is chosen, the
+    a parallel worker.  Depending on the join type, the inner side may also be
-    inner plan may be an index scan which looks up a value taken from the outer
+    a parallel plan.
    side of the join.
  </para>
  <itemizedlist>
    <listitem>
      <para>
-    Each worker will execute the inner side of the join in full.  This is
+        In a <emphasis>nested loop join</emphasis>, the inner side is always
-    typically not a problem for nested loops, but may be inefficient for
+        non-parallel.  Although it is executed in full, this is efficient if
-    cases involving hash or merge joins.  For example, for a hash join, this
+        the inner side is an index scan, because the outer tuples and thus
-    restriction means that an identical hash table is built in each worker
+        the loops that look up values in the index are divided over the
-    process, which works fine for joins against small tables but may not be
+        cooperating processes.
    efficient when the inner table is large.  For a merge join, it might mean
    that each worker performs a separate sort of the inner relation, which
    could be slow.  Of course, in cases where a parallel plan of this type
    would be inefficient, the query planner will normally choose some other
    plan (possibly one which does not use parallelism) instead.
      </para>
    </listitem>
    <listitem>
      <para>
        In a <emphasis>merge join</emphasis>, the inner side is always
        a non-parallel plan and therefore executed in full.  This may be
        inefficient, especially if a sort must be performed, because the work
        and resulting data are duplicated in every cooperating process.
      </para>
    </listitem>
    <listitem>
      <para>
        In a <emphasis>hash join</emphasis> (without the "parallel" prefix),
        the inner side is executed in full by every cooperating process
        to build identical copies of the hash table.  This may be inefficient
        if the hash table is large or the plan is expensive.  In a
        <emphasis>parallel hash join</emphasis>, the inner side is a
        <emphasis>parallel hash</emphasis> that divides the work of building
        a shared hash table over the cooperating processes.
      </para>
    </listitem>
  </itemizedlist>
 </sect2>
 <sect2 id="parallel-aggregation">