doc: Update parallel join documentation for Parallel Shared Hash.

Thomas Munro

Discussion: http://postgr.es/m/CAEepm=3XdL=+bn3=WQVCCT5wwfAEv-4onKpk+XQZdwDXv6etzA@mail.gmail.com
This commit is contained in:
Robert Haas 2018-03-22 13:25:59 -04:00
parent 649f179250
commit f644c3b386

View File

@ -323,23 +323,40 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
more other tables using a nested loop, hash join, or merge join. The more other tables using a nested loop, hash join, or merge join. The
inner side of the join may be any kind of non-parallel plan that is inner side of the join may be any kind of non-parallel plan that is
otherwise supported by the planner provided that it is safe to run within otherwise supported by the planner provided that it is safe to run within
a parallel worker. For example, if a nested loop join is chosen, the a parallel worker. Depending on the join type, the inner side may also be
inner plan may be an index scan which looks up a value taken from the outer a parallel plan.
side of the join.
</para> </para>
<itemizedlist>
<listitem>
<para> <para>
Each worker will execute the inner side of the join in full. This is In a <emphasis>nested loop join</emphasis>, the inner side is always
typically not a problem for nested loops, but may be inefficient for non-parallel. Although it is executed in full, this is efficient if
cases involving hash or merge joins. For example, for a hash join, this the inner side is an index scan, because the outer tuples and thus
restriction means that an identical hash table is built in each worker the loops that look up values in the index are divided over the
process, which works fine for joins against small tables but may not be cooperating processes.
efficient when the inner table is large. For a merge join, it might mean
that each worker performs a separate sort of the inner relation, which
could be slow. Of course, in cases where a parallel plan of this type
would be inefficient, the query planner will normally choose some other
plan (possibly one which does not use parallelism) instead.
</para> </para>
</listitem>
<listitem>
<para>
In a <emphasis>merge join</emphasis>, the inner side is always
a non-parallel plan and therefore executed in full. This may be
inefficient, especially if a sort must be performed, because the work
and resulting data are duplicated in every cooperating process.
</para>
</listitem>
<listitem>
<para>
In a <emphasis>hash join</emphasis> (without the "parallel" prefix),
the inner side is executed in full by every cooperating process
to build identical copies of the hash table. This may be inefficient
if the hash table is large or the plan is expensive. In a
<emphasis>parallel hash join</emphasis>, the inner side is a
<emphasis>parallel hash</emphasis> that divides the work of building
a shared hash table over the cooperating processes.
</para>
</listitem>
</itemizedlist>
</sect2> </sect2>
<sect2 id="parallel-aggregation"> <sect2 id="parallel-aggregation">