Update documentation to reflect availability of aggregate(DISTINCT).

Try to provide a more lucid discussion in 'Using Aggregate Functions' tutorial section.
2025-11-19 00:02:46 -05:00 · 1999-12-13 17:39:38 +00:00 · 1999-12-13 17:39:38 +00:00 · ff6fe1502d
commit ff6fe1502d
parent 662371cc5d
3 changed files with 111 additions and 36 deletions
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@ -361,39 +361,90 @@ DELETE FROM classname;
    Like  most  other  query  languages, 
    <ProductName>PostgreSQL</ProductName> supports
    aggregate functions.
-    The current  implementation  of
+    An aggregate function computes a single result from multiple input rows.
-    <ProductName>Postgres</ProductName> aggregate functions have some limitations.
+    For example, there are aggregates to compute the
-    Specifically, while there  are  aggregates  to  compute
+    <Function>count</Function>, <Function>sum</Function>,
    such  functions as the <Function>count</Function>, <Function>sum</Function>,
    <Function>avg</Function> (average), <Function>max</Function> (maximum) and
-    <Function>min</Function> (minimum) over a set of instances,  aggregates  can  only
+    <Function>min</Function> (minimum) over a set of instances.
-    appear  in  the  target  list of a query and not directly in the
+   </para>
-    qualification (the where clause). As an example,
+
   <Para>
    It is important to understand the interaction between aggregates and
    SQL's <Command>where</Command> and <Command>having</Command> clauses.
    The fundamental difference between <Command>where</Command> and
    <Command>having</Command> is this: <Command>where</Command> selects
    input rows before groups and aggregates are computed (thus, it controls
    which rows go into the aggregate computation), whereas
    <Command>having</Command> selects group rows after groups and
    aggregates are computed.  Thus, the
    <Command>where</Command> clause may not contain aggregate functions;
    it makes no sense to try to use an aggregate to determine which rows
    will be inputs to the aggregates.  On the other hand,
    <Command>having</Command> clauses always contain aggregate functions.
    (Strictly speaking, you are allowed to write a <Command>having</Command>
    clause that doesn't use aggregates, but it's wasteful; the same condition
    could be used more efficiently at the <Command>where</Command> stage.)
   </para>
   <Para>
    As an example, we can find the highest low-temperature reading anywhere
    with
    <ProgramListing>
 SELECT max(temp_lo) FROM weather;
    </ProgramListing>
-    is allowed, while
+    If we want to know which city (or cities) that reading occurred in,
    we might try
    <ProgramListing>
 SELECT city FROM weather WHERE temp_lo = max(temp_lo);
    </ProgramListing>
-    is not. However, as is often the case the query can be restated to accomplish 
+    but this will not work since the aggregate max() can't be used in
-    the intended result; here by using a <FirstTerm>subselect</FirstTerm>:
+    <Command>where</Command>. However, as is often the case the query can be
    restated to accomplish the intended result; here by using a
    <FirstTerm>subselect</FirstTerm>:
    <ProgramListing>
 SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
    </ProgramListing>
    This is OK because the sub-select is an independent computation that
    computes its own aggregate separately from what's happening in the outer
    select.
   </Para>
   <Para>
-    Aggregates may also have <FirstTerm>group by</FirstTerm> clauses:
+    Aggregates are also very useful in combination with
    <FirstTerm>group by</FirstTerm> clauses.  For example, we can get the
    maximum low temperature observed in each city with
    <ProgramListing>
 SELECT city, max(temp_lo)
    FROM weather
    GROUP BY city;
    </ProgramListing>
    which gives us one output row per city.  We can filter these grouped
    rows using <Command>having</Command>:
    <ProgramListing>
 SELECT city, max(temp_lo)
    FROM weather
    GROUP BY city
    HAVING min(temp_lo) < 0;
    </ProgramListing>
    which gives us the same results for only the cities that have some
    below-zero readings.  Finally, if we only care about cities whose
    names begin with 'P', we might do
    <ProgramListing>
 SELECT city, max(temp_lo)
    FROM weather
    WHERE city like 'P%'
    GROUP BY city
    HAVING min(temp_lo) < 0;
    </ProgramListing>
    Note that we can apply the city-name restriction in
    <Command>where</Command>, since it needs no aggregate.  This is
    more efficient than adding the restriction to <Command>having</Command>,
    because we avoid doing the grouping and aggregate calculations
    for all rows that fail the <Command>where</Command> check.
   </Para>
  </sect1>
 </Chapter>
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.22 1999/08/06 13:50:31 thomas Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.23 1999/12/13 17:39:38 tgl Exp $
 Postgres documentation
 -->
@ -202,10 +202,10 @@ SELECT [ ALL | DISTINCT [ ON <replaceable class="PARAMETER">column</replaceable>
  <para>
   <command>DISTINCT</command> will eliminate all duplicate rows from the
-   selection.
+   result.
   <command>DISTINCT ON <replaceable class="PARAMETER">column</replaceable></command>
   will eliminate all duplicates in the specified column; this is
-   equivalent to using
+   similar to using
   <command>GROUP BY <replaceable class="PARAMETER">column</replaceable></command>.
   <command>ALL</command> will return all candidate rows,
   including duplicates.
@ -320,11 +320,13 @@ GROUP BY <replaceable class="PARAMETER">column</replaceable> [, ...]
   <para>
    GROUP BY will condense into a single row all rows that share the
-    same values for the
+    same values for the grouped columns.  Aggregate functions, if any,
-    grouped columns; aggregates return values derived from all rows
+    are computed across all rows making up each group, producing a
-    that make up the group.  The value returned for an ungrouped
+    separate value for each group (whereas without GROUP BY, an
-    and unaggregated column is dependent on the order in which rows
+    aggregate produces a single value computed across all the selected
-    happen to be read from the database.
+    rows).  When GROUP BY is present, it is not valid to refer to
    ungrouped columns except within aggregate functions, since there
    would be more than one possible value to return for an ungrouped column.
   </para>
  </refsect2>
@ -354,7 +356,8 @@ HAVING <replaceable class="PARAMETER">cond_expr</replaceable>
   <para>
    Each column referenced in 
    <replaceable class="PARAMETER">cond_expr</replaceable> shall unambiguously
-    reference a grouping column.
+    reference a grouping column, unless the reference appears within an
    aggregate function.
   </para>
  </refsect2>
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@ -642,15 +642,16 @@ CAST '<replaceable>string</replaceable>' AS <replaceable>type</replaceable>
     <member><replaceable>a_expr</replaceable> <replaceable>right_unary_operator</replaceable></member>
     <member><replaceable>left_unary_operator</replaceable> <replaceable>a_expr</replaceable></member>
     <member>parameter</member>
-     <member>functional expressions</member>
+     <member>functional expression</member>
-     <member>aggregate expressions</member>
+     <member>aggregate expression</member>
    </simplelist>
   </para>
   <para>
-    We have already discussed constants and attributes.  The two kinds of
+    We have already discussed constants and attributes.  The three kinds of
-    operator expressions indicate respectively binary and left_unary
+    operator expressions indicate respectively binary (infix), right-unary
-    expressions.  The following sections discuss the remaining options.
+    (suffix) and left-unary (prefix) operators.  The following sections
    discuss the remaining options.
   </para>
   <sect2>
@ -690,7 +691,7 @@ CREATE FUNCTION dept (name)
     enclosed in parentheses:
     <synopsis>
-<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> )
+<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> ... ] )
     </synopsis>
    </para>
@ -705,20 +706,40 @@ sqrt(emp.salary)
   </sect2>
   <sect2>
-    <title>Aggregate Expression</title>
+    <title>Aggregate Expressions</title>
    <para>
-     An <firstterm>aggregate expression</firstterm>
+     An <firstterm>aggregate expression</firstterm> represents the application
-     represents a simple aggregate (i.e., one that computes a single value)
+     of an aggregate function across the rows selected by a query.
-     or an aggregate function (i.e., one that computes a set of values).
+     An aggregate function reduces multiple inputs to a single output value,
-     The syntax is the following:
+     such as the sum or average of the inputs.
     The syntax of an aggregate expression is one of the following:
-     <synopsis>
+    <simplelist>
-<replaceable>aggregate_name</replaceable> (<replaceable>attribute</replaceable>)
+     <member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member>
-     </synopsis>
+     <member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member>
     <member><replaceable>aggregate_name</replaceable> ( * )</member>
    </simplelist>
-     where <replaceable>aggregate_name</replaceable>
+     where <replaceable>aggregate_name</replaceable> is a previously defined
-     must be a previously defined aggregate.
+     aggregate, and <replaceable>expression</replaceable> is any expression
     that doesn't itself contain an aggregate expression.
    </para>
    <para>
     The first form of aggregate expression invokes the aggregate across all
     input rows for which the given expression yields a non-null value.
     The second form invokes the aggregate for all distinct non-null values
     of the expression found in the input rows.  The last form invokes the
     aggregate once for each input row regardless of null or non-null values;
     since no particular input value is specified, it is generally only useful
     for the count() aggregate.
    </para>
    <para>
      For example, count(*) yields the total number of input rows;
      count(f1) yields the number of input rows in which f1 is non-null;
      count(distinct f1) yields the number of distinct non-null values of f1.
    </para>
   </sect2>