diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index a1830eb0297..961bc0d9dc7 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -361,39 +361,90 @@ DELETE FROM classname;
Like most other query languages,
PostgreSQL supports
aggregate functions.
- The current implementation of
- Postgres aggregate functions have some limitations.
- Specifically, while there are aggregates to compute
- such functions as the count, sum,
+ An aggregate function computes a single result from multiple input rows.
+ For example, there are aggregates to compute the
+ count, sum,
avg (average), max (maximum) and
- min (minimum) over a set of instances, aggregates can only
- appear in the target list of a query and not directly in the
- qualification (the where clause). As an example,
+ min (minimum) over a set of instances.
+
+
+
+ It is important to understand the interaction between aggregates and
+ SQL's where and having clauses.
+ The fundamental difference between where and
+ having is this: where selects
+ input rows before groups and aggregates are computed (thus, it controls
+ which rows go into the aggregate computation), whereas
+ having selects group rows after groups and
+ aggregates are computed. Thus, the
+ where clause may not contain aggregate functions;
+ it makes no sense to try to use an aggregate to determine which rows
+ will be inputs to the aggregates. On the other hand,
+ having clauses always contain aggregate functions.
+ (Strictly speaking, you are allowed to write a having
+ clause that doesn't use aggregates, but it's wasteful; the same condition
+ could be used more efficiently at the where stage.)
+
+
+
+ As an example, we can find the highest low-temperature reading anywhere
+ with
SELECT max(temp_lo) FROM weather;
- is allowed, while
+ If we want to know which city (or cities) that reading occurred in,
+ we might try
SELECT city FROM weather WHERE temp_lo = max(temp_lo);
- is not. However, as is often the case the query can be restated to accomplish
- the intended result; here by using a subselect:
+ but this will not work since the aggregate max() can't be used in
+ where. However, as is often the case the query can be
+ restated to accomplish the intended result; here by using a
+ subselect:
SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
+ This is OK because the sub-select is an independent computation that
+ computes its own aggregate separately from what's happening in the outer
+ select.
- Aggregates may also have group by clauses:
+ Aggregates are also very useful in combination with
+ group by clauses. For example, we can get the
+ maximum low temperature observed in each city with
SELECT city, max(temp_lo)
FROM weather
GROUP BY city;
+ which gives us one output row per city. We can filter these grouped
+ rows using having:
+
+SELECT city, max(temp_lo)
+ FROM weather
+ GROUP BY city
+ HAVING min(temp_lo) < 0;
+
+ which gives us the same results for only the cities that have some
+ below-zero readings. Finally, if we only care about cities whose
+ names begin with 'P', we might do
+
+SELECT city, max(temp_lo)
+ FROM weather
+ WHERE city like 'P%'
+ GROUP BY city
+ HAVING min(temp_lo) < 0;
+
+ Note that we can apply the city-name restriction in
+ where, since it needs no aggregate. This is
+ more efficient than adding the restriction to having,
+ because we avoid doing the grouping and aggregate calculations
+ for all rows that fail the where check.
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index f2afa38b7c9..070f8b43d0f 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1,5 +1,5 @@
@@ -202,10 +202,10 @@ SELECT [ ALL | DISTINCT [ ON column
DISTINCT will eliminate all duplicate rows from the
- selection.
+ result.
DISTINCT ON column
will eliminate all duplicates in the specified column; this is
- equivalent to using
+ similar to using
GROUP BY column.
ALL will return all candidate rows,
including duplicates.
@@ -320,11 +320,13 @@ GROUP BY column [, ...]
GROUP BY will condense into a single row all rows that share the
- same values for the
- grouped columns; aggregates return values derived from all rows
- that make up the group. The value returned for an ungrouped
- and unaggregated column is dependent on the order in which rows
- happen to be read from the database.
+ same values for the grouped columns. Aggregate functions, if any,
+ are computed across all rows making up each group, producing a
+ separate value for each group (whereas without GROUP BY, an
+ aggregate produces a single value computed across all the selected
+ rows). When GROUP BY is present, it is not valid to refer to
+ ungrouped columns except within aggregate functions, since there
+ would be more than one possible value to return for an ungrouped column.
@@ -354,7 +356,8 @@ HAVING cond_expr
Each column referenced in
cond_expr shall unambiguously
- reference a grouping column.
+ reference a grouping column, unless the reference appears within an
+ aggregate function.
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index a74bd08be11..beacf8fbdf0 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -642,15 +642,16 @@ CAST 'string' AS type
a_expr right_unary_operator
left_unary_operator a_expr
parameter
- functional expressions
- aggregate expressions
+ functional expression
+ aggregate expression
- We have already discussed constants and attributes. The two kinds of
- operator expressions indicate respectively binary and left_unary
- expressions. The following sections discuss the remaining options.
+ We have already discussed constants and attributes. The three kinds of
+ operator expressions indicate respectively binary (infix), right-unary
+ (suffix) and left-unary (prefix) operators. The following sections
+ discuss the remaining options.
@@ -690,7 +691,7 @@ CREATE FUNCTION dept (name)
enclosed in parentheses:
-function (a_expr [, a_expr )
+function (a_expr [, a_expr ... ] )
@@ -705,20 +706,40 @@ sqrt(emp.salary)
- Aggregate Expression
+ Aggregate Expressions
- An aggregate expression
- represents a simple aggregate (i.e., one that computes a single value)
- or an aggregate function (i.e., one that computes a set of values).
- The syntax is the following:
+ An aggregate expression represents the application
+ of an aggregate function across the rows selected by a query.
+ An aggregate function reduces multiple inputs to a single output value,
+ such as the sum or average of the inputs.
+ The syntax of an aggregate expression is one of the following:
-
-aggregate_name (attribute)
-
+
+ aggregate_name (expression)
+ aggregate_name (DISTINCT expression)
+ aggregate_name ( * )
+
- where aggregate_name
- must be a previously defined aggregate.
+ where aggregate_name is a previously defined
+ aggregate, and expression is any expression
+ that doesn't itself contain an aggregate expression.
+
+
+
+ The first form of aggregate expression invokes the aggregate across all
+ input rows for which the given expression yields a non-null value.
+ The second form invokes the aggregate for all distinct non-null values
+ of the expression found in the input rows. The last form invokes the
+ aggregate once for each input row regardless of null or non-null values;
+ since no particular input value is specified, it is generally only useful
+ for the count() aggregate.
+
+
+
+ For example, count(*) yields the total number of input rows;
+ count(f1) yields the number of input rows in which f1 is non-null;
+ count(distinct f1) yields the number of distinct non-null values of f1.