The Query Language

The Query Language The Postgres query language is a variant of the SQL standard. It has many extensions to SQL such as an extensible type system, inheritance, functions and production rules. These are features carried over from the original Postgres query language, PostQuel. This section provides an overview of how to use Postgres SQL to perform simple operations. This manual is only intended to give you an idea of our flavor of SQL and is in no way a complete tutorial on SQL. Numerous books have been written on SQL92, including and . You should be aware that some language features are extensions to the standard. Interactive Monitor In the examples that follow, we assume that you have created the mydb database as described in the previous subsection and have started psql. Examples in this manual can also be found in source distribution in the directory src/tutorial/. Refer to the README file in that directory for how to use them. To start the tutorial, do the following: $ cd .../src/tutorial $ psql -s mydb Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute query You are currently connected to the database: postgres mydb=> \i basics.sql The \i command read in queries from the specified files. The -s option puts you in single step mode which pauses before sending a query to the backend. Queries in this section are in the file basics.sql. psql has a variety of \d commands for showing system information. Consult these commands for more details; for a listing, type \? at the psql prompt. Concepts The fundamental notion in Postgres is that of a table, which is a named collection of rows. Each row has the same set of named columns, and each column is of a specific type. Furthermore, each row has a permanent object identifier (OID) that is unique throughout the database cluster. Historially, tables have been called classes in Postgres, rows are object instances, and columns are attributes. This makes sense if you consider the object-relational aspects of the database system, but in this manual we will use the customary SQL terminology. As previously discussed, tables are grouped into databases, and a collection of databases managed by a single postmaster process constitutes a database cluster. Creating a New Table You can create a new table by specifying the table name, along with all column names and their types: CREATE TABLE weather ( city varchar(80), temp_lo int, -- low temperature temp_hi int, -- high temperature prcp real, -- precipitation date date ); Note that both keywords and identifiers are case-insensitive; identifiers can preserve case by surrounding them with double-quotes as allowed by SQL92. Postgres SQL supports the usual SQL types int, float, real, smallint, char(N), varchar(N), date, time, and timestamp, as well as other types of general utility and a rich set of geometric types. As we will see later, Postgres can be customized with an arbitrary number of user-defined data types. Consequently, type names are not syntactical keywords, except where required to support special cases in the SQL92 standard. So far, the Postgres CREATE command looks exactly like the command used to create a table in a traditional relational system. However, we will presently see that tables have properties that are extensions of the relational model. Populating a Table with Rows The INSERT statement is used to populate a table with rows: INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27'); You can also use COPY to load large amounts of data from flat (ASCII) files. This is usually faster because the data is read (or written) as a single atomic transaction directly to or from the target table. An example would be: COPY weather FROM '/home/user/weather.txt' USING DELIMITERS '|'; where the path name for the source file must be available to the backend server machine, not the client, since the backend server reads the file directly. Querying a Table The weather table can be queried with normal relational selection and projection queries. A SQL SELECT statement is used to do this. The statement is divided into a target list (the part that lists the columns to be returned) and a qualification (the part that specifies any restrictions). For example, to retrieve all the rows of weather, type: SELECT * FROM weather; and the output should be: +--------------+---------+---------+------+------------+ |city | temp_lo | temp_hi | prcp | date | +--------------+---------+---------+------+------------+ |San Francisco | 46 | 50 | 0.25 | 1994-11-27 | +--------------+---------+---------+------+------------+ |San Francisco | 43 | 57 | 0 | 1994-11-29 | +--------------+---------+---------+------+------------+ |Hayward | 37 | 54 | | 1994-11-29 | +--------------+---------+---------+------+------------+ You may specify any arbitrary expressions in the target list. For example, you can do: SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather; Arbitrary Boolean operators (AND, OR and NOT) are allowed in the qualification of any query. For example, SELECT * FROM weather WHERE city = 'San Francisco' AND prcp > 0.0; results in: +--------------+---------+---------+------+------------+ |city | temp_lo | temp_hi | prcp | date | +--------------+---------+---------+------+------------+ |San Francisco | 46 | 50 | 0.25 | 1994-11-27 | +--------------+---------+---------+------+------------+ As a final note, you can specify that the results of a select can be returned in a sorted order or with duplicate rows removed. SELECT DISTINCT city FROM weather ORDER BY city; Redirecting SELECT Queries Any SELECT query can be redirected to a new table SELECT * INTO TABLE temp FROM weather; This forms an implicit CREATE command, creating a new table temp with the column names and types specified in the target list of the SELECT INTO command. We can then, of course, perform any operations on the resulting table that we can perform on other tables. Joins Between Tables Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables at one time is called a join query. As an example, say we wish to find all the records that are in the temperature range of other records. In effect, we need to compare the temp_lo and temp_hi columns of each WEATHER row to the temp_lo and temp_hi columns of all other WEATHER columns. This is only a conceptual model. The actual join may be performed in a more efficient manner, but this is invisible to the user. We can do this with the following query: SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high, W2.city, W2.temp_lo AS low, W2.temp_hi AS high FROM weather W1, weather W2 WHERE W1.temp_lo < W2.temp_lo AND W1.temp_hi > W2.temp_hi; +--------------+-----+------+---------------+-----+------+ |city | low | high | city | low | high | +--------------+-----+------+---------------+-----+------+ |San Francisco | 43 | 57 | San Francisco | 46 | 50 | +--------------+-----+------+---------------+-----+------+ |San Francisco | 37 | 54 | San Francisco | 46 | 50 | +--------------+-----+------+---------------+-----+------+ The semantics of such a join are that the qualification is a truth expression defined for the Cartesian product of the tables indicated in the query. For those rows in the Cartesian product for which the qualification is true, Postgres computes and returns the values specified in the target list. Postgres SQL does not assign any meaning to duplicate values in such expressions. This means that Postgres sometimes recomputes the same target list several times; this frequently happens when Boolean expressions are connected with an "or". To remove such duplicates, you must use the SELECT DISTINCT statement. In this case, both W1 and W2 are surrogates for a row of the table weather, and both range over all rows of the table. (In the terminology of most database systems, W1 and W2 are known as range variables.) A query can contain an arbitrary number of table names and surrogates. Updates You can update existing rows using the UPDATE command. Suppose you discover the temperature readings are all off by 2 degrees as of Nov 28, you may update the data as follow: UPDATE weather SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2 WHERE date > '1994-11-28'; Deletions Deletions are performed using the DELETE command: DELETE FROM weather WHERE city = 'Hayward'; All weather recording belonging to Hayward are removed. One should be wary of queries of the form DELETE FROM tablename; Without a qualification, DELETE will simply remove all rows from the given table, leaving it empty. The system will not request confirmation before doing this. Using Aggregate Functions Like most other relational database products, PostgreSQL supports aggregate functions. An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the count, sum, avg (average), max (maximum) and min (minimum) over a set of rows. It is important to understand the interaction between aggregates and SQL's WHERE and HAVING clauses. The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE clause may not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, HAVING clauses always contain aggregate functions. (Strictly speaking, you are allowed to write a HAVING clause that doesn't use aggregates, but it's wasteful; the same condition could be used more efficiently at the WHERE stage.) As an example, we can find the highest low-temperature reading anywhere with SELECT max(temp_lo) FROM weather; If we want to know what city (or cities) that reading occurred in, we might try SELECT city FROM weather WHERE temp_lo = max(temp_lo); but this will not work since the aggregate max can't be used in WHERE. However, as is often the case the query can be restated to accomplish the intended result; here by using a subselect: SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather); This is OK because the sub-select is an independent computation that computes its own aggregate separately from what's happening in the outer select. Aggregates are also very useful in combination with GROUP BY clauses. For example, we can get the maximum low temperature observed in each city with SELECT city, max(temp_lo) FROM weather GROUP BY city; which gives us one output row per city. We can filter these grouped rows using HAVING: SELECT city, max(temp_lo) FROM weather GROUP BY city HAVING min(temp_lo) < 0; which gives us the same results for only the cities that have some below-zero readings. Finally, if we only care about cities whose names begin with "P", we might do SELECT city, max(temp_lo) FROM weather WHERE city like 'P%' GROUP BY city HAVING min(temp_lo) < 0; Note that we can apply the city-name restriction in WHERE, since it needs no aggregate. This is more efficient than adding the restriction to HAVING, because we avoid doing the grouping and aggregate calculations for all rows that fail the WHERE check.