Delimited Text File Layer

Loads and displays delimited text files

Overview
Creating a delimited text layer
How the delimiter, quote, and escape characters work
How regular expression delimiters work
How WKT text is interpreted
Attributes in delimited text files
Example of a text file with X,Y point coordinates
Example of a text file with WKT geometries
Using delimited text layers in Python

Overview

A "delimited text file" contains data in which each record starts on a new line, and is split into fields by a delimiter such as a comma. This type of file is commonly exported from spreadsheets (for example CSV files) or databases. Typically the first line of a delimited text file contains the names of the fields.

Delimited text files can be loaded into QGIS as a layer. The records can be displayed spatially either as a point defined by X and Y coordinates, or using a Well Known Text (WKT) definition of a geometry which may describe points, lines, and polygons of arbitrary complexity. The file can also be loaded as an attribute only table, which can then be joined to other tables in QGIS.

In addition to the geometry definition the file can contain text, integer, and real number fields. By default QGIS will choose the type of field based on its the non blank values of the field. If all can be interpreted as integer then the type will be integer, if all can be interpreted as real numbers then the type will be double, otherwise the type will be text.

QGIS can also read the types from an OGR CSV driver compatible "csvt" file. This is a file alongside the data file, but with a "t" appended to the file name. The file should just contain one linewhich lists the type of each field. Valid types are "integer", "real", "string", "date", "time", and "datetime". The date, time, and datetime types are treated as strings in QGIS. Each type may be followed by a width and precision, for example "real(10.4)". The list of types are separated by commas, regardless of the delimiter used in the data file. An example of a valid format file would be:

"integer","string","string(20)","real(20.4)"

Creating a delimited text layer

Creating a delimited text layer involves choosing the data file, defining the format (how each record is to be split into fields), and defining the geometry is represented. This is managed with the delimited text dialog as detailed below. The dialog box displays a sample from the beginning of the file which shows how the format options have been applied.

Choosing the data file

Use the "Browse..." button to select the data file. Once the file is selected the layer name will automatically be populated based on the file name. The layer name is used to represent the data in the QGIS legend.

By default files are assumed to be encoded as UTF-8. However other file encodings can be selected. For example "System" uses the default encoding for the operating system. It is safer to use an explicit coding if the QGIS project needs to be portable.

Specifying the file format

The file format can be one of

Record and field options

The following options affect the selection of records and fields from the data file

Geometry definition

The geometry is can be define as one of

For point coordinates the following options apply:

For well known text geometry the following options apply:

Layer settings

Layer settings control the way the layer is managed in QGIS. The options available are:

How the delimiter, quote, and escape characters work

Records are split into fields using three character sets: delimiter characters, quote characters, and escape characters. Other characters in the record are considered as data, split into fields by delimiter characters. Quote characters occur in pairs and cause the text between them to be treated as a data. Escape characters cause the character following them to be treated as data.

Quote and escape characters cannot be the same as delimiter characters - they will be ignored if they are. Escape characters can be the same as quote characters, but behave differently if they are.

The delimiter characters are used to mark the end of each field. If more than one delimiter character is defined then any one of the characters can mark the end of a field. The quote and escape characters can override the delimiter character, so that it is treated as a normal data character.

Quote characters may be used to mark the beginning and end of quoted fields. Quoted fields can contain delimiters and may span multiple lines in the text file. If a field is quoted then it must start and end with the same quote character. Quote characters cannot occur within a field unless they are escaped.

Escape characters which are not quote characters force the following character to be treated as data. (that is, to stop it being treated as a new line, delimiter, or quote character).

Escape characters that are also quote characters have much more limited effect. They only apply within quotes and only escape themselves. For example, if ' is a quote and escape character, then the string 'Smith''s Creek' will represent the value Smith's Creek.

How regular expression delimiters work

Regular expressions are mini-language used to represent character patterns. There are many variations of regular expression syntax - QGIS uses the syntax provided by the QRegExp class of the Qt framework.

In a regular expression delimited file each line is treated as a record. Each match of the regular expression in the line is treated as the end of a field. If the regular expression contains capture groups (eg (cat|dog)) then these are extracted as fields. If this is not desired then use non-capturing groups (eg (?:cat|dog)).

The regular expression is treated differently if it is anchored to the start of the line (that is, the pattern starts with ^). In this case the regular expression is matched against each line. If the line does not match it is discarded as an invalid record. Each capture group in the expression is treated as a field. The regular expression is invalid if it does not have capture groups. As an example this can be used as a (somewhat unintuitive) means of loading data with fixed width fields. For example the expression

^(.{5})(.{10})(.{20})(.{20})

will extract four fields of widths 5, 10, 20, and 20 characters from each line. Lines less than 55 characters long will be discarded.

How WKT text is interpreted

The delimited text layer recognizes the following well known text types - POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, and MULTIPOLYGON. It will accept geometries with a Z coordinate (eg POINT Z), a measure (POINT M), or both (POINT ZM).

It can also handle the PostGIS EWKT variation, in which the geometry is preceded by an spatial reference system id (eg SRID=4326;POINT(175.3 41.2)), and a variant used by Informix in which the WKT is preceded by an integer spatial reference id (eg 1 POINT(175.3 41.2)). In both cases the SRID is ignored.

Attributes in delimited text files

Each record in the delimited text file is split into fields representing attributes of the record. Usually the attribute names are taken from the first data record in the file. However if this does not contain attribute names, then they will be named field_1, field_2, and so on. Also if records have more fields than are defined in the header record then these will be named field_#, where # is the field number (note that empty fields at the end of a record are ignored). QGIS may override the names in the text file if they are numbers, or have names like field_#, or are duplicated.

In addition to the attributes explicitly in the data file QGIS assigns a unique feature id to each record which is the line number in the source file on which the record starts.

Each attribute also has a data type, one of string (text), integer, or real number. The data type is inferred from the content of the fields - if every non blank value is a valid integer then the type is integer, otherwise if it is a valid real number then the type is real, otherwise the type is string. Note that this is based on the content of the fields - quoting fields does not change the way they are interpreted.

Example of a text file with X,Y point coordinates

X;Y;ELEV
-300120;7689960;13
-654360;7562040;52
1640;7512840;3

This file:

Example of a text file with WKT geometries

id|wkt
1|POINT(172.0702250 -43.6031036)
2|POINT(172.0702250 -43.6031036)
3|POINT(172.1543206 -43.5731302)
4|POINT(171.9282585 -43.5493308)
5|POINT(171.8827359 -43.5875983)

This file:

Using delimited text layers in Python

Delimited text data sources can be creating from Python in a similar way to other vector layers. The pattern is:

from PyQt4.QtCore import QUrl, QString
from qgis.core import QgsVectorLayer, QgsMapLayerRegistry

# Define the data source
filename="test.csv"
uri=QUrl.fromLocalFile(filename)
uri.addQueryItem("type","csv")
uri.addQueryItem("delimiter","|")
uri.addQueryItem("wktField","wkt")
# ... other delimited text parameters
layer=QgsVectorLayer(QString(uri.toEncoded()),"Test CSV layer","delimitedtext")
# Add the layer to the map
if layer.isValid():
    QgsMapLayerRegistry.instance().addMapLayer( layer )

This could be used to load the second example file above.

The configuration of the delimited text layer is defined by adding query items to the uri. The following options can be added