Commit 865f14a2d31af23a05bbf2df04c274629c5d5c4d was quite a few bricks
shy of a load: psql, ecpg, and plpgsql were all left out-of-step with
the core lexer. Of these only the last was likely to be a fatal
problem; but still, a minimal amount of grepping, or even just reading
the comments adjacent to the places that were changed, would have found
the other places that needed to be changed.
contrib/pg_stat_statements will sometimes run the core lexer a second time
on submitted statements. Formerly, if you had standard_conforming_strings
turned off, this led to sometimes getting two copies of any warnings
enabled by escape_string_warning. While this is probably no longer a big
deal in the field, it's a pain for regression testing.
To fix, change the lexer so it doesn't consult the escape_string_warning
GUC variable directly, but looks at a copy in the core_yy_extra_type state
struct. Then, pg_stat_statements can change that copy to disable warnings
while it's redoing the lexing.
It seemed like a good idea to make this happen for all three of the GUCs
consulted by the lexer, not just escape_string_warning. There's not an
immediate use-case for callers to adjust the other two AFAIK, but making
it possible is easy enough and seems like good future-proofing.
Arguably this is a bug fix, but there doesn't seem to be enough interest to
justify a back-patch. We'd not be able to back-patch exactly as-is anyway,
for fear of breaking ABI compatibility of the struct. (We could perhaps
back-patch the addition of only escape_string_warning by adding it at the
end of the struct, where there's currently alignment padding space.)
of different parsers having different YYSTYPE unions that they want to use
with it. I defined a new union core_YYSTYPE that is just the (very short)
list of semantic values returned by the core scanner. I had originally
worried that this would require an extra interface layer, but actually we can
have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at
no extra cost. Names associated with the core scanner are now "core_yy_foo",
with "base_yy_foo" being used in the core Bison parser and the parser.c
interface layer.
This solves the last serious stumbling block to eliminating plpgsql's separate
lexer. One restriction that will still be present is that plpgsql and the
core will have to agree on the token numbers assigned to tokens that can be
returned by the core lexer. Since Bison doesn't seem willing to accept
external assignments of those numbers, we'll have to live with decreeing that
core and plpgsql grammars declare these tokens first and in the same order.