mirror of
				https://github.com/postgres/postgres.git
				synced 2025-11-04 00:02:52 -05:00 
			
		
		
		
	
		
			
				
	
	
		
			917 lines
		
	
	
		
			43 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			917 lines
		
	
	
		
			43 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
From pgsql-hackers-owner+M1833@hub.org Sat May 13 22:49:26 2000
 | 
						||
Received: from news.tht.net (news.hub.org [216.126.91.242])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07394
 | 
						||
	for <pgman@candle.pha.pa.us>; Sat, 13 May 2000 22:49:24 -0400 (EDT)
 | 
						||
Received: from hub.org (majordom@hub.org [216.126.84.1])
 | 
						||
	by news.tht.net (8.9.3/8.9.3) with ESMTP id WAB99859;
 | 
						||
	Sat, 13 May 2000 22:44:15 -0400 (EDT)
 | 
						||
	(envelope-from pgsql-hackers-owner+M1833@hub.org)
 | 
						||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
 | 
						||
	by hub.org (8.9.3/8.9.3) with ESMTP id WAA51058
 | 
						||
	for <pgsql-hackers@postgreSQL.org>; Sat, 13 May 2000 22:41:16 -0400 (EDT)
 | 
						||
	(envelope-from tgl@sss.pgh.pa.us)
 | 
						||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						||
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA18343
 | 
						||
	for <pgsql-hackers@postgreSQL.org>; Sat, 13 May 2000 22:40:38 -0400 (EDT)
 | 
						||
To: pgsql-hackers@postgresql.org
 | 
						||
Subject: [HACKERS] Proposal for fixing numeric type-resolution issues
 | 
						||
Date: Sat, 13 May 2000 22:40:38 -0400
 | 
						||
Message-ID: <18340.958272038@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
X-Mailing-List: pgsql-hackers@postgresql.org
 | 
						||
Precedence: bulk
 | 
						||
Sender: pgsql-hackers-owner@hub.org
 | 
						||
Status: ORr
 | 
						||
 | 
						||
We've got a collection of problems that are related to the parser's
 | 
						||
inability to make good type-resolution choices for numeric constants.
 | 
						||
In some cases you get a hard error; for example "NumericVar + 4.4"
 | 
						||
yields
 | 
						||
ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
 | 
						||
        You will have to retype this query using an explicit cast
 | 
						||
because "4.4" is initially typed as float8 and the system can't figure
 | 
						||
out whether to use numeric or float8 addition.  A more subtle problem
 | 
						||
is that a query like "... WHERE Int2Var < 42" is unable to make use of
 | 
						||
an index on the int2 column: 42 is resolved as int4, so the operator
 | 
						||
is int24lt, which works but is not in the opclass of an int2 index.
 | 
						||
 | 
						||
Here is a proposal for fixing these problems.  I think we could get this
 | 
						||
done for 7.1 if people like it.
 | 
						||
 | 
						||
The basic problem is that there's not enough smarts in the type resolver
 | 
						||
about the interrelationships of the numeric datatypes.  All it has is
 | 
						||
a concept of a most-preferred type within the category of numeric types.
 | 
						||
(We are abusing the most-preferred-type mechanism, BTW, because both
 | 
						||
FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
 | 
						||
category!  This is in fact why the resolver can't make a choice for
 | 
						||
"numeric+float8".)  We need more intelligence than that.
 | 
						||
 | 
						||
I propose that we set up a strictly-ordered hierarchy of numeric
 | 
						||
datatypes, running from least preferred to most preferred:
 | 
						||
	int2, int4, int8, numeric, float4, float8.
 | 
						||
Rather than simply considering coercions to the most-preferred type,
 | 
						||
the type resolver should use the following rules:
 | 
						||
 | 
						||
1. No value will be down-converted (eg int4 to int2) except by an
 | 
						||
explicit conversion.
 | 
						||
 | 
						||
2. If there is not an exact matching operator, numeric values will be
 | 
						||
up-converted to the highest numeric datatype present among the operator
 | 
						||
or function's arguments.  For example, given "int2 + int8" we'd up-
 | 
						||
convert the int2 to int8 and apply int8 addition.
 | 
						||
 | 
						||
The final piece of the puzzle is that the type initially assigned to
 | 
						||
an undecorated numeric constant should be NUMERIC if it contains a
 | 
						||
decimal point or exponent, and otherwise the smallest of int2, int4,
 | 
						||
int8, NUMERIC that will represent it.  This is a considerable change
 | 
						||
from the current lexer behavior, where you get either int4 or float8.
 | 
						||
 | 
						||
For example, given "NumericVar + 4.4", the constant 4.4 will initially
 | 
						||
be assigned type NUMERIC, we will resolve the operator as numeric plus,
 | 
						||
and everything's fine.  Given "Float8Var + 4.4", the constant is still
 | 
						||
initially numeric, but will be up-converted to float8 so that float8
 | 
						||
addition can be used.  The end result is the same as in traditional
 | 
						||
Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
 | 
						||
is initially typed as int2, since it fits, and we end up selecting
 | 
						||
int2lt, thereby allowing use of an int2 index.  (On the other hand,
 | 
						||
given "Int2Var < 100000", we'd end up using int4lt, which is correct
 | 
						||
to avoid overflow.)
 | 
						||
 | 
						||
A couple of crucial subtleties here:
 | 
						||
 | 
						||
1. We are assuming that the parser or optimizer will constant-fold
 | 
						||
any conversion functions that are introduced.  Thus, in the
 | 
						||
"Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
 | 
						||
time execution begins, so there's no performance loss.
 | 
						||
 | 
						||
2. We cannot lose precision by initially representing a constant as
 | 
						||
numeric and later converting it to float.  Nor can we exceed NUMERIC's
 | 
						||
range (the default 1000-digit limit is more than the range of IEEE
 | 
						||
float8 data).  It would not work as well to start out by representing
 | 
						||
a constant as float and then converting it to numeric.
 | 
						||
 | 
						||
Presently, the pg_proc and pg_operator tables contain a pretty fair
 | 
						||
collection of cross-datatype numeric operators, such as int24lt,
 | 
						||
float48pl, etc.  We could perhaps leave these in, but I believe that
 | 
						||
it is better to remove them.  For example, if int42lt is left in place,
 | 
						||
then it would capture cases like "Int4Var < 42", whereas we need that
 | 
						||
to be translated to int4lt so that an int4 index can be used.  Removing
 | 
						||
these operators will eliminate some code bloat and system-catalog bloat
 | 
						||
to boot.
 | 
						||
 | 
						||
As far as I can tell, this proposal is almost compatible with the rules
 | 
						||
given in SQL92: in particular, SQL92 specifies that an operator having
 | 
						||
both "approximate numeric" (float) and "exact numeric" (int or numeric)
 | 
						||
inputs should deliver an approximate-numeric result.  I propose
 | 
						||
deviating from SQL92 in a single respect: SQL92 specifies that a
 | 
						||
constant containing an exponent (eg 1.2E34) is approximate numeric,
 | 
						||
which implies that the result of an operator using it is approximate
 | 
						||
even if the other operand is exact.  I believe it's better to treat
 | 
						||
such a constant as exact (ie, type NUMERIC) and only convert it to
 | 
						||
float if the other operand is float.  Without doing that, an assignment
 | 
						||
like
 | 
						||
	UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
 | 
						||
will not work as desired because the constant will be prematurely
 | 
						||
coerced to float, causing precision loss.
 | 
						||
 | 
						||
Comments?
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
From tgl@sss.pgh.pa.us Sun May 14 17:30:56 2000
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05808
 | 
						||
	for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 17:30:52 -0400 (EDT)
 | 
						||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.4 $) with ESMTP id RAA16657 for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 17:29:52 -0400 (EDT)
 | 
						||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						||
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA20914;
 | 
						||
	Sun, 14 May 2000 17:29:30 -0400 (EDT)
 | 
						||
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] type conversion discussion 
 | 
						||
In-reply-to: <200005141950.PAA04636@candle.pha.pa.us> 
 | 
						||
References: <200005141950.PAA04636@candle.pha.pa.us>
 | 
						||
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
	message dated "Sun, 14 May 2000 15:50:20 -0400"
 | 
						||
Date: Sun, 14 May 2000 17:29:30 -0400
 | 
						||
Message-ID: <20911.958339770@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						||
> As some point, it seems we need to get all the PostgreSQL minds together
 | 
						||
> to discuss type conversion issues.  These problems continue to come up
 | 
						||
> from release to release.  We are getting better, but it seems a full
 | 
						||
> discussion could help solidify our strategy.
 | 
						||
 | 
						||
OK, here are a few things that bug me about the current type-resolution
 | 
						||
code:
 | 
						||
 | 
						||
1. Poor choice of type to attribute to numeric literals.  (A possible
 | 
						||
   solution is sketched in my earlier message, but do we need similar
 | 
						||
   mechanisms for other type categories?)
 | 
						||
 | 
						||
2. Tensions between treating string literals as "unknown" type and
 | 
						||
   as "text" type, per this thread so far.
 | 
						||
 | 
						||
3. IS_BINARY_COMPATIBLE seems like a bogus concept.  Do we really want a
 | 
						||
   fully symmetrical ring of types in each group?  I'd prefer to see a
 | 
						||
   one-way equivalence, which allows eg. OID to be silently converted
 | 
						||
   to INT4, but *not* vice versa (except perhaps by specific user cast).
 | 
						||
   This'd be more like a traditional "is-a" or inheritance relationship
 | 
						||
   between datatypes, which has well-understood semantics.
 | 
						||
 | 
						||
4. I'm also concerned that the behavior of IS_BINARY_COMPATIBLE isn't
 | 
						||
   very predictable because it will happily go either way.  For example,
 | 
						||
   if I do 
 | 
						||
	select * from pg_class where oid = 1234;
 | 
						||
   it's unclear whether I will get an oideq or an int4eq operator ---
 | 
						||
   and that's a rather critical point since only one of them can exploit
 | 
						||
   an index on the oid column.  Currently, there is some klugery in the
 | 
						||
   planner that works around this by overriding the parser's choice of
 | 
						||
   operator to substitute one that is compatible with an available index.
 | 
						||
   That's a pretty ugly solution ... I'm not sure I know a better one,
 | 
						||
   but as long as we're discussing type resolution issues ...
 | 
						||
 | 
						||
5. Lack of extensibility.  There's way too much knowledge hard-wired
 | 
						||
   into the parser about type categories, preferred types, binary
 | 
						||
   compatibility, etc.  All of it falls down when faced with
 | 
						||
   user-defined datatypes.  If we do something like I suggested with
 | 
						||
   a hardwired hierarchy of numeric datatypes, it'll get even worse.
 | 
						||
   All this stuff ought to be driven off fields in pg_type rather than
 | 
						||
   be hardwired into the code, so that the same concepts can be extended
 | 
						||
   to user-defined types.
 | 
						||
 | 
						||
I don't have worked-out proposals for any of these but the first,
 | 
						||
but they've all been bothering me for a while.
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
From tgl@sss.pgh.pa.us Sun May 14 21:02:31 2000
 | 
						||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA07700
 | 
						||
	for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 21:02:28 -0400 (EDT)
 | 
						||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						||
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id VAA21261;
 | 
						||
	Sun, 14 May 2000 21:03:17 -0400 (EDT)
 | 
						||
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] type conversion discussion 
 | 
						||
In-reply-to: <20911.958339770@sss.pgh.pa.us> 
 | 
						||
References: <200005141950.PAA04636@candle.pha.pa.us> <20911.958339770@sss.pgh.pa.us>
 | 
						||
Comments: In-reply-to Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
	message dated "Sun, 14 May 2000 17:29:30 -0400"
 | 
						||
Date: Sun, 14 May 2000 21:03:17 -0400
 | 
						||
Message-ID: <21258.958352597@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
Status: OR
 | 
						||
 | 
						||
Here are the results of some further thoughts about type-conversion
 | 
						||
issues.  This is not a complete proposal yet, but a sketch of an
 | 
						||
approach that might solve several of the gripes in my previous proposal.
 | 
						||
 | 
						||
While thinking about this, I realized that my numeric-types proposal
 | 
						||
of yesterday would break at least a few cases that work nicely now.
 | 
						||
For example, I frequently do things like
 | 
						||
	select * from pg_class where oid = 1234;
 | 
						||
whilst poking around in system tables and querytree dumps.  If that
 | 
						||
constant is initially resolved as int2, as I suggested yesterday,
 | 
						||
then we have "oid = int2" for which there is no operator.  To succeed
 | 
						||
we must decide to promote the constant to int4 --- but with no int4
 | 
						||
visible among the operands of the "=", it will not work to just "promote
 | 
						||
numerics to the highest type seen in the operands" as I suggested
 | 
						||
yesterday.  So there has to be some more interaction in there.
 | 
						||
 | 
						||
Anyway, I was complaining about the looseness of the concept of
 | 
						||
binary-compatible types and the fact that the parser's type conversion
 | 
						||
knowledge is mostly hardwired.  These might be resolved by generalizing
 | 
						||
the numeric type hierarchy idea into a "type promotion lattice", which
 | 
						||
would work like this:
 | 
						||
 | 
						||
* Add a "typpromote" column to pg_type, which contains either zero or
 | 
						||
  the OID of another type that the parser is allowed to promote this
 | 
						||
  type to when searching for usable functions/operators.  For example,
 | 
						||
  my numeric-types hierarchy of yesterday would be expressed by making
 | 
						||
  int2 promote to int4, int4 to int8, int8 to numeric, numeric to
 | 
						||
  float4, and float4 to float8.  The promotion idea also replaces the
 | 
						||
  current concept of binary-compatible types: for example, OID would
 | 
						||
  link to int4 and varchar would link to text (but not vice versa!).
 | 
						||
 | 
						||
* Also add a "typpromotebin" boolean column to pg_type, which contains
 | 
						||
  't' if the type conversion indicated by typpromote is "free", ie,
 | 
						||
  no conversion function need be executed before regarding a value as
 | 
						||
  belonging to the promoted type.  This distinguishes binary-compatible
 | 
						||
  from non-binary-compatible cases.  If "typpromotebin" is 'f' and the
 | 
						||
  parser decides it needs to apply the conversion, then it has to look
 | 
						||
  up the appropriate conversion function in pg_proc.  (More about this
 | 
						||
  below.)
 | 
						||
 | 
						||
Now, if the parser fails to find an exact match for a given function
 | 
						||
or operator name and the exact set of input data types, it proceeds by
 | 
						||
chasing up the promotion chains for the input data types and trying to
 | 
						||
locate a set of types for which there is a matching function/operator.
 | 
						||
If there are multiple possibilities, we choose the one which is the
 | 
						||
"least promoted" by some yet-to-be-determined metric.  (This metric
 | 
						||
would probably favor "free" conversions over non-free ones, but other
 | 
						||
than that I'm not quite sure how it should work.  The metric would
 | 
						||
replace a whole bunch of ad-hoc heuristics that are currently applied
 | 
						||
in the type resolver, so even if it seems rather ad-hoc it'd still be
 | 
						||
cleaner than what we have ;-).)
 | 
						||
 | 
						||
In a situation like the "oid = int2" example above, this mechanism would
 | 
						||
presumably settle on "int4 = int4" as being the least-promoted
 | 
						||
equivalent operator.  (It could not find "oid = oid" since there is
 | 
						||
no promotion path from int2 to oid.)  That looks bad since it isn't
 | 
						||
compatible with an oidops index --- but I have a solution for that!
 | 
						||
I don't think we need the oid opclass at all; why shouldn't indexes
 | 
						||
on oid be expressed as int4 indexes to begin with?  In general, if
 | 
						||
two types are considered binary-equivalent under the old scheme, then
 | 
						||
the one that is considered the subtype probably shouldn't have separate
 | 
						||
index operators under this new scheme.  Instead it should just rely on
 | 
						||
the index operators of the promoted type.
 | 
						||
 | 
						||
The point of the proposed typpromotebin field is to save a pg_proc
 | 
						||
lookup when trying to determine whether a particular promotion is "free"
 | 
						||
or not.  We could save even more lookups if we didn't store the boolean
 | 
						||
but instead the actual OID of the conversion function, or zero if the
 | 
						||
promotion is "free".  The trouble with that is that it creates a
 | 
						||
circularity problem when trying to define a new user type --- you can't
 | 
						||
define the conversion function if its input type doesn't exist yet.
 | 
						||
In any case, we want the parser to do a function lookup if we've
 | 
						||
advanced more than one step in the promotion hierarchy: if we've decided
 | 
						||
to promote int4 to float8 (which will be a four-step chain through int8,
 | 
						||
numeric, float4) we sure want the thing to use a direct int4tofloat8
 | 
						||
conversion function if available, not a chain of four conversion
 | 
						||
functions.  So on balance I think we want to look in pg_proc once we've
 | 
						||
decided which conversion to perform.  The only reason for having
 | 
						||
typpromotebin is that the promotion metric will want to know which
 | 
						||
conversions are free, and we don't want to have to do a lookup in
 | 
						||
pg_proc for each alternative we consider, only the ones that are finally
 | 
						||
selected to be used.
 | 
						||
 | 
						||
I can think of at least one special case that still isn't cleanly
 | 
						||
handled under this scheme, and that is bpchar vs. varchar comparison.
 | 
						||
Currently, we have
 | 
						||
 | 
						||
regression=# select 'a'::bpchar = 'a '::bpchar;
 | 
						||
 ?column?
 | 
						||
----------
 | 
						||
 t
 | 
						||
(1 row)
 | 
						||
 | 
						||
This is correct since trailing blanks are insignificant in bpchar land,
 | 
						||
so the two values should be considered equal.  If we try
 | 
						||
 | 
						||
regression=# select 'a'::bpchar = 'a '::varchar;
 | 
						||
ERROR:  Unable to identify an operator '=' for types 'bpchar' and 'varchar'
 | 
						||
        You will have to retype this query using an explicit cast
 | 
						||
 | 
						||
which is pretty bogus but at least it saves the system from making some
 | 
						||
random choice about whether bpchar or varchar comparison rules apply.
 | 
						||
On the other hand,
 | 
						||
 | 
						||
regression=# select 'a'::bpchar = 'a '::text;
 | 
						||
 ?column?
 | 
						||
----------
 | 
						||
 f
 | 
						||
(1 row)
 | 
						||
 | 
						||
Here the bpchar value has been promoted to text and then text comparison
 | 
						||
(where trailing blanks *are* significant) is applied.  I'm not sure that
 | 
						||
we can really justify doing this in this case when we reject the bpchar
 | 
						||
vs varchar case, but maybe someone wants to argue that that's correct.
 | 
						||
 | 
						||
The natural setup in my type-promotion scheme would be that both bpchar
 | 
						||
and varchar link to 'text' as their promoted type.  If we do nothing
 | 
						||
special then text-style comparison would be used in a bpchar vs varchar
 | 
						||
comparison, which is arguably wrong.
 | 
						||
 | 
						||
One way to deal with this without introducing kluges into the type
 | 
						||
resolver is to provide a full set of bpchar vs text and text vs bpchar
 | 
						||
operators, and make sure that the promotion metric is such that these
 | 
						||
will be used in place of text vs text operators if they apply (which
 | 
						||
should hold, I think, for any reasonable metric).  This is probably
 | 
						||
the only way to get the "right" behavior in any case --- I think that
 | 
						||
the "right" behavior for such comparisons is to strip trailing blanks
 | 
						||
from the bpchar side but not the text/varchar side.  (I haven't checked
 | 
						||
to see if SQL92 agrees, though.)
 | 
						||
 | 
						||
Another issue is how to fit resolution of "unknown" literals into this
 | 
						||
scheme.  We could probably continue to handle them more or less as we
 | 
						||
do now, but they might complicate the promotion metric.
 | 
						||
 | 
						||
I am not clear yet on whether we'd still need the concept of "type
 | 
						||
categories" as they presently exist in the resolver.  It's possible
 | 
						||
that we wouldn't, which would be a nice simplification.  (If we do
 | 
						||
still need them, we should have a column in pg_type that defines the
 | 
						||
category of a type, instead of hard-wiring category assignments.)
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
From e99re41@DoCS.UU.SE Mon May 15 07:39:03 2000
 | 
						||
Received: from meryl.it.uu.se (root@meryl.it.uu.se [130.238.12.42])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA10251
 | 
						||
	for <pgman@candle.pha.pa.us>; Mon, 15 May 2000 07:39:01 -0400 (EDT)
 | 
						||
Received: from Zebra.DoCS.UU.SE (e99re41@Zebra.DoCS.UU.SE [130.238.9.158])
 | 
						||
	by meryl.it.uu.se (8.8.5/8.8.5) with ESMTP id NAA10849;
 | 
						||
	Mon, 15 May 2000 13:39:45 +0200 (MET DST)
 | 
						||
Received: from localhost (e99re41@localhost) by Zebra.DoCS.UU.SE (8.6.12/8.6.12) with ESMTP id NAA26523; Mon, 15 May 2000 13:39:44 +0200
 | 
						||
X-Authentication-Warning: Zebra.DoCS.UU.SE: e99re41 owned process doing -bs
 | 
						||
Date: Mon, 15 May 2000 13:39:44 +0200 (MET DST)
 | 
						||
From: Peter Eisentraut <e99re41@DoCS.UU.SE>
 | 
						||
Reply-To: Peter Eisentraut <peter_e@gmx.net>
 | 
						||
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
cc: Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						||
Subject: Re: [HACKERS] type conversion discussion 
 | 
						||
In-Reply-To: <20911.958339770@sss.pgh.pa.us>
 | 
						||
Message-ID: <Pine.GSO.4.02A.10005151309020.26399-100000@Zebra.DoCS.UU.SE>
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: TEXT/PLAIN; charset=iso-8859-1
 | 
						||
Content-Transfer-Encoding: 8bit
 | 
						||
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by candle.pha.pa.us id HAA10251
 | 
						||
Status: OR
 | 
						||
 | 
						||
On Sun, 14 May 2000, Tom Lane wrote:
 | 
						||
 | 
						||
> 1. Poor choice of type to attribute to numeric literals.  (A possible
 | 
						||
>    solution is sketched in my earlier message, but do we need similar
 | 
						||
>    mechanisms for other type categories?)
 | 
						||
 | 
						||
I think your plan looks good for the numerical land. (I'll ponder the oid
 | 
						||
issues in a second.) For other type categories, perhaps not. Should a line
 | 
						||
be promoted to a polygon so you can check if it contains a point? Or a
 | 
						||
polygon to a box? Higher dimensions? :-)
 | 
						||
 | 
						||
 | 
						||
> 2. Tensions between treating string literals as "unknown" type and
 | 
						||
>    as "text" type, per this thread so far.
 | 
						||
 | 
						||
Yes, while we're at it, let's look at this in detail. I claim that
 | 
						||
something of the form 'xxx' should always be text (or char or whatever),
 | 
						||
period. Let's consider the cases were this could potentially clash with
 | 
						||
the current behaviour:
 | 
						||
 | 
						||
a) The target type is unambiguously clear, e.g., UPDATE ... SET. Then you
 | 
						||
cast text to the target type. The effect is identical.
 | 
						||
 | 
						||
b) The target type is completely unspecified, e.g. CREATE TABLE AS SELECT
 | 
						||
'xxx'; This will currently create an "unknown" column. It should arguably
 | 
						||
create a "text" column.
 | 
						||
 | 
						||
Function argument resolution:
 | 
						||
 | 
						||
c) There is only one function and it has a "text" argument. No-brainer.
 | 
						||
 | 
						||
d) There is only one function and it has an argument other than text. Try
 | 
						||
to cast text to that type. (This is what's done in general, isn't it?)
 | 
						||
 | 
						||
e) The function is overloaded for many types, amongst which is text. Then
 | 
						||
call the text version. I believe this would currently fail, which I'd
 | 
						||
consider a deficiency.
 | 
						||
 | 
						||
f) The function is overloaded for many types, none of which is text. In
 | 
						||
that case you have to cast anyway, so you don't lose anything.
 | 
						||
 | 
						||
On thing to also keep in mind regarding required casting for (b) and (f)
 | 
						||
is that SQL never allowed literals of "fancy" types (e.g., DATE) to have
 | 
						||
undecorated 'yyyy-mm-dd' constants, you always have to say DATE
 | 
						||
'yyyy-mm-dd'. What Postgres allows is a convencience where DATE would be
 | 
						||
obvious or implied. In the end it's a win-win situation: you tell the
 | 
						||
system what you want, and your code is clearer.
 | 
						||
 | 
						||
 
 | 
						||
> 3. IS_BINARY_COMPATIBLE seems like a bogus concept.
 | 
						||
 | 
						||
At least it's bogus when used for types which are not actually binary
 | 
						||
compatible, e.g. int4 and oid. The result of the current implementation is
 | 
						||
that you can perfectly happily insert and retrieve negative numbers from
 | 
						||
oid fields.
 | 
						||
 | 
						||
I'm not so sure about the value of this particular equivalency anyway.
 | 
						||
AFAICS the only functions that make sense for oids are comparisons (incl.
 | 
						||
min, max), adding integers to them, subtracting one oid from another.
 | 
						||
Silent mangling with int4 means that you can multiply them, square them,
 | 
						||
add floating point numbers to them (doesn't really work in practice
 | 
						||
though), all things that have no business with oids.
 | 
						||
 | 
						||
I'd say define the operators that are useful for oids explicitly for oids
 | 
						||
and require casts for all others, so the users know what they're doing.
 | 
						||
The fact that an oid is also a number should be an implementation detail.
 | 
						||
 | 
						||
In my mind oids are like pointers in C. Indiscriminate mangling of
 | 
						||
pointers and integers in C has long been dismissed as questionable coding.
 | 
						||
 | 
						||
 | 
						||
Of course I'd be very willing to consider counterexamples to these
 | 
						||
theories ...
 | 
						||
 | 
						||
-- 
 | 
						||
Peter Eisentraut                  Sernanders v<>g 10:115
 | 
						||
peter_e@gmx.net                   75262 Uppsala
 | 
						||
http://yi.org/peter-e/            Sweden
 | 
						||
 | 
						||
 | 
						||
From tgl@sss.pgh.pa.us Tue Jun 13 04:58:20 2000
 | 
						||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24281
 | 
						||
	for <pgman@candle.pha.pa.us>; Tue, 13 Jun 2000 03:58:18 -0400 (EDT)
 | 
						||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						||
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA02571;
 | 
						||
	Tue, 13 Jun 2000 03:58:43 -0400 (EDT)
 | 
						||
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
cc: pgsql-hackers@postgresql.org
 | 
						||
Subject: Re: [HACKERS] Proposal for fixing numeric type-resolution issues 
 | 
						||
In-reply-to: <200006130741.DAA23502@candle.pha.pa.us> 
 | 
						||
References: <200006130741.DAA23502@candle.pha.pa.us>
 | 
						||
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
	message dated "Tue, 13 Jun 2000 03:41:56 -0400"
 | 
						||
Date: Tue, 13 Jun 2000 03:58:43 -0400
 | 
						||
Message-ID: <2568.960883123@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						||
> Again, anything to add to the TODO here?
 | 
						||
 | 
						||
IIRC, there was some unhappiness with the proposal you quote, so I'm
 | 
						||
not sure we've quite agreed what to do... but clearly something must
 | 
						||
be done.
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
 | 
						||
>> We've got a collection of problems that are related to the parser's
 | 
						||
>> inability to make good type-resolution choices for numeric constants.
 | 
						||
>> In some cases you get a hard error; for example "NumericVar + 4.4"
 | 
						||
>> yields
 | 
						||
>> ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
 | 
						||
>> You will have to retype this query using an explicit cast
 | 
						||
>> because "4.4" is initially typed as float8 and the system can't figure
 | 
						||
>> out whether to use numeric or float8 addition.  A more subtle problem
 | 
						||
>> is that a query like "... WHERE Int2Var < 42" is unable to make use of
 | 
						||
>> an index on the int2 column: 42 is resolved as int4, so the operator
 | 
						||
>> is int24lt, which works but is not in the opclass of an int2 index.
 | 
						||
>> 
 | 
						||
>> Here is a proposal for fixing these problems.  I think we could get this
 | 
						||
>> done for 7.1 if people like it.
 | 
						||
>> 
 | 
						||
>> The basic problem is that there's not enough smarts in the type resolver
 | 
						||
>> about the interrelationships of the numeric datatypes.  All it has is
 | 
						||
>> a concept of a most-preferred type within the category of numeric types.
 | 
						||
>> (We are abusing the most-preferred-type mechanism, BTW, because both
 | 
						||
>> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
 | 
						||
>> category!  This is in fact why the resolver can't make a choice for
 | 
						||
>> "numeric+float8".)  We need more intelligence than that.
 | 
						||
>> 
 | 
						||
>> I propose that we set up a strictly-ordered hierarchy of numeric
 | 
						||
>> datatypes, running from least preferred to most preferred:
 | 
						||
>> int2, int4, int8, numeric, float4, float8.
 | 
						||
>> Rather than simply considering coercions to the most-preferred type,
 | 
						||
>> the type resolver should use the following rules:
 | 
						||
>> 
 | 
						||
>> 1. No value will be down-converted (eg int4 to int2) except by an
 | 
						||
>> explicit conversion.
 | 
						||
>> 
 | 
						||
>> 2. If there is not an exact matching operator, numeric values will be
 | 
						||
>> up-converted to the highest numeric datatype present among the operator
 | 
						||
>> or function's arguments.  For example, given "int2 + int8" we'd up-
 | 
						||
>> convert the int2 to int8 and apply int8 addition.
 | 
						||
>> 
 | 
						||
>> The final piece of the puzzle is that the type initially assigned to
 | 
						||
>> an undecorated numeric constant should be NUMERIC if it contains a
 | 
						||
>> decimal point or exponent, and otherwise the smallest of int2, int4,
 | 
						||
>> int8, NUMERIC that will represent it.  This is a considerable change
 | 
						||
>> from the current lexer behavior, where you get either int4 or float8.
 | 
						||
>> 
 | 
						||
>> For example, given "NumericVar + 4.4", the constant 4.4 will initially
 | 
						||
>> be assigned type NUMERIC, we will resolve the operator as numeric plus,
 | 
						||
>> and everything's fine.  Given "Float8Var + 4.4", the constant is still
 | 
						||
>> initially numeric, but will be up-converted to float8 so that float8
 | 
						||
>> addition can be used.  The end result is the same as in traditional
 | 
						||
>> Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
 | 
						||
>> is initially typed as int2, since it fits, and we end up selecting
 | 
						||
>> int2lt, thereby allowing use of an int2 index.  (On the other hand,
 | 
						||
>> given "Int2Var < 100000", we'd end up using int4lt, which is correct
 | 
						||
>> to avoid overflow.)
 | 
						||
>> 
 | 
						||
>> A couple of crucial subtleties here:
 | 
						||
>> 
 | 
						||
>> 1. We are assuming that the parser or optimizer will constant-fold
 | 
						||
>> any conversion functions that are introduced.  Thus, in the
 | 
						||
>> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
 | 
						||
>> time execution begins, so there's no performance loss.
 | 
						||
>> 
 | 
						||
>> 2. We cannot lose precision by initially representing a constant as
 | 
						||
>> numeric and later converting it to float.  Nor can we exceed NUMERIC's
 | 
						||
>> range (the default 1000-digit limit is more than the range of IEEE
 | 
						||
>> float8 data).  It would not work as well to start out by representing
 | 
						||
>> a constant as float and then converting it to numeric.
 | 
						||
>> 
 | 
						||
>> Presently, the pg_proc and pg_operator tables contain a pretty fair
 | 
						||
>> collection of cross-datatype numeric operators, such as int24lt,
 | 
						||
>> float48pl, etc.  We could perhaps leave these in, but I believe that
 | 
						||
>> it is better to remove them.  For example, if int42lt is left in place,
 | 
						||
>> then it would capture cases like "Int4Var < 42", whereas we need that
 | 
						||
>> to be translated to int4lt so that an int4 index can be used.  Removing
 | 
						||
>> these operators will eliminate some code bloat and system-catalog bloat
 | 
						||
>> to boot.
 | 
						||
>> 
 | 
						||
>> As far as I can tell, this proposal is almost compatible with the rules
 | 
						||
>> given in SQL92: in particular, SQL92 specifies that an operator having
 | 
						||
>> both "approximate numeric" (float) and "exact numeric" (int or numeric)
 | 
						||
>> inputs should deliver an approximate-numeric result.  I propose
 | 
						||
>> deviating from SQL92 in a single respect: SQL92 specifies that a
 | 
						||
>> constant containing an exponent (eg 1.2E34) is approximate numeric,
 | 
						||
>> which implies that the result of an operator using it is approximate
 | 
						||
>> even if the other operand is exact.  I believe it's better to treat
 | 
						||
>> such a constant as exact (ie, type NUMERIC) and only convert it to
 | 
						||
>> float if the other operand is float.  Without doing that, an assignment
 | 
						||
>> like
 | 
						||
>> UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
 | 
						||
>> will not work as desired because the constant will be prematurely
 | 
						||
>> coerced to float, causing precision loss.
 | 
						||
>> 
 | 
						||
>> Comments?
 | 
						||
>> 
 | 
						||
>> regards, tom lane
 | 
						||
>> 
 | 
						||
 | 
						||
 | 
						||
> -- 
 | 
						||
>   Bruce Momjian                        |  http://www.op.net/~candle
 | 
						||
>   pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						||
>   +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						||
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						||
 | 
						||
From tgl@sss.pgh.pa.us Mon Jun 12 14:09:45 2000
 | 
						||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA01993
 | 
						||
	for <pgman@candle.pha.pa.us>; Mon, 12 Jun 2000 13:09:43 -0400 (EDT)
 | 
						||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						||
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id NAA01515;
 | 
						||
	Mon, 12 Jun 2000 13:10:01 -0400 (EDT)
 | 
						||
To: Peter Eisentraut <peter_e@gmx.net>
 | 
						||
cc: Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						||
        "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>,
 | 
						||
        PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						||
Subject: Re: [HACKERS] Adding time to DATE type 
 | 
						||
In-reply-to: <Pine.LNX.4.21.0006110322150.9195-100000@localhost.localdomain> 
 | 
						||
References: <Pine.LNX.4.21.0006110322150.9195-100000@localhost.localdomain>
 | 
						||
Comments: In-reply-to Peter Eisentraut <peter_e@gmx.net>
 | 
						||
	message dated "Sun, 11 Jun 2000 13:41:24 +0200"
 | 
						||
Date: Mon, 12 Jun 2000 13:10:00 -0400
 | 
						||
Message-ID: <1512.960829800@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
Status: ORr
 | 
						||
 | 
						||
Peter Eisentraut <peter_e@gmx.net> writes:
 | 
						||
> Bruce Momjian writes:
 | 
						||
>> Can someone give me a TODO summary for this issue?
 | 
						||
 | 
						||
> * make 'text' constants default to text type (not unknown)
 | 
						||
 | 
						||
> (I think not everyone's completely convinced on this issue, but I don't
 | 
						||
> recall anyone being firmly opposed to it.)
 | 
						||
 | 
						||
It would be a mistake to eliminate the distinction between unknown and
 | 
						||
text.  See for example my just-posted response to John Cochran on
 | 
						||
pgsql-general about why 'BOULEVARD'::text behaves differently from
 | 
						||
'BOULEVARD'::char.  If string literals are immediately assigned type
 | 
						||
text then we will have serious problems with char(n) fields.
 | 
						||
 | 
						||
I think it's fine to assign string literals a type of 'unknown'
 | 
						||
initially.  What we need to do is add a phase of type resolution that
 | 
						||
considers treating them as text, but only after the existing logic fails
 | 
						||
to deduce a type.
 | 
						||
 | 
						||
(BTW it might be better to treat string literals as defaulting to char(n)
 | 
						||
instead of text, allowing the normal promotion rules to replace char(n)
 | 
						||
with text if necessary.  Not sure if that would make things more or less
 | 
						||
confusing for operations that intermix fixed- and variable-width char
 | 
						||
types.)
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
From pgsql-hackers-owner+M1936@postgresql.org Sun Dec 10 13:17:54 2000
 | 
						||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA20676
 | 
						||
	for <pgman@candle.pha.pa.us>; Sun, 10 Dec 2000 13:17:54 -0500 (EST)
 | 
						||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						||
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eBAIGvZ40566;
 | 
						||
	Sun, 10 Dec 2000 13:16:57 -0500 (EST)
 | 
						||
	(envelope-from pgsql-hackers-owner+M1936@postgresql.org)
 | 
						||
Received: from sss.pgh.pa.us (sss.pgh.pa.us [209.114.132.154])
 | 
						||
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eBAI8HZ39820
 | 
						||
	for <pgsql-hackers@postgreSQL.org>; Sun, 10 Dec 2000 13:08:17 -0500 (EST)
 | 
						||
	(envelope-from tgl@sss.pgh.pa.us)
 | 
						||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						||
	by sss.pgh.pa.us (8.11.1/8.11.1) with ESMTP id eBAI82o28682;
 | 
						||
	Sun, 10 Dec 2000 13:08:02 -0500 (EST)
 | 
						||
To: Thomas Lockhart <lockhart@alumni.caltech.edu>
 | 
						||
cc: pgsql-hackers@postgresql.org
 | 
						||
Subject: [HACKERS] Unknown-type resolution rules, redux
 | 
						||
Date: Sun, 10 Dec 2000 13:08:02 -0500
 | 
						||
Message-ID: <28679.976471682@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
Precedence: bulk
 | 
						||
Sender: pgsql-hackers-owner@postgresql.org
 | 
						||
Status: OR
 | 
						||
 | 
						||
parse_coerce.c contains the following conversation --- I believe the
 | 
						||
first XXX comment is from me and the second from you:
 | 
						||
 | 
						||
    /*
 | 
						||
     * Still too many candidates? Try assigning types for the unknown
 | 
						||
     * columns.
 | 
						||
     *
 | 
						||
     * We do this by examining each unknown argument position to see if all
 | 
						||
     * the candidates agree on the type category of that slot.  If so, and
 | 
						||
     * if some candidates accept the preferred type in that category,
 | 
						||
     * eliminate the candidates with other input types.  If we are down to
 | 
						||
     * one candidate at the end, we win.
 | 
						||
     *
 | 
						||
     * XXX It's kinda bogus to do this left-to-right, isn't it?  If we
 | 
						||
     * eliminate some candidates because they are non-preferred at the
 | 
						||
     * first slot, we won't notice that they didn't have the same type
 | 
						||
     * category for a later slot.
 | 
						||
     * XXX Hmm. How else would you do this? These candidates are here because
 | 
						||
     * they all have the same number of matches on arguments with explicit
 | 
						||
     * types, so from here on left-to-right resolution is as good as any.
 | 
						||
     * Need a counterexample to see otherwise...
 | 
						||
     */
 | 
						||
 | 
						||
The comment is out of date anyway because it fails to mention the new
 | 
						||
rule about preferring STRING category.  But to answer your request for
 | 
						||
a counterexample: consider
 | 
						||
 | 
						||
	SELECT foo('bar', 'baz')
 | 
						||
 | 
						||
First, suppose the available candidates are
 | 
						||
 | 
						||
	foo(float8, int4)
 | 
						||
	foo(float8, point)
 | 
						||
 | 
						||
In this case, we examine the first argument position, see that all the
 | 
						||
candidates agree on NUMERIC category, so we consider resolving the first
 | 
						||
unknown input to float8.  That eliminates neither candidate so we move
 | 
						||
on to the second argument position.  Here there is a conflict of
 | 
						||
categories so we can't eliminate anything, and we decide the call is
 | 
						||
ambiguous.  That's correct (or at least Operating As Designed ;-)).
 | 
						||
 | 
						||
But now suppose we have
 | 
						||
 | 
						||
	foo(float8, int4)
 | 
						||
	foo(float4, point)
 | 
						||
 | 
						||
Here, at the first position we will still see that all candidates agree
 | 
						||
on NUMERIC category, and then we will eliminate candidate 2 because it
 | 
						||
isn't the preferred type in that category.  Now when we come to the
 | 
						||
second argument position, there's only one candidate left so there's
 | 
						||
no category conflict.  Result: this call is considered non-ambiguous.
 | 
						||
 | 
						||
This means there is a left-to-right bias in the algorithm.  For example,
 | 
						||
the exact same call *would* be considered ambiguous if the candidates'
 | 
						||
argument orders were reversed:
 | 
						||
 | 
						||
	foo(int4, float8)
 | 
						||
	foo(point, float4)
 | 
						||
 | 
						||
I do not like that.  You could maybe argue that earlier arguments are
 | 
						||
more important than later ones for functions, but it's harder to make
 | 
						||
that case for binary operators --- and in any case this behavior is
 | 
						||
extremely difficult to explain in prose.
 | 
						||
 | 
						||
To fix this, I think we need to split the loop into two passes.
 | 
						||
The first pass does *not* remove any candidates.  What it does is to
 | 
						||
look separately at each UNKNOWN-argument position and attempt to deduce
 | 
						||
a probable category for it, using the following rules:
 | 
						||
 | 
						||
* If any candidate has an input type of STRING category, use STRING
 | 
						||
category; else if all candidates agree on the category, use that
 | 
						||
category; else fail because no resolution can be made.
 | 
						||
 | 
						||
* The first pass must also remember whether any candidates are of a
 | 
						||
preferred type within the selected category.
 | 
						||
 | 
						||
The probable categories and exists-preferred-type booleans are saved in
 | 
						||
local arrays.  (Note this has to be done this way because
 | 
						||
IsPreferredType currently allows more than one type to be considered
 | 
						||
preferred in a category ... so the first pass cannot try to determine a
 | 
						||
unique type, only a category.)
 | 
						||
 | 
						||
If we find a category for every UNKNOWN arg, then we enter a second loop
 | 
						||
in which we discard candidates.  In this pass we discard a candidate if
 | 
						||
(a) it is of the wrong category, or (b) it is of the right category but
 | 
						||
is not of preferred type in that category, *and* we found candidate(s)
 | 
						||
of preferred type at this slot.
 | 
						||
 | 
						||
If we end with exactly one candidate then we win.
 | 
						||
 | 
						||
It is clear in this algorithm that there is no order dependency: the
 | 
						||
conditions for keeping or discarding a candidate are fixed before we
 | 
						||
start the second pass, and do not vary depending on which other
 | 
						||
candidates were discarded before it.
 | 
						||
 | 
						||
Comments?
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
From pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org Sat Dec 29 15:47:47 2001
 | 
						||
Return-path: <pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org>
 | 
						||
Received: from rs.postgresql.org (server1.pgsql.org [64.39.15.238] (may be forged))
 | 
						||
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id fBTKlkT05111
 | 
						||
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 15:47:46 -0500 (EST)
 | 
						||
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						||
	by rs.postgresql.org (8.11.6/8.11.6) with ESMTP id fBTKhZN74322
 | 
						||
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 14:43:35 -0600 (CST)
 | 
						||
	(envelope-from pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org)
 | 
						||
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
 | 
						||
	by postgresql.org (8.11.3/8.11.4) with ESMTP id fBTKaem38452
 | 
						||
	for <pgsql-general@postgresql.org>; Sat, 29 Dec 2001 15:36:40 -0500 (EST)
 | 
						||
	(envelope-from pgman@candle.pha.pa.us)
 | 
						||
Received: (from pgman@localhost)
 | 
						||
	by candle.pha.pa.us (8.11.6/8.10.1) id fBTKaTg04256;
 | 
						||
	Sat, 29 Dec 2001 15:36:29 -0500 (EST)
 | 
						||
From: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
Message-ID: <200112292036.fBTKaTg04256@candle.pha.pa.us>
 | 
						||
Subject: Re: [GENERAL] Casting Varchar to Numeric
 | 
						||
In-Reply-To: <20011206150158.O28880-100000@megazone23.bigpanda.com>
 | 
						||
To: Stephan Szabo <sszabo@megazone23.bigpanda.com>
 | 
						||
Date: Sat, 29 Dec 2001 15:36:29 -0500 (EST)
 | 
						||
cc: Andy Marden <amarden@usa.net>, pgsql-general@postgresql.org
 | 
						||
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Precedence: bulk
 | 
						||
Sender: pgsql-general-owner@postgresql.org
 | 
						||
Status: OR
 | 
						||
 | 
						||
> On Mon, 3 Dec 2001, Andy Marden wrote:
 | 
						||
> 
 | 
						||
> > Martijn,
 | 
						||
> >
 | 
						||
> > It does work (believe it or not). I've now tried the method you mention
 | 
						||
> > below - that also works and is much nicer. I can't believe that PostgreSQL
 | 
						||
> > can't work this out. Surely implementing an algorithm that understands that
 | 
						||
> > if you can go from a ->b and b->c then you can certainly go from a->c. If
 | 
						||
> 
 | 
						||
> It's more complicated than that (and postgres does some of this but not
 | 
						||
> all), for example the cast text->float8->numeric potentially loses
 | 
						||
> precision and should probably not be an automatic cast for that reason.
 | 
						||
> 
 | 
						||
> > this is viewed as too complex a task for the internals - at least a diagram
 | 
						||
> > or some way of understanding how you should go from a->c would be immensely
 | 
						||
> > helpful wouldn't it! Daunting for anyone picking up the database and trying
 | 
						||
> > to do something simple(!)
 | 
						||
> 
 | 
						||
> There may be a need for documentation on this.  Would you like to write
 | 
						||
> some ;)
 | 
						||
 | 
						||
OK, I ran some tests:
 | 
						||
	
 | 
						||
	test=> create table test (x text);
 | 
						||
	CREATE
 | 
						||
	test=> insert into test values ('323');
 | 
						||
	INSERT 5122745 1
 | 
						||
	test=> select cast (x as numeric) from test;
 | 
						||
	ERROR:  Cannot cast type 'text' to 'numeric'
 | 
						||
 | 
						||
I can see problems with automatically casting numeric to text because
 | 
						||
you have to guess the desired format, but going from text to numeric
 | 
						||
seems quite easy to do.  Is there a reason we don't do it?
 | 
						||
 | 
						||
I can cast to integer and float8 fine:
 | 
						||
	
 | 
						||
	test=> select cast ( x as integer) from test;
 | 
						||
	 ?column? 
 | 
						||
	----------
 | 
						||
	      323
 | 
						||
	(1 row)
 | 
						||
 | 
						||
	test=> select cast ( x as float8) from test;
 | 
						||
	 ?column? 
 | 
						||
	----------
 | 
						||
	      323
 | 
						||
	(1 row)
 | 
						||
 | 
						||
-- 
 | 
						||
  Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						||
  pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						||
  +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						||
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						||
 | 
						||
---------------------------(end of broadcast)---------------------------
 | 
						||
TIP 2: you can get off all lists at once with the unregister command
 | 
						||
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						||
 | 
						||
From pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org Sat Dec 29 19:10:38 2001
 | 
						||
Return-path: <pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org>
 | 
						||
Received: from west.navpoint.com (west.navpoint.com [207.106.42.13])
 | 
						||
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id fBU0AbT23972
 | 
						||
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 19:10:37 -0500 (EST)
 | 
						||
Received: from rs.postgresql.org (server1.pgsql.org [64.39.15.238] (may be forged))
 | 
						||
	by west.navpoint.com (8.11.6/8.10.1) with ESMTP id fBTNVj008959
 | 
						||
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 18:31:45 -0500 (EST)
 | 
						||
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						||
	by rs.postgresql.org (8.11.6/8.11.6) with ESMTP id fBTNQrN78655
 | 
						||
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 17:26:53 -0600 (CST)
 | 
						||
	(envelope-from pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org)
 | 
						||
Received: from sss.pgh.pa.us ([192.204.191.242])
 | 
						||
	by postgresql.org (8.11.3/8.11.4) with ESMTP id fBTN8Fm47978
 | 
						||
	for <pgsql-general@postgresql.org>; Sat, 29 Dec 2001 18:08:15 -0500 (EST)
 | 
						||
	(envelope-from tgl@sss.pgh.pa.us)
 | 
						||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						||
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id fBTN7vg20245;
 | 
						||
	Sat, 29 Dec 2001 18:07:57 -0500 (EST)
 | 
						||
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
cc: Stephan Szabo <sszabo@megazone23.bigpanda.com>,
 | 
						||
   Andy Marden <amarden@usa.net>, pgsql-general@postgresql.org
 | 
						||
Subject: Re: [GENERAL] Casting Varchar to Numeric 
 | 
						||
In-Reply-To: <200112292036.fBTKaTg04256@candle.pha.pa.us> 
 | 
						||
References: <200112292036.fBTKaTg04256@candle.pha.pa.us>
 | 
						||
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						||
	message dated "Sat, 29 Dec 2001 15:36:29 -0500"
 | 
						||
Date: Sat, 29 Dec 2001 18:07:57 -0500
 | 
						||
Message-ID: <20242.1009667277@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
Precedence: bulk
 | 
						||
Sender: pgsql-general-owner@postgresql.org
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						||
> I can see problems with automatically casting numeric to text because
 | 
						||
> you have to guess the desired format, but going from text to numeric
 | 
						||
> seems quite easy to do.  Is there a reason we don't do it?
 | 
						||
 | 
						||
I do not think it's a good idea to have implicit casts between text and
 | 
						||
everything under the sun, because that essentially destroys the type
 | 
						||
checking system.  What we need (see previous discussion) is a flag in
 | 
						||
pg_proc that says whether a type conversion function may be invoked
 | 
						||
implicitly or not.  I've got no problem with offering text(numeric) and
 | 
						||
numeric(text) functions that are invoked by explicit function calls or
 | 
						||
casts --- I just don't want the system trying to use them to make
 | 
						||
sense of a bogus query.
 | 
						||
 | 
						||
> I can cast to integer and float8 fine:
 | 
						||
 | 
						||
I don't believe that those should be available as implicit casts either.
 | 
						||
They are, at the moment:
 | 
						||
 | 
						||
regression=# select 33 || 44.0;
 | 
						||
 ?column?
 | 
						||
----------
 | 
						||
 3344
 | 
						||
(1 row)
 | 
						||
 | 
						||
Ugh.
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
---------------------------(end of broadcast)---------------------------
 | 
						||
TIP 6: Have you searched our list archives?
 | 
						||
 | 
						||
http://archives.postgresql.org
 | 
						||
 |