Grouping digits in SQL
105 points
1 year ago
| 4 comments
| peter.eisentraut.org
| HN
JoelJacobson
1 year ago
[-]
I'm truly impressed by the tactfulness of first making the previously unintentionally legal but undocumented syntax an error in PostgreSQL 15. This is a textbook case of "Virtuous Intolerance" [1].

By taking this step, the PostgreSQL community set up an invaluable real-world litmus test that lasted an entire year. This strategic intolerance allowed them to gather critical insights into whether any existing code would break and to understand whether such cases were intentional or not.

The decision significantly de-risked the introduction of the new, much-welcomed syntax in SQL and PostgreSQL 16. I think this decision framework serves as a model for other projects on how to introduce changes responsibly without breaking the ecosystem.

Kudos to the PostgreSQL team for such thoughtful engineering and to Peter Eisentraut for leading this exemplary four-year journey!

[1] The Harmful Consequences of the Robustness Principle: https://www.ietf.org/archive/id/draft-iab-protocol-maintenan...

reply
noduerme
1 year ago
[-]
"Move slow and try not to break stuff" would be a nice antidote to the last 20 years. I hope we all go that way.

Speaking of DB changes, I wish someone had even warned me what kind of crazy nearly impossible to debug chaos could erupt from pushdown optimization when I had to migrate a huge codebase to mysql 8.

reply
derekperkins
1 year ago
[-]
Very interested in the details
reply
imrehg
1 year ago
[-]
Here's a more up-to-date link to [1] as it is now published as RFC-9413: https://www.rfc-editor.org/rfc/rfc9413.html#name-virtuous-in...
reply
JoelJacobson
1 year ago
[-]
Thanks for the link!
reply
hans_castorp
1 year ago
[-]
I typically use 1e8 instead 100000000 when writing large (integer) numbers
reply
fbdab103
1 year ago
[-]
In Python that returns a float.

    In [1]: 1e8
    Out[1]: 100000000.0
    In [2]: type(1e8)
    Out[2]: float
reply
marginalia_nu
1 year ago
[-]
Right, but what if you want to express something that does not end in a bunch of zeroes?
reply
NeoTar
1 year ago
[-]
I imagine there are cases where you want a large number not expressible in base-10 exponential notation. The obvious examples are the upper limits of integers - 2_147_483_647 and 4_294_967_295.
reply
thaumasiotes
1 year ago
[-]
Usually you'd want to express those as 0x7fff_ffff or 0xffff_ffff.
reply
oever
1 year ago
[-]
Adding syntax to improve readability of numbers is counterproductive. Improve the software the you use to read and edit the files. The extra syntax makes assumptions on what is easier to read.
reply
fbdab103
1 year ago
[-]

  For example, this was accepted:
   SELECT 123abc;
  This would parse the same as
   SELECT 123 abc;
SQL is just not fair.
reply
zoky
1 year ago
[-]
To be fair, that’s not part of the SQL standard, it was just an unintended quirk in PostgreSQL resulting from a series of historical choices. It actually kind of makes sense how that happened; once the requirement to use AS was dropped, it became necessary to find other ways to delimit parameters in a statement, and anything beginning with a number would be delimited by the first non-numeric character. So if the parser sees a token that starts with a digit, it reads until the first non-numeric digit, strips out any following whitespace, and starts the next token at the next character. Probably nobody ever thought to test for this behavior because it’s not in the SQL standard to specifically reject such input.
reply
rob74
1 year ago
[-]
What's a bit weird is that you can leave out the whitespace between the tokens. I'm not familiar with PostgreSQL, but I'm pretty sure "selectcount(*)fromtable" is not valid (although the parser could conceivably say "Ok, I have seen the token select, whatever comes after it must be the next token" - this would slow down parsing, but is theoretically possible)?
reply
taway1237
1 year ago
[-]
Parsing and tokenisation are usually (at least conceptually) separate steps. Your example almost certainly tokenizes as selectcount, (, *, ), fromtable. It's plausible "select 123abc" would tokenize as select, 123, abc because of a code quirk.
reply
paul-vernon
1 year ago
[-]
select"this statement is valid""Postgres"from"table"
reply
chasil
1 year ago
[-]
This appears to come from Oracle. It is present in the final release of 8i.

  SQL> select version from v$instance;

  VERSION
  -----------------
  8.1.7.4.0

  SQL> select 123abc from dual;

         ABC
  ----------
         123
reply