Grouping digits in SQL
105 points
2 months ago
| 4 comments
| peter.eisentraut.org
| HN
JoelJacobson
2 months ago
[-]
I'm truly impressed by the tactfulness of first making the previously unintentionally legal but undocumented syntax an error in PostgreSQL 15. This is a textbook case of "Virtuous Intolerance" [1].

By taking this step, the PostgreSQL community set up an invaluable real-world litmus test that lasted an entire year. This strategic intolerance allowed them to gather critical insights into whether any existing code would break and to understand whether such cases were intentional or not.

The decision significantly de-risked the introduction of the new, much-welcomed syntax in SQL and PostgreSQL 16. I think this decision framework serves as a model for other projects on how to introduce changes responsibly without breaking the ecosystem.

Kudos to the PostgreSQL team for such thoughtful engineering and to Peter Eisentraut for leading this exemplary four-year journey!

[1] The Harmful Consequences of the Robustness Principle: https://www.ietf.org/archive/id/draft-iab-protocol-maintenan...

reply
noduerme
2 months ago
[-]
"Move slow and try not to break stuff" would be a nice antidote to the last 20 years. I hope we all go that way.

Speaking of DB changes, I wish someone had even warned me what kind of crazy nearly impossible to debug chaos could erupt from pushdown optimization when I had to migrate a huge codebase to mysql 8.

reply
derekperkins
2 months ago
[-]
Very interested in the details
reply
imrehg
2 months ago
[-]
Here's a more up-to-date link to [1] as it is now published as RFC-9413: https://www.rfc-editor.org/rfc/rfc9413.html#name-virtuous-in...
reply
JoelJacobson
2 months ago
[-]
Thanks for the link!
reply
hans_castorp
2 months ago
[-]
I typically use 1e8 instead 100000000 when writing large (integer) numbers
reply
fbdab103
2 months ago
[-]
In Python that returns a float.

    In [1]: 1e8
    Out[1]: 100000000.0
    In [2]: type(1e8)
    Out[2]: float
reply
marginalia_nu
2 months ago
[-]
Right, but what if you want to express something that does not end in a bunch of zeroes?
reply
NeoTar
2 months ago
[-]
I imagine there are cases where you want a large number not expressible in base-10 exponential notation. The obvious examples are the upper limits of integers - 2_147_483_647 and 4_294_967_295.
reply
thaumasiotes
2 months ago
[-]
Usually you'd want to express those as 0x7fff_ffff or 0xffff_ffff.
reply
oever
2 months ago
[-]
Adding syntax to improve readability of numbers is counterproductive. Improve the software the you use to read and edit the files. The extra syntax makes assumptions on what is easier to read.
reply
fbdab103
2 months ago
[-]

  For example, this was accepted:
   SELECT 123abc;
  This would parse the same as
   SELECT 123 abc;
SQL is just not fair.
reply
zoky
2 months ago
[-]
To be fair, that’s not part of the SQL standard, it was just an unintended quirk in PostgreSQL resulting from a series of historical choices. It actually kind of makes sense how that happened; once the requirement to use AS was dropped, it became necessary to find other ways to delimit parameters in a statement, and anything beginning with a number would be delimited by the first non-numeric character. So if the parser sees a token that starts with a digit, it reads until the first non-numeric digit, strips out any following whitespace, and starts the next token at the next character. Probably nobody ever thought to test for this behavior because it’s not in the SQL standard to specifically reject such input.
reply
rob74
2 months ago
[-]
What's a bit weird is that you can leave out the whitespace between the tokens. I'm not familiar with PostgreSQL, but I'm pretty sure "selectcount(*)fromtable" is not valid (although the parser could conceivably say "Ok, I have seen the token select, whatever comes after it must be the next token" - this would slow down parsing, but is theoretically possible)?
reply
taway1237
2 months ago
[-]
Parsing and tokenisation are usually (at least conceptually) separate steps. Your example almost certainly tokenizes as selectcount, (, *, ), fromtable. It's plausible "select 123abc" would tokenize as select, 123, abc because of a code quirk.
reply
paul-vernon
2 months ago
[-]
select"this statement is valid""Postgres"from"table"
reply
chasil
2 months ago
[-]
This appears to come from Oracle. It is present in the final release of 8i.

  SQL> select version from v$instance;

  VERSION
  -----------------
  8.1.7.4.0

  SQL> select 123abc from dual;

         ABC
  ----------
         123
reply