Show HN: 22 GB of Hacker News in SQLite
378 points
11 hours ago
| 30 comments
| hackerbook.dosaygo.com
| HN
Community, All the HN belong to you. This is an archive of hacker news that fits in your browser. When I made HN Made of Primes I realized I could probably do this offline sqlite/wasm thing with the whole GBs of archive. The whole dataset. So I tried it, and this is it. Have Hacker News on your device.

Go to this repo (https://github.com/DOSAYGO-STUDIO/HackerBook): you can download it. Big Query -> ETL -> npx serve docs - that's it. 20 years of HN arguments and beauty, can be yours forever. So they'll never die. Ever. It's the unkillable static archive of HN and it's your hands. That's my Year End gift to you all. Thank you for a wonderful year, have happy and wonderful 2026. make something of it.

simonw
8 hours ago
[-]
Don't miss how this works. It's not a server-side application - this code runs entirely in your browser using SQLite compiled to WASM, but rather than fetching a full 22GB database it instead uses a clever hack that retrieves just "shards" of the SQLite database needed for the page you are viewing.

I watched it in the browser network panel and saw it fetch:

  https://hackerbook.dosaygo.com/static-shards/shard_1636.sqlite.gz
  https://hackerbook.dosaygo.com/static-shards/shard_1635.sqlite.gz
  https://hackerbook.dosaygo.com/static-shards/shard_1634.sqlite.gz
As I paginated to previous days.

It's reminiscent of that brilliant SQLite.js VFS trick from a few years ago: https://github.com/phiresky/sql.js-httpvfs - only that one used HTTP range headers, this one uses sharded files instead.

The interactive SQL query interface at https://hackerbook.dosaygo.com/?view=query asks you to select which shards to run the query against, there are 1636 total.

reply
ncruces
4 hours ago
[-]
A read-only VFS doing this can be really simple, with the right API…

This is my VFS: https://github.com/ncruces/go-sqlite3/blob/main/vfs/readervf...

And using it with range requests: https://pkg.go.dev/github.com/ncruces/go-sqlite3/vfs/readerv...

And having it work with a Zstandard compressed SQLite database, is one library away: https://pkg.go.dev/github.com/SaveTheRbtz/zstd-seekable-form...

reply
nextaccountic
6 hours ago
[-]
Is there anything more production grade built around the same idea of HTTP range requests like that sqlite thing? This has so much potential
reply
Humphrey
5 hours ago
[-]
Yes — PMTiles is exactly that: a production-ready, single-file, static container for vector tiles built around HTTP range requests.

I’ve used it in production to self-host Australia-only maps on S3. We generated a single ~900 MB PMTiles file from OpenStreetMap (Australia only, up to Z14) and uploaded it to S3. Clients then fetch just the required byte ranges for each vector tile via HTTP range requests.

It’s fast, scales well, and bandwidth costs are negligible because clients only download the exact data they need.

https://docs.protomaps.com/pmtiles/

reply
simonw
5 hours ago
[-]
PMTiles is absurdly great software.
reply
Humphrey
5 hours ago
[-]
I know right! I'd never heard of HTTP Range requests until PMTiles - but gee it's an elegant solution.
reply
nextaccountic
2 hours ago
[-]
That's neat, but.. is it just for cartographic data?

I want something like a db with indexes

reply
simonw
6 hours ago
[-]
There was a UK government GitHub repo that did something interesting with this kind of trick against S3 but I checked just now and the repo is a 404. Here are my notes about what it did: https://simonwillison.net/2025/Feb/7/sqlite-s3vfs/

Looks like it's still on PyPI though: https://pypi.org/project/sqlite-s3vfs/

You can see inside it with my PyPI package explorer: https://tools.simonwillison.net/zip-wheel-explorer?package=s...

reply
simonw
5 hours ago
[-]
I recovered it from https://archive.softwareheritage.org/browse/origin/directory... and pushed a fresh copy to GitHub here:

https://github.com/simonw/sqlite-s3vfs

This comment was helpful in figuring out how to get a full Git clone out of the heritage archive: https://news.ycombinator.com/item?id=37516523#37517378

Here's a TIL I wrote up of the process: https://til.simonwillison.net/github/software-archive-recove...

reply
QuantumNomad_
5 hours ago
[-]
I also have a locally cloned copy of that repo from when it was on GitHub. Same latest commit as your copy of it.

From what I see in GitHub in your copy of the repo, it looks like you don’t have the tags.

Do you have the tags locally?

If you don’t have the tags, I can push a copy of the repo to GitHub too and you can get the tags from my copy.

reply
simonw
4 hours ago
[-]
I don't have the tags! It would be awesome if you could push that.
reply
QuantumNomad_
4 hours ago
[-]
reply
simonw
4 hours ago
[-]
Thanks for that, though actually it turns out I had them after all - I needed to run:

  git push --tags origin
reply
QuantumNomad_
4 hours ago
[-]
All the better :)
reply
bspammer
3 hours ago
[-]
Doing all this in an hour is such a good example of how absurdly efficient you can be with LLMs.
reply
AceJohnny2
5 hours ago
[-]
didn't you do something similar for Datasette, Simon?
reply
simonw
5 hours ago
[-]
Nothing smart with HTTP range requests yet - I have https://lite.datasette.io which runs the full Python server app in the browser via WebAssembly and Pyodide but it still works by fetching the entire SQLite file at once.
reply
AceJohnny2
5 hours ago
[-]
reply
billywhizz
58 minutes ago
[-]
i played around with this a while back. you can see a demo here. it also lets you pull new WAL segments in and apply them to the current database. never got much time to go any further with it than this.

https://just.billywhizz.io/sqlite/demo/#https://raw.githubus...

reply
mootothemax
8 minutes ago
[-]
This is pretty much well what is so remarkable about parquet files; not only do you get seekable data, you can fetch only the columns you want too.

I believe that there are also indexing opportunities (not necessarily via eg hive partitioning) but frankly - am kinda out of my depth pn it.

reply
__turbobrew__
1 hour ago
[-]
gdal vsis3 dynamically fetches chunks of rasters from s3 using range requests. It is the underlying technology for several mapping systems.

There is also a file format to optimize this https://cogeo.org/

reply
omneity
2 hours ago
[-]
I tried to implement something similar to optimize sampling semi-random documents from (very) large datasets on Huggingface, unfortunately their API doesn't support range requests well.
reply
ericd
6 hours ago
[-]
This is somewhat related to a large dataset browsing service a friend and I worked on a while back - we made index files, and the browser ran a lightweight query planner to fetch static chunks which could be served from S3/torrents/whatever. It worked pretty well, and I think there’s a lot of potential for this style of data serving infra.
reply
6510
4 hours ago
[-]
I want to see a bittorrent version :P
reply
nextaccountic
2 hours ago
[-]
Maybe webtorrent-based?
reply
maxloh
39 minutes ago
[-]
I am curios why they don't use a single file and HTTP Range Requests instead. PMTiles (a distribution of OpenStreetMap) uses that.
reply
meander_water
1 hour ago
[-]
I love this so much, on my phone this is much faster than actual HN (I know it's only a read-only version).

Where did you get the 22GB figure from? On the site it says:

> 46,399,072 items, 1,637 shards, 8.5GB, spanning Oct 9, 2006 to Dec 28, 2025

reply
amitmahbubani
1 hour ago
[-]
> Where did you get the 22GB figure from?

The HN post title (:

reply
meander_water
1 hour ago
[-]
Hah, well that's embarrassing
reply
sodafountan
58 minutes ago
[-]
The GitHub page is no longer available, which is a shame because I'm really interested in how this works.

How was the entirety of HN stored in a single SQLite database? In other words, how was the data acquired? And how does the page load instantly if there's 22GB of data having to be downloaded to the browser?

reply
tehlike
8 hours ago
[-]
Vfs support is amazing.
reply
kamranjon
3 hours ago
[-]
It'd be great if you could add it to Kiwix[1] somehow (not sure what the process is for that but 100rabbits figured it out for their site) - I use it all the time now that I have a dumb phone - I have the entirety of wikipedia, wiktionary and 100rabbits all offline.

https://kiwix.org/en/

reply
codazoda
28 minutes ago
[-]
I love that you have 100r.ca on that short list.
reply
yread
6 hours ago
[-]
I wonder how much smaller it could get with some compression. You could probably encode "This website hijacks the scrollbar and I don't like it" comments into just a few bits.
reply
Rendello
5 hours ago
[-]
The hard-coded dictionary wouldn't be much stranger than Brotli's:

https://news.ycombinator.com/item?id=27160590

reply
hamburglar
1 hour ago
[-]
It might be a neat experiment to use ai to produce canonicalized paraphrasings of HN arguments so they could be compared directly and compress well.
reply
jacquesm
5 hours ago
[-]
That's at least 45%, then you can leave out all of my comments and you're left with only 5!
reply
kristianp
4 hours ago
[-]
I tried "select * from items limit 10" and it is slowly iterating through the shards without returning. I got up to 60 shards before I stopped. Selecting just one shard makes that query return instantly. As mentioned elsewhere I think duckdb can work faster by only reading the part of a parquet file it needs over http.

I was getting an error that the users and user_domains tables aren't available, but you just need to change the shard filter to the user stats shard.

reply
piperswe
1 hour ago
[-]
Doesn't `LIMIT` just limit the amount of rows returned, rather than the amount read & processed?
reply
lucb1e
1 hour ago
[-]
That's what it does, but if I'm not mistaken (at least in my experience with MariaDB) it'll also return immediately once it ran up to the limit and not try to process further rows. If you have an expensive subquery in the SELECT (...) AS `column_name`, it won't run that for every row before returning the first 10 (when using LIMIT 10) unless you ORDERed BY that column_name. Other components like the WHERE clause might also require that it reads every row before finding the ten matches. So mostly yes but not necessarily
reply
ncruces
4 hours ago
[-]
That's odd. If it was a VFS, that's not what I'd expect would happen. Maybe it's not a VFS?
reply
zkmon
8 hours ago
[-]
Similar to Single-page applications (SPA), single-table application (STA) might become a thing. Just a shard a table on multiple keys and serve the shards as static files, provided that the data is Ok to share, similar to sharing static html content.
reply
jhd3
7 hours ago
[-]
[The Baked Data architectural pattern](https://simonwillison.net/2021/Jul/28/baked-data/)
reply
jesprenj
7 hours ago
[-]
do you mean single database? it'd be quite hard if not impossible to make applications using a single table (no relations). reddit did it though, they have a huge table of "things" iirc.
reply
mburns
7 hours ago
[-]
That is a common misconception.

> Next, we've got more than just two tables. The quote/paraphrase doesn't make it clear, but we've got two tables per thing. That means Accounts have an "account_thing" and an "account_data" table, Subreddits have a "subreddit_thing" and "subreddit_data" table, etc.

https://www.reddit.com/r/programming/comments/z9sm8/comment/...

reply
rplnt
5 hours ago
[-]
And the important lesson from that the k/v-like aspect of it. That the "schema" is horizontal (is that a thing?) and not column-based. But I actually only read it on their blog IIRC and never even got the full details - that there's still a third ID column. Thanks for the link.
reply
carbocation
9 hours ago
[-]
That repo is throwing up a 404 for me.

Question - did you consider tradeoffs between duckdb (or other columnar stores) and SQLite?

reply
keepamovin
9 hours ago
[-]
No, I just went straight to sqlite. What is duckdb?
reply
fsiefken
9 hours ago
[-]
DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It's designed to provide high performance on complex queries against large databases in embedded configuration.

It has transparent compression built-in and has support for natural language queries. https://buckenhofer.com/2025/11/agentic-ai-with-duckdb-and-s...

"DICT FSST (Dictionary FSST) represents a hybrid compression technique that combines the benefits of Dictionary Encoding with the string-level compression capabilities of FSST. This approach was implemented and integrated into DuckDB as part of ongoing efforts to optimize string storage and processing performance." https://homepages.cwi.nl/~boncz/msc/2025-YanLannaAlexandre.p...

reply
simonw
8 hours ago
[-]
One interesting feature of DuckDB is that it can run queries against HTTP ranges of a static file hosted via HTTPS, and there's an official WebAssembly build of it that can do that same trick.

So you can dump e.g. all of Hacker News in a single multi-GB Parquet file somewhere and build a client-side JavaScript application that can run queries against that without having to fetch the whole thing.

You can run searches on https://lil.law.harvard.edu/data-gov-archive/ and watch the network panel to see DuckDB in action.

reply
cess11
9 hours ago
[-]
It is very similar to SQLite in that it can run in-process and store its data as a file.

It's different in that it is tailored to analytics, among other things storage is columnar, and it can run off some common data analytics file formats.

reply
1vuio0pswjnm7
5 hours ago
[-]
"What is duckdb?"

duckdb is a 45M dynamically-linked binary (amd64)

sqlite3 1.7M static binary (amd64)

DuckDB is a 6yr-old project

SQLite is a 25yr-old project

reply
1vuio0pswjnm7
2 hours ago
[-]
I like SQLite
reply
jacquesm
5 hours ago
[-]
Maybe it got nuked by MS? The rest of their repo's are up.
reply
3eb7988a1663
9 hours ago
[-]
While I suspect DuckDB would compress better, given the ubiquity of SQLite, it seems a fine standard choice.
reply
linhns
9 hours ago
[-]
Not the author here. I’m not sure about DuckDB, but SQLite allows you to simply use a file as a database and for archiving, it’s really helpful. One file, that’s it.
reply
cobolcomesback
9 hours ago
[-]
DuckDB does as well. A super simplified explanation of duckdb is that it’s sqlite but columnar, and so is better for analytics of large datasets.
reply
formerly_proven
9 hours ago
[-]
The schema is this: items(id INTEGER PRIMARY KEY, type TEXT, time INTEGER, by TEXT, title TEXT, text TEXT, url TEXT

Doesn't scream columnar database to me.

reply
embedding-shape
9 hours ago
[-]
At a glance, that is missing (at least) a `parent` or `parent_id` attribute which items in HN can have (and you kind of need if you want to render comments), see http://hn.algolia.com/api/v1/items/46436741
reply
agolliver
8 hours ago
[-]
Edges are a separate table
reply
m-p-3
5 hours ago
[-]
Looks like the repo was taken down (404).

That's too bad, I'd like to see the inner-working with a subset of data, even with placeholders for the posts and comments.

reply
octoberfranklin
19 minutes ago
[-]
But why would they take it down?
reply
3abiton
5 hours ago
[-]
That was fast. I was looking into recent HN datasets, and they are impossible find.
reply
xnx
5 hours ago
[-]
reply
gettingoverit
1 hour ago
[-]
If the last story on HN was at December 26, that is.
reply
scsh
54 minutes ago
[-]
It's available on BigQuery and is updated frequently enough(daily I think).
reply
Paul-E
8 hours ago
[-]
That's pretty neat!

I did something similar. I build a tool[1] to import the Project Arctic Shift dumps[2] of reddit into sqlite. It was mostly an exercise to experiment with Rust and SQLite (HN's two favorite topics). If you don't build a FTS5 index and import without WAL (--unsafe-mode), import of every reddit comment and submission takes a bit over 24 hours and produces a ~10TB DB.

SQLite offers a lot of cool json features that would let you store the raw json and operate on that, but I eschewed them in favor of parsing only once at load time. THat also lets me normalize the data a bit.

I find that building the DB is pretty "fast", but queries run much faster if I immediately vacuum the DB after building it. The vacuum operation is actually slower than the original import, taking a few days to finish.

[1] https://github.com/Paul-E/Pushshift-Importer

[2] https://github.com/ArthurHeitmann/arctic_shift/blob/master/d...

reply
s_ting765
8 hours ago
[-]
You could check out SQLite's auto_vacuum which reclaims space without rebuilding the entire db https://sqlite.org/pragma.html#pragma_auto_vacuum
reply
Paul-E
4 hours ago
[-]
I haven't tested that, so I'm not sure if it would work. The import only inserts rows, it doesn't delete, so I don't think that is the cause of fragmentation. I suspect this line in the vacuum docs:

> The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY.

means SQLite does something to organize by rowid and that this is doing most of the work.

Reddit post/comment IDs are 1:1 with integers, though expressed in a different base that is more friendly to URLs. I map decoded post/comment IDs to INTEGER PRIMARY KEYs on their respective tables. I suspect the vacuum operation sorts the tables by their reddit post ID and something about this sorting improves tables scans, which in turn helps building indices quickly after standing up the DB.

reply
fouc
46 minutes ago
[-]
Suddenly occurs to me that it would be neat to pair a small LLM (3-7B) with an HN dataset
reply
codazoda
24 minutes ago
[-]
Does the SQLite version of this already exist somewhere? The github link on the footer of the page fails for me.
reply
diyseguy
4 hours ago
[-]
reply
Sn0wCoder
6 hours ago
[-]
Site does not load on Firefox console error says 'Uncaught (in promise) TypeError: can't access property "wasm", sqlite3 is null'

Guess its common knowledge that SharedArrayBuffer (SQLite wasm) does not work with FF due to Cross-Origin Attacks (i just found out ;).

Once the initial chunk of data loads the rest load almost instantly on Chrome. Can you please fix the GitHub link (current 404) would like to peak at the code. Thank you!

reply
modeless
2 hours ago
[-]
It's really a shame that comment scores are hidden forever. Would the admins consider publishing them after stories are old enough that voting is closed? It would be great to have them for archives and search indices and projects like this.
reply
sieep
8 hours ago
[-]
What a reminder on how text is so much more efficient than video, its crazy! Could you imagine the same amount of knowledge (or dribble) but in video form? I wonder how large that would be.
reply
jacquesm
5 hours ago
[-]
That's what's so sad about youtube. 20 minute videos to encode a hundred words of usable content to get you to click on a link. The inefficiency is just staggering.
reply
Rendello
5 hours ago
[-]
Youtube can be excellent for explanations. A picture's worth a thousand words, and you can fit a lot of decent pictures in a 20 minute video. The signal-to-noise can be high, of course.
reply
ivanjermakov
8 hours ago
[-]
Average high quality 1080p60 video has bitrate of 5Mbps, which is equivalent to 120k English words per second. With average English speech being 150wpm, we end up with text being 50 thousand times more space efficient.

Converting 22GB of uncompressed text into video essay lands us at ~1PB or 1000TB.

reply
fsiefken
7 hours ago
[-]
one could use a video llm to generate the video, diagrams or the stills automatically based on the text. except when it's boardgames playthroughs or programming i just transcribe to text, summarise and read youtube video's.
reply
deskamess
7 hours ago
[-]
How do you read youtube videos? Very curious as I have been wanting to watch PDF's scroll by slowly on a large TV. I am interested in the workflow of getting a pdf/document into a scrolling video format. These days NotebookLM may be an option but I am curious if there is something custom. If I can get it into video form (mp4) then I can even deliver it via plex.
reply
fsiefken
2 hours ago
[-]
I use yt-dlp to download the transcript, and if it's not available i can get the audio file and run it through parakeet locally. Then I have the plain text, which could be read out loud (kind of defeating the purpose), but perhaps at triple speed with a computer voice that's still understandble at that speed. I could also summarize it with an llm. With pandoc or typst I can convert to single column or mult column pdf to print or watch on tv or my smart glasses. If I strip the vowels and make the font smaller I can fit more!

One could convert the Markdown/PDF to a very long image first with pandoc+wkhtml, then use ffmpeg to crop and move the viewport slowly over the image, this scrolls at 20 pixels per second for 30s - with the mpv player one could change speed dynamically through keys.

ffmpeg -loop 1 -i long_image.png -vf "crop=iw:ih/10:0:t*20" -t 30 -pix_fmt yuv420p output.mp4

Alternatively one could use a Rapid Serial Visual Presentation / Speedreading / Spritz technique to output to mp4 or use dedicated rsvp program where one can change speed.

One could also output to a braille 'screen'.

Scrolling mp4 text on the the TV or Laptop to read is a good idea for my mother and her macula degeneration, or perhaps I should make use of an easier to see/read magnification browser plugin tool.

reply
Barbing
7 hours ago
[-]
Can be nice to pull a raw transcript and have it formatted as HTML (formatting/punctuation fixes applied).

Best locally of course to avoid “I burned a lake for this?” guilt.

reply
fsiefken
2 hours ago
[-]
yes, yt-dlp can download the transcript, and if it's not available i can get the audio file and run it through parakeet locally.
reply
3eb7988a1663
2 hours ago
[-]
Did anyone get a copy of this before it was pulled? If GitHub is not keen, could it be uploaded to HuggingFace or some other service which hosts large assets?

I have always known I could scrape HN, but I would much rather take a neat little package.

reply
abixb
9 hours ago
[-]
Wonder if you could turn this into a .zim file for offline browsing with an offline browser like Kiwix, etc. [0]

I've been taking frequent "offline-only-day" breaks to consolidate whatever I've been learning, and Kiwix has been a great tool for reference (offline Wikipedia, StackOverflow and whatnot).

[0] https://kiwix.org/en/the-new-kiwix-library-is-available/

reply
Barbing
7 hours ago
[-]
Oh this should TOTALLY be available to those who are scrolling through sources on the Kiwix app!
reply
tevon
8 hours ago
[-]
The link seems to be down, was it taken down?
reply
scsh
8 hours ago
[-]
Probably just forgot to make it public.
reply
dspillett
5 hours ago
[-]
Is there a public dump of the data anywhere that this is based upon, or have they scraped it themselves?

Such as DB might be entertaining to play with, and the threadedness of comments would be useful for beginners to practise efficient recursive queries (more so than the StackExchange dumps, for instance).

reply
thomasmarton
5 hours ago
[-]
While not a dump per se, there is an API where you can get HN data programmatically, no scraping needed.

https://github.com/HackerNews/API

reply
zX41ZdbW
9 hours ago
[-]
The query tab looks quite complex with all these content shards: https://hackerbook.dosaygo.com/?view=query

I have a much simpler database: https://play.clickhouse.com/play?user=play#U0VMRUNUIHRpbWUsI...

reply
embedding-shape
9 hours ago
[-]
Does your database also runs offline/locally in the browser? Seems to be the reason for the large number of shards.
reply
spit2wind
6 hours ago
[-]
This is pretty neat! The calendar didn't work well for me. I could only seem to navigate by month. And when I selected the earliest day (after much tapping), nothing seemed to be updated.

Nonetheless, random access history is cool.

reply
yupyupyups
9 hours ago
[-]
1 hour passed and it's already nuked?

Thank you btw

reply
dmarwicke
6 hours ago
[-]
22gb for mostly text? tried loading the site, it's pretty slow. curious how the query performance is with this much data in sqlite
reply
layer8
5 hours ago
[-]
Apparently the comment counts are only the top-level comments?

It would be nice for the thread pages to show a comment count.

reply
joshcsimmons
4 hours ago
[-]
Link appears broken
reply
ra
4 hours ago
[-]
confirmed - I wonder what happened?
reply
KomoD
5 hours ago
[-]
How do I download it? That repo is a 404.
reply
wslh
9 hours ago
[-]
Is this updated regularly? 404 on GitHub as the other comment.

With all due respect it would be great if there is an official HN public dump available (and not requiring stuff such as BigQuery which is expensive).

reply
scsh
46 minutes ago
[-]
The BQ dataset is only ~17GB and the free tier of BQ lets you query 1TB per month. If you're not doing select * on every query you should be able to do a lot with that.
reply
sirjaz
8 hours ago
[-]
This would be awesome as a cross platform app.
reply
solarized
4 hours ago
[-]
Beautiful !

2026 prayer: for all you AI junkies—please don’t pollute H/N with your dirty AI gaming.

Don’t bot posts, comments, or upvote/downvote just to maximize karma. Please.

We can’t identify anymore who’s a bot and who’s human. I just want to hang out with real humans here.

reply
asdefghyk
10 hours ago
[-]
How much space is needed? ...for the data .... Im wondering if it would work on a tablet? ....
reply
asdefghyk
44 minutes ago
[-]
FYI I did NOT see the size info in the title. Impossible to edit / delete my comment now ........
reply
keepamovin
10 hours ago
[-]
~9GB gzipped.
reply
fao_
8 hours ago
[-]
> Community, All the HN belong to you. This is an archive of hacker news that fits in your browser.

> 20 years of HN arguments and beauty, can be yours forever. So they'll never die. Ever. It's the unkillable static archive of HN and it's your hands

I'm really sorry to have to ask this, but this really feels like you had an LLM write it?

reply
jesprenj
7 hours ago
[-]
I doubt it. "hacker news" spelled lowercase? comma after "beauty"? missing "in" after "it's"? i doubt an LLM would make such syntax mistakes. it's just good writing, that's also possible these days.
reply
walthamstow
8 hours ago
[-]
There's a thing in soccer at the moment where a tackle looks fine in realtime but when the video referee shows it to the onpitch referee, they show the impact in slo-mo over and over again and it always looks way worse.

I wonder if there's something like this going on here. I never thought it was LLM on first read, and I still don't, but when you take snippets and point at them it makes me think maybe they are

reply
Insanity
6 hours ago
[-]
Even if so, would it have mattered? The point is showing off the SQLite DB.

But it didn’t read LLM generated IMO.

reply
rantingdemon
8 hours ago
[-]
Why do you say that?
reply
sundarurfriend
8 hours ago
[-]
Because anything that even slightly differs from the standard American phrasing of something must be "LLM generated" these days.
reply
JavGull
8 hours ago
[-]
With the em dashes I see you. But at this point idrc so long as it reads well. Everyone uses spell check…
reply
naikrovek
7 hours ago
[-]
I add em dashes to everything I write now, solely to throw people who look for them off. Lots of editors add them automatically when you have two sequential dashes between words — a common occurrence, like that one. And this is is Chrome on iOS doing it automatically.

Ooh, I used “sequential”, ooh, I used an em dash. ZOMG AI IS COMING FOR US ALL

reply
3eb7988a1663
2 hours ago
[-]
Anyone demonstrating above a high-school vocabulary/reading level is obviously a machine.
reply
Barbing
7 hours ago
[-]
Ya—in fact, globally replaced on iOS (sent from Safari)

Also for reference: “this shortcut can be toggled using the switch labeled 'Smart Punctuation' in General > Keyboard settings.”

reply
deadbabe
8 hours ago
[-]
Sometimes I want to write more creatively, but then worry I’ll be accused of being an LLM. So I dumb it down. Remove the colorful language. Conform.
reply
ssl-3
4 hours ago
[-]
Fuck 'em.

Always write what you want, however you want to write it. If some reader somewhere decides to be judgemental because of — you know — an em dash or an X/Y comparison or a complement or some other thing that they think pins you down as being a bot, then that's entirely their own problem. Not yours.

They observe the reality that they deserve.

reply
deadbabe
2 hours ago
[-]
You’re absolutely right. It’s not my problem, it’s their problem.
reply
naikrovek
7 hours ago
[-]
> I'm really sorry to have to ask this, but this really feels like you had an LLM write it?

Ending a sentence with a question mark doesn’t automatically make your sentence a question. You didn’t ask anything. You stated an opinion and followed it with a question mark.

If you intended to ask if the text was written by AI, no, you don’t have to ask that.

I am so damn tired of the “that didn’t happen” and the “AI did that” people when there is zero evidence of either being true.

These people are the most exhausting people I have ever encountered in my entire life.

reply
jacquesm
5 hours ago
[-]
You're right. Unfortunately they are also more and more often right.
reply
abetusk
5 hours ago
[-]
Alas, HN does not belong to us, and the existence of projects like this are subject to the whims of the legal owners of HN.

From the terms of use [0]:

"""

Commercial Use: Unless otherwise expressly authorized herein or in the Site, you agree not to display, distribute, license, perform, publish, reproduce, duplicate, copy, create derivative works from, modify, sell, resell, exploit, transfer or upload for any commercial purposes, any portion of the Site, use of the Site, or access to the Site. The buying, exchanging, selling and/or promotion (commercial or otherwise) of upvotes, comments, submissions, accounts (or any aspect of your account or any other account), karma, and/or content is strictly prohibited, constitutes a material breach of these Terms of Use, and could result in legal liability.

"""

[0] https://www.ycombinator.com/legal/#tou

reply
tom1337
3 hours ago
[-]
But is this really a commercial use? There doesn’t seem to be any intention of monetising this so I guess it doesn’t as specify commercial?
reply