FilterHN

How to record and retrieve anything you've ever had to look up twice

181 points

by Curiositry

1 month ago

| past

| 26 comments

| ellanew.com

| HN

▲

evanjrowley

1 month ago

[-]

In my lists of Pros and Cons for sticking with the Google Pixel ecosystem, one of the Cons is the fact that Google definitely does not want you to have this valuable capability. If you stop looking things up, then you won't be looking at their search engine, their ads, and their recommendation algorithms. Every platform wants you to do that. It's why bookmarks in Google Chrome lack useful features like tagging. It's one of the reasons why so many vendors try to lock your data inside their walled gardens. Apple is well known for walled gardens, but for the most part, you can be sure they will let you change your default search engine without much hassle. They won't care so much if you want to use something like SearXNG to prioritize your knowledgebase first, however, the App Store, Apple Music, and Apple TV are the same story as Google - any attempts to influence the search results in your favor will be actively fought against.

▲

Barbing

1 month ago

[-]

>Apple is well known for walled gardens, but for the most part, you can be sure they will let you change your default search engine without much hassle.

Respectfully, please, this below is an absolute joke--has it changed in a decade?

https://imgz.org/i6oyQ3QG.jpg

Image: Apple provides easy changing for Google, Yahoo, Bing, DuckDuckGo, and Ecosia. (That poor paid search engine that starts with a K has to mess around with extensions; I'm unaffiliated.)

Plus, even though I have it set to DuckDuckGo, when I ask Siri to "search {query}", it searches Google. So even the default I have set is not actually truly the all-around default. Embarrassing to have locked this down as if I didn't drop a grand on the phone.

Fun fact: for years now, asking Siri to "search Google Images" results in whitelabeled Bing Images (thankfully, exceptionally easy to remedy with the excellent Shortcuts: "Picture-Search" for Google, quality difference night & day unfortunately... anyway go SearXNG!, it lets you keep your soul).

▲

evanjrowley

1 month ago

[-]

This enlightening fact has burst my hubris bubble. Shows how much I know about iOS (e.g., the marketing). Well, I hope the EU comes up with a good alternative. I'm sure they'll have some law that requires all searches be age-verified and reported to the government, but at least we can bet the implementation will be "sovereign" and distanced from FAANG.

▲

Barbing

1 month ago

[-]

—-

Oops btw I said “exceptionally easy to remedy” when that isn’t true because I can’t say “{shortcut name} {query}”; must be two separate commands. Not first class unless I want Bing Images hehe (“search images {query}”, single dictation to Siri).

▲

retsibsi

1 month ago

[-]

This seems obvious in retrospect! I've often wondered why Chrome's bookmarks (and, to a lesser extent, history) system is so bad, even to the point of thinking it was a bit suspicious, but I didn't put 2 and 2 together and realise that a better version could directly hurt Google's search business.

▲

autoexec

1 month ago

[-]

> Google definitely does not want you to have this valuable capability. If you stop looking things up, then you won't be looking at their search engine, their ads, and their recommendation algorithms.

Google is very interested in knowing about whatever you're interested in, and in knowing when, how often, and for how long you're interested in those things. In addition to looking at their search engine, their ads, and their recommendations, you're also feeding them more and more data about you.

▲

direwolf20

1 month ago

[-]

This is why I switched to Kagi. Even if it's just proxying Google, it doesn't tell Google who I am.

▲

autoexec

1 month ago

[-]

My problem with Kagi is that it requires a login to be useful meaning that it always knows exactly who you are and can associate your searches to your identity. If you don't log into google they can only guess (although honestly they'll probably guess accurately given enough search data).

I think I remember Kagi was working on a way to allow users to create an anonymous account and if so I'll have to take another look at them.

▲

freediver

1 month ago

[-]

Kagi Privacy Pass

https://blog.kagi.com/kagi-privacy-pass

▲

evanjrowley

1 month ago

[-]

FWIW, I'm also a paid Kagi user and would like very much if I could use it with SearXNG or potentially have it include my own self-hosted services as part of my personal search results.

▲

theturtletalks

1 month ago

[-]

Extend that to marketplaces too, all their search UIs have dark patterns forcing you to see their “recommendations” instead of being able to manipulate the results like you want.

▲

rootsudo

1 month ago

[-]

Thank you for your insight and comment. I’ve witnessed this behavior but never thought to question it until now. It’s amazing how simple and devious it can be.

▲

sureglymop

1 month ago

[-]

I've done this for years and it has saved me many times. I just make a new markdown file everyday and often search through them with ripgrep.

One secret here is to have a good UX for adding metadata. For example, in obsidian a search window pops up when you write `#[[`. Or when you type `#` to create a tag, a window with all preexisting tags shows up.

However, lately I've been working on a new side project in order to additionally automatically record/collect what I am doing on digital devices. Basically I am building a "personal" spyware/data collection software suite. Kind of in the same realm as ms recall but more focused on security/privacy with sensible cryptographic defaults where needed.

▲

cachius

1 month ago

[-]

Checkout Timelinize and Perkeep!

https://timelinize.com/ https://github.com/timelinize/timelinize https://news.ycombinator.com/item?id=45504973 https://perkeep.org/ https://github.com/perkeep/perkeep https://news.ycombinator.com/item?id=45896130

Links are Website, Repo, HN discussion

▲

sureglymop

1 month ago

[-]

Thank you for the links/resources! I've come across these in my research and they book look interesting. I've contemplated contributing to Timelinize but ultimately decided I want to create my own thing, also for educational purposes.

▲

solarkraft

1 month ago

[-]

I’m highly interested in this! Would love to see what you come up with!

▲

pwndByDeath

1 month ago

[-]

I've used Zim for this over the last 20 years. Its a mess but with search I can find what I'm looking for if I was good enough. For especially tricky trouble shooting I'll put in each theory and strike it out so at least if I stop documenting I'll see what didn't work before I gave up or succeeded and quit documenting.

▲

bryanhogan

1 month ago

[-]

Also love using Obsidian for this! Small suggestion, use the `aliases` property for alternative titles, I usually use them for a title that means the same thing but uses different keywords. Makes it easier to search for a note.

Although usually a bottom-up approach using automatically updating `Map of Content` notes (Bases) work well for me for finding content.

▲

user205738

1 month ago

[-]

For myself, I have developed a so-called default template for notes.

```

---

aliases:

  - <%tp.file.title%>

tags:

---

[[<%tp.file.creation_date("YYYY-MM-DD")%>]]

```

<%tp.file.title%> for aliases, it is necessary to always refer to the alias [[note|note alias]] notes in the text (if I refer without a pseudonym, then by accidentally or intentionally changing the name of the note, I can ruin the text in all places where it occurs

▲

user205738

1 month ago

[-]

There is also a template for different types of notes, which is selected when creating a note in a specific folder or creating it using QuickAdd.:

For example, when I add a link to the author to a book note and use keyboard shortcuts to create a note page for the author, the following template is used:

```md

---

aliases:

  - <%tp.file.title%>

tags:

- t3/books

- people

- t3/author

---

[[<%tp.file.creation_date("YYYY-MM-DD")%>]]

### Works

```base

views:

  - type: table

name: Table

    filters:

      and:

        - file.hasLink("<% tp.file.find_tfile(tp.file.folder(true) + "/" + tp.file.title + ".md").path %>")

- file.hasTag("t3/books") sort:

      - property: file.name

        direction: ASC

```

Tags in metadata do not need the # symbol, although you can use it if you enclose the entire tag in quotation marks.

▲

FearNotDaniel

1 month ago

[-]

I find one possible answer to the question “How to make yourself actually do it” is to start by getting into the routine of keeping an engineering notebook - if you are already in the habit of jotting down stream-of-consciousness notes on whatever you are working on at a given time, then Obsidian’s feature to “extract highlighted text into a new note” feature makes it blisteringly easy to file away things you are likely to want to repeat in the future.

▲

alexpotato

1 month ago

[-]

Shout out to Pinboard for making bookmarking pages and adding notes incredibly easy.

They have a bookmarklet that sits on my bookmarks toolbar and if I like a page/tweet/video etc I just hit the "Add pin", enter some tags and hit enter.

This works so well that I went through and bookmarked and tagged all of my LinkedIn connections as well (inspired by a post from Derek Sivers [1]).

People are generally amazed at how quickly I can go from talking about a subject to "oh, I have this article you would love" to "here it is!"

0 - https://pinboard.in

1 - https://sive.rs/dbt

▲

sucrose

27 days ago

[-]

I love Pinboard! I was just thinking today about how much I value that service.

▲

stavros

1 month ago

[-]

I made something to help me with this exact process:

https://www.stavros.io/posts/i-made-a-voice-note-taker/

I usually forget what steps I've taken, but using the recorder above, I can dictate short clips of the steps. An LLM assistant I've built takes the clips and adds them to my Joplin, which then gets published:

https://notes.stavros.io/

It's been extremely helpful for keeping logs.

▲

huijzer

1 month ago

[-]

In most cases, I just add a blog post for such things.

For example, Syncthing on Debian notes [1] or using Spleeter AI to remove background sound from a long audio track [2]. This is why I switched back from static site to a Wordpress-like site [3], so that I can quickly publish notes from my phone.

[1]: https://huijzer.xyz/posts/149/setup-a-syncthing-service-on-d...

[2]: https://huijzer.xyz/posts/146/installing-and-running-spleete...

[3]: https://github.com/rikhuijzer/fx

▲

WillAdams

1 month ago

[-]

Several times in my life, when I've needed to study/learn something, I just found or made a wiki on it, and categorized everything which I learned on it --- then when I had trouble recalling a fact, it was there in that structured site --- on the flip side, there almost certainly are folks who will aver that I ruined the Shapeoko CNC project wiki by using it as a personal notebook. Fortunately, @julien, a native French speaker from the Carbide 3D Community forums wanted to improve his English, so he made a gitbook:

https://shapeokoenthusiasts.gitbook.io/shapeoko-cnc-a-to-z/

which re-worked the essentials from that wiki, discarded the chaff, and has become a reference which a number of projects have re-purposed. I did resurrect the notes aspect on the /r/shapeoko wiki though.

Similarly, when I wanted to set up the ultimate commuter/long-haul mountain bike, I put down all the gear I learned about at:

http://old.reddit.com/r/bicyclegear/wiki

(probably out-of-date now, but I found the notes useful)

Unfortunately, I've lost access to the two e-mail archives from when I worked as a graphic designer/typographer --- really should have forwarded any notable e-mails (which I would have wanted to refer to later) to myself --- at least one of them wound up being printed out by a startup composition house and distributed to new employees.... maybe one of these days I'll finish the type composition book I was asked to write by an editor at a major publishing house.

For now, I've been working on:

https://willadams.gitbook.io/design-into-3d and https://github.com/WillAdams/gcodepreview

▲

isr

1 month ago

[-]

You may be aware of this already, but if not: it sounds like tiddlywiki was made specifically with you in mind.

▲

WillAdams

1 month ago

[-]

It would be, if I could discipline myself to carry it around on a thumb drive --- the one time I tried that, I left it in a computer at work and when I got back the next day, it was gone (really hated that job/workplace).

▲

Obscurity4340

1 month ago

[-]

If you're on iOS there's a neat little Tiddlywiki "client" called Quine

(I know Tiddlywiki is just an html file) but it syncs it and makes using it quite smooth. I think it can be synced with iCloud or whatever

▲

WillAdams

1 month ago

[-]

Since I have to have a Wacom EMR stylus on my devices, no iOS here (which I'm kind of bummed about) --- it kills me that I mislike the Apple Pencil (and that Apple won't make a stylus-enabled Mac, or a device w/ an e-ink screen).

▲

SanjayMehta

1 month ago

[-]

I use a couple of slim A7 notebooks, one is like a diary, it gets stored when it's full. The other is a hard copy of my memory of how to do things like squishing PDFs. I rarely have to look at things twice as the act of writing it down gives me enough context to remember it. But it's invaluable when I need it.

Tried Evernote and tagging and so on and it turns out cataloging stuff is hard, and the lazy recourse is to over-tag, and then I end up doing a brute force search.

▲

klondike_klive

1 month ago

[-]

I do this too. A7 is the sweet spot for me because it fits in my back pocket and so it's always there when I need it. I write the date started on the front cover and the same when I finish them and file them away. The hard part about using physical notebooks is resurfacing old info, I really need to work on processing and triage.

▲

vessenes

1 month ago

[-]

I’m on the other side of this - I’d like it to happen passively and in an automated fashion. Right now I’m playing around with porting concepts from zettelkasten (card based handwritten knowledge systems) to openclaw memory.

Rather than just coalescing to markdown files, the memory-zet plugin looks for actionable durable information and files it inside the existing zettelkasten system with embeddings - a quick no-LLM step (well 300m parameter query embed, it’s fast) is run against incoming chats or as a tool - this returns cards (zettels).

Zettels are somewhat unique in that the original methodology included a post-writing categorization and linking step - I have the system doing this as well. Result - cards can give you a (possibly cyclic) directed graph of connectivity. I built it for ‘centaur’ mode, so I can edit, link, unlink, move, etc through a nice little web interface.

The auto links are not the same quality I would make. But they are genuinely useful; upshot is for anything incoming, the LLM can see information directly about the query (if we have it), stuff that relates whether or not it embeds similarly, and can follow up links if they look promising with a fast tool call.

I made this memory system my daily driver yesterday; so far it is a significant improvement over the core memory extension (write to markdown files, don’t worry about compaction bro, it will be fine)!

It’s already building out people and organizational card bases for things that come in via email and whatsapp - this is a dream, basically. I think it will scale over time - but it’s at least scaling nicely over a few days of work right now.

▲

fransje26

1 month ago

[-]

Man, I only understood half of what you were describing, but it sounds fascinating. I you happen to find the time to do a write-up or share your workflow, I would love to read more about it.

▲

basch

1 month ago

[-]

You’re the first comment as I scrolled challenging the premise. But your solution amplifies what I see as a problem.

I’d like to add, that by forcing myself to look up the answer every time I have happy accidents where I learn new ways to do things.

It’s a skill to be willing to unlearn and always presume yourself ignorant, even if you do know how to do it. It’s like confirming “is my way still best practice.”

▲

vessenes

1 month ago

[-]

I agree with this. Like I'm the most paper forward person I know, by a lot. But, there's still a lot of information that comes digitally. FWIW, I did build out a paper integration system here that lets you print out cards to do analog management as well. In practice, printing cards isn't easy on the printers I've tested. But, I do like the analog research as well. It's just much more useful when it can incorporate digital information.

▲

megamorf

1 month ago

[-]

I'd love to hear more about this; have you considered creating a blog post, gist or something similar about it?

▲

zavec

1 month ago

[-]

This sounds fascinating, if you ever write more about it I'd love to read it!

▲

vessenes

1 month ago

[-]

Oh man, it’s on the extensive backlog. :) I’m just about to put it up for people to play with, though, so you can clone it and use it or just ask claude to tell you about it.

I think the essay will be something like: adding structure post-hoc lets you build intelligence into the datastore as an architectural matter, not just rely on connections being made during use-time inference, using an embedding with links like this is much different than bulk embedding search, and we need some sort of tests to understand if this helps in practice, although it a) feels pretty good and b) it’s VERY nice to be able to refer to and modify the agents “mid term” memory directly in any event.

Anyway you’ve triggered me enough to say I’ll try and get the repo published today so people can look at it.

▲

BOOSTERHIDROGEN

1 month ago

[-]

I’d love to see that too. thanks.

▲

aa-jv

1 month ago

[-]

I simply print to PDF, anything interesting I've read online. So now I've got 30+ years of my own private offline Internet experience.

Some 80,000+ files in a directory represents an awesome database of knowledge. "$ ls inux" to find anything Linux-related, etc.

One of these days I'll get around to setting up some ML tool that will tell me all the things I didn't already osmose from the archive .. and maybe long after I'm gone, in some hole in a wall of some grimy back alley somewhere, there'll be a ML version of me embedded in a brick, ready to have the conversation well into the future ..

▲

dewey

1 month ago

[-]

https://github.com/paperless-ngx/paperless-ngx might be a nice rabbit hole for you, drop the files in there and it'll be OCR'ed and searchable. There's also some AI projects you can give access to paperless to achieve your use case.

▲

aa-jv

1 month ago

[-]

Thanks for that - I've tried it off and on over the years and it looks like its turning into quite an effective tool these days - so I'll give it another try. I've considered building my own tools to manage these files, but its always nice to see the approach others are taking too ..

▲

EllaneW

1 month ago

[-]

I’m interested in your naming convention for these files (so many files!). Do they all have the date you saved them on, up front and centre?

▲

aa-jv

1 month ago

[-]

They have the datetime they were saved, and the filename is usually derived from the page title - this often needs a bit of manual tweaking on my part if the original author hasn't named it something sensible, but its a minor action to take. Every year or so I 'sanitize' the filenames of the archives, to make sure I can still search them with normal command-line (bash) tools, removing special % ^ # $ & chars, and so on ..

A lot of other info is stored in the PDF metadata, too ..

▲

an_am

1 month ago

[-]

I have a ChatGPT project categorized by work, cooking, and repairing.

Whenever I do something and realize I might need it in the future, I just store it on corresponding projects.

Seems to be serving well to me for some time.

▲

owenversteeg

1 month ago

[-]

That idea - using an audio recording of how you did the process/complex thing - is a great one. I have done it a few times in the past and it is perfect because it's very low effort and easy to get out a "stream of consciousness".

▲

mpalmer

1 month ago

[-]

Hah, my prediction that this was going to be about Bloom filters was a bad one. Nice article!

▲

eneveu

1 month ago

[-]

The cool thing when using AI (e.g. Claude Code) is that the conversation with the agent is saved, and you can retrieve from that convo the way you did things in the past. Not just the how, but also the why.

▲

emptysands

1 month ago

[-]

Long time ago (10+ years) I'd occasionally google something and find an older blog solution post on my own blog. It is hard to maintain both the revelance and the practice to maintain these over the course of life.

▲

brador

1 month ago

[-]

Gold solution would be a searchable folder of synced text files with year sub folders, backed up automatically in 123 config.

Anything else is a bandaid.

▲

gab007

1 month ago

[-]

I am using a combination of Tomboy (desktop), Tomdroid and Markor (mobile) to record info i need later. Simple and effective.

▲

eudamoniac

1 month ago

[-]

Why can't we put the full text of every page we've ever visited into a searchable local database?

▲

qznc

1 month ago

[-]

I recently (mostly vibe-) coded myself a Firefox addon which indexes every page I visit locally: https://codeberg.org/copacetic/where_did_i_read

▲

basch

1 month ago

[-]

Microsoft Recall wasn’t popular because it was Microsoft, but it is actually what people want.

▲

jll29

1 month ago

[-]

The mmap() call is actually not ISO C, but part of the IEEE/IEC POSIX.1 (2024) Standard (since about 1996?).

Check for yourself: mmap does not occur in the C standard document: https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definit...

▲

NiallBunting

1 month ago

[-]

Another thing I do for software projects is use the exact same make template [1].

I also try to add any other commands to it as they come up. So much easier to run 'make install' whenever I pull a project than have to remember the commands.

Even if I can't always add the process I will use a bunch of echo's to bring me through the steps.

1. https://niallbunting.com/commake/

▲

arjie

1 month ago

[-]

I just post these things on my blog and then use the built-in search engine. One thing I do with my email and Reddit comments is that I use the GDPR/CCPA data request forms and then run them through GPT Embedding and stick it in a sqlite/duckdb (there's no real difference for me between the two) and then put that on a recurring job where my claw can read from it via a skill. This has proven strangely useful.

I haven't found a way to automate this import of my data, but most of the magic is in the history not in the present. It really is incredible. I'll ask the claw to find what I said about the SFPD cruiser I once saw in the TL and boom! It's there! A mild annoyance with using my Mediawiki-based blog (which I chose because it has good support for allowing users to edit it) is that authoring is still a lot of work and I keep forgetting Draft namespace articles.

▲

indigodaddy

1 month ago

[-]

But how do we just make an AI do all of that from the article

▲

rpigab

1 month ago

[-]

The letter "y" is off, it's bugging me.

▲

nkydr0i0