- CT log monitoring (https://github.com/CaliDog/CertStream-Server)
- Mass-Scanning across ipv4 on 80/443 at the least?
- Brute-forcing subdomains on wildcards with large DNS wordlist (like something from assetnote: https://wordlists-cdn.assetnote.io/data/manual/best-dns-word...)
- Scraping/extracting subdomains/domains from JS
But I've never attempted to enumerate subdomains on this scale before, so I could be missing something obvious
This would be useful for early detection of potential impersonations/typo-squatting domains typically used for phishing/scams.
Something as simple as a configurable levenshtein distance/jaro-winkler similarity check across CN and SAN of all new certs maybe? (user can configure with threshold to control how "noisy" they want their feed).
Curious if you're running your own CertStream server, or just continuously polling known CT logs with your own implementation.
I'm not sure but I believe that's used by Google internally for testing purposes.
For example if you search google, it returns 120k+ results, and these useless results are at the front.
The goal is to have something exhaustive so I'll keep them. But you are right that I probably should not put them at front. Not sure how important it is though as these results shouldn't match many queries.
Minimizing storage was a priority for me since it's just a small side-project/automation.
I've looked for information on what the hell the `flowers-to-the-world` entries are that pop and have found nothing, curious what's going on there.
I found that back then when I wondered the same: https://medium.com/@hadfieldp/hey-ryan-c0fee84b5c39
Btw, you can get our feed like that:
curl -N 'https://api.merklemap.com/live-domains?no_throttle=true'
https://www.merklemap.com/search?query=marginalia.nu&page=1
doesn't catch the fact that I have like 20 viable subdomains for marginalia.nu.
Thanks for the tool.