I've been working on building a pipeline to create a DNS records database lately. The goal is to enable research as well as competitive landscape analysis on the internet.
The dataset for now spans around 4 billion records and covers all the common DNS record types:
A
AAAA
ANAME
CAA
CNAME
HINFO
HTTPS
MX
NAPTR
NS
PTR
SOA
SRV
SSHFP
SVCB
TLSA
TXT
Each line in the CSV file represents a single DNS record in the following format:
www.example.com,A,93.184.215.14Let me know if you have any questions or feedback!
Passive DNS [2] has been in my toolbox for 15+ years, and is invaluable for security research / threat intelligence. Knowing historical resolutions to something are so helpful in investigations.
For anyone interested, they should check out the talk by one of the DomainTools people [3] on how it can be utilized for investigation.
Are you passively collecting this data, or actively querying for these records?
[1] - https://www.domaintools.com/products/threat-intelligence-fee...
I wrote a documentation piece here:
I don't want to decompress 29 GB into 211 GB each time I want to make a search.
Except grep / zgrep, is there a good tool/viewer (or hex editor that can decompress parts of big files for display) for this general task?
staging.pannekoeken-poffertjes-restaurant-amstelland.nl,CNAME,www.pannekoeken-poffertjes-restaurant-amstelland.nl.
staging.pannekoeken-poffertjes-restaurant-amstelland.nl,CNAME,www.pannekoeken-poffertjes-restaurant-amstelland.nl.
www.domiciliatuempresa.com,CNAME,domiciliatuempresa.com.
www.domiciliatuempresa.com,CNAME,domiciliatuempresa.com.
*.autokozmetikakaposvar.hu,CNAME,autokozmetikakaposvar.hu.
*.autokozmetikakaposvar.hu,CNAME,autokozmetikakaposvar.hu.
c7ac691a.oob-nuq1907.indubitably.xyz,CNAME,oob-nuq1907.hosts.secretcdn.net.
c7ac691a.oob-nuq1907.indubitably.xyz,CNAME,oob-nuq1907.hosts.secretcdn.net.
etcI may improve that in future releases.
Also you can avoid unnecessary data with analyze CNAME records. -- domain.tld CNAME www.domain.tld -- So you can use only domain.tld or www.domain.tld records.
You need timestamps, or first / last seen.
Records don't exist in a vacuum. They come in RRsets. They are served (sometimes inconsistently) by different nameservers. Some use cases care about this.
Records which don't resolve are also useful, especially for use cases which amount to front-running. On any given day if the wind was blowing the right direction .belkin could be one of the top 10 non-resolving TLDs. If your data is any good, check under .cisco for stuff which resolves to 127.0.53.53. ;-)
Information about provenance (where the data comes from) is required for some use cases.
We shipped Farsight's DNSDB on one or more 1TB drives, depending on what the customer was purchasing.
This mean that from country A i can get records as X, but in country B records can be Y.
Would be great if you can make new column in CSV that can show about variations - Y/N.
Does it include expired domains?
I plan to do 2 releases a month for now, goal is one a day.
> Does it include expired domains?
Yes.
That's quite a fun project!
https://www.merklemap.com/documentation/how-it-works
Basically the same process here but using that data to perform DNS queries.