It also seems like a huge effort to try to come up with a data model here for the normalized dataset. They mentioned it in the article as an aside but it seems like a pretty tough task.
And my last thought is perhaps there is a sadness in our loss of population level insight into health with the advent of modern privacy concerns. A big source for genomics data is the UK Biobank which ties all sorts of information to a genome. I’m sure that someone could come up with dangers that this presents but it’s been a gift to so many researchers over time, and to so many people who suffer from genetic disease. I hope that in time people will be willing to volunteer sufficient information to be able to do population-level science, even knowing the dangers inherent.
If you’re in the US, All of US accepts participation and I found that doing so was quite easy. They will give you all the scary warnings, which are good to consider but I hope many will find it worthwhile even knowing the risks.
The other day I came across this pricing dataset https://oria-data.trillianthealth.com/ (this is just for pricing though)
There must be some gem datasets like these - I wish I had the time (and expertise) to explore