How to sequence your DNA for <$2k
125 points
8 hours ago
| 16 comments
| maxlangenkamp.substack.com
| HN
teekert
6 hours ago
[-]
We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.

Yes it requires chopping the genome opening small(er) pieces (than with Nanopore sequencing) and then reconstructing the genome based on a reference (and this has its issues). But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).

Nanopore devices are truly cool, small and comparatively cheap though, and you can compensate for the error rate by just sequence everything multiple times. I’m not too familiar with the economics of this approach though.

With sbs technology you could probably sequence your whole genome 30 times (a normal “coverage”) for below 1000€/$ with a reputable company. I’ve seen 180$, but not sure if I’d trust that.

reply
bonsai_spool
5 hours ago
[-]
> But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).

There is no reason for Nanopore to supplant sequencing-by-synthesis for short reads - that's largely solved and getting cheaper all the while.

The future clinical utility will be in medium- and large-scale variation. We don't understand this in the clinical setting nearly as well as we understand SNPs. So Nanopore is being used in the research setting and to diagnose individuals with very rare genetic disorders.

(edit)

> We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.

I also strongly disagree.

SBS is very reliable but it's common (if Toyota is the most popular car, does that mean we're in the Toyota internal combustion era? Or can Waymo still matter despite its small footprint?).

Novelty in sequencing is coming from ML approaches, RNA-DNA analysis, and combining long- and short-read technologies.

reply
teekert
5 hours ago
[-]
I agree with you. Long reads lead to new insights and over time to better diagnoses by providing better understanding of large(r) scale aberrations, and as the tech gets better will be able to do so more easily. But is really not there yet. It’s mostly research and somehow it’s not really improving as much as hoped, I get the feeling.
reply
Metacelsus
5 hours ago
[-]
>you can compensate for the error rate by just sequence everything multiple times.

Usually, but sometimes the errors are correlated.

Overall I agree, short read sequencing is a lot more cost effective. Doing an Illumina whole genome sequence for cell line quality control (at my startup) costs $260 in total.

reply
Danjoe4
1 hour ago
[-]
Nanopore is good for hybrid sequencing. You can align the higher quality illumina reads against its longer contiguous reads
reply
BobbyTables2
2 hours ago
[-]
I’ve always wondered how the reconstruction works.

It would be difficult to break a modest program into basic blocks and then reconstruct it. Same with paragraphs in a book.

How does this work with DNA?

reply
bonsai_spool
2 hours ago
[-]
This is very easily googled. There are new algorithmic advances for new kinds of sequencing data but this is the key (from the 70s)

https://en.wikipedia.org/wiki/Burrows–Wheeler_transform

reply
Onavo
4 hours ago
[-]
You can get it pretty damn cheap if you are willing to send your biological data overseas. Nebula genomics and a lot of other biotechs do this by essentially outsourcing to China. There's no particular technology secret, just cheaper labor and materials.
reply
Aurornis
7 hours ago
[-]
Interesting concept, but between the broken hardware and the way they gave up before getting anything useful this article was rather disappointing:

> Another problem was our flow cell was malfunctioning from the start — only 623 out of 2048 pores were working.

Is this normal for the machine? Is there a better write up somewhere where they didn’t give up immediately after one attempt?

reply
homeless_engi
6 hours ago
[-]
Hi, believe it or not, I have actually done what the authors were attempting. I used saliva rather than blood as a source of DNA and extracted it using a Qiagen kit.

My Nanopore flow cell had nearly every pore working from the start. So I would say that is not normal. Maybe it was stored incorrectly.

reply
LolWolf
49 minutes ago
[-]
Do you have a write up somewhere? If not, it would be amazing if you wrote one!

I was planning on doing a similar thing (also with saliva) once I finished moving in and had a bit more time after conferences. (But, of course, I’d have to go through and actually figure out all of the mechanics and so on.)

reply
MillironX
2 hours ago
[-]
> Is this normal for the machine?

No, it's not "normal," but it is fairly common. When I worked in NGS, nearly 1/4 of flow cells were duds. ONT used to have a policy where you could return the cell and get a new one if it failed its self-test.

reply
sbassi
7 hours ago
[-]
it depends of the sample. usually you have at least 1200, with a guaranteed of at least 800, so maybe he could ask for a refund.
reply
refurb
2 hours ago
[-]
Like most analytical methods, the preparation of the sample is key. High quality output comes with careful sample prep so that the analytical process can run optimally.
reply
dunk010
7 hours ago
[-]
Nebula and Dante will do this for like $300, and you can get 30x coverage at every base or even 100x coverage if you pay a little more. The $1000 genome was here more than a decade ago.
reply
zaptheimpaler
7 hours ago
[-]
I wanted to try this, but I looked into Nebula a bit more.

Nebula is facing a class action for apparently disclosing detailed genomic data to Meta, Microsoft & Google. The subreddit is also full of reports of people who never received their results years after sending their kits back. There are also concerns about the quality of sequencing and false positives in all DTC genomics testing. Given what happened with 23andme as well and all of this stuff, I'm wary of sending my genetic data to any private company.

reply
mquander
6 hours ago
[-]
I was interested to read this because some time ago I had my genome sequenced by Nebula. If you look at the lawsuit you can see that what Nebula did was use off-the-shelf third-party analytics products on their website, including recording analytics pings when users buy a kit, and pings when users use the Nebula website to browse Nebula's high-level analysis of their traits (leaking that the user has those traits to the analytics provider.)

This behavior represents a contemptible lack of respect for users' privacy, but it's important to distinguish it from Nebula selling access to users' genomes.

https://www.classaction.org/media/portillov-nebula-genomics-...

reply
zaptheimpaler
5 hours ago
[-]
That's a good clarification. I read through some of that link, and it does look relatively benign - Meta & Google pixels might see when you buy a kit but nothing more, but on page 21 they directly leaked genetic information to Microsoft via their Clarity tracker. Not intentionally maybe, questionable if it can be linked to a person specifically instead of just an advertising ID but they did leak that. I think the lawsuit says that even disclosing whether a person has undergone genetic testing is in violation of GIPA, so the information they sent to all 3 is enough to violate that.

I don't have any evidence they're selling anything but that lawsuit shows pretty sloppy behaviour for a company that should be thinking very deeply about privacy. I guess that's about what you said though :)

reply
busterarm
5 hours ago
[-]
The point isn't what they are doing with your data now, but that they retain your data and what might happen in the future. Someone with malicious designs on your DNA might buy Nebula tomorrow and there's nothing you can do about it.
reply
mquander
3 hours ago
[-]
Actually, the main reason I used Nebula was that they advertised a credible-to-me promise that you could download and permanently delete your data upon request. That was some years ago, so I don't know if I would trust them today. But that was their claim, and I have no reason to believe they didn't delete my data.
reply
Aurornis
7 hours ago
[-]
> There are also concerns about the quality of sequencing and false positives in all DTC genomics testing.

Even when the raw results are accurate there is a cottage industry of consultants and snake-oil sellers pushing bad science based on genetic testing results.

Outside of a few rare mutations, most people find their genetic testing results underwhelming or hard to interpret. Many of the SNPs come with mild correlations like “1.3X more likely to get this rare condition” which is extremely alarming to people who don’t understand that 1.3 times a very small number is still a very small number.

The worst are the consultants and websites that take your files and claim to interpret everything about your life or illness based on a couple SNPs. Usually it’s the famous MTHFR variants, most of which have no actual impact on your life because they’re so common. Yet there are numerous Facebook groups and subreddits telling you to spend $100 on some automated website or consultant who will tell you that your MTHFR and COMT SNPs explain everything about you and your ills, along with which supplements you need to take (through their personal branded supplement web shop or affiliate links, of course).

reply
phyzome
6 hours ago
[-]
Yeah, the only way I would ever do DNA sequencing is anonymously...
reply
jjallen
5 hours ago
[-]
Because of public family trees potentially linking a genome to a family, no dna is fully anonymous these days.
reply
freehorse
7 hours ago
[-]
Yeah but then basically somebody else gets ownership of your genetic data and gets the right to do anything with it in the context of their "legitimate interests". Not to mention to probability of that company getting hacked or sold, as it has already happened with some.
reply
sbassi
7 hours ago
[-]
yes, the difference here is that the $1000 tag is "at-scale price". You reach that price point by running multiple sequencing with a set of reactive.
reply
subroutine
6 hours ago
[-]
Does Nebula or Dante provide BAM or just VCF?
reply
Metacelsus
6 hours ago
[-]
Both do. I got mine through Dante, my wife through Nebula.
reply
conradev
6 hours ago
[-]
Dante includes a BAM
reply
kyriakos
26 minutes ago
[-]
What is the practical use of having your dna sequenced?
reply
jasongill
7 hours ago
[-]
Unfortunately, the "MinION Starter Kit" for $1000 appears to no longer be available; the link in the article to the kit goes to a 404 page, and the cheapest MinION device with flow cells is now $4950 USD
reply
jolmg
7 hours ago
[-]
Article was posted 2 days ago...
reply
greazy
6 hours ago
[-]
The article author probably bought the starter kit a while ago. It might explain why the pore count was low. It's a biological product so it degrades over time.
reply
numpad0
5 hours ago
[-]
These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug.

The brochures always showed it next to a completely non-sterile laptop, but it never made sense. It's fundamentally a bio lab equipment, just small. You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.

reply
bonsai_spool
5 hours ago
[-]
> The brochures always showed it next to a completely non-sterile laptop

This can be done in the field (read near a lot of dirt). This does not require sterility at all. The main problems with this are keeping your prep clean (which is different from sterile; primarily involves not getting bubbles where they shouldn't be etc.) and temperature/salt handling.

> These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug. > You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.

The consumable product is what needs to be stored carefully. Its delivered DNA-free; no disinfectant is needed. It's actually hard for accidental DNA to be introduced at the sequencing step; that would usually reflect poor practices earlier on.

reply
arjie
2 hours ago
[-]
I used Nebula (seems to be rebranded and more expensive now) for my wife and me, and for my parents and brother, and it was pretty straightforward. I paid for the 'lifetime' plan but they removed it before we did it for anyone else and it was pretty reasonable. I downloaded the FASTQ files and stuck it in an R2 bucket for myself. Nebula cost about $250 and there's a monthly $50 or something plan that's compulsory but you can cancel it right away.

If you're curious about my genome, here are my VCF files https://my.pgp-hms.org/profile/hu81A8CC

If you want to indulge your curiosity some more:

     $ rg "20189511" /Users/george/tmp/genome/nebula_roshan_NG1AW8W7PU.mm2.sortdup.bqsr.hc.vcf
     3499829:chr13 20189511 rs104894396 C T 252.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.54;ClippingRankSum=0.00;DB;DP=25;ExcessHet=3.0103;FS=4.008;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=10.11;ReadPosRankSum=0.666;SOR=0.160 GT:AD:DP:GQ:PL 0/1:15,10:25:99:281,0,436
Put that into an LLM or look it up here https://www.snpedia.com/index.php/Rs104894396 to find out which pathogenic mutation I am heterozygous for.

In practice, when my wife and I did carrier screening we didn't do it with Nebula, but carrier screening also confirmed that we had GJB2-related hearing loss genes in common. The embryos of our prospective children were also sequenced so that we could have a child without the condition.

Anyway, if you'd like a test file of a real human to play with, there's mine (from Nebula) for you to take a look at. If you use an LLM you can have some fun looking at this stuff (you can see I'm a man because there are chrY variants in there).

I also used Dante because I wanted to compare the results of their sequencing and variant calling. Unfortunately, they have a different way to tie the sequence back to the user (you take the code they have and keep it safe, nebula has you put the stuff in a labeled container so it's already mapped by them) and I was in a hurry with other stuff. They never responded to me with any assistance on the subject - not even to refuse the request to get the code for that address - so I have no idea how they work.

The nanopore stuff is very cool, but I heard (on Twitter) there were quality control issues with the devices. I'd love to try it some time later just to line it up with my daughter's genome.

reply
greazy
6 hours ago
[-]
The thermocycler replacement using an electric kettle is hilarious. Thats how old school dna amplification would happen before the invention of thermocyclers.

OP you'd get better results of you centrifuge your blood, extract the white blood cells and sequence those instead of whole blood. Thats a bit tricky with a lance and a tiny device though...

reply
optionalsquid
6 hours ago
[-]
It's cool that nanopore technologies are getting this affordable, but keep in mind that these technologies (to my knowledge) still have very high error rates compared older sequencing techniques. Both in terms of individual nucleotides (A, C, G, and Ts) being misread, but also in terms of stretches of nucleotides being mistakenly added to or removed from the resulting sequences (indels).

So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results

reply
greazy
6 hours ago
[-]
With the recent R10 flow cells the error rate has improved. The basecalling models have also been steadily improving and therefore reducing the error rate.

For assembling a bacterial genome the consensus error rate is as low or in some cases better than Illumina.

Nanopore platform has its usecases that Illumina falls short on.

> So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results

Agreed, any at home sequencing should not be used to draw any conclusions.

reply
Ovah
6 hours ago
[-]
That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.
reply
optionalsquid
6 hours ago
[-]
The author got less than 1x coverage for their efforts. To get the kind of coverage required for reliable base-calls, you need significantly higher coverage, and therefore a significantly higher spend
reply
bonsai_spool
5 hours ago
[-]
> That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.

That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.

reply
optionalsquid
5 hours ago
[-]
> That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.

Whole-genome shotgun sequencing is pretty cheap these days.

The person you are replying to doesn't give any specific numbers, but in my experience, you aim for 5-20x average coverage for population level studies, depending on the number of samples and what you are looking for, and 30x or higher for studies where individuals are important.

For context, coverage refers to the (average) number of resulting DNA sequences that cover a given position in the target genome. Though there is of course variation in local coverage, regardless of your average coverage, and that can result in individual base-calls being being more or less reliable

reply
bonsai_spool
5 hours ago
[-]
I’m referring to the experiment done in the OP - the most I’ve read about from an minION flow cell is 8 Gb (and this is from cell line preps with tons of DNA, so the coverage isn’t great).

You need multiple flow cells or a higher capacity flow cell to get anything close to 1X on an unselected genome prep.

Shotgun sequencing isn’t probably what you meant to say - this is all enzymatic or, if it’s sonicated, gets size selected.

reply
optionalsquid
4 hours ago
[-]
What the person you replied to described read like short read sequencing with PCR amplification to me ("each segment of DNA is read many times over"), rather than nanopore sequencing. My reply to you was written based on that (possibly false) assumption.

But if we are talking nanopore sequencing, then yes, you need multiple flowcells. Which is not a problem if you are not a private person attempting to sequence your own genome on the cheap

reply
bonsai_spool
3 hours ago
[-]
There wasn’t enough information to tell (on my 1 minute scan) which nanopore kit was used, but the presence of PCR does not imply short reads.

You can do nanopore PCR/cDNA workflows right up to the largest known mRNAs (13kb).

Edit:

I’m not sure if you’re saying that you can’t do a 5/20/30X genome on nanopore - that’s also not true. It only makes sense in particular research settings, of course.

reply
IceHegel
6 hours ago
[-]
Who can do this with good data controls? I don't want to have to dig through the fine print of some Terms of Service page to figure out if a sequencing company is going to save a copy of my genetic code for possible future use.
reply
greazy
6 hours ago
[-]
I sequences my genome about 10 year's ago using illumina platform for ~1200AUD. We used a university sequencing facility. They were happy to extract and sequence the dna using a shotgun approach. Depth was 5x and I think we achieved about 90% coverage. It was just for fun.

The issue with this approach is that you'll receive raw data that needs to be processed. Even after processing you'll need to do further analysis to answer your questions. After all this, I'd be suspicious of the results and seek a medical councellor to discuss and perform further tests.

I'd advise on thinking what questions you want answered. 'Sequencing your genome' sounds amazing but imo you're better off with seeking accredited tests with acrionable results.

reply
FL33TW00D
5 hours ago
[-]
Dante and Nebula have a bad reputation. ySeq has an 8 month wait list. This guys Nanopore sequencer doesn’t work.

It is quite hard to get yourself sequenced in EU in 2025.

reply
coppa
7 hours ago
[-]
Speaking of which I would advise : Svante Pääbo Neanderthal Man: In Search of Lost Genomes then even better imho The Naked Neanderthal by Ludovic Slimak. After these books I spent many hours listening to the full courses of Jean-Jacques Hublin, chaire Paléoanthropologie in college de France ( in french but probably translatable now with automatic features ?). This was an unexpected and wonderful path.
reply
nashashmi
6 hours ago
[-]
If I have my genome dna data, where can I get it analyzed? For ancestry? For health info? Etc. of course With privacy!
reply
Real_S
4 hours ago
[-]
Take a look at Monadic DNA:

https://monadicdna.com/

They are building Fully Homomorphic Encryption (FHE) and Multiparty Computation (MPC) tools for genetic data. Your data format may need to be modified. They currently focus on the SNP results from places like Ancestry.

Some HN posts from their CEO:

https://news.ycombinator.com/submitted?id=vishakh82

reply
isbvhodnvemrwvn
5 hours ago
[-]
Forget any use for ancestry with privacy guarantees. All you'll get is magic "ethnicity" percentages, kind of astrology of genealogy. For it to be useful in genealogy context you need to rely on matching and analyzing common ancestors, this will inherently lead to your data being shared in one way or another and possibly your identity being revealed.
reply
cariaso
3 hours ago
[-]
reply
shevy-java
4 hours ago
[-]
Wasn't the cost a few years ago below 1000 already?
reply
jaberjaber23
5 hours ago
[-]
Nanopore’s getting closer
reply
7e
7 hours ago
[-]
Just wait for the Nebula Black Friday sale.
reply
pixelpoet
7 hours ago
[-]
> ‘Sequencing by synthesis’. instead of chopping up and separating each base pair through a gel lattice, we [cuts off]

k

> 200 µL of blood (about ⅕ of a ml)

"About"? Anyway, thanks for the clarification.

reply
NuclearPM
6 hours ago
[-]
Maybe the “about” was supposed to cover the 200 µL as well.
reply