Assembly-Solving Really Big Puzzles

More than one way to Skin a Genome
The Human Genome Project was really a race between two projects, both attempting to assemble the entire human genetic sequence. The older, classical method used by the government's team worked like this:![]() |
Colony picker robot. |
- Get DNA from a person
- Break it into pieces
- Clone pieces into plasmids
- Put plasmids in bacteria
- Isolate individual bacterial colonies (each colony has one piece of human DNA on a plasmid)
- Sequence each plasmid from each isolated colony
- Put pieces together
Shotgut Sequencing
"He puzzled and puzzed 'til his puzzler was sore ..." - Dr. Suess

- Get DNA from a person
- Break it into pieces
- Sequence all the pieces at once
- Put them back together using a complicated computer program (the first assembler)
Reproduced from reference 1. |
Example Assembly
Check out this example:
Sequence 1: AATTCGTCGTCGCTCG
Sequence 2: CGAATCGTCGCAATTC
These sequences overlap, like so:
CGAATCGTCGCAATTC
AATTCGTCGTCGCTCG
and can be combined into a single sequence:
CGAATCGTCGCAATTCGTCGTCGCTCG
This was done over, and over and over until many of the small pieces were swallowed up into bigger pieces (bigger sequences are called "contigs" in bioinformatese). Since it is possible that two pieces could overlap by random chance, to diminish the possibility of overlapping two pieces by accident, overlaps had to be big (bigger than the 5 bps in our example). Eugene Myers' team nicknamed different sized contigs: small = rock, smaller = stone, smaller = pebble. By joining contigs one by one, the whole genome could be reconstructed.
The idea is simple enough, right? It's not that the concepts are too difficult that makes this hard in real life; It's the sheer, overwhelming amounts of data. The human genome is 3 BILLION bps long! Craig Venter's team needed a supercomputer to run the software to assemble the Human Genome. Assembly is now a commonplace part of biology, and there are many genomes much larger than 3 billion bps. No wonder CLC Bio has the saying: "Rocket Science is for kids, Bioinformatics is for scientists".
Rocket Science is for Kids - Try Bioinformatics
For more juicy details about the people and science involved in the Human Genome Project, check out this book:
The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the Worldby James Shreeve
No comments:
Post a Comment
We are always glad when someone catches a mistake, has more to add, or just likes our work. Let us know about it!