![]() |
CLC Bio |
Rocket
Science is for Kids (No offense to all the rocket scientists out there)
As we've discussed, informatics by itself is only
about as cool sounding as cutting grass. But we're not talking about just any
old informatics. We're talking about bio-informatics.
In this case, the coolness factor increases by the number of bases stored in
GenBank. You'll find out how many that is in a minute. For now, just know that
bioinformatics is really cool and really, really important for modern biotechnology. So important in fact, that without it, biotech wouldn't exist. My job
is to convince you that such is the case. We'll start by talking about DNA
sequence. We'll talk about where it comes from and what it's used for.
DNA = Bioinformation
As we discussed in previous posts, DNA is the base
information layer for all life forms. It indirectly dictates the structure of
proteins, RNA, and all chemical reactions that go on inside the cell membrane.
It was probably not abundantly clear what DNA looks like when it's inside a
cell. Is it floating around in little pieces or one big piece? Is it free
floating or compact? Is it organized or balled up in knots? Can it break apart
and change shape? Should you envision a ball of yarn, a bowl of noodles, or a
coughed up hairball?
It depends on the organism. In prokaryotes (bacteria) the genome is generally a large loop of DNA,
usually millions of bases long. A prokaryote can have several loops of DNA. The
main loop of DNA is called the "chromosome". Smaller loops are generally called
"plasmids". Large plasmids are "megaplasmids". No matter
what they're called, they're all just loops of DNA. The DNA can contain
stretches of protein-coding sequence (genes), important RNA sequences (tRNAs
are an example), or the sequences may do nothing at all.


When we talk about genome size, we generally refer
to the number of bases (As, Gs, Ts and Cs) that are strung together to make the
chromosomes in an organism. In yeast, there are over 12 million base pairs
(bps), or 12 megabases. In Yersinia pestis
(the bacteria responsible for the black plague) there are 4.6 mega base pairs
(Mbps). In humans, there are over 3 billion base pairs, about 3.08 giga base
pairs (Gbps). Genome size does not
necessarily correlate with intelligence or complexity. The largest genome size
is reportedly found in Amoeba dubia.
Its genome contains 670 billion base pairs! 670 Gbps, or 223 times larger than
your genome (that is of course assuming that you, the reader, are human).
Mountains of Bioinformation
If you, the reader, are not human, let me be clear
that in order to study a genome 670 billion bps long, a human researcher needs
a computer's help. To put it in perspective, 670 billion characters would
require 67 million sheets of paper with 5,000 characters printed on each side.
The stack of paper would be over 4 miles high. That's a single (though
granted—huge) genome. Every organism on earth has its own genome. Most
individuals on earth have slightly different genomes from each other. That's a
lot of data.
All that data
can be useful if we can get our hands on it. It is useful to know what sequence
of DNA codes for which proteins. If we know which DNA codes for Red Fluorescent
Protein (RFP) then we can put RFP into yeast to make our bread glow red (great
Halloween gag). On a more practical note, if we know what genes code for the
proteins in the artemisinin pathway, then we can put those in yeast and make
cheaper malaria drugs. If we know which base pair mutated in a cancerous cell or a cycstic fibrosis patient, we can customize their treatment.
Bioinformatics is about organizing and studying bioinformation: predicting genes and proteins from DNA sequence, predicting protein function, predicting protein shape, designing new proteins and DNA sequences for bioengineering.
No comments:
Post a Comment
We are always glad when someone catches a mistake, has more to add, or just likes our work. Let us know about it!