CLC Bio |
Rocket
Science is for Kids (No offense to all the rocket scientists out there)
As we've discussed, informatics by itself is only
about as cool sounding as cutting grass. But we're not talking about just any
old informatics. We're talking about bio-informatics.
In this case, the coolness factor increases by the number of bases stored in
GenBank. You'll find out how many that is in a minute. For now, just know that
bioinformatics is really cool and really, really important for modern biotechnology. So important in fact, that without it, biotech wouldn't exist. My job
is to convince you that such is the case. We'll start by talking about DNA
sequence. We'll talk about where it comes from and what it's used for.
DNA = Bioinformation
As we discussed in previous posts, DNA is the base
information layer for all life forms. It indirectly dictates the structure of
proteins, RNA, and all chemical reactions that go on inside the cell membrane.
It was probably not abundantly clear what DNA looks like when it's inside a
cell. Is it floating around in little pieces or one big piece? Is it free
floating or compact? Is it organized or balled up in knots? Can it break apart
and change shape? Should you envision a ball of yarn, a bowl of noodles, or a
coughed up hairball?
It depends on the organism. In prokaryotes (bacteria) the genome is generally a large loop of DNA,
usually millions of bases long. A prokaryote can have several loops of DNA. The
main loop of DNA is called the "chromosome". Smaller loops are generally called
"plasmids". Large plasmids are "megaplasmids". No matter
what they're called, they're all just loops of DNA. The DNA can contain
stretches of protein-coding sequence (genes), important RNA sequences (tRNAs
are an example), or the sequences may do nothing at all.
There is no nucleus in prokaryotes. The genome
floats around the cell along with all the proteins and other cellular
components. It is not a tangled mess however. There are special proteins that
help with genome organization. Stretches of DNA wrap around these proteins, the
way we wrap unruly electronics cords around plastic cores to organize them.
When viewing the genome through a microscope, it does not look like a loop of
DNA. In prokaryotes, it looks far more like a daisy chain, with some tightly
bunched portions and other open segments.
In eukaryotes the organization is different. Eukaryotes include single-celled organisms like
yeast, and all multi-cellular organisms you can think of, such as komodo
dragons and penguins. In eukaryotes, the genome is contained in the nucleus,
which is a separate compartment within the cell that separates the
transcription and translation processes from other parts of the cell. The genome
is usually found on very long, linear stretches of DNA. These linear pieces of
DNA are also called "chromosomes". The eukaryotic genome is also
highly organized, with specific proteins whose function is solely to keep the
DNA organized and compact. Under the microscope, a eukaryotic genome can look
like a bowl of spaghetti if the genome is actively being transcribed. If the
genome is compact, the chromosomes will look like lots of little worms. The
classic "X" shape you may have seen in textbooks is a view of the
chromosomes when they have been duplicated during cell replication. The two
chromosomal copies are in a compact form, and joined in the middle (the joint
is called the centromere).
When we talk about genome size, we generally refer
to the number of bases (As, Gs, Ts and Cs) that are strung together to make the
chromosomes in an organism. In yeast, there are over 12 million base pairs
(bps), or 12 megabases. In Yersinia pestis
(the bacteria responsible for the black plague) there are 4.6 mega base pairs
(Mbps). In humans, there are over 3 billion base pairs, about 3.08 giga base
pairs (Gbps). Genome size does not
necessarily correlate with intelligence or complexity. The largest genome size
is reportedly found in Amoeba dubia.
Its genome contains 670 billion base pairs! 670 Gbps, or 223 times larger than
your genome (that is of course assuming that you, the reader, are human).
Mountains of Bioinformation
If you, the reader, are not human, let me be clear
that in order to study a genome 670 billion bps long, a human researcher needs
a computer's help. To put it in perspective, 670 billion characters would
require 67 million sheets of paper with 5,000 characters printed on each side.
The stack of paper would be over 4 miles high. That's a single (though
granted—huge) genome. Every organism on earth has its own genome. Most
individuals on earth have slightly different genomes from each other. That's a
lot of data.
All that data
can be useful if we can get our hands on it. It is useful to know what sequence
of DNA codes for which proteins. If we know which DNA codes for Red Fluorescent
Protein (RFP) then we can put RFP into yeast to make our bread glow red (great
Halloween gag). On a more practical note, if we know what genes code for the
proteins in the artemisinin pathway, then we can put those in yeast and make
cheaper malaria drugs. If we know which base pair mutated in a cancerous cell or a cycstic fibrosis patient, we can customize their treatment.
Bioinformatics is about organizing and studying bioinformation: predicting genes and proteins from DNA sequence, predicting protein function, predicting protein shape, designing new proteins and DNA sequences for bioengineering.
No comments:
Post a Comment
We are always glad when someone catches a mistake, has more to add, or just likes our work. Let us know about it!