Bioinformatics: DNA = Bioinformation

CLC Bio


Rocket Science is for Kids (No offense to all the rocket scientists out there)
As we've discussed, informatics by itself is only about as cool sounding as cutting grass. But we're not talking about just any old informatics. We're talking about bio-informatics. In this case, the coolness factor increases by the number of bases stored in GenBank. You'll find out how many that is in a minute. For now, just know that bioinformatics is really cool and really, really important for modern biotechnology. So important in fact, that without it, biotech wouldn't exist. My job is to convince you that such is the case. We'll start by talking about DNA sequence. We'll talk about where it comes from and what it's used for. 


DNA = Bioinformation

As we discussed in previous posts, DNA is the base information layer for all life forms. It indirectly dictates the structure of proteins, RNA, and all chemical reactions that go on inside the cell membrane. It was probably not abundantly clear what DNA looks like when it's inside a cell. Is it floating around in little pieces or one big piece? Is it free floating or compact? Is it organized or balled up in knots? Can it break apart and change shape? Should you envision a ball of yarn, a bowl of noodles, or a coughed up hairball? 


It depends on the organism. In prokaryotes (bacteria) the genome is generally a large loop of DNA, usually millions of bases long. A prokaryote can have several loops of DNA. The main loop of DNA is called the "chromosome".  Smaller loops are generally called "plasmids". Large plasmids are "megaplasmids". No matter what they're called, they're all just loops of DNA. The DNA can contain stretches of protein-coding sequence (genes), important RNA sequences (tRNAs are an example), or the sequences may do nothing at all. 


There is no nucleus in prokaryotes. The genome floats around the cell along with all the proteins and other cellular components. It is not a tangled mess however. There are special proteins that help with genome organization. Stretches of DNA wrap around these proteins, the way we wrap unruly electronics cords around plastic cores to organize them. When viewing the genome through a microscope, it does not look like a loop of DNA. In prokaryotes, it looks far more like a daisy chain, with some tightly bunched portions and other open segments. 


In eukaryotes the organization is different. Eukaryotes include single-celled organisms like yeast, and all multi-cellular organisms you can think of, such as komodo dragons and penguins. In eukaryotes, the genome is contained in the nucleus, which is a separate compartment within the cell that separates the transcription and translation processes from other parts of the cell. The genome is usually found on very long, linear stretches of DNA. These linear pieces of DNA are also called "chromosomes". The eukaryotic genome is also highly organized, with specific proteins whose function is solely to keep the DNA organized and compact. Under the microscope, a eukaryotic genome can look like a bowl of spaghetti if the genome is actively being transcribed. If the genome is compact, the chromosomes will look like lots of little worms. The classic "X" shape you may have seen in textbooks is a view of the chromosomes when they have been duplicated during cell replication. The two chromosomal copies are in a compact form, and joined in the middle (the joint is called the centromere). 


When we talk about genome size, we generally refer to the number of bases (As, Gs, Ts and Cs) that are strung together to make the chromosomes in an organism. In yeast, there are over 12 million base pairs (bps), or 12 megabases. In Yersinia pestis (the bacteria responsible for the black plague) there are 4.6 mega base pairs (Mbps). In humans, there are over 3 billion base pairs, about 3.08 giga base pairs (Gbps).  Genome size does not necessarily correlate with intelligence or complexity. The largest genome size is reportedly found in Amoeba dubia. Its genome contains 670 billion base pairs! 670 Gbps, or 223 times larger than your genome (that is of course assuming that you, the reader, are human).

 Mountains of Bioinformation

If you, the reader, are not human, let me be clear that in order to study a genome 670 billion bps long, a human researcher needs a computer's help. To put it in perspective, 670 billion characters would require 67 million sheets of paper with 5,000 characters printed on each side. The stack of paper would be over 4 miles high. That's a single (though granted—huge) genome. Every organism on earth has its own genome. Most individuals on earth have slightly different genomes from each other. That's a lot of data.


 All that data can be useful if we can get our hands on it. It is useful to know what sequence of DNA codes for which proteins. If we know which DNA codes for Red Fluorescent Protein (RFP) then we can put RFP into yeast to make our bread glow red (great Halloween gag). On a more practical note, if we know what genes code for the proteins in the artemisinin pathway, then we can put those in yeast and make cheaper malaria drugs. If we know which base pair mutated in a cancerous cell or a cycstic fibrosis patient, we can customize their treatment.

Bioinformatics is about organizing and studying bioinformation: predicting genes and proteins from DNA sequence, predicting protein function, predicting protein shape, designing new proteins and DNA sequences for bioengineering.



No comments:

Post a Comment

We are always glad when someone catches a mistake, has more to add, or just likes our work. Let us know about it!