Student Portal:

Genes and Proteins


Genes and Proteins: Introduction

There are four main types of biological molecules: carbohydrates, proteins, nucleic acids, and lipids. This CELL concentrates on two of these – proteins and nucleic acids. However, before focusing on proteins and nucleic acids, lets first take a quick look at the other two major classes of biomolecules, carbohydrates, and lipids.


Carbohydrates consist mainly of sugars and starches. The name carbohydrates comes from the fact that these molecules are composed mainly of carbon, hydrogen, and oxygen atoms. Sometimes carbohydrates are referred to simply as “carbs”.

As the above illustration shows, there are several different classes of carbohydrates including monosaccharides, disaccharides, and polysaccharides. The monosaccharide glucose is a very important sugar for energy metabolism. Disaccharides are formed by two monosaccharide sugars chemically bonded together. Perhaps the most common disaccharide, sucrose (table sugar), consists of the monosaccharide sugars glucose and fructose. Finally, long chains of monosaccharides bind together to make up the very large starch molecules that are involved in carbohydrate storage by cells.

Carbohydrates are extremely important dietary components. However, as the above illustration suggests, there are “good” carbohydrates and “bad” carbohydrates available for human consumption. While occasional consumption of “bad” carbohydrates is not disastrous, we should try to limit most of our carbohydrate intake to the “good” carbohydrate category.


Lipids (Fats and Oils)

Lipids consist of fats and oils. While the word “fat” doesn’t sound healthy at all, it is essential that our bodies consume a certain amount of fat for proper nutrition.

Of the lipids we need to control our consumption of are cholesterol, fatty acids, and triglycerides. While each of these biomolecules are essential for proper metabolism, uncontrolled consumption of fats can lead to a very serious cardiovascular disease known as arteriosclerosis.

In arteriosclerosis, lipid deposits combine with other blood components, such as calcium, to form what are know as atherosclerotic plaques on the interior lining of blood vessels. These fatty plaques can restrict the flow of blood to various parts of the body. For example, heart vessels, like the coronary arteries, can restrict blood and oxygen flow to heart muscle causing chest pain when one exerts oneself (angina pectoris).

Even worse, narrowed blood vessels may become completely and suddenly clogged and blocked. This can result in a stroke if the event occurs in the brain. If the blockage event occurs in heart vessels, this can cause an immediate cardiac event or heart attack. Consequently, while “good” lipids and fats are important for a well-balanced diet, consumption of unlimited “bad” fats can lead to very serious health risks.


Nucleic Acids and Proteins

Nucleic acids and proteins are both organic molecules, as are carbohydrates and lipids. Both nucleic acids and proteins are also polymers. A polymer is a long, chain-like molecule of similar or identical repeating units called monomers.

Proteins are polymers of amino acids or polypeptides. They are called polypeptides because it is a peptide-bond that holds the individual subunits of proteins (the amino acids) together. There are 21 different amino acids. The composition and sequence of the amino acids in a protein give it it’s unique function.

Proteins are very large biological molecules that control almost all functions of a cell. Proteins provide structure, catalyze chemical reactions, transport materials, regulate functions, and act as signals within and between cells.

From a dietary perspective, we obtain protein and amino acids__________


Nucleic acids are polymers of nucleotide. A nucleotide consists of a phosphate group, a pentose (5 carbon) sugar, and a nitrogenous base. Nucleic acids differ from each other in the type of sugar and the types of nitrogenous bases that they contain. There are two types of nucleic acids, DNA and RNA.

Deoxyribonucleic acid (DNA) contains nucleotides that consist of the sugar deoxyribose. The four nitrogenous bases found in DNA are the pyrimidines, cytosine and thymine, and the purines, adenine and guanine. DNA is a double-stranded nucleic acid consisting of two nucleotide strands. These strands form a double helix with the sugar phosphate backbones on the outside and nitrogenous base pairs on the inside. The helix is held together by hydrogen bonds and van der Waals forces between base pairs.

The two strands of DNA are complementary. Because of the molecular structure of the bases, adenine always pairs with thymine and cytosine always pairs with guanine. Because the strands are complementary to each other, the strands can serve as templates for each other. This characteristic of DNA makes replication, the process by which copies of DNA are made, possible.

DNA is often called the genetic material of the cell. Genes are made of DNA. Each gene encodes for one protein. However, genes do not make proteins directly. DNA is located in the nucleus of eukaryotic cells, but protein synthesis occurs in the cytoplasm on structures called ribosomes. The transfer of information from DNA in the nucleus to the ribosomes is mediated by RNA.

Ribonucleic acid (RNA) is made up of nucleotides that contain ribose as the sugar. RNA contains three of the same nitrogenous bases as DNA – cytosine, adenine, and guanine. The fourth base in RNA is the pyrimidine uracil. RNA does not contain thymine. RNA is singlestranded, consisting of only one polynucleotide chain.

There are many types of RNA in cells, and they serve many different functions. Some of the major functions of RNA are involved in protein synthesis. The three major types of RNA that are involved in protein production are mRNA (messenger RNA), tRNA (transfer RNA), and rRNA (ribosomal RNA).

The transfer of information from genes to RNA to protein occurs in two steps, as shown in Figure 1. The first step is transcription. In this step the information in DNA is transferred to RNA. The second step is translation. In this process, the information in RNA is translated into a polypeptide (protein).

During transcription, RNA is formed by a protein called RNA polymerase. The polymerase uses one strand of DNA as a template for the RNA. The strand of mRNA is complementary to the template DNA strand. For example, the base guanine in the DNA template would correspond to a cytosine in the RNA transcript. This is illustrated in Figure 2. Transcription occurs in the nucleus of the cells. In eukaryotic cells, the RNA transcript is called pre-mRNA. Additional processing occurs in the nucleus to remove non-coding regions of the pre-mRNA and splice together the coding regions to form mRNA.

The mRNA leaves the nucleus and is transported to a ribosome in the cytoplasm. Translation occurs on ribosomes. Translation is the RNA-directed synthesis of polypeptides. During translation, the mRNA is “read” as a series of three bases called codons, as shown in Figure 2. Each codon encodes for a specific amino acid. There is some built-in redundancy in the system, as some amino acids have more than one codon. There are also start and stop codons that initiate and signal the end of polypeptide synthesis.

After translation, the polypeptide chain will fold to form its final structure. There are four levels of protein structure. The first level, primary structure, is simply the chain of amino acids. Hydrogen bonding between amino acids in the protein causes the formation of secondary structures – helices and pleated sheets. Additional folding due to side-chain interactions forms a precise conformation called tertiary structure. All proteins have these first three levels of structure. Some proteins consist of two or more folded polypeptides called subunits that aggregate together. This is called quaternary structure. The proper folding of a protein into its unique structure is essential to its function.


Cold Spring Harbor Laboratory

Cold Spring Harbor Laboratory (CSHL) has a distinguished history in the field of molecular biology. The Laboratory and its DNA Learning Center has also provided an extraordinary amount of resources for the consumption of non-scientist and K-12 and university-level students of science. A link to the DNA Learning Center at CSHL is included on the bottom of the CELL landing page. We highly recommend this site to all LabLearner students. In addition, below is a short but remarkable video animation from CSHL of the Central Dogma of Biology, describing DNA replication, transcription, and translation in a way that makes molecular biology come to life!

While watching this video animation, remember:

DNA Replication: In the cell nucleus, this is the process by which a double-stranded DNA molecule is copied. This results in two identical DNA molecules.

Transcription: Transcription is the process, which occurs in the nucleus, where the DNA molecule is used as a template to make an RNA “transcript” from the DNA sequence. In the RNA molecule, thymine is replaced with uracil.

Translation: Once an RNA transcript (mRNA) leaves the nucleus and enters the cytoplasm, ribosomes bind to it and the transcript is used to code for a sequence of amino acids. The amino acids are found by covalent “peptide” bonds – thus the term “polypeptide” is used to refer to proteins.


  • Fun Facts
  • Learn the Lingo
  • Get Focused



Discovery of a Century!

The twentieth century saw an explosion in our scientific understanding of the physical world around us. New knowledge and discoveries occurred in many fields. In physics, work early in the century introduced an entirely new way of looking at our Universe through quantum mechanics. The Big Bang Theory was also introduced shortly later and has had a profound impact on our thoughts about the beginning of time, energy, and matter.

In the biological sciences, perhaps the single greatest advancement was the discovery of the structure and function of DNA (see a short fragment of DNA in the model above). The structure of the DNA molecule was published in the journal Nature in 1953. The American geneticist Dr. James Watson and British physicist Dr. Francis Crick were first to publish the DNA structure.

While Watson and Crick gained initial personal credit and notoriety, their work would not have been possible without the previous research of Drs. Rosalind Franklin and Maurice Wilkins, who obtained excellent x-ray diffraction images of the crystalized DNA molecule, that permitted the actual structure of the molecule to be inferred. 

Since the time of its discovery and structure, the double-stranded, twisted structure of this incredibly large molecule has become a symbol of modern molecular biology. How large is the DNA molecule? If you were able to untwist and stretch the DNA molecules in a single human nucleus, it would be about 2 meters long. If you did the same with all of the DNA molecules in a single human being, the DNA would stretch much further than the diameter of the Solar System!


Each set of three bases on the mRNA strand is referred to as a triplet codon. Each triplet of bases directs the incorporation of one of 20 specific amino acids to covalently bind to the growing polypeptide protein chain. Below is a chart that shows which amino acid is coded for by each triplet codon.

Click on the Codon Chart Above to Download a larger PDF Image

Start at the center of the circle and work outward. For example, if you have a codon with the sequence GCC, start with the inner G, then move outward for the C and then outward again for the final C. By following this protocol, you see that the codon GCC codes for the amino acid alanine. The “three-letter” abbreviation for alanine is ala. The codon UUU codes for phenylalanine (Phe). 

Notice that more than one codon may code for the same amino acid. For example, UCG, UCA, UCC, and UCU all code for serine (Ser). One can also see that three codons code for STOP (UGA, UAG, and UAA). These special codons are referred to as “stop-codons”. When one of the three stop codons is reached in an mRNA transcript, the translation reaction will terminate and the polypeptide protein chain will be complete in its amino acid sequence.

Finally, notice that one codon, AUG, codes for a “start codon” and the amino acid methionine. This important codon determines the start of the protein sequence encoded by the mRNA transcript. Thus, unless modified later, every protein has at least one methionine amino acid. 


The following list includes Key Terms that are introduced within the Backgrounds of the CELL. These terms should be used, as appropriate, by teachers and students during everyday classroom discourse.

Note: Additional words may be bolded within the Background(s). These words are not Key Terms and are strictly emphasized for exposure at this time.


Investigation 1:
  • DNA: nucleic acid that carries genetic information in cells and consists of two complementary chains of nucleotides wound in a double helix
  • Protein: large molecule consisting of one or more chains of amino acids (polypeptides)
  • Amino Acid: building block of protein molecules
  • RNA: group of single-stranded nucleic acids, including mRNA (messenger RNA) that is necessary for transcription and translation
  • Codon: three adjacent nucleotides in DNA or mRNA that code for a specific amino acid in a protein
  • Mutation: a change in the DNA sequence of a gene
Investigation 2:
  • There are no Key Terms introduced in Investigation 2.
Investigation 3:
  • Chromosome: structures of DNA and protein in the nucleus of cells, where genes are located
  • Mitosis: nuclear division characterized by chromosome replication and formation of two identical daughter nuclei


The Focus Questions in each Investigation are designed to help teachers and students focus on the important concepts. By the end of the CELL, students should be able to answer the following questions:


Investigation 1:
  • How does DNA control the functions of an organism? 
  • Can mutations in DNA cause changes in an organism? 


Investigation 2:
  • Can mutations in DNA cause changes in an organism?
Investigation 3:
  • Why can mutations in the DNA of a single cell affect the functions of an entire organism?