Tag Archives: analysis

Protein Purification (activity)

Size-exclusion of dye molecules

As a demonstration, the instructor may illustrate the concept of size exclusion on a set of mixed food coloring.

Size exclusion chromatography of food coloring
  1. Pack column with 3 ml sepharose slurry
  2. Let the column empty over a beaker
  3. Carefully load 0.2 ml of food coloring mixture onto the column
  4. Place 10 tubes on a rack under the column
  5. Place 1 ml buffer on column and collect 0.5 ml fractions
  6. Continue to add buffer 1 ml at a time until all fractions have been collected

Size-exclusion of Proteins

This exercise seeks to purify Green Fluorescent Protein (GFP) or Blue Fluorescent Protein (BFP) from bacterial lysate. These proteins have a specific size of 238 amino acids and are 40,000 daltons (40kD). Based on their specific size, they will have a specific rate of migration through the size exclusion resin. Remember that the bacterial lysate is full of additional proteins that are not your protein of interest that we are attempting to isolate.

3D model of GFP (Top Left), BFP (Top Right), structural alignment of GFP and BFP (Center) and the sequence alignment  (Bottom) illustrating the 3 amino acid changes to produce the alternative protein. Red asterisks indicate location of mutations.

Drops of fluid will be collected in fractions. The fractions containing the fluorescent proteins will be found only in specific fractions that will be visible under UV illumination.

  1. Vertically mount the column on a ring stand. Make sure it is straight.
  2. Slide the cap onto the spout at the bottom of the column.
  3. Mix the slurry (molecular sieve) thoroughly by swirling or gently stirring.
  4. Carefully pipet 2 ml of the mixed slurry into the column by letting it stream down the inside walls of the column.
  5. Place an empty beaker under the column to collect wash buffer.
  6. Remove the cap from the bottom of the column and allow the matrix to pack into the column.
  7. Label eight microcentrifuge tubes #1-8.
  8. Slowly load the column with 0.2ml of the GFP extract. Allow the extract to completely enter the column.
  9. Add 1ml of elution buffer on top of resin without disturbing the resin
    • Add buffer slowly (several drops at a time) to avoid diluting the protein sample.
    • Using the graduated marks on the sides of the tubes, collect 0.5ml fractions in the labeled microcentrifuge tubes.
    • Continue to add 1ml buffer and collect fractions until all tubes are full
  1. Check all fractions by using long wave U.V. light to identify tubes that contain the fluorescent GFP or BFP proteins.
  2. Further purification may be performed with a different resin with the few fractions containing the protein of interest
  3. Protein samples should be run on an acrylamide gel and stained against all proteins to check the purity of the sample or fluorescence measurements taken


Protein Production

Protein Expression

Recombinant DNA technology has many uses in basic scientific research to better understand the nature of of living things. As a tool, recombinant DNA technology can be used to express proteins towards medical applications. Prior to biotechnology, type I diabetes (insulin-dependent) was treated by injection of insulin isolated from the pancreas of pigs. With the ability to express human proteins inside bacteria, yeast and other cells, sacrificing pigs for porcine insulin is no longer necessary.

Bacterial expression vector pGEX-3X contains the AmpR gene, origin of replication, MCS downstream of the hybrid lac/trp promoter (tac) and the coding sequence for glutathione-S-transferase (GST). GST acts as a tag that is fused directly with the protein from the gene of interest and used to purify the protein with a glutathione resin.

Credit: Stewart EJ, Madden R, Paul G, Taddei F (CC-SA 3.0) .
Bacteria or other cells can be engineered to express proteins through the process of cloning and transformation. Bacteria are advantageous because of their rapid life cycle and ease of growth. A bacterial expression vector contains the basic plasmid features: origin of replication as well as antibiotic resistance gene. Often, an affinity tag will be used to aid in purification of the protein. An example in the vector above shows the GST (glutathione-s-transferase) tag that can be purified with glutathione resin. Expression is only the first problem since bacteria are also synthesizing proteins that are required for the bacteria to grow and divide. Injecting these proteins in addition to insulin would cause an immune reaction that could be deadly. Therefore, it is required that overexpressed proteins be purified and isolated from other undesirable proteins.

Criteria for Choosing an Expression System

Protein expression systems have inherent advantages and disadvantages. The table above summarizes the comparison of the various cellular systems of production (Fernandez & Hoeffler, 1999).

Labfors 4 with touchscreen
A bioreactor or fermenter that is used to grow large amounts of bacteria for the production of protein. Credit: Miropiro CC-BY-SA 3.0)

GFP Production and Purification



Different methods of isolation can be applied depending on the properties of the protein. Ion exchange chromatography is useful if the protein of interest has a specific charge that will interact with a resin packed with the opposite charge.


Immunoprecitation: Column is packed with Protein-A agarose which binds to antibodies. Cell lysates are then loaded onto the columns where they flow through and are allowed to interact with the antibody. Washes are performed to remove the non-specifically bound proteins. An elution buffer is used to disrupt the interaction of the antibody to the protein target.

Affinity Purification

Affinity purification employs the use of specific antibodies that bind to the protein of interest very tightly to retain it on a column. With these techniques, the protein retained on the resin is washed numerous times to remove other proteins that are non-specifically sticking. A change in pH or ionic conditions then is used to disrupt the interaction with the resin and elute the proteins from the column. Proteins that are engineered to contain tags can be purified by antibodies specific to those tags. Also, the addition of 6 or more consecutive Histidine residues to the end of a protein make them susceptible to purification with Nickel-NTA resin or Cobalt purification. In these cases, the 6XHis tag associates with these metal ions on the resin and are selectively adhered.

Nickel NTA Protein Prification
Nickel NTA resin coordinating the capture of a 6His tagged protein

Size Exclusion

Activated Charcoal

Credit: Mydriatic (CC-BY-SA 3.0)

Credit:  Takometer(CC-BY 2.5)

Most of you are familiar with water purification filters. Before using these filters, you soak them in water and dark residue leaks out. This dark residue is activated charcoal. The activated charcoal has tiny microscopic pores that trap small items like ions and other particles. The primary goal of these filters is to remove metals and chlorine that are found in tap water. The porous nature of activated charcoal renders it useful for trapping molecules in water purification systems .

The process used to trap these small particles is called size exclusion. Unlike agarose gel electrophoresis where the smaller particles navigate through the matrix faster, size exclusion resins trap the smaller molecules.
Pore size schematic
The smaller the molecule, the longer they spend within the pores as they traverse through the matrix.

Significance of Purification

Limulus polyphemus (aq.)
Credit: Hans Hillewaert (CC-BY)

All injectible drugs must be clean of endotoxins from bacteria. Purification of the protein of interest from bacterial lysates removes the dangerous pathogenic materials from that would otherwise activate host immune reactivity.

The horseshoe crab (Limulus polyphemus) performs a special function in the ecosystem by providing eggs for migratory birds to feed on. This organism also houses a special cell type in its hemolymph. The limulus amoebocyte lysate (LAL) test is the most sensitive assay of detecting endotoxins from bacteria. Amoebocytes are collected from these organisms for use on testing batches of injectible drugs to ensure proper purification and safety.


Tags: , , ,

Morphometric Analysis

Morphometrics and physical markers

Morphometrics (morpho– shape; metrics– measurements) is the use of physical measurements to determine the relatedness of organisms. With extinct organisms that have died out long ago, DNA extraction proves to be difficult. Likewise, prior to DNA technologies to analyze species, Linnean taxonomy was ascribed to organisms based on similarities in features.

Describing Species and Variation of Morphologies

Below are images of skull landmarks of the lizard family Varanidae. This family includes monitor lizards and Komodo Dragons.As can be seen below, the general morphology of the skulls are similar enough that they all retain the same landmarks. The figure below also illustrates the diversity in these lizards that illustrate a large variety between species.

Skulls of the species involved in this analysis.
Skulls of the species involved in this analysis. McCurry et al. (2015) (CC-BY)

Landmarks Standardize measurements

Having a set of shared landmarks provides the opportunity to make systematic measurements of morphometric features.

Landmarks and measurement metrics for the morphometric analysis of fossils.
Landmarks and measurement metrics for the morphometric analysis of skulls. McCurry et al. (2015) (CC-BY)


Euclidean distance to measure relatedness

Euclidean distance is a measurement derived from Pythagorean geometry that describes the shortest distance (d) between 2 points (A & B) as a straight line using triangulation. In a cartesian space, the points can be defined:

A=(x_A, y_A) and B=(x_B, y_B)

Standard pythagorean theorem can be expressed as:

x^2 + y^2 = d^2

To find the distance between the 2 points, we utilize algebra to calculate for d.

d = \sqrt{x^2 + y^2}

In this case, we expand to comparing the coordinates of the two points:

\Delta x = x_B - x_A and \Delta y = y_B - y_A

We can then expand this idea to include the differences of data points that describe the comparisons of multiple measurements.

d(\mathbf{X_i, X_j}) = \sqrt{\sum_{k=1}^{p}(X_{ik} - X_{jk})^2}

Calculating distance with R

  1. Download the dataset (McCurry et al. 2015) associated with this activity (a Comma Separated Value .csv file). This can be used in a spreadsheet or in a text editor. This data can be imported into R to determine the euclidean distances of landmarks.
  2. The following code in R will download the data set into a variable called “varanoid”, measure euclidean distance and save a plot into a PDF file in a directory called “/tmp”.
## install curl for fetching from internet if it isn't
## Load the curl library
## read the data of measurements and assign it to a variable 'varanoid'
varanoid = read.csv(curl('https://raw.githubusercontent.com/jeremyseto/bio-oer/master/data/varanoid.csv'))
## set the row names to the Species column
row.names(varanoid) = varanoid$Species
## remove the first column of the table to have purely numeric data
varanoid_truncated = (varanoid[,2:14])
## calculate distance using euclidean as the method
dist_measure = dist(varanoid_truncated, method='euclidean')
## display dist_measure to look at the comparisons
varanoid_cluster = hclust(dist_measure)
## open PDF as a graphics device  to save a file in the '/tmp' directory
## close the device to save the plot as pdf



DNA Analysis

Before starting this activity, review bioinformatics and sequence analysis.

  1. Search NCBI for mitochondrial sequences from the species involved in McCurry 2015. The data has been submitted by Ast (2001).
  2. Find the sequences and identify/extract elements that are common to all
  3. Assemble the shared sequences in a text editor as a single FASTA file where each species is separated by a header (“>Species A”)
    • Notepad on Windows (but it’s better to download notepad++)
    • Textedit on Mac (but probably better to download TextWrangler)
    • Gedit on Linux
  4. Save the file as “something.fasta”
  5. Perform a multiple sequence analysis using UGENE
  6. Generate a phylogenetic tree using UGENE. For this exercise, use Maximum Likelihood (PhyML) as the algorithm. File the tutorial below.
  7. Compare the DNA with the morphometric analyses. What problems could we imagine arise if we rely solely on morphometry.


Tags: , , , ,

Enzyme Kinetics (activity)

The Enzyme

Amylase reaction
Amylase is an enzyme that breaks down amylose (starch) into glucose molecules.

      1. What test can be used to indicate the presence of Starch?
      2. What test can be used to indicate the presence of glucose?
      3. What is the role of an enzyme in a chemical reaction and what is it made of?
      4. What parameters would influence the ability of the enzyme to facilitate the rate of the reaction?
Salivary amylase is produced in the mouth, where digestion begins.
Pancreatic amylase is produced in the pancreas and is supplied to the duodenum of the small intestines.
Overlay of salivary (green) and pancreatic (teal) amylase molecules.

Tags: , , ,

Quantitative Skills


Experimental science looks at cause and effect types of relationships. Controlled experiments vary one of the factors or traits  to observe the effect on another factor or trait. These factors are called variables. A dependent variable is something that is observed and expected to change as a result of modifying another factor in the experiment. That is to say, the outcome depends on another factor.  Another name for dependent variable is responding variable. The independent variable is the factor or condition that is changing or being changed by the experimenter. Sometimes waiting is the the condition that is changing, making the independent variable : time. Since we change the independent variable, it is also called the manipulated variable.

Graphing a line

Linear Function Graph
A graph displaying 2 lines and their equations

 A line can be described mathematically by the equation:

y= mx + b

This is referred to as the slope-intercept form. The 2 variables y and x refer to coordinates along each axis. The term m refers to the change in the y-values over the change in the x-values. This is referred to as the slope of the line. The term b, the y-intercept, is the y-value where the line crosses the y-axis.

m = \frac{y_2-y_1}{x_2-x_1} OR m = \frac{\Delta y}{\Delta x}
This is how the slope of a line (m) is determined.

The slope of the line indicates the relationship between the two variables, x and y. The equation of the line y= mx + b indicates to us that “y” occurs as a function of changes to “x”. Sometimes this is represented by the equation f(x)= mx + b . Since “y” depends on “x”, “y is the dependent variable and “x” is the independent variable.¬† As “x” changes, how does “y” change in response? This is what the slope reveals to us.

For more review, visit the following link.

Slope activity

Graphing Lines
Click on image to run the simulation

 Data Plotting Activity

  1. Use the following data from the USDA to plot a scatterplot and generate a trend-line.
  2. Use the info from the table below or download (a text file that can be opened in notepad/textedit)
  3. follow a tutorial on how to graph a scatterplot with line of best fit (trend-line)
  4. or use this tutorial in plot.ly
      • remember NOT to make a line graph
      • Ensure that “Year” is the x-axis
      • Ensure that “lbs. Mozzarella” is the y-axis
      • Show the equation of the trend-line on the graph
Year lbs. Mozzarella
2000 9.33
2001 9.70
2002 9.66
2003 9.65
2004 9.94
2005 10.19
2006 10.52
2007 11.02
2008 10.57
2009 10.63

What do we learn?

  1. What does this scatterplot tell us about the relationship between consumption of mozzarella in relationship to years?
  2. How would this graph influence the way you invest in a mozzarella cheese company? Can you predict anything about the future of cheese consumption?
  3. What does the slope of this line indicate to you?
    1. Use mathematics to illustrate this point.
    2. The slope has a unit related to “lbs.” and “year”, what is this unit?

Creating a Line of Best Fit

Not all points collected will fall on a straight line. A Line of Best Fit or a Trendline approximates the average of those points through a mathematical process called the least squares method. While one could “eyeball” this line, the least squares method uses the data to minimize the distance from all those points to the line to have an averaging effect.

  1. Create a column of data for “x” values, “y” values, x2 and xy
  2. At the bottom of these columns, sum the data. ő£x,¬†ő£y,¬†ő£x2,¬†ő£xy
  3. Calculate the slope from these values
    • m = \frac{N (\Sigma xy)- \Sigma x \Sigma y}{N (\Sigma x^2)-(\Sigma x)^2} (where N=number of entries in a column)
  4. Calculate the y-intercept from these values
    • b = \frac{N(\Sigma y) - m (\Sigma x)}{N}
  5. Which provides use the the function f(x)= mx + b
Least-Squares Regression
Click on image to run the simulation


Tags: , , ,

Why are cells small? (activity)

Why are cells so small?

  1. Take 3 blocks of agar of different size (1cm, 2cm, 3cm) ‚Üí these are our cell models
  2. Measure the length, width and height of each cube using a ruler
  3. Calculate the area of each face of the cubes and add all the areas together for a single cube
    • a cube has 6 faces ‚Üí the total surface area is the same as the area of one side multiplied by 6
  1. Calculate the volume of each cube
  2. Report the surface area-to-volume in the table below

Data Table: Calculating Surface Area-to-Volume Ratio

Cell Model (cube)




Total Surface Area

Volume of cell

Surface Area: Volume




Stop and think:

  • Which cube has the greatest surface area:volume ratio?
  • Which cube has the smallest surface area:volume ratio?
  • Hypothesize: In an osmosis or diffusion experiment, which cube size would have the greatest diffusion rate?



  1. Each group will acquire three agar cubes: A 3cm cube, a 2cm cube, and a 1cm cube. CUT AS ACCURATELY AS POSSIBLE . (This may be already completed for you.)
  2. Place cubes into a beaker and submerge with 200 ml NaOH
  3. Let the cubes soak for approximately 10 minutes.
  4. Periodically, gently stir the solution, or turn the cubes over.
  5. After 10 minutes, remove the NaOH solution
  6. Blot the cubes with a paper towel.
  7. Promptly cut each cube in half and measure the depth to which the pink color has penetrated. Sketch each block’s cross-section.
  8. Record the volume that has remained white in color.
  9. Do the following calculations for each cube and complete the following data table:


Data Table: Calculation of Diffusion Area-to-Volume







cross-section of each


of the diffused cube

(Vtotal ‚Äď Vwhite)




% Diffused

Area: Volume

previous table)





  • Which cube had the greatest percentage of diffusion?
  • Did this meet your expectations with your hypothesis?
  • If you designed a large cell, would it be a large sphere or something long and flat?

Tags: , , ,

Variable Number Tandem Repeats


The¬†difference in nucleotide sequences between humans lies between¬†0.1-0.4%. That means that people are greater than 99% similar. But¬†when you look around the room at your classmates, you can see that¬†that small difference amounts to quite a bit of variation within our¬†species. The bulk of these differences aren’t even within the coding¬†sequences of genes, but lie outside in regulatory regions that change¬†the expression of those genes. Imagine if there were mutations to the¬†coding sequences, this could be very deleterious to the well-being of¬†the organism. We say that the coding sequences of genes that¬†ultimately lead to proteins has a selective¬†pressure¬†to remain the same. The areas outside of the coding sequences have a¬†reduced and sometimes non-existent selection pressure. These areas¬†are allowed to mutate in sequence and even expand or contract. Areas¬†of changes or differences are called polymorphic¬†(many¬†forms). If you were to read a repetitive set of sequences and count¬†the repetition, you’d make mistakes and lose count. Likewise, DNA polymerase will make errors or stutter in areas of repetitiveness and¬†produce polymorphic regions.

Tandem Repeats

A type of polymorphism occurs due to these repeats expanding and contracting in non-coding regions. These regions are called variable number tandem repeats (VNTRs) or sometimes short tandem repeats (STRs). Any region or location on a chromosome is referred to as locus (loci for plural).  Scientists use polymorphic loci that are known to contain VNTRs/STRs in order to differentiate people based on their DNA. This is often used in forensic science or in maternity/paternity cases. Any variation of a locus is referred to as an allele. In standard genetics, we often think of an allele as a variation of gene that would result in a difference in a physical manifestation of that gene. In the case of STRs, these alleles are simply a difference in number of repeats. That means the length of DNA within this locus is either longer or shorter and gives rise to many different alleles. VNTRs are referred to as minisatellites while STRs are called microsatellites.


Dna fingerprinting
DNA fingerprinting. Credit: Helixitta,The Photographer and Jeremy Seto (CC-BY-SA 3.0)
The FBI and local law enforcement agencies have developed a database called the  Combined DNA Index System (CoDIS) that gathers data on a number of STRs. By establishing the number of repeats of a given locus, law enforcement officials can differentiate individuals based on the repeat length of these alleles. CoDIS uses a
set of 20 loci that are tested together. As you would imagine, people are bound to have the same alleles of certain loci, especially if they were related. The use of 20 different loci makes it statistically improbable that 2 different people could be confused for each other. Think about this in terms of physical traits. As you increase the number of physical traits used to describe someone, you are less likely to confuse that person with someone else based on those combinations of traits. Using the CoDIS loci increases the stringency since there are many alleles for each locus. The twentieth locus in CoDIS (called AMEL) discriminates between male and female.

CoDIS STRs: The FBI utilizes 20 different loci to discriminate between people. AMEL discriminates by gender and is located on the X & Y. Credit: Jeremy Seto (CC-BY-NC-SA)
STR electropherogram of a three person mixture
STR electropherogram of a three person mixture
STR genotype
A genotype of 5 STR markers. Peaks illustrate the signal detected and the intensity of the signal corresponding to repeat sizes. The peak height of D16S539 is twice the size of other peaks and illustrates that this locus contains two of the same repeat sizes.


Crime Scene Investigation

This lab uses a CoDIS locus called TH01. TH01 is a locus on chromosome 11 that has a repeating sequence of TCAT. There are reported to be between 3-14 repeats in this locus. With the exception of X and Y in a male, all chromosomes have a homologous partner. Therefore, each individual will have 2 alleles for each CoDIS locus.

TH01 STR: Outside of the STR, there is flanking areas of known sequence. The primers that amplify TH01 in PCR recognize these flanking sequences to amplify the TCAT repeats. Credit: Jeremy Seto (CC-BY-NC-SA)

At¬†a crime scene, criminals don’t often leave massive amounts of tissue¬†behind. Scant evidence in the form of a few cells found within bodily¬†fluids or stray hairs can be enough to use as DNA evidence. DNA is¬†extracted from these few cells and amplified by PCR using the¬†specific primers that flank the STRs used in CoDIS.

DNA evidence from a crime scene: DNA can be extracted from cells found from various sources at a crime scene. PCR can amplify this small amount of DNA. Credit: Jeremy Seto (CC-BY-NC-SA)

Amplified DNA will be separated by gel electrophoresis and analyzed. Size reference standards and samples from the crime scene and the putative suspects would be analyzed together. In a paternity test, samples from the mother, the child and the suspected father would be analyzed in the same manner. A simple cheek swab will supply enough cells for
this test.

TH01 locus used in a Paternity/Maternity test: Individual PCR reactions are run for each sample (mom, dad, child). The TH01 primer pair specifically amplifies the locus. Each amplified sample is run on the same gel to resolve the different alleles of TH01 from each individual. From this test the sample could be the offspring from these 2 parents but use of more STRs would make it more definitive. Count the TCATs. Credit: Jeremy Seto (CC-BY-NC-SA)

External Resources

Tags: , , , ,

Analyzing DNA

Agarose Gel Electrophoresis

Agarose polymere
Agarose is a linear carbohydrate polymer purified from the cell walls of certain species of algae. Agar is a combination of the crude extract that contains agarose and the smaller polysaccharide agaropectin. When dissolved and melted in liquid, agarose strands become tangled together to form a netting that holds the fluid in a gel. Reduction of the fluid creates a higher percentage gel that is firmer and contains smaller pores within the netting.

Two percent Agarose Gel in Borate Buffer cast in a Gel Tray (Front, angled)Placing a comb within the melted agarose creates spaces that allow for the insertion of samples when the gel is solidified. Molecules can traverse through the pores as they are drawn by electrical currents. Charged compounds will migrate towards the electrode of opposite charge but migration rate will be influenced by the size of the molecules. Smaller compounds can easily traverse through the webbing while larger items are retarded by the pore size. Follow this simulation to get a better idea of how we use Agarose Gel Electrophoresis in molecular biology to study DNA fragments.

DNA molecules are not readily visible when resolved (separated) on an agarose gel. In order to visualize the molecules, a DNA dye must be administered to the gel. In research labs, a DNA intercalating agent called Ethidium Bromide is added to the molten gel and will bind to the DNA of the samples when run. Ethidium Bromide can then be visualized on a UV box that will fluoresce the compound and reveal bands where DNA is accumulated. Since Ethidium Bromide is known as a carcinogen, teaching labs will use a safer DNA intercalating agent known as Sybr Green. This can be visualized in a similar fashion, but will fluoresce a green color instead.

Agarose Gels
Agarose gels visualized on a UV transilluminator. Left shows a gel with Ethidium bromide. Right shows a gel with Sybr Green.

Agarose gels are made of and bathed in a buffered solution, usually of Tris-Borate-EDTA (TBE) or Tris-Acetate-EDTA (TAE). Regardless of buffer solution, the buffer provides necessary electrolytes  for the current to pass through and maintain the pH of the solution.

DNA samples are prepared in a buffer similar to the solution that it will be run in to ensure that the phosphate backbone of the DNA remains deprotonated and moves to the positive electrode. Additionally, glycerol or another compound is added to this buffer in order for the solution to sink into the wells without spreading out. A dye is often included in this loading buffer in order to visualize the loading in the wells and to track the relative progression of gel.

Agarose Gel Set-up

OSC Microbio 12 02 AgaroseGE

External Resources

Tags: ,