Tag Archives: quantitative reasoning

Hardy-Weinberg & Population Genetics

Hardy-Weinberg Principle

The Hardy-Weinberg principle is a mathematical model used to describe the equilibrium of two alleles in a population in the absence of evolutionary forces. This model was derived independently by G.H. Hardy and Wilhelm Weinberg. It states that the allele and genotype frequencies across a population will remain constant across generations in the absence of evolutionary forces. This equilibrium makes several assumptions in order to be true:

  1. An infinitely large population size
  2. The organism involved is diploid
  3. The organism only reproduces sexually
  4. There are no overlapping generations
  5. Mating is random
  6. Allele frequencies equal in both genders
  7. Absence of migration, mutation or selection

As we can see, many items in the list above can not be controlled for but it allows for us to make a comparison in situations where expected evolutionary forces come into play (selection etc.).

Hardy-Weinberg Equilibrium

The alleles in the equation are defined as the following:

  • Genotype frequency is calculated by the following:\text{\Large genotype frequency} = \frac{\text{\Large \# individuals of given genotype}}{\text{\Large total \# individuals in population}}
  • Allele frequency is calculated by the following:\text{\Large allele frequency} = \frac{\text{\Large \# of copies of an allele in a population}}{\text{\Large total \# of alleles in population}}
  • In a two allele system with dominant/recessive, we designate the frequency of one as p and the other as q and standardize to:
    • \textit{\Large p} =\text{\Large Dominant allele frequency}
    • \textit{\Large q} =\text{\Large recessive allele frequency}
  • Therefore the total frequency of all alleles in this system equal 100% (or 1)
    • {\Large p + q = 1}
  • Likewise, the total frequency of all genotypes is expressed by the following quadratic where it also equals 1:
    • {\Large p^2 + 2pq + q^2 = 1}
    • This equation is the Hardy-Weinberg theorem that states that there are no evolutionary forces at play that are altering the gene frequencies.

Calculating Hardy-Weinberg Equilibrium (activity)

This exercise refers to the PTC tasting exercise. One can test for selection for one  allele within the population using this example. Though the class size is small, pooling results from multiple section can enhance the exercise. Remember to surmise the dominant/recessive traits from the class counts.

  1. What is the recessive phenotype and how can we represent the genotype?
  2. What is the dominant phenotype and how can we represent the genotypes?
  3. What is the frequency of recessive genotype? (q2)
  4. What is the frequency of the recessive allele? (q)
  5. What is the frequency of the dominant allele?(p=1-q)
  6. Use Hardy-Weinberg to calculate the frequency of heterozygotes in the class. (2pq)
  7. Use Hardy-Weinberg to calculate the frequency of homozygotes in the class. (p2)
  8. Using an aggregate of multiple section, compare the local allelic and genotypic frequencies with what the Hardy-Weinberg would predict.
  9. With this small number in mind, we can see that there are problems with the assumptions required for this principle. The instructor will perform the following simulation in class to illustrate the effects on multiple populations with the effects of selection and /or population limitations. A coefficient of fitness can be applied to illustrate a selective pressure against an allele.
  10. In the case of a selective pressure, a fitness coefficient (w) can be introduced. A research article http://www.jci.org/articles/view/64240 has shown that the Tas2R38 receptor aids in the immune response against Pseudomonas. Imagine a situation where there is an epidemic of antibiotic resistant Pseudomonas. This would
    show that the dominant allele will have a selective advantage.

    • Modify the fitness coefficient in the Population Genetics Simulator and describe the effects this would have over many successive generations.

A case study of evolution: Population Genetics at Work

Additional Resources/Experiments



Download PDF
Tags: ,

Probability and Chi-Square Analysis

Mendel’s Observations

Probability: Past Punnett Squares

Punnett Squares are convenient for predicting the outcome of monhybrid or dihybrid crosses. The expectation of two heterozygous parents is 3:1 in a single trait cross or 9:3:3:1 in a two-trait cross. Performing a three or four trait cross becomes very messy. In these instances, it is better to follow the rules of probability. Probability is the chance that and event will occur expressed as a fraction or percentage. In the case of a monohybrid cross, 3:1 ratio means that there is a 3/4 (0.75) chance of the dominant phenotype with a 1/4 (0.25) chance of a recessive phenotype.


A single die has a 1 in 6 chance of being a specific value. In this case, there is a 1/6 probability of rolling a 3. It is understood that rolling a second die simultaneously is not influenced by the first and is therefore independent. This second die also has a 1/6  chance of being a 3.

We can understand these rules of probability by applying them to the dihybrid cross and realizing we come to the same outcome as the 2 monohybrid Punnett Squares as with the single dihybrid Punnett Square.

This forked line method of calculating probability of offspring with various genotypes and phenotypes can be scaled and applied to more characteristics.

Download PDF
Tags: , ,

Quantitative Nucleic Acid Measurement

Quantitative PCR (qPCR)


Measurements can be made of individual genes of interest through PCR of those specific genes. A process known as Real-Time PCR or quantitative PCR (qPCR) is used to measure individual genes using fluorescence measurements. An intercalating agent that binds only to double stranded DNA called Sybr Green is used in a qPCR machine that is measuring fluorescence after each cycle of PCR indirectly indicates the amount of amplified product. However, non-specific products of amplification may also be measured and not discriminated from the authentic amplicon.

An alternative to Sybr Green is exemplified by the TaqMan technology. With TaqMan, a third primer (TaqMan probe) is designed in the middle of the area to be amplified. This middle primer is designed with a hairpin self-complimentarity so that the 5′ and 3′ ends are in close proximity. At one end, a fluorescent reporter is attached while the other terminus has a quencher that absorbs any fluorescence signal. Under normal circumstances, measurements of fluorescence will be very low. When PCR extension occurs, the Polymerase hydrolyzes this middle primer, thereby separating the quencher and reporter. The name TaqMan is a play on words since it is imagined that the polymerase is chewing up the probe like Pacman. With increased distance between quencher/reporter, fluorescence signal from this probe can now be measured. This method is much more specific than Sybr Green, however the use of specific probes increases the cost considerably.

Threshold Cycles (Ct)

Fluorescence measurement early during the PCR process will be very low due to small number of dsDNA molecules (Sybr Green) or most TaqMan primers being quenched. During this exponential DNA production, a threshold will be reached in which the fluorescence will linearly increase. A specific point where the fluorescence is clearly measurable called the Threshold Cycle (Ct) is used as a reference point to compare expression values.

Looking at the example of Sybr Green qPCR above, it can be observed that samples exponentially increasing at a lower Cycle number (Ct) has a higher level of mRNA expression (towards the left) of that gene than samples with higher cycle number (towards the right). Notice that the fluorescence eventually plateaus and stops increasing. This is due to the depletion of raw materials for DNA production like dNTPs.

Since the PCR reactions theoretically represent a doubling of DNA after each cycle, the Ct values can be interpreted on a base 2 system. If there is a difference in Ct between two samples (őĒCt) of 5 cycles, this corresponds to 25 or 32 fold difference.¬†We can control for variations in the RNA preparation through comparing the fluorescence values of our gene of interest to a housekeeping gene like actin. The use of a house-keeping gene to normalize the initial input to the reactions and comparison between samples is referred to as Relative Quantification.

Melt Curves for Sybr Green

Melting Curve Analysis Graphs
Top panel illustrates the decrease in fluorescence as the temperature increases due to the dissociation of double stranded DNA. Bottom panel illustrates the first derivative plot. Each peak in this example illustrates a different allele. The double peaks represent the presence of the 2 distinct alleles in the amplification products.
When using Sybr Green, we need to ensure that the PCR is specific so that the fluorescence measurement truly reflect amplification of our gene of interest. At the end of each qPCR run (~40 cycles), a melt curve is performed. A melting curve (or dissociation curve) comes from constant measurements as the temperature is increased. As temperature increases, the DNA strands start to denature and fluorescence will begin to decrease. After complete separation of DNA strands, the fluorescence will again remain constant. The way this curve is viewed is through a derivative plot where the inflection in fluorescence reading is reported as the melting temperature (Tm).

melt curve
This melt curve illustrates each sample contains the same specific product with a melting temperature of 83.51¬įC.

Any peaks in this plot refer to a specific PCR product. If multiple peaks appear, the results will not be valid as they do not directly measure a single product.

Expression measurements

Differential gene expression refers to transcriptional programs activated by the cell under various conditions. “Differential” refers to a comparison of two or more states or timepoints. Using mRNA as an indirect measurement of protein, one can ascertain which proteins are linked to these different states. In eukaryotes, this can be ¬†assessed by enriching total RNA for polyA-containing mature mRNA. Through the use of oligo-(dT) containing resin, mRNA can be separated from non-protein encoding RNA. Likewise, performing a reverse-transcription using an ¬†oligo-(dT) primer will create a stable complimentary DNA (cDNA) molecule that can be used with PCR. Using qPCR in this way is called RT-PCR or reverse-transcription polymerase chain reaction where specific primer pairs are used to amplify a small portion of a known gene.¬†

Hybridization based methods and Microarrays

Prior to RT-PCR, expression of individual genes was assessed through a hybridization-based approach. This method called for running RNA on an agarose gel and transferring the size-fractionated RNAs onto a membrane through a method called “blotting”. This transferred RNA was then hybridized to a radioactively labelled probe for a specific gene (corresponding to the reverse complimentary sequence) and visualized by exposure to X-ray film in a process called Northern Blotting. The intensity of the band would be proportional to the amount of mRNA corresponding to the gene of interest. Re-probing with a housekeeping gene like actin would be used as a loading control to illustrate that a similar amount of total RNA was loaded into each well. Differences in sizes of the mRNA on the Northern Blot also revealed differences in splice variants of mature mRNA in the different states.
Northern Blot

This technique was later adapted using non-radioactive methods. Using these non-radioactive methods, the reverse protocol was developed to measure multiple gene targets. By systematically immobilizing gene specific probes onto a membrane or a microscope slide, an array of targets can be produced. In the simplest paradigm of having 2 states (control or experimental), cDNA from each sample can be used to generate fluorescent RNA that can hybridize to immobilized probes. Using 2 different fluorescent markers allows for the competitive hybridization onto the array whereby the fluorescent signal in each channel can reveal the differential gene expression of the two states in a 2-color microarray.
OSC Microbio 12 02 Microarray

Download PDF
Tags: ,

Morphometric Analysis

Morphometrics and physical markers

Morphometrics (morpho– shape; metrics– measurements) is the use of physical measurements to determine the relatedness of organisms. With extinct organisms that have died out long ago, DNA extraction proves to be difficult. Likewise, prior to DNA technologies to analyze species, Linnean taxonomy was ascribed to organisms based on similarities in features.

Describing Species and Variation of Morphologies

Below are images of skull landmarks of the lizard family Varanidae. This family includes monitor lizards and Komodo Dragons.As can be seen below, the general morphology of the skulls are similar enough that they all retain the same landmarks. The figure below also illustrates the diversity in these lizards that illustrate a large variety between species.

Skulls of the species involved in this analysis.
Skulls of the species involved in this analysis. McCurry et al. (2015) (CC-BY)

Landmarks Standardize measurements

Having a set of shared landmarks provides the opportunity to make systematic measurements of morphometric features.

Landmarks and measurement metrics for the morphometric analysis of fossils.
Landmarks and measurement metrics for the morphometric analysis of skulls. McCurry et al. (2015) (CC-BY)


Euclidean distance to measure relatedness

Euclidean distance is a measurement derived from Pythagorean geometry that describes the shortest distance (d) between 2 points (A & B) as a straight line using triangulation. In a cartesian space, the points can be defined:

A=(x_A, y_A) and B=(x_B, y_B)

Standard pythagorean theorem can be expressed as:

x^2 + y^2 = d^2

To find the distance between the 2 points, we utilize algebra to calculate for d.

d = \sqrt{x^2 + y^2}

In this case, we expand to comparing the coordinates of the two points:

\Delta x = x_B - x_A and \Delta y = y_B - y_A

We can then expand this idea to include the differences of data points that describe the comparisons of multiple measurements.

d(\mathbf{X_i, X_j}) = \sqrt{\sum_{k=1}^{p}(X_{ik} - X_{jk})^2}\

Calculating distance with R

  1. Download the dataset (McCurry et al. 2015) associated with this activity (a Comma Separated Value .csv file). This can be used in a spreadsheet or in a text editor. This data can be imported into R to determine the euclidean distances of landmarks.
  2. The following code in R will download the data set into a variable called “varanoid”, measure euclidean distance and save a plot into a PDF file in a directory called “/tmp”.
## install curl for fetching from internet if it isn't
## Load the curl library
## read the data of measurements and assign it to a variable 'varanoid'
varanoid = read.csv(curl('https://raw.githubusercontent.com/jeremyseto/bio-oer/master/data/varanoid.csv'))
## set the row names to the Species column
row.names(varanoid) = varanoid$Species
## remove the first column of the table to have purely numeric data
varanoid_truncated = (varanoid[,2:14])
## calculate distance using euclidean as the method
dist_measure = dist(varanoid_truncated, method='euclidean')
## display dist_measure to look at the comparisons
varanoid_cluster = hclust(dist_measure)
## open PDF as a graphics device  to save a file in the '/tmp' directory
## close the device to save the plot as pdf



DNA Analysis

Before starting this activity, review bioinformatics and sequence analysis.

  1. Search NCBI for mitochondrial sequences from the species involved in McCurry 2015. The data has been submitted by Ast (2001).
  2. Find the sequences and identify/extract elements that are common to all
  3. Assemble the shared sequences in a text editor as a single FASTA file where each species is separated by a header (“>Species A”)
    • Notepad on Windows (but it’s better to download notepad++)
    • Textedit on Mac (but probably better to download TextWrangler)
    • Gedit on Linux
  4. Save the file as “something.fasta”
  5. Perform a multiple sequence analysis using UGENE
  6. Generate a phylogenetic tree using UGENE. For this exercise, use Maximum Likelihood (PhyML) as the algorithm. File the tutorial below.
  7. Compare the DNA with the morphometric analyses. What problems could we imagine arise if we rely solely on morphometry.


Download PDF
Tags: , , , ,

Absorbance Spectra of Photosynthetic Pigments

Prelab Exercise

  1. Fill the Color field in the table below
  2. Use plot.ly to create a line graph with the 3 samples below (A, B, C)
    • plot % Reflectance on the Y-axis and Wavelength (nm) ¬†on the X-axis

% Reflectance


























































Stop and Think: Reflectance

A sign of plant health is viewed through the near infra-red. While we cannot see this spectrum of light with our eyes, we can use other sensors to detect this light. Compare the images of the Black & White with the Infra-red image. What differences can you see in the 2 images that will help you understand how this is a useful measure of plant health? How do you think this is corresponds to the table above?

English Garden in Color
The English Garden

Reflectance Slider shows an overlay of the next images

English Garden in Black and White
The English Garden (black & white)
English Garden in Infra-red
The English Garden (near infra-red)

Visible light wavelengths (between 400nm-700nm) are strongly absorbed by the pigments in leaves (Chlorophylls, Xanthophylls, Carotenoids). These pigments utilize the energy of these wavelengths to take part in the light reactions. The cellular structure of leaves do not absorb wavelengths longer than these wavelengths (>700nm in the infra-red range). By comparing the amount of visible light to the amount of near infra-red light that are reflected, one can gauge the relative health of leaves, forests or jungles . This is the rough description of the Normalized Differential Vegetation Index (NDVI) that scientists use in conjunction with satellite imagery to assess the health of vegetation.

The Role of Light in Carbohydrate Synthesis

  1. Pick a leaf from a geranium exposed to light and one kept in the dark for 48 hours.
    • keep the stem on the leaf grown in light
    • remove the stem from the leaf grown in the dark
  2. Hydrolyze the cell walls of the geranium leaves by boiling in a water bath for 5 minutes or until it looks like over-cooked vegetables)
  3. Bleach the leaves by  removing the pigments. Place the leaves in hot alcohol for 7 minutes or until they turn white.
    1. Save this green solution for Absorbance Spectrum exercise
  4. Remove the leaves and place it in a petri dish.
  5. Add iodine to the dish. If starch is present, the leaf will turn a deep bluish-black color.
  6. Photograph the leaf with your phone to document the effects of light on carbohydrate storage.

Measuring Absorbance

  1. Connect the Spectrovis to the LabQuest2
  2. Turn on the Labquest2 units
  3. Choose the Labquest app
  4. Select the icon that looks like X|Y
  5. Press the green Play button on the bottom left
  6. Press OK to calibrate
  7. Let the machine calibrate for 90 seconds
  8. Choose “Finish calibration”
  9. Insert the Geranium pigment from the bleaching reaction
    1. Do NOT use Acetone in these plastic cuvettes since it will frost over the plastic
  10. Press the Red Stop button
  11. Students should record the absorbance values at every 10 nm  from 380nm-700nm
  12. The professor will prepare Spirulina extract diluted in ethanol in a cuvette and obtain the continuous absorbance spectrum.
  13. Plot Relative Absorbance against wavelength using a line graph and compare the absorption spectrum of the extracts.
    1. Relative Absorbance sets the maximum value in each dataset as a denominator
    2. Every value is divided by this maximum value
Download PDF

Photosynthetic Pigments

Extract and separate the pigments

  1. Lay a strip of filter paper on the bench
  2. about 2 cm from the bottom of the strip, place a fresh spinach leaf and rub a coin across the leaf to transfer pigment to the strip
  3. The instructor will be provided with a spoonful of Spirulina powder that has been soaked in 10ml acetone overnight.
    • on a separate strip, the instructor will apply the Spirulina extract approximately 2cm from the bottom of the strip
  4. Suspend the strips by a dowel or paper clip in a tube with about 3ml chromatography solution (2 isooctane: 1 acetone: 1 diethyl ether).
  5. Develop the strips until the solvent reaches about 2 cm from the top

Chromatography Analysis

  1. How many different pigments separate from the spinach extract? From the spirulina?
  2. Are all pigments represented between the two extract?
  3. The mobile phase is non-polar, what are the properties of each pigment?
  4. Measure the Rf of each pigment.
Download PDF

Enzyme Kinetics (activity)

The Enzyme

Amylase reaction
Amylase is an enzyme that breaks down amylose (starch) into glucose molecules.

      1. What test can be used to indicate the presence of Starch?
      2. What test can be used to indicate the presence of glucose?
      3. What is the role of an enzyme in a chemical reaction and what is it made of?
      4. What parameters would influence the ability of the enzyme to facilitate the rate of the reaction?
Salivary amylase is produced in the mouth, where digestion begins.
Pancreatic amylase is produced in the pancreas and is supplied to the duodenum of the small intestines.
Overlay of salivary (green) and pancreatic (teal) amylase molecules.

Download PDF
Tags: , , ,

Quantitative Detection of Protein (SpectroVis Plus)

Experimental Background

Bovine Serum Albumin (BSA) is a protein that circulates in the blood of cows. Purified BSA can be used with Biuret solution in serial dilutions to generate a Standard Curve. The standard curve will illustrate the relationship between concentration (the dependent variable) and absorbance at 540 nm (the independent variable).  We can then use this curve to estimate the concentration of unkown samples.

  1. On a graph, do you remember which axis is the dependent and which is the independent variable?
  2. In the table below, can you identify which samples are the negative controls and which are the positive controls?
  3. What is the prediction of the absorbance or color intensity of the different tubes?

Dilute BSA Standards

  1. Label 9 tubes 1-9
  2. Combine the components of the table below to generate appropriate concentration of solutions

Dilution table

  1. Place tube 1 (1mg/ml) into a cuvette for measuring absorbance (A) in the SpectroVis Plus. This will find the peak absorbance value.
  2. The instructor will begin to set-up the units for distribution
  3. Enter the LabQuest 2 application and press on the green Start button to generate a full spectrum
    • tap on the file cabinet icon to store this data
  4. On the Meter Screen, tap on Mode
    1. Change the mode to “Events with Entry”
    2. Enter the Name: Concentration
    3. Enter Units: mg/ml
    4. Select OK
    5. If message appears about saving run, choose Discard
  5. Sequentially read each sample at the stored wavelength (between A540nm-A600nm) and record values in table below

Data recording

  1. Plot each BSA dilution in plot.ly as a scatterplot
  2. Generate best-fit line for these standards with the equation of the line
  3. Use the equation of the line to estimate the concentration of the unknown sample.

Curve Fitting

Run the simulation below to understand how you can use the standard dilution series to estimate your sample concentrations.

Curve Fitting
Click on image above to begin simulation on curve fitting

LabQuest2 and SpectroVis Tutorial

Scatterplot Tutorial

You can watch this tutorial at 1.25X and pause when needed.

Download PDF


Energy and catalysts

In Biological systems, energy is roughly defined as the capacity to do work. Molecules are held together by electrons. Breaking and building these bonds requires an input of energy. The energy needed to initiate such reactions is referred to as activation energy (EA). Sometimes the necessary energy to initiate a reaction is so great, that it greatly limits the likelihood of the reaction ever occurring. Catalysts are chemicals that take part in facilitating reactions by reducing the energy of activation. If the activation energy is reduced, the likelihood of a reaction occurring is greatly enhanced. In cells, the catalysts are often made of proteins and called enzymes.

reaction coordinates
Reaction coordinate of an exothermic reaction with and without an enzyme. The enzyme reduced the EA to facilitate the likelihood that the reaction occurs. This catabolic reaction breaks complex things down, thus increasing entropy and releasing energy into the system.


Reactants in enzymatic reactions are called substrates. They have an imperfect fit to a binding domain of the enzyme called the active site. Substrate binding to this active site induces a change in the shape of the protein that coordinates the substrate into a transition state that will reduce the amount of EA required for the reaction to go to completion. The induced fit of the protein also aids in coordinating other cofactors or coenzymes that will aid in the reaction.


Two substrates
Induced fit model of enzymes and substrates. The active site of the protein is an imperfect match for the substrate. Intermolecular interactions between the enzyme and substrate induce a new fit that facilitates the formation of a transition state and results in the catalysis of the reaction.

The reaction follows the standard flow where the Enzyme (E) and the Substrate (S) interact to form an Enzyme-Substrate Complex (ES). The ES then dissociates into Enzyme and the resultant Product (P)

E + S ‚áí ES ‚áí E + P

The induced fit of the enzyme-substrate complex coordinates the transition state to facilitate the reaction. This induced fit occurs through non-covalent means that result in a tugging on the molecules (an application of energy) while molecules are coaxed into the reactions.

Hexokinase induced fit
Hexokinase enzyme interacts with an ATP and a hexose. These interactions alter slightly the structure of the enzyme (induced fit). This pulling on the enzyme and the substrates aids in catalyzing the reaction through coordinating the molecules, sometimes with the aid of cofactors and coenzymes. The yellow sphere represents the cofactor Mg2+
Coenzymes can be covalently linked to amino acid side chains of the enzyme and are also referred to as prosthetic groups. While prosthetic groups are organic in nature, they may also involve the coordination of metal ions, like the heme group which binds to iron. These prosthetic groups enhance the repertoire of the amino acids to provide additional functioning to the entire protein. Early coenzymes were described as being vital to normal functioning and were characterized as organic molecules with amine groups. Because of this coincidence, they were referred to as vitamins (for vital amines) though not all vitamins have amine groups. The trace metal ions that work with these groups are also required and represent the minerals on food items.

Download PDF
Tags: , ,

Quantitative Detection of Protein (activity)

Experimental Background

Bovine Serum Albumin (BSA) is a protein that circulates in the blood of cows. Purified BSA can be used with Biuret solution in serial dilutions to generate a Standard Curve. The standard curve will illustrate the relationship between concentration (the dependent variable) and absorbance at 540 nm (the independent variable).  We can then use this curve to estimate the concentration of unknown samples.

  1. On a graph, do you remember which axis is the dependent and which is the independent variable?
  2. In the table below, can you identify which samples are the negative controls and which are the positive controls?
  3. What is the prediction of the absorbance or color intensity of the different tubes?

Download PDF