An Introduction to Bayes Theorem (including videos!)

We needed to use Bayes’ Theorem to solve HW7 #2. In this post, I try to briefly address the following questions:

  • What is Bayes’ Theorem?
  • Where does it come from?
  • What is the “Bayesian interpretation” of probability?
  • Why is called “Bayes'”  Theorem?

Also, at the bottom of the post are a handful of videos regarding Bayes Theorem (including applications to medical testing!), and links to a couple books entirely about Bayesian probability.

What is Bayes’ Theorem?

Bayes’ Theorem (or Bayes’ Rule) is following formula for computing conditional probability (screenshot taken from

Bayes Theorem

Where does Bayes’ Theorem come from?

Where does formula above for P(A | B) come from?  We just have to do some algebra on the definition of conditional probability.

Start with the definition of conditional probability, applied to P(A | B) and P(B | A):

P(A | B) = P(A & B)/P(B)

P(B | A) = P(A & B)/P(A)

Now “clear the denominators” on the RHS of each equation by multiplying thru by P(B) and P(A), respectively:

P(A | B) * P(B) = P(A & B)

P(B | A) * P(A) = P(A & B)

Since the RHS is in both these equations, we know the two LHS of the equations are equal to each other!

P(A | B) = P(B | A) * P(A)/P(B)

Now just divide through by P(B) and we get Bayes’ Rule:

P(A | B) * P(B) = P(B | A) * P(A)

What is the “Bayesian interpretation” of probability?

Also from the wikipedia entry for Bayes’ Theorem

Bayesian Interpretation

See the wikipedia entry for Bayesian probability for more on this!

Why is it called Bayes’ Rule?

One more quote from the wikipedia entry for Bayes’ Theorem:

Bayes’ theorem is named after Reverend Thomas Bayes (/bz/; 1701?–1761), who first used conditional probability to provide an algorithm (his Proposition 9) that uses evidence to calculate limits on an unknown parameter, published as An Essay towards solving a Problem in the Doctrine of Chances (1763). In what he called a scholium, Bayes extended his algorithm to any unknown prior cause. Independently of Bayes, Pierre-Simon Laplace in 1774, and later in his 1812 Théorie analytique des probabilités, used conditional probability to formulate the relation of an updated posterior probability from a prior probability, given evidence. Sir Harold Jeffreys put Bayes’s algorithm and Laplace’s formulation on an axiomatic basis. Jeffreys wrote that Bayes’ theorem “is to the theory of probability what the Pythagorean theorem is to geometry.”

Videos on Bayes Theorem:

Here are two introductions to Bayes Theorem:


An important application of Bayes Theorem is accuracy of medical tests–this is very timely since there is a lot of discussion about the accuracy of coronavirus testing!  Here are two videos specifically on that topic:


Textbooks on Bayesian :

If you want to go even further, there are entire books devoted to Bayesian probability/statistics. Here are two introductory textbooks that look interesting (in fact, I hope to read them myself at some point!):

Notes for Mon April 20 / HW8 (Permutations & Combinations)

Here’s a brief recap of today’s Blackboard Collaborate session:

For the first approximately 40mins, we gave an overview of HW8, which consists of permutations and combinations calculations:

  • HW8 is a written homework assignment; you can find the pdf with the homework exercises under Files
    • HW8 is due next Monday (April 27)
    • I will create an Assignment in Blackboard where you can submit your solutions (preferably as a pdf, as you did for Quiz #3 over the weekend)
    • we went through HW8 #1 together–in particular I wanted to demonstrate how to show your work;
    • we will go through at least one more exercise from HW8 during Wednesday’s Blackboard session

We spent the remaining hour reviewing random variables and introducing probability distributions for such random variables.

Please review the Class Outline on those topics–in particular, it’s essential you understand the example involving the probability experiment of flipping a coin 3 times, and constructing the probability distribution for the random variable “X = the number of heads observed.”  We will build on that example when we discuss binomial experiments and binomial random variables.

You can review the Blackboard recording, and/or you can view this Khan Academy  video, which constructs the probability distribution for that same random variable:


Videos/Notes for Tues April 7: Permutations and Combinations

See below for some videos and notes recapping our Tuesday April 7 class session:

  • we spent most the session going through the class outline on “Permutations and Combinations“; please review the outline and try to write out solutions to the Example exercises (I will collect these exercises plus some additional exercises as a homework set; details TBA!)


  • Here are a few YouTube videos by a math teacher whose videos I like (Patrick JMT):
    • see the following video which discusses permutations:


    • we only introduced combinations at the end of the session; we will pick up with that topic next Monday, but in the meantime viewing this video may help:

    • this video is also relevant–please watch it:

  • Finally, I haven’t watched thru this entire video yet (it’s longer, 38mins), but it looks pretty good, and addresses one of the key questions–what is the difference between permutations and combinations?

Videos/Notes for Monday April 6: Conditional Probability

Wee below for some videos and notes recapping our Monday April 6 class session:

  • We started with an example of conditional probability that introduces the concept of “independent events”; see the “Introduction to Conditional Probability” video we discussed last time, starting at 9:30:

  • We then reviewed the class outline on Conditional Probability, in particular the 2nd page which introduces the Multiplication Rule for probabilities and the concept of independent vs dependent events
  • We went over some of the exercises in HW6 and HW7 that use the Multiplication Rule
  • You can go on to view and discuss the subsequent jbstatistics video, “Independent Events”:


Conditional Probability

This past Monday, we introduced conditional probability in our Blackboard Collaborate session.  Here is a brief recap, with some online resources included below that go over the definition and go through some examples.

The conditional probability of A given B is defined as:

P(A|B) = P(A & B)/P(B)

Note that by switching A and B, we can also look at the conditional probability of B given A:

P(B|A) = P(A & B)/P(A)

(Note that the numerator is the same in both cases, since P(A&B) = P(B&A).  The denominator is the probability of the “condition” i.e., the event after the vertical line “|”.)

The following 2 videos may be helpful:

If you have a copy of the textbook (Ross), you should read the examples listed on the class outline pdf.  I will try to update this post with some additional online examples shortly.