Genetics Notes
Chapter 3 -- Probability and Statistics

These notes are provided to help direct your study from the textbook. They are not designed to explain all aspects of the material in great detail; that is what class time and the textbook is for. If you were to study only these notes, you would not learn enough genetics to do well in the course.

Probability

Probability deals with events that are stochastic or random. Probability theory tells how to predict the outcomes of crosses, as well as many other processes.

P = q/n = number of favorable cases/number of possible cases

Two types of probabilities
  1. empirical probabilities are determined by observing a large number of trials for example, the probability of the next car you see being red for example, the probability of dying of cancer
  2. a priori probabilities are determined by the nature of the event for example, rolling a die and getting a six (P = 1/6 as a die has six faces, all with the same probability of occurring) for example, drawing an ace of spaces from a deck of cards for example, the probability that an offspring of a selfed F1 dihybrid will show the doubly recessive phenotype P = 1/16 = 1 square shows the doubly recessive phenotype divided by the 16 squares in the Punnet square
Here are some important rules for determining the probability of two or more events together
  1. Sum Rule (addition rule, page 55) - the probability of the occurrence of one of several mutually exclusive events is the sum of the probabilities of the individual events, for example probability of rolling a 2 or a 4 with a die
    P(2) = 1/6
    P(4) = 1/6
    P(2 or 4) = 1/6 + 1/6 = 1/3
  2. Product Rule (multiplication rule, page 54) - the probability of the occurrence of independent events is the product of their separate probabilities, for example probability of rolling a 2 and then a 4 with a die
    P(2 then 4) = 1/6 x 1/6 = 1/36
  3. Binomial expansion (pages 56-57) - the probability of the occurrence of some arrangement of two mutually exclusive trials, where the final order is not specified, is defined by the binomial theorem
    P = (n!/s!t!)(psqt)
    definitions...
    0! = no = 1 = 1!
    n = number of trials
    p = probability of an event occurring on any given trial
    q = probability of the event not occurring (q = 1 - p)
    s = number of times an event occurs
    t = number of times opposite event occurs

    probability of having 5 girls and 1 boy
    P =(6!/5!1!)(1/2)5(1/2)1
    P = 6(1/2)5(1/2)1
    P = 6/64

Use of rules
  • What is the probability of tossing a coin twice and getting one head and one tail?
    2 possibilities
    1) product ----> first head, then tail 1/2 x 1/2 = 1/4
    2) product ----> first tail, then head 1/2 x 1/2 = 1/4
    since order is not specified we then use the sum rule which yields 1/4 + 1/4 = 1/2

    What is the probability of having 5 girls and 1 boy in no particular order?
          G G G G G B         P = 1/2 x 1/2 x 1/2 x 1/2 x 1/2 x 1/2 = 1/64
          G G G G B G         P = 1/64
          G G G B G G         P = 1/64
          G G B G G G         P = 1/64
          G B G G G G         P = 1/64
          B G G G G G         P = 1/64
                              P(5 girls, 1 boy) = 6/64
    2 persons heterozygous for albinism
    Aa x Aa ----> 3:1 normal:albino
    1. what is the probability that any given child will be normal? P(normal) = 3/4
    2. if these 2 people have 4 children, what is the probability that they will all be normal? P(4 normal) = 3/4 x 3/4 x 3/4 x 3/4 = 0.3164 (product rule)
    3. what if the second child is albino? P = 3/4 x 1/4 x 3/4 x 3/4 = 0.1055 (product rule)
    4. what if no order is specified? P =(4!/3!1!)(3/4)3(1/4)1
         = 4(3/4)3(1/4)1
         = 0.4219

    Hypothesis testing

    This is determining whether to reject or not reject a proposed hypothesis based on the likelihood, or probability, that the hypothesis is correct. For example, Mendel got 787 tall plants and 277 short plants for an F2 phenotypic ratio. Is this really indicative of a 3:1 ratio?

    The first step is the construction of a sampling distribution based upon what we think the probabilities of various events are. Statisticians have agreed to call the area under this curve 1. Thus we are guaranteed that whatever ratio we get will be on the graph.

    If our observed ratio falls within the middle 95% of the sampling distribution, then we determine that our observed ratio fits our hypothesis of a 3:1 ratio. Thus, 95% of the time our observed ratio, when sampling from 3:1, will lie within the extremes. On the other hand, if we calculate a ratio from the observed data that should happen to fall in the tails, for example, all the plants are tall, then we reject the hypothesis that 3:1 is the true ratio for the phenotypes in the F2 (Figure 4.2), because it is not likely that our observed ratio came from a population whose phenotypic distribution is a 3:1 ratio. Because we only use the middle 95% of the distribution for not rejecting our null hypothesis, 5% of the time (one time in 20) we will have a value fall into one of the tails even if the null hypothesis is true. In this case we will incorrectly reject our null hypothesis, which is a mistake. We will make this make 5% of the time.

    Now if we wanted to test a 9:3:3:1 ratio we would have to calculate a new sampling distribution, which is tedious to say the least. Or we can use a standardized distribution such as the chi-square distribution, which is used to compare two distributions. In this case, an observed distribution from the data and an expected distribution that is calculated based upon our hypothesis about the nature of the cross. The chi-square is useful when your data fit into discrete classes such as short versus tall (pages 65-67).
    X2 = sigma (O - E)2/E (figure 3.15)
    (capital X is the Greek letter chi)

    For example, does 787 tall to 277 short indicate a 3:1 ratio?
                      tall                   short
    observed #         787                    277
    expected ratio     3/4                    1/4
    expected # (787 + 277)3/4 = 798     (787 + 277)1/4 = 266
    (O - E)            -11                    +11
    (O - E)2           121                    121
    (O - E)2/E        0.15                    0.45
    X2 = 0.15 + 0.45 = 0.60
    Suppose we assume that we are sampling from a population with a 1:1 ratio
                      tall                   short
    observed #         787                    277
    expected ratio     1/2                    1/2
    expected # (787 + 277)1/2 = 532     (787 + 277)1/2 = 532
    (O - E)            255                   -255
    (O - E)2         65,025                  65,025
    (O - E)2/E       122.23                  122.23
    X2 = 122.23 + 122.23 = 244.45
    Before determining the significance of the X2 value we must determine the degrees of freedom. The degrees of freedom (d.f.) tell us how many unique categories we have. In this case, we had 1064 plants, with 787 in the tall category, thus we have to have 277 in the short category. The short category is not unique in that it can be calculated from knowing the tall category.
    if n = # of classes, then d.f. = n - 1

    From the table, the critical value for d.f. = 1 at the 0.05 probability level is 3.841 (Table 3.4).
    if X2 < 3.841 we fail to reject
    if X2 > 3.841 we reject

    Here is a problem with a 9:3:3:1 ratio of a dihybrid cross.
    315 round yellow (RY)
    108 round green (RG)
    101 wrinkled yellow (WY)
    32 wrinkled green (WG) Does this represent a 9:3:3:1 ratio as expected in a dihybrid F2?
                        RY        RG        WY        WG
    observed #          315       108       101       32
    expected ratio      9/16      3/16      3/16      1/16
    expected #          313       104       104       35
    (O - E)             2         4         -3        -3
    (O - E)2/E         0.013     0.154     0.087     0.257
    X2 = 0.013 + 0.154 + 0.087 + 0.257 = 0.511
    X2 = 0.511, d.f. = 3 then the hypothesis of the 9:3:3:1 ratio can not be rejected

    If the expected number for any category is less than 5, then the conclusions are not reliable, thus in doing an F2 cross, you need to count at least 80 individuals to make the doubly recessive category expected value of 5.


    Last update on 7 September 2004
    Provide comments to Dwight Moore at mooredwi@emporia.edu
    Return to the General Genetics Home Page at Emporia State University.