These notes are provided to help direct your study from the textbook. They are not designed to explain all aspects of the material in great detail; that is what class time and the book is for. If you were to study only these notes, you would not learn enough genetics to do well in the course.

Genetics Notes
Chapter 23 -- Population and Evolutionary Genetics

Theodosius Dobzhansky - "Nothing in biology makes sense except in the light of evolution."

Evolution cannot be understood except in the light of genetics. Evolutionary change can be defined as a change in gene frequencies within a population over time.

A population, or deme, is a community of individuals linked by bonds of mating and parenthood.

A Mendelian population is a group of interbreeding, sexually reproducing individuals.

A species is a group of actually or potentially interbreeding natural populations that are reproductively isolated from other such groups.

The evolutionary unit is the population or, in some cases, the species. It is the genome of the population that changes over time and not that of the individual. Much of the mathematical description of genetic changes in population was developed during the 1920's and 1930's by Fisher, Haldane, and Wright.

The first step in characterizing the genome of a population is the calculation of allelic frequencies and genotypic frequencies (pages 678-680).
The phenotypic distribution of the MN blood type among 200 people . . .
   Type M   (MM) = 88                   f(MM) = 88/200 = .44
   Type MN  (MN) = 88                   f(MN) = 88/200 = .44
   Type N   (NN) = 24                   f(NN) = 24/200 = .12
           # of M alleles 
   f(M) = -----------------     
          total # of alleles

(It is easier to think in terms of alleles rather than genotypes.)

            2(88) +88       264  
p = f(M) = ----------  =   -----  = .66
             2(200)         400
            2(24) + 88       136 
q = f(N) = -----------   =   ---  = .34
              2(200)         400

As there are only two alleles in this example, q = 1 - p, because
p + q = 1.  The sum of the frequencies of all the alleles must add to equal one.
An alternative way of calculating gene frequencies is . . . (MM) + 0.5*(MN) 88 + 44 p = f(M) = -------------- = -------- = 0.66 individuals 200 (NN) +0.5*(MN) 24 + 44 q = f(N) = -------------- = ------- = 0.34 individuals 200
In 1908, Hardy and Weinberg independently discovered that an equilibrium in allelic and genotypic frequencies will arise in a diploid population if certain conditions are met. This is called the Hardy-Weinberg equilibrium and it has three important facets.
  1. the allelic frequencies at an autosomal locus will not change from one generation to the next
  2. the genotypic frequencies of a population are determined on the basis of the allelic frequencies
  3. if the equilibrium in disturbed, it will be reestablished after just one generation of random mating
Assumptions of Hardy-Weinberg equilibrium
(These assumptions must hold if the Hardy-Weinberg equilibrium is to occur.)
  1. Random mating This means that any individual has an equal chance of mating with any other individual. Because of this, you can predict the probability of any two genotypes mating. e.g. if the MM genotype makes up 44% of the population, then the probability of two individuals with the MM genotype mating is .44 * .44 = .1936 Deviations occur due to
    1. assortative mating - likes mate with likes
    2. disassortative mating - unlikes mate
    3. inbreeding - individuals are more likely to mate with related individuals than other members of the population
    4. outbreeding - individuals systematically exclude relatives as potential mates Deviations from random mating will change the genotypic frequency (increase homozygotes) but will not alter allelic frequencies.
  2. No selection No genotype has a better chance of survival and reproduction than any other genotype.
  3. Large population size Each new generation is a sample of the previous generation's gametes. A small sample size (small population) is more likely to show random fluctuation in sampling and hence greater deviation from generation to generation. When populations are small and allelic frequencies are affected more by chance, the changes are referred to as drift.
  4. No mutation or migration Mutation and migration result in gain or loss of alleles from a population. This gain or loss will disturb the equilibrium.
It would seem that almost no population would ever be in equilibrium given these assumptions. However, in practice . . .
  1. mutation rates are small to negligible
  2. population size will still give a good fit to Hardy-Weinberg even if the population is fairly small, say 100 individuals
  3. also, since the equilibrium is established after one generation, the population will come back to equilibrium within one generation after being disturbed

Testing for Hardy-Weinberg
This is the first test that a population biologist will do after determining the allelic and genotypic frequencies within a population. To determine if a population is in Hardy-Weinberg equilibrium, you want to know if the genotype (MM, MN, NN) occurs with frequencies of p2, 2pq, and q2.

p = f(M) = .66       q = f(N) = .34       p + q = 1

p2 = the expected probability of an individual having the genotype MM
q2 = the expected probability of an individual having the genotype NN
2pq = the expected probability of an individual having the genotype MN

All of this follows from (p + q)2 = p2 + 2pq + q2
                 MM          MN          NN          Total
observed         88          88          24           200
expected ratio    p2         2pq          q2           1
expected #  (.436)(200)  (.449)(200)  (.116)(200)     200
                87.1        89.8         23.1
(O - E)2/E      .009         .036         .035 

the chi-square value is the sum across all classes.
   .009 + .036 + .035 = 0.09
To compare to the tables value, we need the degrees of freedom. The d.f. = number of phenotypes - number of alleles = 3 - 2 = 1. In this case the calculated value is below the critical value of 3.814, thus we fail to reject the null hypothesis that the population is in Hardy-Weinberg equilibrium.

The reverse of Hardy-Weinberg can be used to estimate allelic frequencies in the case of dominant alleles in which the dominant homozygote cannot be distinguished from the heterozygote. This works even for something as severely debilitating as PKU. Because PKU babies are so rare (1/10,000), selection has a negligible effect on genotypic frequencies.

       PP     Pp     pp = PKU phenotype
f(pp) = q2 = 0.0001 (1/10,000)
        q  = 0.01  so p = 1 - q = .99
homozygous normal = p2 = (.99)(.99) = .98
heterozygous = 2pq = 2(.99)(.01) = .02
homozygous PKU = q2 = .0001

Non-random mating
Two components to non-random mating
  1. assortative or disassortative mating (sometimes called positive assortative and negative assortative mating, respectively)- mates are chosen on the basis of some phenotypic characteristic such that certain matings occur more commonly than would be predicted by chance
  2. inbreeding or outbreeding - mates are chosen based upon the degree of relatedness. Mates are more closely related or less closely related than would be predicted by chance. For example, marriages between first cousins are examples of inbreeding.

    The effects are similar in that inbreeding and assortative mating increase homozygosity, while outbreeding and disassortative mating decrease homozygosity. However, assortative and disassortative mating will only affect the loci involved in expression of the phenotypic trait and loci that are linked to them. Inbreeding and outbreeding affect all loci equally, across the entire genome.

This comes about in two ways.
  1. the systematic choice of relatives as mates
  2. subdivision of the population where individuals have a narrower choice of mates and thus are forced to mate with relatives

An inbred individual is one whose parents are related, meaning that there is a common ancestry in the family tree. The most obvious effect of inbreeding is the expression of hidden recessives. Each human carries about four (2 - 7) lethal recessive equivalents from estimates of a variety of studies. Four lethal-equivalents means that four alleles, when homozygous, cause 100% lethality or eight alleles cause 50% lethality.

Only rarely does an outbred individual receive the same recessive lethal from each parent.

Inbreeding often results in spontaneous abortions, fetal death, and congenital deformities (inbreeding depression).

For unrelated parents, 4-6% of the offspring will carry some sort of genetic defect. In marriages between first cousins, 16-28% of the offspring will carry some sort of genetic defect (table 23-3).

There are two types of homozygosity
  1. allozygosity - two alleles are alike but unrelated. They represent two separate mutational events.
  2. autozygosity - two alleles have identity by descent, meaning they are identical copies of the same ancestral allele

The inbreeding coefficient (F) can be defined as the probability of autozygosity, or the probability that any two alleles at a locus are identical by descent. F can range from 0 to 1. F = 1 would represent the doubling of a gamete and would be autozygous at all loci. Listed below are the calculations of expected genotypic frequencies given (F), the level of inbreeding for the population
             AA               Aa               aa
F = 0        p2               2pq              q2
         p2(1 - F) + pF    2pq(1 - F)     q2(1 - F) + qF
F = 1    p2(0) + pF        2pq(0)          q2(0) + qF
You can see from the formula, as F increases, the proportion of heterozygotes goes down. These formulas are alegebraically the same as the formulas on page 684, equation 23.10). However, the allelic frequencies do not change.

Pedigree analysis
This is used to determine (F), the inbreeding coefficient for an individual and it implies the same effect for the individual as it does for the population.

To construct a path diagram, you must eliminate all individuals that cannot contribute to inbreeding. You must draw all the paths through which an allele can be passed to an individual. Path diagram rules
1) all possible paths must be counted
2) in any path, an individual can be counted only once
3) every path must have one and only one ancestor

The inbreeding coefficient for a population can be estimated from the observed and expected genotype frequencies. As the inbreeding coefficient increases the heterozygosity (proportion of heterozygotes) decreases. We can use this to estimate the inbreeding coefficient for a population. The formula is
           Ho - Hf
     F = ------------
where Ho is the expected frequency of heterozygotes, which is given by 2*p*q and Hf is the observed frequency of heterozygotes. Note that as the observed frequency of heterozygotes (Hf) decreases, the ratio approaches 1.0.

Picture a mutation as A ----> a The forward mutation rate is u, while the backward mutation rate is v (figure 23.8).
             A  -----> a
pn is the frequency of A in generation n
qn is the frequency of a in generation n
qn+1 is the frequency of a in generation n+1 (the next generation)
qn+1 = qn + u*pn - v*qn where u*pn = forward mutation rate times the frequency of A, and v*qn = backward mutation rate times the frequency of a
delta q = qn+1 - qn
= u*pn - v*qn

Forward mutation rates are on the order of 1 x 10-5 and backward mutation rates are usually 10 to 100 times slower. delta q = (1 x 10-5)(pn) - (1 x 10-7)(qn) The change in allelic frequency from one generation to the next is very small.

Setting delta q equal to zero and solving for qn yields qhat, which is the value of q when the population reaches equilibrium.
u*pn = v*qn             pn = 1 - qn
u - u*qn = v*qn
u = u*qn + v*qn
qn = u/u+v
Given no perturbation into the system, p and q will eventually reach an equilibrium that is determined by the two mutational rates.
qhat = u/(u+v)      phat = v/(u+v)
As u gets larger the equilibrium shifts to higher frequencies of a. If u = v, then forward and backward mutation rates are the same, and qhat = phat = 0.5. However, to significantly change allelic frequencies due to mutational pressure, or for a population to reach equilibrium, requires thousands of generations.

Assume two populations, both having alleles A and a at the A locus
p1 = f(A) in population 1      q1 = f(a) in population 1 p2 = f(A) in population 2      q2 = f(a) in population 1

Assume that members of population 2 migrate to population 1, such that now migrants make up m proportion of the next generation population, and natives make up 1-m frequency of the next generation (figure 23.10). The frequency of a, qc, will be a weighted average qc = m*q2 + (1-m)q1 with m*q2 being migrants and (1-m)q1 being natives.
qc = q1 + m(q2 - q1)
delta q = qc - q1
delta q = m(q2 - q1)
The equilibrium value of q will be reached whenever
a) migration stops
b) or the allelic frequency of both populations becomes the same.

These equations can be used to estimate gene flow from one population to the next, like from White U.S. populations to Black U.S. populations.
For example, at the Duffy blood group, Europeans (the source of the U.S. White population) have either allele Fya,or Fyb. In West Africa (the source of U.S. Black populations) the Fyo allele is essentially one hundred percent. By measuring the frequency of Fya or Fyb in U.S. Black populations, we can estimate the rate of gene flow from the White population to the Black population.

Allelic frequency of Fya in various populations
Source of Black population (native population)
     Liberia                  .005
     Ghana                    .01  q1
U.S. Black population  (conglomerate population)
     Charleston, S.C.         .02
     rural Georgia            .04
     Detroit                  .13  qc
Source of White population  (migrant population)
     Western Europe           .42  q2

solve the equation for m (proportion of migrants), which will give the
total extent of migration from the white population to the black population
          qc - q1
     m =  ---------   
          q2 - q1

            0.13 - 0.01    0.12
     m  =   -----------  = ---- = .29
            0.42 - 0.01    0.41
These data indicate that about 29% of the alleles in the U.S. Black population are the result of marriages and subsequent gene flow with the White population.

Small population size
The zygotes of every generation are a sample of gametes from the parent generation. Errors in sampling gametes from a small parental population cause the allelic frequencies to fluctuate from generation to generation. This process is called random genetic drift.

As the allele frequency fluctuates from generation to generation, the possibility exists that it might fluctuate to either 0 or 1.0, in which case, all individuals are homozygous and no further changes in allelic frequency can occur unless mutation or migration reintroduces the lost allele.

By using computer simulations based upon sampling theory and random processes, we can track allelic frequency in 100 populations simultaneously for different population sizes. (figure 23.11)
  1. start with 100 populations, with 100 individuals per population, q = 0.5
  2. after a number of generations, record the q for each population
  3. and plot the number of populations at each q
Between N and 2N generations, the curve flattens out and populations are lost to fixiation at a constant rate of about 1/2N per generation.

Founder effects
When a population is initiated by a small and therefore, genetically unrepresentative sample of the main population, the genetic drift that observed is called a founder effect. Examples are seen in many groups of organisms, even in humans. Remember the movie "Mutiny on the Bounty"? - the population on Pitcairn Island was formed from a small number of mutineers and Polynesians. This population today has a unique mixture of Caucasian and Polynesian features, some of which are rare in either parent population.

A bottleneck occurs when a population declines to a small number of individuals and then builds back up. This usually causes a loss of genetic diversity as seen for example in American bison.

Natural Selection
Up to now, we have been discussing several factors that affect allelic frequencies, but these factors (migration, mutation, inbreeding, etc.) do not produce individuals that are better adapted to their environment. Natural selection is a relentless process that eliminates the less suitable organisms in an environment. Natural selection, or just called selection, is a process whereby one genotype leaves more offspring than another genotype. Selection is determined by reproductive success, which has two components - fertility and survival. The genotype that leaves the most offspring is given the highest value for reproductive success. This value is called the fitness. The letter w is usually used to signify fitness and can vary from 0 to 1. Fitness is always relative to the other genotypes in the population and can vary from time to time. A variety of factors can decrease the fitness value w to below 1. The sum of the forces provides a selection coefficient, which is usually denoted by the letter S.

w = 1 - S

Components of fitness
Selection can act at any stage of an organism's life cycle.
  1. zygotic selection is the survival component This can be either prenatal, juvenile, or adult.
  2. gametic selection is the differential success of an individual's gametes In male mice, an individual that is heterozygous at the t-locus, Tt, will have 95% of their gametes containing the t allele.
  3. sexual selection means that some genotypes may mate more often than others e.g. large male deer mate more often than small male deer
  4. fecundity selection means that some genotypes may be more fertile than other genotypes

Types of selection
Directional selection works by continuously removing individuals from one end of the phenotypic distribution. e.g. during the Eocene, the oldest member of the horse family appeared in the fossil record - Hyracotherium, about one foot high at the shoulder. Today's horses are much taller, and represent a continuous directional selection for taller horses.

Stabilizing selection works by constantly removing individuals from both ends of a phenotypic distribution, so that the mean is not shifted. This is the more common situation, and occurs as a population becomes optimally adapted to an unchanging environment. For example, directional selection favored an increase in the length of the giraffe's neck. However, the length today appears to be unchanging, thus stabilizing selection is acting to maintain the length of the neck.

Disruptive selection works by removing individuals from the center of the phenotypic distribution while favoring individuals on either end. Disruptive selection is seen in the appearance of different discrete forms or morphs in the same species. An example is the polymorphic butterfly Papilio dardanus. This butterfly mimics several distasteful species of butterflies by its color pattern. The dominance relationships among the genotypes are such that an individual will mimic any number of models, but intermediates do not occur. Intermediates would not resemble any of the models and would be rapidly eaten.

Selection against the homozygous recessive
define initial condition
allow selection to act
calculate allelic frequency after selection qn+1
calculate delta q
et delta q equal to 0 and find q hat
                          AA         Aa             aa
initial frequencies       p2         2pq            q2
fitness (W)               1          1              1-S
ratio after selection    p2(W)      2pq(W)          q2(W)
                         p2(1)      2pq(1)         q2(1-S)

    the sum of the ratio after selection is p2 + 2pq + q2(1-S)
    p2 + 2pq + q2 - Sq2 = 1 - Sq2 = mean W
    mean W = the mean fitness of the population
 It represents the sum of the fitness of the genotypes multiplied by their
Finally, we can calculate the genotypic frequencies after selection.            
                       (original frequency)(fitness)    p2(W)  p2(1)
 genotypic frequency = ----------------------------- = -------- =
                        mean fitness of population      mean W    1 -

                              AA            Aa              aa
genotypic frequencies      p2/mean W    2pq/mean W           q2(1-S)/mean W      
 after selection

qn+1 = freq(aa) + 1/2 freq(Aa)
     = q2(1-S)/mean W + 1/2(2pq/mean W)
     = q2(1-S)/(1-Sq2) + pq/(1-Sq2)
     = (q2 - Sq2 + pq)/(1 - Sq2)
     = q(q - Sq + p)/(1 - Sq2)                    p = 1 - q
     = q(q - Sq + 1 - q)/(1 - Sq2)
     = q(1 - Sq)/(1 - Sq2)
We can simplify things a little by assuming that S is approximately equal
to 1, and that the aa genotype is lethal.
qn+1 = q(1 - q)/(1 - q2)
     = q/(1 + q)

delta q = qn+1 - q
        = q/(1 + q) - q
        = -q2/(1 + q)
        = 0 
There is no change, or an equilibrium exists when q = 0. Thus selection will act to drive a to 0, or f(A) to fixation. q is changing in proportion to q2 or the relative frequency of the recessive homozygote. However, as q becomes small, most individuals having a will be heterozygotes and not selected against. Thus, selection against a recessive homozygote may take many, many generations to completely remove recessive lethal mutation from the population.
for example,   phenyl ketonuria      q2 = .0001 but 2pq = .02

Types of selection models
There are only 4 basic selection models based upon how fitness values are assigned. (table 23.7)
                                        AA      Aa      aa
    1) against homozygous recessive     1       1       1-S
    2) against heterozygotes            1       1-S     1
    3) against one allele               1       1-S1    1-S2
    4) against homozygotes              1-S1    1        1-S2
In models 1 and 3, selection is against the a allele. In model 1, we saw that selection may take an infinite number of generations to remove a because a can 'hide' in the heterozygote. In model 3, however, the heterozygote is also selected against and a will be removed much faster because the a allele can no longer hide, for example thalassemia.

Under model 3, a completely dominant lethal would be removed in one generation. Severely deleterious dominants are also removed very quickly, for example retinoblastoma.

Model 2, where selection is against the heterozygote, is interesting because it drives the rarer allele to extinction. An equilibrium does exist, but only if q = p = 0.5. If the equilibrium is disturbed, it moves toward extinction of one allele or the other. This is called an unstable equilibrium. An example is the condition erythroblastosis, or maternal-fetal incompatibility at the Rh locus. Rh+Rh- babies born to Rh-Rh- mothers can develop a condition where the mother's antibodies attack the child's blood.

Model 4, where selection is against homozygotes, demonstrates the heterozygote advantage. This is an important model because it is often invoked to explain the maintenance of allelic polymorphism in a population. We will see that at equilibrium, both alleles are maintained in the population. An example is sickle cell anemia. In West Africa, where malaria is common, an individual is better off as a heterozygote. You can derive these equations exactly as we did in the last series for selection against the homozygous recessive.
delta q = pq(S1p - S2q)/mean W = 0
        = 0 when p = 0, q = 0, or S1p - S2q = 0
p = 0 and q = 0 are trivial and the case of interest is when
S1p - S2q = 0
S1p - S2q = 0
S1p = S2q               p = 1 - q
S1(1 - q)= S2q
S1 - S1q = S2q
S1 = (S2 + S1)q
qhat = S1/(S1 + S2)       phat = S2/(S1 + S2)
This equilibrium is stable. Any perturbation to equilibrium is returned quickly and regardless of the starting condition, the allelic frequencies always converge to qhat and phat.

The application of population genetics to natural population in attempt to study evolution has been called neo-Darwinism or the new synthesis. This dates back to the 1920's and 1930's. As you can see from the models we have been working with, a natural population would be exceedingly complex, especially when you consider the effects of having a number of processes working simultaneously, having more than two alleles at a locus, hundreds to thousands of loci affecting the fitness at the individual, and the unpredictable environmental variations.

The process of evolution as outlined by Darwin has three major steps.
  1. Variation is characteristic of virtually every group of animals and plants. This arises from mutations. This variation is important because it is the raw material that selection is going to work on.
  2. Every group of organisms overproduces offspring. In stable populations, every adult replaces itself, but most adults produce more than one offspring. Thus, most offspring die before they reproduce. There is an overabundance of offspring.
  3. The most fit will survive. Among all the organisms competing for a limited number of resources, only the organisms best suited to obtain and utilize these resources will survive. To whatever degree the characteristics of the most fit are inherited, the "favored" traits will be passed onto the next generation.

Last update on 13 November 2005
Provide comments to Dwight Moore at
Return to the General Genetics Home Page at Emporia State University.