|
|
|
Note to the Reader
This is a revised edition of a review published in Mouse Brain Development
(Springer, ISBN 3540666648, $169 from Amazon.com). Text additions and
modification are in brackets. [...]. Williams RW (2000) Mapping genes that
modulate brain development: a quantitative genetic approach. In: Mouse brain
development (Goffinet AF, Rakic P, eds). Springer Verlag, New York, pp
21–49.
Print Friendly
Mapping Genes that Modulate Mouse Brain Development: A Quantitative
Genetic Approach
Robert W. Williams
Center for Neuroscience and Department of Anatomy and Neurobiology,
University of Tennessee, 855 Monroe Avenue, Memphis, Tennessee 38163 USA
Email questions and comments to
rwilliam@nb.utmem.edu
Contents
1. Why brain weight and neuron number matter
1.1 Metabolic constraints
1.2 Functional correlates
1.3 Insights into CNS development
2. The biometric analysis of the size and
structure of the mouse CNS
2.1 A new opportunity
2.2 Brain weight is highly variable
2.3 Sex and age effects on brain weight
2.4 Large differences between substrains
2.5 Consistency and inconsistency across
studies
3. Mapping brain weight QTLs
3.1 QTLs versus Mendelian loci
3.2 Assessing trait variation
3.3 Estimating heritability
3.4 Phenotyping and genotyping members of
an experimental cross
3.4.1 Phenotyping and regression
analysis
3.4.2 Genotyping
3.5 The statistics of mapping QTLs
3.5.1 Permutation analysis
3.6 Cloning QTLs
3.7 The probability of success
4. Neuron and glial cell numbers in adult mice
4.1 The mouse brain library
4.2 Numbers of neurons and glial cells in the
brain of a mouse
5. Mapping QTLs that modulate neuron number
5.1 Mapping cell-specific QTLs
5.2 The Nnc1 locus
5.3 Mechanisms of QTL action
5.4 Candidate gene analysis
6. Conclusion
In my opinion there are only quantitative differences, not
qualitative differences, between the brain of a man and that of a mouse.
Ramón y Cajal (1890)
The difference in behavioral capacity between man and chimpanzee may
be no more than the addition of one cell generation in the segmentation of
the neuroblasts which form the cerebral network. Lashley (1949)
Introduction
The complexity of CNS development is staggering. In mice a total of
approximately 75 million neurons and 25 million glial cells are generated,
moved, connected, and integrated into hundreds of different circuits over
a period of one month. The process is coordinated by the expression of a
large fraction of the genome—as many as 40,000 genes may be involved
(Sutcliffe 1988; Adams et al. 1993). These same genes coordinate the
development of the human brain, but a thousand times more neurons are
generated (Williams and Herrup 1988) and their integration and training
take more than a decade. While 5,000 of these genes have common roles in
cellular metabolism, this still leaves a huge complement that have
selective, transient, and partially redundant roles in the development of
different parts of the brain (Usui et al. 1994; Gautvik et al. 1996).
Reductionist approaches that focus on isolated processes and molecules may
seem hopelessly inadequate, but they are an absolute necessity at this
early stage of analysis and understanding.
This chapter introduces a comparatively new reductionist approach
called complex trait analysis that my research group is using to explore
the genetic basis of CNS development. Complex trait analysis is a field
that developed rapidly in the 1990s as a result of the hybridization of
quantitative and molecular genetics. The suite of techniques associated
with complex trait analysis greatly extends the variety of CNS phenotypes
that can be subjected to systematic molecular analysis. It is in essence a
forward genetic approach that proceeds from phenotypic variation to single
genes. This approach has been embraced by behavioral geneticists and
neuropharmacologists (Plomin et al. 1991; Johnson et al. 1992; Takahashi
et al. 1994; Crabbe et al. 1994; Kanes et al. 1996), and these techniques
can now be applied with equal vigor to explore genetic sources of
variation in brain structure and development. This chapter begins with a
genetic analysis of sources of variation in brain weight and illustrates
how we have mapped quantitative trait loci (QTLs) that control brain
weight and neuron number in mice.
Why brain weight and neuron number matter
Metabolic constraints. There are several
reasons why differences in brain size and neuron number are interesting
and biologically significant. First, relative to its size, the brain with
its large population of neurons consumes a disproportionate amount of
energy (Clark 1994). The high cost of making, training, and maintaining
this metabolically demanding organ has wide-ranging effects on an animal's
development and behavior (Sacher and Staffeldt 1974; Eisenberg and Wilson
1978; Martin, 1981; Armstrong 1983; Hofman, 1983; Pagel and Harvey 1990;
Allman et al. 1993). Humans are an extreme example, with a brain that is
10 times heavier than expected on the basis of body weight. We afford this
luxury by developing slowly and by having an efficient diet (Aiello and
Wheeler 1995). Given the fact that we are such a large-brained species, it
may be a surprise to learn that mice have brains that are proportionally
just as large as those of humans. A 22-gm adult mouse typically has a
450-mg brain, whereas a 66-kg human typically has a 1350-gm brain, 2% in
both cases.
Functional correlates. A second and almost
self-evident reason to be interested in brain weight and neuron number is
that variation in these simple parameters is associated with variation in
behavior (Lashley 1949; Rensch 1956; Wimer and Prater 1966; Fuller and
Herman 1972; Roderick et al. 1979; Fuller, 1979; Crusio et al. 1989;
Jacobs et al. 1990; Belknap et al. 1992; Aboitiz 1996; Keverne et al.
1996). This is most clear-cut when specific regions of the brains of
different species or individuals are compared. For example, in song birds
the volume of song system nuclei and numbers of neurons tend to be
positively correlated with different features of song production (e.g.,
DeVoogd et al. 1993; Ward et al. 1998). Another fine example—although
strongly negative in this case—is the correlation in mice between
avoidance learning and the size of the infrapyramidal projection from
dentate gyrus to CA3 (Schwegler and Lipp 1983; Lipp et al. 1989).
Insights into CNS development. My colleagues
and I are interested in brain weight and neuron number for a third reason:
as a means to map, clone, and characterize genes that control the
proliferation, differentiation, and death of cells in the CNS (Williams
and Herrup
1988; Williams et al.
1998a). These genes are entry points into molecular networks that
control brain development. Differences in brain weight are proportional to
total brain DNA content and consequently to total CNS cell numbers (Zamenhof
and von Marthens 1976). This is true even in neonatal mice, before
appreciable glial cell production (Zamenhof et al. 1971; Zamenhof and von
Marthens 1976). For this reason, brain weight is a surprisingly good
surrogate measure for total cell number in mice, as in humans (Pakkenberg
and Gundersen 1996).
The initial tactical and technical problem is how to go about
identifying genes that modulate cell proliferation and death either in
specific nuclei or in the brain as a whole. Mutants may be useful in some
instances, but we need more generic methods that can target any and all
CNS regions and cell types. And instead of depending on rare mutations and
knockouts, we need methods that provide information about common gene
variants—the normal gene polymorphisms that are responsible for the far
more pervasive and important natural variation found within typical
populations of animals.
Natural variation can be impressive. Numbers of neurons in the human
neocortex vary from 15 to 32 billion (Pakkenberg and Gundersen
1997). The volume of human primary visual cortex varies threefold (Stensaas
et al. 1974; Gilissen and Zilles 1996). Numbers of ocular dominance
columns within the primary visual cortex of rhesus monkeys vary more than
50% (Horton and Hocking 1996). These robust differences are not caused by
mutations but are caused by the cumulative action of many normally
variable genes and by the action of numerous developmental and
environmental factors. In the long run, normal genetic polymorphisms are
the most critical source of variance: they are the substrate for
evolutionary and developmental modification of brain size and cellular
architecture (Williams and Herrup 1988; Lipp 1989; Williams et al. 1993).
The biometric analysis of the size and
structure of the mouse CNS
Precedents. In the late 1960s, Thomas Roderick, John Fuller,
Douglas Wahlsten, and Richard and Cynthia Wimer began an ambitious program
to manipulate neuroanatomical traits in mice by selective breeding
(Roderick 1976). Their aim was to explore correlated changes in behavior.
They gave the rapidly expanding field of behavioral neurogenetics a
rigorous foundation in quantitative and statistical neuroanatomy (Wimer et
al. 1969; Fuller and Geils 1972; Wahlsten 1975; Roderick et al. 1976;
Fuller 1979; Wimer 1979; Wimer and Wimer 1985). Rather than relying on
mutants, they exploited the substantial variation among standard inbred
strains of mice. This work led to some important breakthroughs and some
brick walls. One of the breakthroughs was successfully selecting for
substantial differences in brain weight over less than 20 generations
(Fuller 1979). An obvious limitation, highlighted by Roderick (1976), was
that it was not possible to map gene loci responsible for the remarkable
quantitative variation in CNS size, regional architecture, or behavior.
A new opportunity. The situation has
changed radically in the past decade (Lander and Botstein 1989; Plomin et
al. 1991; Johnson et al. 1992; Belknap et al. 1992; Tanksley 1993; Frankel
1995; Crawley et al., 1997). Computational methods and molecular
reagents—particularly the polymerase chain reaction method—have become so
powerful and economical that it is now practical to systematically dissect
complex polygenic traits such as brain weight into sets of single
well-defined QTLs. Virtually any heritable trait in mice, whether
structural, physiological, pharmacological, or behavioral, can be targeted
for analysis. Recent examples in mice include epilepsy (Rise et al.,
1991), effects of ethanol and haloperidol (Plomin et al. 1993; Belknap et
al. 1993; Hitzemann et al. 1994; Kanes et al. 1996; Buck et al. 1997);
patterns of sleep and activity (Toth and Williams, 1998), and the mouse
equivalent of anxiety (Flint et al. 1995). As illustrated in the work of
Belknap and colleagues (1992), it is now feasible to continue the
systematic genetic dissection of the mouse CNS begun in the late 1960s and
to start identifying genes that underlie heritable variation in CNS size
and structure.
Variation in brain weight is a classic polygenic trait; one that is
influenced during development by the activity of hundreds, if not
thousands of genes. Brain weight is also affected by maternal factors and
myriad environmental factors (e.g., Collins 1970; Eleftheriou et
al. 1975; Wahlsten 1983; Katz and Davies 1983). Finally, many factors that
target body size have important pleiotropic or correlated effects on brain
size, making the selectivity of action a critical problem (Lande 1979).
From the point of view of genetic complexity, it is hard to imagine a
morphometric trait that would be more difficult to resolve into individual
QTLs.
We began this biometric analysis by weighing brains of numerous
different types of mice. Table 1 is taken from a database that has been
assembled over a five-year period with contributions from Drs. Dan
Goldowitz, Richelle Strom, and Guomin Zhou. For the great majority of
animals, we have information on sex, body weight, age, and type and
quality of fixation. For animals born at the University of Tennessee, we
also generally know the size of the litter and the mother's parity. Most
cases that we have studied were fixed by perfusion with mixed aldehydes
(Williams et al. 1996a). This leads to a reduction in brain weight of
3–4%, for which these data have been corrected. Weights include the
olfactory bulbs, the paraflocculi, and the entire brainstem, but exclude
the dura, the pineal, and the pituitary.
Brain weight is highly variable. Brain weight
is highly variable among strains reared in a common environment. For
example, both A/J and DBA/2J have average brain weights close to 410 mg,
whereas C57BL/6J and BALB/cJ have weights close to 510 mg. The variation
within each strain is considerable even after compensating for differences
in age, body weight, and sex by multiple regression (Williams et al.
1997). Two animals of the same sex and body weight taken from the same
litter often have brain weights that differ by 10—20 mg. The coefficient
of variation within isogenic groups shown in Table 1 averages about 5.5%,
but when technical errors associated with fixation and dissection are
taken into account, true non-genetic variation is close to 4%. In
comparison, the retinal ganglion population of isogenic mice has a
coefficient of variation that averages 3.6% (Williams et al.
1996a).
We have explored the possibility that some of these differences in brain
weight are due to variation in water content and the volume of the
ventricles, and the short answer is that neither factor is important in
mice older than 30 days. Wet and dry brain weights are very tightly
correlated.
Sex and age effects on brain weight. Both sexes
and a wide range of ages were studied. Surprisingly, in mice sex has no
detectable effect on adult brain weight (Williams et al. 1997) and this
otherwise important trait can be neglected for most purposes. In some
strains, there is a significant age-related increase in brain weight even
after sexual maturity is reached. There is also a significant correlation
between body weight and brain weight. The correlation across strains
listed in Table 1 is merely 0.2, but in some crosses, such as that between
CAST/Ei and BALB/cJ, the correlation can rise to 0.8. Information on over
5,000 mice and over 200 genotypes is available online at
<http://www.nervenet.org>.
[There are statistically significant mean sex differences in the size
of several CNS regions, including the hippocampus (Lu et al., 2000), and
the olfactory bulbs (Williams et al., 2000). These differences are
relatively modest and certainly should not be thought of as sexual
dimorphisms. The overlap in size between the sexes is very sustantial,,
and sex only accounts a few percentage points of the total variance in
either hipocampus or olfactory bulb size. (RW, June 2000)]
Large differences between substrains. Perhaps
the most remarkable aspect of the data summarized in Table 1 is the large
differences in brain weight between several substrains of mice. For
instance, brain weights of BALB/cByJ and BALB/cJ differ by 76 mg; C57L/J
and C57BL/6J differ by 88 mg; C3H/HeJ and C3H/HeSnJ also differ by 88 mg.
The closely matched and highly significant differences in these three
pairs are intriguing. These differences were presumably generated by the
recent fixation of variant alleles in a very small number of
genes—probably one or two.
Table 1. Brain weights of 28 common
inbred strains of laboratory mice with a comparison to two previous
studies.
Additional data on brain and body weights are availble for over 230
genotypes of mice. /P>
Inbred Strains |
Brain
SA a |
SD
|
CV%
|
Litters
|
RWWS
1973 b |
FW
1966 c |
|
129/J |
423 |
15 |
3.1 |
4 |
454 |
444 |
|
129/SvJ |
430 |
17 |
3.9 |
4 |
|
|
|
A/J |
408 |
21 |
5.0 |
11 |
455 |
437 |
|
AKR/J |
464 |
29 |
4.9 |
5 |
530 |
|
|
BALB/cByJ |
448 |
26 |
5.1 |
6 |
|
|
|
BALB/cJ |
524 |
28 |
5.2 |
12 |
540 |
502 |
|
C3H/HeJ |
416 |
21 |
4.8 |
2 |
|
|
|
C3H/HeSnJ |
504 |
20 |
4.2 |
6 |
|
|
|
C57BL/6J |
499 |
21 |
4.4 |
23 |
489 |
449 |
|
C57BL/10J |
459 |
20 |
4.2 |
3 |
|
|
|
C57BLKS/J |
463 |
19 |
4.0 |
8 |
|
|
|
C57L/J |
411 |
18 |
4.1 |
2 |
448 |
|
|
C58/J |
429 |
19 |
4.2 |
2 |
451 |
|
|
CBA/J |
462 |
7 |
3.0 |
1 |
508 |
|
|
CBA/CaJ |
437 |
21 |
4.8 |
3 |
|
|
|
CE/J |
472 |
23 |
5.0 |
7 |
476 |
|
|
DBA/1J |
403 |
23 |
5.9 |
4 |
|
409 |
|
DBA/2J |
417 |
27 |
6.4 |
10 |
432 |
413 |
|
FVB/NJ |
481 |
11 |
2.2 |
5 |
|
|
|
LG/J |
488 |
25 |
5.2 |
4 |
552 |
|
|
LP/J |
397 |
29 |
7.1 |
5 |
466 |
|
|
NOD/LtJ |
524 |
47 |
8.4 |
3 |
|
|
|
NZB/BinJ |
515 |
40 |
7.7 |
5 |
|
|
|
NZW/LacJ |
479 |
38 |
7.9 |
3 |
|
|
|
PL/J |
452 |
27 |
5.9 |
3 |
516 |
|
|
SJL/J |
419 |
26 |
6.3 |
7 |
450 |
413 |
|
SM/J |
469 |
24 |
4.6 |
11 |
496 |
436 |
|
SWR/J |
396 |
15 |
3.7 |
2 |
469 |
|
|
Averagesd |
453 |
23 |
5.0 |
4.6 |
483/446 e |
438/446 e |
a
Brain weights are correted for differences
in sex and age. All values normalized to those of 75-day-old females
without fixation. SD is the standard deviation computed using individual
values, CV is the corresponding coefficient of variation expressed as a
percentage (SD x 100/mean).
b
Roderick et al. (1973)
c
Fuller and Wimer (1966)
d
Litter average is geometric mean.
e
Paired averages (483/446): first value from
original study; second value is average for the same set from our current
database. Note the fair agreement with Fuller and Wimer (1963, r =
0.83). Values from Roderick et al. (1973) are consistently higher and the
correlation is somewhat lower ( r = 0.78). This difference may be
due to their use of retired breeders killed by CO2
asphixiation.
Mapping brain weight QTLs
QTLs versus Mendelian loci. QTLs are conventional genes that
have two or more alleles that contribute to quantitative variation of
specific traits (Roff 1997; Lynch and Walsh 1998). A trait may be a
concentration or number, a size, weight or density, an activity or
behavior, a severity index or an age-of-onset. QTLs are often contrasted
with Mendelian loci that have discontinuous effects on phenotypes and
predictable segregation patterns. In contrast, individual QTLs usually
have more modest effects on a particular phenotype and are associated with
phenotypes in a probabilistic way. A QTL might account for as little as 2%
or as much as 50% of the total phenotypic variance. QTLs come in sets that
collectively define a polygene. For example, at least three QTLs are
currently known to control part of the twofold variation in numbers of
retinal ganglion cells (Williams et al. 1998a; Strom 1999), and at least
30 QTLs appear to modulate body size (Cheverud et al. 1996; Brockmann et
al. 1998). In the next several pages I explain the process of mapping a
QTL—in this case, one of the first QTLs demonstrated to modulate brain
weight in the mouse. There are four key steps in mapping QTLs.
Figure 1. Variation in brain weight between two inbred strains
and their test cross progeny. Click on figure to see a higher quality
version. The two parental strains—BALB/cJ and CAST/Ei—are shown to the
far left. Each dot represents the brain weight of an individual
mouse; the short horizontal lines through each box indicate group
averages; the vertical bars within each box mark indicate
standard deviations; and the horizontal line at 454 mg is the
mid-parental value (average of BALB/cJ and CAST/Ei). Box heights are
generally ±2 SD. F1 animals were crossed back to both parental strains,
giving rise to the two sets of B1 progeny shown to the right. The
equation at the bottom of the figure is the Wright-Castle equation
(Wright 1978) for estimating the minimum number of effective factors
(single or linked QTLs) that contribute to the genetic variance of a
trait. Delta P is the difference between parental strain means.
VF2 and Viso
are the variances of the F2 and isogenic strains,
respectively. For these data, we estimate that at least eight
polymorphic genes account for the increased variance of the F2 relative
to that of the isogenic groups. Data are not corrected for variation in
age, sex, or body weight.
Step 1: Assessing trait variation. The
first step is to identify significant variation in phenotypes among
individuals, or, in the case of laboratory mice, among inbred strains.
Variation is an absolute necessity. It is the signal we are trying to
pinpoint on a map of the genome. The greater the heritable variation, the
better the prospects of success.
Figure 1 illustrates the wide variation in brain weight among two
inbred strains (BALB/cJ, and CAST/Ei) and among their intercross and
backcross progeny. This is a cross that I will use throughout this section
as a specific example of mapping a brain weight QTL. Note that brain
weight in the F1 generation overlaps that of the BALB/cJ parental strain.
Brain weight may be inherited as a dominant trait, but since all of these
F1 progeny were born to BALB/cJ mothers, maternal non-genetic factors are
also likely to be an important factor. The spread of points among F2
individuals is somewhat greater than that of either parental strain. This
increase in variance is due to the segregation and assortment of QTLs that
affect brain weight. No fewer than seven QTLs are needed to account for
the differences seen among members of this cross (Wright 1978), but using
our small sample of F2 animals (n = 98), we have only succeeded in
mapping one of these QTLs.
Step 2: Estimating heritability. The
second step in QTL mapping is to verify that a substantial fraction of the
variability of the trait is heritable (Curcio 1992; Wahlsten 1992;
Williams et al. 1996a). In a standard mouse colony, variation in brain
weight has a heritability that ranges from 0.35 to 0.7 (Roderick et al.,
1973; Roderick et al., 1976; Seyfried and Daniel 1977; Fuller 1979;
Henderson 1979; Atchley et al. 1984; Williams et al. 1996b; Strom and
Williams 1997; Strom 1999). Heritability estimates can admittedly be
problematic (Lewontin 1957; Eleftheriou et al. 1975), and in the context
of the heritability of human intelligence, Wahlsten (1994) comments that
"I would feel more secure riding a three legged moose over thin ice than
relying on a heritability coefficient to help me understand the origins of
individual differences or predict future levels of intelligence." But it
can still be useful to go through the process of computing heritability.
The reason is that we need to have some idea of the approximate fraction
of variance in our sample population that is due to heritable genetic
factors before we attempt to map QTLs. The heritable variance is what we
are trying to assign to a set of QTLs. While heritability estimates may be
labile, the QTLs that we map are anchored in the genome itself.
Figure 2. The correlation between brain weights of parents and
their offspring estimates heritability. Animals are from a
multigenerational cross between C57BL/6J and DBA/2J inbred strains (G.
Zhou and R. W. Williams, in progress). Parental values are the average
unfixed weights of mothers and fathers without correction for variation in
age or body weight. Offspring data are average brain weight per litter.
Brains weights are also presented without correction for variation in body
weight, sex, or age. Offspring weights tend to be slightly less than those
of the parents because of offspring are on average about 50 days younger.
The correlation between pairs of values is 0.38 and is a direct estimate
of the narrow-sense heritability of brain weight in this cross and
environment. Correlations between mothers and offspring and fathers and
offspring do not differ significantly. Thus, this estimate of heritability
is not inflated by maternal effect.
Heritability is the fraction of the total variance in a trait that is
generated by the segregation and assortment of allelic variants at the
many gene loci that influence a trait. (New mutations contribute very
little to heritability under all but extreme environmental conditions.) A
simple way to measure heritability is to compare traits between parents
and their offspring. Figure 2 compares the average brain weight of parents
to that of their first litters. The correlation between values is a direct
estimate of heritability—in this case what is called the narrow-sense, or
additive, heritability (Lynch and Walsh 1998). The correlation for this
particular dataset is 0.38. Broad sense heritability which includes
variance due to dominance effects and non-linear interactions between
different genes is likely to be as high as 0.5. In comparison to these
estimates, variation in neuron number has a broad-sense heritability of
approximately 0.8 for granule cells in the dentate gyrus (Wimer and Wimer
1989) and between 0.7 and 0.9 for retinal ganglion cells (Williams et al.
1996a, 1998a; Strom 1999). These values are certainly sufficiently high to
motivate a QTL analysis.
Step 3: Phenotyping and genotyping members of
an experimental cross. The third step is to gather phenotype and
genotype data from a set of animals appropriate for QTL mapping. Several
different types of crosses can be used to map QTLs (Taylor 1978; Groot et
al. 1992; Frankel 1995; Darvasi 1998; Vadasz et al. 1998; Williams
1998b). Figure 1 already introduced one the most common—the F2
intercross. The central idea behind the intercross is to allow high and
low alleles of QTLs inherited from the two inbred strains to segregate and
assort independently from unlinked marker loci. The only marker loci that
will consistently be associated with high, intermediate, and low trait
values in the set of F2 progeny are those marker loci that are closely
linked to QTLs (Tanksley 1993; Williams
1998b).
Phenotyping and regression analysis.
Weighing brain weight is quick and easy, but before we can use these
weights to map QTLs we need to deal with the issue of specificity of gene
action. The brain weight data we have considered so far have not been
corrected for significant differences in the mean body weight among mice.
The heritability that we blithely assigned to brain weight may actually be
a consequence of heritable variation in body size. Unless we adjust our
brain weight phenotype appropriately, we risk mapping body weight QTLs
(Hahn and Haber 1978; Lande 1979). To ensure that we are mapping what we
want to map, we need to factor out variation in brain weight that is
predictable from variation in body weight, sex, age, and other variable
for which we have data.
A crude way of factoring out body size is to use the ratio of brain
to body weight as a phenotype, but a computationally and conceptually far
more powerful approach is to use multiple regression analysis to remove
predictable variance associated with body size and any other important
variables (Williams et al. 1997). The same logic applies when the aim is
to map QTLs that modulate the size of particular CNS cell populations
(Williams et al. 1998a,b); we do not want to map generic brain weight QTLs
inadvertently. In this case, we therefore need to use multiple regression
to remove variance in cell number that is actually associated with total
brain weight. Whatever types of QTLs we are trying to map, we need to
carefully consider the higher-order structures and make sure that we have
taken variation in these structures into account.
Figure 3. Regression analysis of body and brain weight. Regression
analysis is used to minimize the effects of variance in brain weight due
to differences in body weight. Crosses mark males, open circles
mark females. Rather than using each animal's actual brain weight as a
phenotype, we compute a residual brain weight based upon body weight and
sex. Examples of positive and negative residuals are marked on the graph.
In this dataset the correlation is 0.81 and r2
(the coefficient of determination) is 0.66. b is the coefficient
(slope) of the regression equation.
Figure 3 provides graphic explanation of a simple regression analysis
run on the set of F2 intercross animals previously illustrated in Figure
1. For every 1-g increase in body weight there is approximately a 7.9 mg
increase in brain weight. Sixty-six percent of the variance in brain
weight can be predicted by body weight alone. Sex in this case is also a
significant predictor, and at a given body weight, females have brain
weights that are on average 9.4 mg heavier than those of males. However,
in this particular sample, neither age nor the logarithm of age were
useful predictors ( P ~ 0.6). Table 2 is a statistical synopsis of
a multiple regression analysis that takes both body weight and sex into
account. When we use the regression equation and coefficients in Table 2
to compensate for differences in body weight and sex we absorb 67.4% of
the variance in brain weight. The residual 32.6% of the variance is
generated by technical error, other non-controlled environmental effects,
and by the QTLs that we are trying to locate on the map of the mouse
genome. Rather than using the original brain weight data to map, we use
the residual deviations illustrated in Figure 3. For each animal we
compute a derived phenotype that is the difference in milligrams between
the predicted weight of that animal given its weight and sex and its
actual brain weight. By mapping residuals we improve our ability to detect
QTLs that are likely to have selective effects on CNS development.
Table 2. Regression analysis of
brain weight in an F2 intercross
|
Variable |
Coef |
SE |
P |
| Body (g) |
8.5 |
0.64 |
< 0.0001 |
| Sex (1 = F) |
9.4 |
4.7 |
0.047 |
r2 = 67.4%
Genotyping. In a typical analysis of F2
progeny, three to five marker loci spaced about 15 to 25 centimorgans (cM)
apart are genotyped on each of the 20 chromosome pairs. These marker loci
are usually repetitive microsatellite DNA sequences that consist of
variable numbers of cytosine-adenine (CA) dinucleotide repeats. One strain
of mouse may have a microsatellite with 30 CA repeats, whereas another
strain may have a microsatellite with 40 CA repeats. The 5' and 3'
flanking sequences of each microsatellites are unique to that part of the
genome, but they are also highly conserved among strains of mice. This
makes it possible to design PCR primers that selectively amplify a
polymorphic microsatellite located at a precisely defined chromosomal
position (Dietrich et al. 1994).
To map QTLs responsible for a part of the variation illustrated among
the F2 progeny, genomic DNA from each animal is extracted and genotyped
using the polymerase chain reaction. There are three possible genotypes at
each polymorphic microsatellite locus: BB, BC, and CC.
Approximately 110 microsatellite loci that effectively sample the entire
genome of each animal were genotyped. Table 3 illustrates the
organization of phenotype and genotype data for 96 animals as entered into
a spreadsheet. The first two columns are case identifiers. The third and
fourth columns lists phenotypes in milligrams, while the fifth column
lists genotypes for each animal at a particular microsatellite locus on
chromosome (Chr) 6 called D6Mit327. The three genotypes are listed
as C (corresponding to CC), H (the heterozygote CB),
and B (corresponding to BB). As shown in the sixth column,
these three genotypes can be converted into values of -1, 0, and +1. The
sixth and seventh columns are values assigned to each genotype assuming
either that the B allele or the C allele is dominant. For
example, if the C allele is dominant then all of the heterozygous
animals are assigned the low trait value of the CAST/Ei parent; –1 in this
case.
Table 3. Quantitative comparison
between phenotypes and genotypes
| |
|
Phenotypes |
Genotype |
Models |
|
Sorted |
|
Case
|
PCR Plate Order |
Brain
|
Brain
Res
|
D6Mit327
|
Add
|
B
Dom
|
C
Dom
|
|
Add
1
|
Add
0
|
Add
-1
|
|
090894F |
1 |
469 |
38 |
B |
1 |
1 |
1 |
|
38 |
|
|
|
041195K |
2 |
496 |
26 |
H |
0 |
1 |
-1 |
|
|
26 |
|
|
071095A |
3 |
502 |
8 |
H |
0 |
1 |
-1 |
|
|
8 |
|
|
040695M |
4 |
489 |
21 |
H |
0 |
1 |
-1 |
|
|
21 |
|
|
051295G |
5 |
475 |
-1 |
H |
0 |
1 |
-1 |
|
|
-1 |
|
|
041195I |
6 |
489 |
18 |
C |
-1 |
-1 |
-1 |
|
|
|
18 |
|
090894I |
7 |
436 |
-23 |
H |
0 |
1 |
-1 |
|
|
-23 |
|
|
081595V |
8 |
550 |
52 |
B |
1 |
1 |
1 |
|
52 |
|
|
|
041195M |
9 |
463 |
-27 |
H |
0 |
1 |
-1 |
|
|
-27 |
|
|
cases |
10-89 |
… |
… |
|
|
|
|
|
|
|
|
|
081595M |
90 |
496 |
-7 |
B |
1 |
1 |
1 |
|
-7 |
|
|
|
101295I |
91 |
501 |
-1 |
H |
0 |
1 |
-1 |
|
|
-1 |
|
|
040695S |
92 |
477 |
8 |
H |
0 |
1 |
-1 |
|
|
8 |
|
|
091895L |
93 |
468 |
-11 |
H |
0 |
1 |
-1 |
|
|
-11 |
|
|
071095D |
94 |
443 |
-31 |
C |
-1 |
-1 |
-1 |
|
|
|
-31 |
|
080394Z |
95 |
481 |
-3 |
C |
-1 |
-1 |
-1 |
|
|
|
-3 |
|
072195G |
96 |
496 |
3 |
C |
-1 |
-1 |
-1 |
|
|
|
3 |
| |
|
r with brain residuals: |
0.39 |
0.23 |
0.40 |
mean: |
17.3 |
-1.7 |
-7.8 |
Step 4: The statistics of mapping QTLs.
We now have all the necessary data and we are poised to assess whether
QTLs have been discovered, and if so, with what precision and confidence
(Lander and Schork 1994; Churchill and Doerge 1994). Mapping QTLs involves
finding marker loci for which the three genotypes match up well with
variation in the phenotype. BALB/cJ has a much larger brain than does
CAST/Ei. If a QTL modulating brain weight is located near one of the
microsatellites then F2 animals that are homozygous for B alleles
at that marker should have heavier brains than those homozygous for C
alleles. Referring to Table 3, we test whether or not there is a
significant correlation (or regression coefficient) between the numerical
values (–1, 0, and +1) in the sixth through eighth columns and brain
weight residuals in the fourth column. These correlations are listed at
the bottom of Table 3.
A complementary way to explore these data is to determine whether
brain weight residuals of animals with the BB genotype are greater
than those of groups of animals with the other two genotypes. This type of
categorization is shown on the right side of Table 3. The average residual
of individuals with the BB genotype is 17.3 ± 4.8 mg (bottom
right), whereas that of CC individuals is –7.8 ± 3.4 mg. Half of
the difference between these means is an estimate of the additive effect
of substituting a low C allele with a high B allele—a value
of 12.6 mg in this case. The heterozygotes in this sample have an average
phenotype that is 6.4 mg lower than that predicted given the difference
between BB and CC genotypes. This deviation estimates the
dominance of the C allele.
In this analysis we have tested whether a single microsatellite
marker, D6Mit327, is located close to a QTL that influences brain
weight. But we would like to scan the entire genome in the same way. Is
the correlation between the three genotypes and variation in brain weight
of 0.39 the highest that we can find? If we do this analysis at each of
110 marker loci we discover that genotypes at D6Mit327 match
variation in phenotypes much better than any other marker (Table 4). In
fact, the probability of getting such a good match by chance alone if one
only performed a single test is about 1 in 10,000. This is referred to as
the point-wise, or nominal probability of linkage. In addition to listing
the nominal probabilities, Table 4 lists several other interesting
statistics and coefficients. One of these is the likelihood ratio
statistic (LRS), a value that like the logarithm of the odds ratio (the
LOD score) is used to assess whether or not a QTL is present close to the
marker locus (Haley and Knott 1992). The next two columns list the
additive effects of allele substitutions and the predicted dominance
deviation. The last column lists the fraction of the variance that can be
accounted for by genotypes at the marker locus. This latter value is just
the square of the correlation coefficient that we already computed in
Table 3. For example, at D6Mit327 the estimate is 16%.
Table 4. Statistical summary of a genome-wide search for a brain
weight QTL
| Locus |
Chr |
P |
LRS b |
Add c |
Domc |
% d |
| D2Mit295 |
2 |
0.02326 |
7.5 |
7.02 |
-3.29 |
5 |
| D3Mit23 |
3 |
0.02901 |
7.1 |
6.56 |
-5.71 |
5 |
| D4Mit172 |
4 |
0.01964 |
7.9 |
5.73 |
7.60 |
6 |
| D4Mit151 |
4 |
0.02791 |
7.2 |
6.95 |
5.57 |
5 |
| D6Mit273 |
6 |
0.04654 |
6.1 |
-2.84 |
-9.19 |
4 |
| D6Mit327 |
6 |
0.00011 |
18.3 |
12.59 |
-6.42 |
16 |
| D7Mit193 |
7 |
0.03257 |
6.8 |
8.68 |
3.24 |
5 |
| D7Mit120 |
7 |
0.00747 |
9.8 |
12.13 |
0.78 |
9 |
| D7Mit31 |
7 |
0.02386 |
7.5 |
7.85 |
1.92 |
5 |
| D12Mit158 |
12 |
0.01966 |
7.9 |
6.68 |
6.99 |
6 |
| D16Mit65 |
16 |
0.00891 |
9.4 |
7.05 |
-5.61 |
7 |
| DXMit54 |
X |
0.03059 |
7.0 |
5.73 |
4.05 |
5 |
a. P is the point-wise or nominal probability of achieving
an LRS value by chance.
b. the LRS is the likelihood ratio statistic (4.61 times the LOD score).
c. add and dom are the additive effects and dominance
deviations in milligrams.
d. % is the percentage of variance that can be explained by a QTL tightly
linked to the marker locus.
To refine the analysis of this QTL near D6Mit327 we could
genotype neighboring markers to determine whether any have even stronger
association with variation in brain weight. This additional genotyping is
usually not necessary because we can infer the genotypes that are likely
to be present between neighboring marker loci. For example, if a mouse has
a BB genotype at one marker and a CC genotype at a flanking
marker then half way between these markers the genotype will most probably
split the difference and be BC. Comparing predicted genotypes with
actual phenotypes in the interval between marker loci is referred to as
interval mapping (Lander and Botstein 1989). This refinement can
significantly improve the statistical power of a QTL search and makes it
possible to distinguish between a weak QTL that is near to a marker and a
strong QTL that is located farther away. In other words, interval mapping
improves the ability to locate a QTL and to estimate the effects that it
is likely to have on the phenotype.
Figure 4. Linkage of the QTL Bsc5
to chromosome 6 in a cross between BALB/cJ and CAST/Ei. The x-axis
represents position along Chr 6. The most proximal marker that we typed
(D6Mit273) maps at 19 centiMorgans (cM), whereas the most distal
maps at 70 cM. The Bsc5 locus is most likely to map about 1 cM
proximal to the microsatellite marker D6Mit32. The confidence
interval (CI) of this estimate (bold black lines) is wide–from 37
to 61 cM for a two-LOD CI (95% probability), and from 41 to 56 cM for a
one-LOD CI. Genome-wide probability thresholds (Fig. 5) are marked by
fine horizontal lines. The right scale and the two lower curves
indicate the approximate additive effect and dominance deviations
generated by Bsc5. The substitution of a single BALB/cJ allele
for a CAST/Ei allele at Bsc5 may be responsible for a 15-mg gain
in brain weight.
The results of the more fine-grained interval mapping analysis of Chr
6 are illustrated in Figure 4. The horizontal line at the top represents
most of Chr 6 (from 19 cM to 70 cM). Only four marker loci on Chr 6 were
genotyped (D6Mit273, D6Mit71, D6Mit327, and D6Mit113).
Using the genotype data and the program Map Manager QT (Manly and Olson,
1999;
http://mapmgr.roswellpark.org/mmQT.html), the LRS was computed at
1-cM intervals. These values were then used to generate the shaded
likelihood profile. As we suspected on the basis of our initial analysis,
there appears to be a QTL influencing brain weight near D6Mit327.
Permutation analysis. The process
of mapping QTLs involves computing hundreds of linkage statistics across
the entire set of chromosomes. Given the large number of statistical tests
there is a strong probability of getting a "significant" association by
chance alone. The nominal probabilities listed in Table 4 tell us little
about the genome-wide probability that we have discovered a QTL (Lander
and Kruglyak 1997). We need to compensate for these multiple tests. The
appropriate correction factor depends on the particular distribution of
trait values and the quality and quantity of genotype data.
Figure 5.Permutation analysis of the Bsc5
locus. Genome-wide thresholds for estimating the strength of linkage are
estimated by randomly permuting data such as those listed in Table 3. This
histogram tallies single best LRS scores for each of 10,000 permutations.
The two-tailed probability of a random dataset having a peak LRS score
better than 18.4 is 0.0215 ± 0.0015.
A conceptual simple but computationally tedious permutation test can be
used to estimate the distribution of best LRS scores that one might expect
to get by chance with a given dataset (Churchill and Doerge 1994). This
procedure reassigns phenotype values listed in Table 3, and then remaps
the jumbled dataset to get a new version of Table 4. For each permutation
the program keeps track of the single highest LRS score. The process is
carried out another 9,999 times. Figure 5 shows a histogram of the peak
LRS scores that resulted from a permutation of the data in Table 3. The
peak LRS score was typically near 10. This non-parametric distribution of
peak LRS scores can now be used to gauge the probability of obtaining an
LRS of 18.3 by chance alone. Only 2% of permutations do this well or
better. We can therefore be reasonably confident that we have mapped a QTL
modulating brain weight to Chr 6. This is the fifth QTL that Richelle
Strom and I have mapped (Strom 1999, R. C. Strom and R. W. Williams, in
progress), and we have named it brain size control 5 (Bsc5).
Bsc5 maps on Chr 6, approximately 1 cM proximal to D6Mit327.
Bsc5 has not been mapped with much precision: the 95% confidence
interval is defined by the width of the map profile 2 LOD units (or 9.2
LRS units) to either side of the peak—in this case between 37 and 61 cM.
This 24-cM interval contains approximately 1,200 genes, and perusing a
list of candidates at this point is little more than an entertaining
exercise in optimism. A quick scan of this region using the
Mouse Genome Database reveals one interesting candidate—the
thyrotropin releasing hormone gene that maps at 43 cM.
Cloning QTLs. Mapping QTLs is the initial
reconnaissance stage in a systematic effort to explore mechanisms that
modulate the development of the CNS. The next step is to match each QTL
with a single gene and its alternative alleles. QTLs will generally need
to be mapped with a precision of 1 to 2 cM—a chromosomal interval that
will typically harbor 50–100 genes. Achieving this level of accuracy is
not impractical, although it will often require an analysis of 1000 or
more animals (Darvasi 1997, 1998). A small subset of positional candidate
genes can then be chosen for further analysis on the basis of expression
patterns, known function, and differences in DNA sequence among strains.
The efficiency of the candidate gene approach will improve greatly in the
next decade. The genome of C57BL/6J will have been sequenced within five
years, and it is also likely that the utility of this code will be
enhanced with sequence data from other major inbred strains such as 129,
A, BALB/c, C3H, DBA/2, CAST/Ei, SPRET/Ei. Once sequence data have been
combined with expression maps for different parts of the mouse brain, it
should be possible to winnow a set of candidate genes to a very short
list. If the thyrotropin releasing hormone gene survives this filtration,
then we may then be justified in comparing its sequence among strains with
different phenotypes. The conversion of quantitative phenotypes (e.g., low
to high) by substituting alleles of one strain with that of another strain
will provide the final and most compelling support that the identity
between a QTL and a particular sequence variant has been made correctly
(Frankel 1995).
[A new method called recombinant inbred intercross (RIX) mapping
promises to make it significantly more practical to fine-map QTLs within
intervals of less than 1 cM (Williams et al,
2000). RIX
mapping relies on the generation of a large number of F1 hybrids by
crossing fully genotyped recombinant inbred strains. For example, the set
of 35 BXD RI strains can be used to generate as many as 595 unique RIX F1
hybrids. Like an F1 between any two inbred parental strains, the genotypes
of each of these RIX F1 is defined precisely and no genotyping is required
to make use of the RIX set for mapping QTLs. For purposes of QTL mapping
the set of RIX lines resembles an F2 intercross (all three genotypes are
represented at each locus) more than it does an RI set. However, unlike an
F2, eaach genotype is represented by a potentially unlimited number of
individuals. Thus, the mean phenotype associated with each genotype can be
determined with as much precision as the study demands. The availability
of very large numbers of novel genotypes greatly improves the power of
detecting QTLs that have modest effects on CNS structure and behavior. Not
all of the huge number of avaialble RIX lines need to be generated and
tested to fine-map or confirm a QTL: one can simply analyze subsets of RIX
lines that have defined genotypes on intervals that harbor putative QTLs.
Contrasting genotypes can be synthesized on adjacent intervals to
determine the true position of a QTL. In this way RIX lines can be used to
map QTLs with nearly as much precision as one could map a Mendelian locus
on the same set of lines. Drs. Lu, Airey, Kulkarni, and I have recently
completed generating the set of CXB RIX lines. From 13 CXB lines we have
generated a set of 76 (13 x 12/2) RIX lines. These lines are now being
used to map numerous CNS and eye morphometric QTLs. (RW, July 2000)]
It is important to realize that QTLs are not invariant across
different populations of mice. A QTL can be identified because it is
polymorphic in a particular population or cross. The same gene may not
necessarily be polymorphic in another cross. If a gene is not polymorphic
it cannot generate phenotypic va
|