The Neurogenetics Tool Box


		EXPERIMENTAL PLAN
		Principal Investigator/Program Director Williams, Robert W.
		Data entry, editing, importing, and transformation The NTB will provide at least two methods for submission of trait information. The more general method will be a form with an entry field into which a list of strain names (or progeny numbers) and trait values can be entered. This method will require the user to follow some simple formatting rules in entering the data. The second method, for shared data sets such as RI lines, will provide separate forms for each shared data set, in which each strain or individual will be represented by a separate entry field for the trait value. This method will eliminate all data formatting problems. Figure 2 shows a prototype of such a submission form. The user need enter only a trait name and trait values for each inbred line and then click the Submit button. If we can do so without sacrificing responsiveness, we will use a Java applet to perform sanity-checking on the trait values (detection of extreme outliers) to alert the user to possible entry errors before the data is submitted. The same form will be used to request descriptive statistics, single-locus association test, and simple interval mapping. A form for composite interval mapping would be somewhat more complicated, because the user would be given the option of choosing among loci previously shown to be associated with the trait. It is often useful to be able to rescale trait data or to transform it with simple functions to make the distribution of the data more nearly normal or to equalize the variance of the different trait phenotypes. Such a transformation does not affect the problem described above as the mixture problem, but it may reduce non-normality derived from the distribution of the environmental effect and thus improve small sample accuracy of the c2 approximation. The NTB will offer a number of transformation functions and will offer a probit plot of the trait distribution, a graphical indication of the distributions normality. The early implementations of the NTB will focus on quantitative traits expressed in metric data. This is the most common type of trait data. Traits expressed in ordinal data (relative rank) will be addressed in later years when nonparametric methods are implemented. Marker redundancy in data sets QTL mapping of marker loci in recombinant inbred mice uses data sets that were not developed solely for this purpose. Such data sets have many markers that are, for QTL mapping purposes, redundant. Markers that are not separated by recombinations, and those that are spaced more closely than 10 cM add little or no information for conventional backcross and intercross populations (Lander and Botstein 1989; Rebai, Goffinet et al. 1995) . The markers chosen for the recombinant inbred data sets supported by the NTB will be optimized for QTL mapping, with appropriate marker density. Advanced intercross populations, on the other hand, benefit from having more closely spaced markers (Darvasi and Soller 1995) . The choice of markers will be especially important for the large G₁₀ advanced intercross described elsewhere in this application. We have created a complete database of all MIT CA-repeat microsatellite loci, which can be downloaded at < www.nervenet.org/main/dictionary.html>. Of a total of 6310 microsatellites in the Whitehead Institute database and our own FileMaker Pro version, 2957 are polymorphic between C57BL/6J and DBA/2J. Of these, 1269 have polymorphisms in the range of 8 to 30 bp. Given the very large number of polymorphic loci, it has been easy for us to select a subset of 350 evenly spaced markers that will cover >95% of the mouse genome, the Y chromosome excepted.



		Next Topic
		Data and map export.

Data entry, editing, importing, and transformation

Marker redundancy in data sets