Statistical Pattern Recognition in Genetic Epidemiology


The identification of susceptibility genes in complex genetic diseases poses many challenging methodological questions. Complex traits are phenotypes that are not attributable to a single gene locus with classical mendelian inheritance, but instead are generally believed to be influenced by multiple interacting disease loci.

Neural networks have previously been suggested for the analysis of complex genetic traits (Lucek et al., 1998), however with mixed results. While the method showed interesting results and even pointed to two genes previously not identified, also some doubts were raised as to the stability of results in a later analysis (Marinov and Weeks, 2001).

We give a brief overview of neural networks and their application to gene finding in affected sibling pair studies. We then show that the method is indeed unstable, identifying different genes and ranking them differently in multiple runs. Worse yet, we show that the method suffers from a high prediction error when the trained network is used to predict affection status from previously unseen marker data, giving an error rate that is higher than would be obtained from mere guessing. An analysis of the causes is given, identifying dataset sparsity and marker dimensionality as the two main concerns. We discuss pruning as a means to control these problems, and then show how the method can be combined with a marker subset selection step carried out by a genetic algorithm. Results are given on a simulated dataset. Download:")?> Technical Report (PS zipped)
Technical Report (PDF)