Over the past ten years there has been a great deal of work in the statistics community devoted to the problem of testing and estimating associations, in particular correlations, between variables in high dimensional data sets. By definition, correlations capture pairwise relationships between variables, and there is a close formal relationship between the statistical analysis of correlations and the statistical analysis of networks. The statistical activity surrounding inference concerning correlations has been motivated in large part by the increasing use and importance of networks in a variety of fields, including economics, brain mapping, genomics and biomedicine. Networks of proteins associated with a disease can point the way towards potential drug interventions; known networks may serve as inputs for predictive models of survival or response to therapy in breast cancer and other diseases. Concurrent with this growth in statistical methodology, recent developments in the fields of probabilistic combinatorics and machine learning have significantly advanced our understanding of discrete random structures that capture the association of high-dimensional objects. Although these powerful theoretical techniques can be brought directly to bear on a number of the correlation based problems considered in the statistical community, to date no such cross-fertilization has taken place.
The proposed research has several complementary components. The first component is development, of an iterative testing procedure that identifies self-associated sets of vertices in a graph, and self-associated sets of variables in a high dimensional data set. Within the framework of the iterative testing procedure we will develop computationally efficient methods for several applied problems: mining of block correlation differences in two sample studies, and identifying groups of mutually correlated variables in studies where each sample is assessed with two or more measurement platforms. As a special case of the latter problem, we will develop tools to enhance the power of genomic studies that link local genetic variation to global changes in gene expression. Development and application of the methods will be carried out in cooperation with researchers in genomics, biomedicine, and sociology at UNC, with whom the PI and co-PI have long standing collaborations. The second component of the proposed research is to adapt and extend existing techniques in probabilistic combinatorics to provide supporting theory for the iterative testing procedure, and to address broader statistical questions concerning the testing and estimation of correlations.

Comments are closed.