In modern statistical analysis, datasets often contain a large number of variables with complicated dependence structures. This situation is especially common in important problems in economics, engineering, finance, genetics, genomics, neurosciences, etc. One of the most important measures on the dependence between variables is the correlation coefficient, which describes their linear dependence. In the new paradigm described above, understanding the correlation and the behavior of correlated variables is a crucial problem and prompts statisticians to develop new theories and methods. Motivated by this challenge, the PI proposes to study the correlation through novel geometric perspectives. The overall objective is (1) to develop useful theories and methods on the correlation and (2) to build a stronger connection between geometry and statistics. The PI anticipates the achievement of his goals through an integration of research and education plans.
The research agenda is to systematically investigate three fundamental aspects of the correlation: (1) the magnitude and distribution of the maximal spurious sample correlation; (2) the detection of a low-rank correlation structure; and (3) the probability measure over the space of correlation matrices. In these studies, the novel integration of statistical and geometric insights characterizes the proposed solutions and facilitates precise probability statements. Completion of the proposed research will provide a comprehensive understanding of the correlation and a stronger connection between geometry and statistics. The PI also has comprehensive plans on educating graduate and undergraduate students and on disseminating the research results to the broader scientific community.