Generalizing Correspondence Analysis for Applications in Machine Learning

Autor: Hsiang Hsu, Flavio P. Calmon, Salman Salamatian
Rok vydání: 2022
Předmět:
Zdroj: IEEE Transactions on Pattern Analysis and Machine Intelligence. 44:9347-9362
ISSN: 1939-3539
0162-8828
DOI: 10.1109/tpami.2021.3127870
Popis: Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies by finding maximally correlated embeddings of pairs of random variables. CA has found applications in fields ranging from epidemiology to social sciences; however, current methods do not scale to large, high-dimensional datasets. In this paper, we provide a novel interpretation of CA in terms of an information-theoretic quantity called the principal inertia components. We show that estimating the principal inertia components, which consists in solving a functional optimization problem over the space of finite variance functions of two random variable, is equivalent to performing CA. We then leverage this insight to design novel algorithms to perform CA at an unprecedented scale. Particularly, we demonstrate how the principal inertia components can be reliably approximated from data using deep neural networks. Finally, we show how these maximally correlated embeddings of pairs of random variables in CA further play a central role in several learning problems including visualization of classification boundary and training process, and underlying recent multi-view and multi-modal learning methods.
30 pages, 7 figures, 6 tables. arXiv admin note: text overlap with arXiv:1902.07828
Databáze: OpenAIRE