Generalizing Correspondence Analysis for Applications in Machine Learning

Autor:	Hsiang Hsu, Flavio P. Calmon, Salman Salamatian
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Current (mathematics) Scale (ratio) Computer science Computer Science - Information Theory Boundary (topology) Machine Learning (stat.ML) Correspondence analysis Machine Learning (cs.LG) Machine Learning Statistics - Machine Learning Artificial Intelligence Leverage (statistics) business.industry Information Theory (cs.IT) Applied Mathematics Principal (computer security) Visualization Computational Theory and Mathematics Neural Networks Computer Computer Vision and Pattern Recognition Artificial intelligence business Random variable Algorithm Algorithms Software
Zdroj:	IEEE Transactions on Pattern Analysis and Machine Intelligence. 44:9347-9362
ISSN:	1939-3539 0162-8828
DOI:	10.1109/tpami.2021.3127870
Popis:	Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies by finding maximally correlated embeddings of pairs of random variables. CA has found applications in fields ranging from epidemiology to social sciences; however, current methods do not scale to large, high-dimensional datasets. In this paper, we provide a novel interpretation of CA in terms of an information-theoretic quantity called the principal inertia components. We show that estimating the principal inertia components, which consists in solving a functional optimization problem over the space of finite variance functions of two random variable, is equivalent to performing CA. We then leverage this insight to design novel algorithms to perform CA at an unprecedented scale. Particularly, we demonstrate how the principal inertia components can be reliably approximated from data using deep neural networks. Finally, we show how these maximally correlated embeddings of pairs of random variables in CA further play a central role in several learning problems including visualization of classification boundary and training process, and underlying recent multi-view and multi-modal learning methods. 30 pages, 7 figures, 6 tables. arXiv admin note: text overlap with arXiv:1902.07828
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8deba069837fa02e7b39cb16c4034eda https://doi.org/10.1109/tpami.2021.3127870 Zobrazit plný text záznamu