ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases
Autor: | Jun Gao, Zhao Li, Donghui Ding, Ji Zhang, Jinze Bai, Jialin Wang |
---|---|
Rok vydání: | 2021 |
Předmět: |
Hypergraph
Computer science Relational database Supervised learning 02 engineering and technology computer.software_genre Column (database) 020204 information systems 0202 electrical engineering electronic engineering information engineering Table (database) Graph (abstract data type) No free lunch in search and optimization 020201 artificial intelligence & image processing Data mining Tuple computer |
Zdroj: | WWW |
Popis: | A relational database, consisting of multiple tables, provides heterogeneous information across various entities, widely used in real-world services. This paper studies the supervised learning task on multiple tables, aiming to predict one label column with the help of multiple-tabular data. However, classical ML techniques mainly focus on single-tabular data. Multiple-tabular data refers to many-to-many mapping among joinable attributes and n-ary relations, which cannot be utilized directly by classical ML techniques. Besides, current graph techniques, like heterogeneous information network (HIN) and graph neural networks (GNN), are infeasible to be deployed directly and automatically in a multi-table environment, which limits the learning on databases. For automatic learning on relational databases, we propose an auto-table-join network (ATJ-Net). Multiple tables with relationships are considered as a hypergraph, where vertices are joinable attributes and hyperedges are tuples of tables. Then, ATJ-Net builds a graph neural network on the heterogeneous hypergraph, which samples and aggregates the vertices and hyperedges on n-hop sub-graphs as the receptive field. In order to enable ATJ-Net to be automatically deployed to different datasets and avoid the ”no free lunch” dilemma, we use random architecture search to select optimal aggregators and prune redundant paths in the network. For verifying the effectiveness of our methods across various tasks and schema, we conduct extensive experiments on 4 tasks, 8 various schemas, and 19 sub-datasets w.r.t. citing prediction, review classification, recommendation, and task-blind challenge. ATJ-Net achieves the best performance over state-of-the-art approaches on three tasks and is competitive with KddCup Winner solution on task-blind challenge. |
Databáze: | OpenAIRE |
Externí odkaz: |