Hey ML, what can you do for me?
Autor: | Ashis Kumer Biswas, Javier Pastorino |
---|---|
Rok vydání: | 2020 |
Předmět: |
0209 industrial biotechnology
Source code Computer science business.industry media_common.quotation_subject Supervised learning 02 engineering and technology Machine learning computer.software_genre Pipeline (software) Spectral clustering Task (project management) Exploratory data analysis 020901 industrial engineering & automation Ranking 0202 electrical engineering electronic engineering information engineering Graph (abstract data type) Unsupervised learning 020201 artificial intelligence & image processing Artificial intelligence Set (psychology) business computer media_common |
Zdroj: | AIKE |
DOI: | 10.1109/aike48582.2020.00023 |
Popis: | Machine learning (ML) algorithms are data-driven and given a goal task and a prior experience dataset relevant to the task, one can attempt to solve the task using ML seeking to achieve high accuracy. There is usually a big gap in the understanding between an ML experts and the dataset providers due to limited expertise in cross disciplines. Narrowing down a suitable set of problems to solve using ML is possibly the most ambiguous yet important agenda for data providers to consider before initiating collaborations with ML experts. We proposed an ML-fueled pipeline to identify potential problems (i.e., the tasks) so data providers can, with ease, explore potential problem areas to investigate with ML. The autonomous pipeline integrates information theory and graph-based unsupervised learning paradigms in order to generate a ranked retrieval of top-k problems for the given dataset for a successful ML based collaboration. We conducted experiments on diverse real-world and well-known datasets, and from a supervised learning standpoint, the proposed pipeline achieved 72% top-5 task retrieval accuracy on an average, which surpasses the retrieval performance for the same paradigm using the popular exploratory data analysis tools. Detailed experiment results with our source codes are available at: https://github.com/jpastorino/heyml. |
Databáze: | OpenAIRE |
Externí odkaz: |