Výsledky vyhledávání - "Rangwala, Huzefa"

Report

FeatNavigator: Automatic Feature Augmentation on Tabular Data

Autor: Liang, Jiaming, Lei, Chuan, Qin, Xiao, Zhang, Jiani, Katsifodimos, Asterios, Faloutsos, Christos, Rangwala, Huzefa

Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment

Externí odkaz: http://arxiv.org/abs/2406.09534

Zobrazit plný text záznamu

Report

GraphStorm: all-in-one graph machine learning framework for industry applications

Autor: Zheng, Da, Song, Xiang, Zhu, Qi, Zhang, Jian, Vasiloudis, Theodore, Ma, Runjie, Zhang, Houyu, Wang, Zichen, Adeshina, Soji, Nisa, Israt, Mottini, Alejandro, Cui, Qingjun, Rangwala, Huzefa, Zeng, Belinda, Faloutsos, Christos, Karypis, George

Publikováno v: KDD 2024

Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution

Externí odkaz: http://arxiv.org/abs/2406.06022

Zobrazit plný text záznamu

Report

DispaRisk: Auditing Fairness Through Usable Information

Autor: Vasquez, Jonathan, Domeniconi, Carlotta, Rangwala, Huzefa

Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases present in datase

Externí odkaz: http://arxiv.org/abs/2405.12372

Zobrazit plný text záznamu

Report

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Autor: Kong, Kezhi, Zhang, Jiani, Shen, Zhengyuan, Srinivasan, Balasubramaniam, Lei, Chuan, Faloutsos, Christos, Rangwala, Huzefa, Karypis, George

Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant

Externí odkaz: http://arxiv.org/abs/2402.14361

Zobrazit plný text záznamu

Report

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Autor: Mavromatis, Costas, Srinivasan, Balasubramaniam, Shen, Zhengyuan, Zhang, Jiani, Rangwala, Huzefa, Faloutsos, Christos, Karypis, George

Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an

Externí odkaz: http://arxiv.org/abs/2310.20046

Zobrazit plný text záznamu

Report

NameGuess: Column Name Expansion for Tabular Data

Autor: Zhang, Jiani, Shen, Zhengyuan, Srinivasan, Balasubramaniam, Wang, Shen, Rangwala, Huzefa, Karypis, George

Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can negatively im

Externí odkaz: http://arxiv.org/abs/2310.13196

Zobrazit plný text záznamu

Report

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

Autor: Zhang, Hengrui, Zhang, Jiani, Srinivasan, Balasubramaniam, Shen, Zhengyuan, Qin, Xiao, Faloutsos, Christos, Rangwala, Huzefa, Karypis, George

Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This pa

Externí odkaz: http://arxiv.org/abs/2310.09656

Zobrazit plný text záznamu

Report

BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

Autor: Wang, Zifeng, Wang, Zichen, Srinivasan, Balasubramaniam, Ioannidis, Vassilis N., Rangwala, Huzefa, Anubhai, Rishita

Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained a

Externí odkaz: http://arxiv.org/abs/2310.03320

Zobrazit plný text záznamu

Report

GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

Autor: Acharya, Angeela, Sikdar, Siddhartha, Das, Sanmay, Rangwala, Huzefa

Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to aggregated

Externí odkaz: http://arxiv.org/abs/2212.05975

Zobrazit plný text záznamu

Report

Training self-supervised peptide sequence models on artificially chopped proteins

Autor: Sadeh, Gil, Wang, Zichen, Grewal, Jasleen, Rangwala, Huzefa, Price, Layne

Representation learning for proteins has primarily focused on the global understanding of protein sequences regardless of their length. However, shorter proteins (known as peptides) take on distinct structures and functions compared to their longer c

Externí odkaz: http://arxiv.org/abs/2211.06428

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání