Zobrazeno 1 - 10
of 33
pro vyhledávání: '"Shiralkar, Prashant"'
Autor:
Du, Wei, Advani, Laksh, Gambhir, Yashmeet, Perry, Daniel J, Shiralkar, Prashant, Xing, Zhengzheng, Colak, Aaron
Large language models (LLMs) have demonstrated significant capability to generalize across a large number of NLP tasks. For industry applications, it is imperative to assess the performance of the LLM on unlabeled production data from time to time to
Externí odkaz:
http://arxiv.org/abs/2309.05619
Recommending a diversity of product types (PTs) is important for a good shopping experience when customers are looking for products around their high-level shopping interests (SIs) such as hiking. However, the SI-PT connection is typically absent in
Externí odkaz:
http://arxiv.org/abs/2305.14549
Extracting structured information from HTML documents is a long-studied problem with a broad range of applications, including knowledge base construction, faceted search, and personalized recommendation. Prior works rely on a few human-labeled web pa
Externí odkaz:
http://arxiv.org/abs/2208.13086
HTML documents are an important medium for disseminating information on the Web for human consumption. An HTML document presents information in multiple text formats including unstructured text, structured key-value pairs, and tables. Effective repre
Externí odkaz:
http://arxiv.org/abs/2201.10608
Autor:
Wang, Daheng, Shiralkar, Prashant, Lockard, Colin, Huang, Binxuan, Dong, Xin Luna, Jiang, Meng
Information extraction from semi-structured webpages provides valuable long-tailed facts for augmenting knowledge graph. Relational Web tables are a critical component containing additional entities and attributes of rich and diverse knowledge. Howev
Externí odkaz:
http://arxiv.org/abs/2102.09460
In many documents, such as semi-structured webpages, textual semantics are augmented with additional information conveyed using visual elements including layout, font size, and color. Prior work on information extraction from semi-structured websites
Externí odkaz:
http://arxiv.org/abs/2005.07105
Autor:
Lin, Bill Yuchen, Lee, Dong-Ho, Shen, Ming, Moreno, Ryan, Huang, Xiao, Shiralkar, Prashant, Ren, Xiang
Publikováno v:
Proc. of ACL 2020, page 8503--8511
Training neural models for named entity recognition (NER) in a new domain often requires additional human annotations (e.g., tens of thousands of labeled instances) that are usually expensive and time-consuming to collect. Thus, a crucial research qu
Externí odkaz:
http://arxiv.org/abs/2004.07493
The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and rec
Externí odkaz:
http://arxiv.org/abs/1804.04635
Autor:
Shiralkar, Prashant, Avram, Mihai, Ciampaglia, Giovanni Luca, Menczer, Filippo, Flammini, Alessandro
We present RelSifter, a supervised learning approach to the problem of assigning relevance scores to triples expressing type-like relations such as 'profession' and 'nationality.' To provide additional contextual information about individuals and rel
Externí odkaz:
http://arxiv.org/abs/1712.08674
The volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformat
Externí odkaz:
http://arxiv.org/abs/1708.07239