Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant

Autor:	Yunqi Zhang, Michael A. Laurenzano, Arjun Khurana, Johann Hauswald, Lingjia Tang, Ronald G. Dreslinski, Jason Mars, Austin Rovinski, Yiping Kang, Hailong Yang, Trevor Mudge, Cheng Li, Vinicius Petrucci
Rok vydání:	2016
Předmět:	General Computer Science Computer science Suite Workload 02 engineering and technology Total cost of ownership computer.software_genre 020202 computer hardware & architecture End-to-end principle Server Scalability 0202 electrical engineering electronic engineering information engineering Operating system 020201 artificial intelligence & image processing Field-programmable gate array computer Throughput (business)
Zdroj:	ACM Transactions on Computer Systems. 34:1-32
ISSN:	1557-7333 0734-2071
Popis:	As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::2d7892d0f86f0c73ac6e56f1f6175e56 https://doi.org/10.1145/2870631 Zobrazit plný text záznamu