Popis: |
The spatial-temporal locality has been observed in various scenarios for conversational services with either voice or text requests. Given the current cloud-based processing mechanism, integrating such a service with caching is a promising way to improve responsiveness, reduce in-network transmission, and avoid computational redundancy. Goes beyond precise redundancy and fuzzy redundancy, semantic redundancy adapts to the diversity in command expression, and is considered as a practical solution for conversational services. In this paper, we introduce a hierarchical cache design inspired by semantic redundancy for conversational services. We propose a scalable edge system ChatCache to incorporate the hierarchical cache design and serve single or multiple users. We discussed the cache efficiency with different similarity match policies, and evaluate the responsiveness and scalability of ChatCache on heterogeneous edge platforms. On most of the evaluated platforms, ChatCache reduces user-perceived latency by more than 91.7% for voice requests, more than 81.6% for text requests. The throughput of ChatCache reaches 42.6 throughput tps for voice requests, and 64.4 tps for text requests, which is comparable with mainstream cloud cognitive services. The promising evaluation results show the capability of ChatCache in reducing the user-perceived latency and computation redundancy with high response accuracy for conversational services. |