Abstrakt: |
This article presents a novel parallel comparison-free hardware sorting architecture that sorts $N$ , $n$ -bit elements consuming linear worst-case sorting latency of $\mathit {O(N)}$ clock-cycles utilizing $k$ -parallel clusters. A parallel cluster contains a number of identical blocks utilizing a few fundamental logic components. The architecture achieves a speed-up of ${}({n}/[{\lceil {({n}/{k})}\rceil +k}])$ over nonparallel architectures. The proposed sorting architecture identifies the largest element in every cycle with smaller clock periods compared to state-of-the-art comparison-free approaches and does not require any kind of preprocessing of the input data set, unlike most of the existing hardware sorters. The experimental result shows the degree of parallelism $(k)$ for 16, 32, and 64-bit elements are, respectively, {4}, {4, 5, 6, 7, 8}, and {8} to achieve maximum system throughput. This sorter achieves a throughput of ≈194 Million Elements per second (MEps) for $k=6$ and, it is ≈142 MEps for $k=8$ to sort 256 elements of 32 and 64-bit, respectively, whereas it is ≈159 MEps for $k = 6$ to sort 512 elements of 40 bit. |