Popis: |
Deep learning has become the algorithm of choice in many applications like face recognition, object detection, speech recognition, etc. because of superior accuracy. Large models with several parameters were developed to obtain higher accuracy, which eventually gave diminishing returns at very large training and deployment cost. Consequently, greater attention is now being paid to the efficiency of neural networks. Low power consumption is particularly important in the case of always-on applications. Some examples of these applications are the datacenters, cellular base stations, battery-powered devices like implantable devices, wearables, cell phones and UAVs. Improvement in the efficiency of these devices by reducing the power consumed will bring down the energy cost or extend the battery life or decrease the form factor of these devices, thereby improving the acceptability and adoption of the device. Neural networks are a significant component of the total workload in the case of IoT devices with smart functions and datacenters. Base stations can also employ neural networks to improve the rate of convergence in channel estimation. Efficient execution of the neural networks on always-on devices, therefore, helps in lowering the overall power dissipation. Algorithm only solutions target CPU or GPU as a platform and tend to focus on the number of computing operations. Hardware only solutions tend to focus on programmability, low voltage operation, standby power reduction and on-chip data movement. Such solutions fail to take advantage of the joint optimization of both algorithm and hardware for the target application. This thesis contributes to improving the efficiency of neural networks on always-on devices through both algorithmic and hardware interventions. It presents works of algorithm-hardware co-design which can obtain better power reduction in the case of a smart IoT device, a datacenter and a small cell base station. It achieves power reduction through a combination of appropriate neural network algorithm and architecture, simpler operations and a reduction in the number of off-chip memory accesses. |