Popis: |
Konvolucijske nevronske mreže dosegajo izjemne rezultate na področju računalniškega vida. Osrednja operacija teh mrež je konvolucija z jedrom majhne in nespremenljive velikosti. V praksi je zato standardni prijem za povečavo dovzetnega polja združevanje sosednjih slikovnih točk, kar pa za mnoge probleme v računalniškem vidu nima zadovoljive izhodne resolucije. Problem naslavlja t. i. dilatacija, ki enote iz konvolucijskega jedra razširi na širše območje in s tem poveča dovzetno polje. Velikost razširitve je ročno nastavljena in tekom učenja ni spremenljiva, kar lahko predstavlja težavo, saj v splošnem njene optimalne vrednosti ne poznamo. Učljivo velikost dovzetnega polja ima nedavno predlagana metoda, pri kateri je konvolucijsko jedro sestavljeno iz premičnih združevalnih enot (angl. displaced aggregation units, DAU). Vsako jedro ima svoj nabor parametrov, svojo velikost dovzetnega polja. V tej diplomski nalogi naslavljamo vprašanje, ali je mogoče reducirati prostostne stopnje modela brez izgube natančnosti. Predlagamo tri načine reduciranja prostostnih stopenj z deljenjem odmikov na vhodih in izhodih. Implementiramo prehod naprej in vzvratni prehod za te tri različice, jih vgradimo v arhitekture konvolucijskih nevronskih mrež različnih velikosti in evalviramo na problemu klasifikacije slik v 10 razredov. Vse različice imajo za več kot 50% manj parametrov kot originalni sloj DAU. Eksperimentalni rezultati kažejo, da ima model, ki ima odmike neodvisne od izhoda, znatno manjšo računsko zahtevnost kot originalni sloj DAU, pri tem da klasifikacijska točnost pade za manj kot 2%. Convolutional neural networks have demonstrated excellent performance at computer vision tasks. The central operation of these networks is a convolution with a small, fixed size kernel. In practice, therefore, the standard approach for increasing the receptive field is to combine adjacent pixels, which for many computer vision tasks does not have a sufficient output resolution. The problem is addressed by the so-called dilation, which extends the units from the convolution kernel to a wider area, thereby increasing the receptive field. The size of the kernel is manually set and is not variable during learning, which can be a problem, as we generally do not know its optimal value. To solve this problem, a method has recently been proposed in which the convolution kernel consists of displaced aggregation units (DAU). Each kernel has its own set of parameters, its own size of receptive field. In this thesis we address the question of whether it is possible to reduce model degree of freedom without loss of its accuracy. We propose three ways to reduce degrees of freedom by sharing displacements at the inputs and outputs. We implement a forward and backward pass for these three versions, embed them in architectures of convolutional neural networks of different sizes and evaluate on the problem of classifying images into 10 classes. All versions have more than 50% fewer parameters than the original DAU layer. The experimental results show that the model, which has output-independent displacements, has a significantly lower computational complexity than the original DAU layer, with the classification accuracy lower by less than 2%. |