Autor: |
Tadonki, Claude, Haggui, Olfa, Lacassagne, Lionel |
Přispěvatelé: |
Centre de Recherche en Informatique (CRI), MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Université Paris sciences et lettres (PSL), National Engineering School of Sousse / Ecole Nationale d'Ingénieurs de Sousse (ENISo), Ecole Nationale d'Ingénieurs de Sousse (ENISo), Architecture et Logiciels pour Systèmes Embarqués sur Puce (ALSOC), Laboratoire d'Informatique de Paris 6 (LIP6), Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), MINES ParisTech - PSL Research University, Centre de recherche en informatique - MINES ParisTech - PSL Research University, LIP6, Sorbonne Université, CNRS, UMR 7606, Mines Paris - PSL (École nationale supérieure des mines de Paris) |
Jazyk: |
angličtina |
Rok vydání: |
2017 |
Předmět: |
|
Zdroj: |
[Research Report] E-424, MINES ParisTech-PSL Research University; Centre de recherche en informatique-MINES ParisTech-PSL Research University; LIP6, Sorbonne Université, CNRS, UMR 7606. 2017 |
Popis: |
Corner detection is a key kernel for many image processing procedures including pattern recognition and motion detection. The latter, for instance, mainly relies on the corner points for which spatial analyses are performed, typically on (probably live) videos or temporal flows of images. Thus, highly efficient corner detection is essential to meet the real-time requirement of associated applications. In this paper, we consider the corner detection algorithm proposed by Harris, whose the main work-flow is a composition of basic operators represented by their approximations using 3 × 3 matrices. The corresponding data access patterns follow a stencil model, which is known to require careful memory organization and management. Cache misses and other additional hindering factors with NUMA architectures need to be skillfully addressed in order to reach an efficient scalable implementation. In addition, with an increasingly wide vector registers, an efficient SIMD version should be designed and explicitly implemented. In this paper, we study a direct and explicit implementation of common and novel optimization strategies, and provide a NUMA-aware parallelization. Experimental results on a dual-socket INTEL Bradwell-E/EP show a noticeably good scalability performance. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|