Understanding 3D vision as a policy network (preprint)

Autor: Andrew Glennerster
Rok vydání: 2022
DOI: 10.31219/osf.io/u243p
Popis: A 'policy network' is a term used in reinforcement learning to describe the set of actions that are generated by an agent depending on its current state. This is an untypical starting point for describing 3D vision, but a policy network can serve as a useful representation both for the 3D layout of a scene and the location of the observer within it. It avoids 3D reconstruction of the type used in computer vision but is similar to recent representations for navigation generated through reinforcement learning. A policy network for saccades (pure rotations of the camera/eye) is a logical starting point for understanding (i) an ego-centric representation of space (e.g. Marr's 2.5D sketch and (ii) a hierarchical, compositional representation for navigation. The neural implementation of policy networks in general is uncontroversial whereas possible implementations of 3D coordinate transformations in the brain have yet to be developed. Hence, if the representation underlying 3D vision can be described as a policy network (in which the actions are either saccades or head translations), this would be a significant step towards a neurally plausible model of 3D vision.
Databáze: OpenAIRE