Managing interlingual references - a type generic approach

Autor: Menge, Sebastian
Přispěvatelé: Doberkat, Ernst-Erich, Jürjens, Jan
Jazyk: angličtina
Rok vydání: 2011
Předmět:
DOI: 10.17877/de290r-3026
Popis: This thesis presents a framework to make ubiquitous low level references between arbitrary constructs in source code given in arbitrary programming languages explicit. While the problems that arise due to these implicit interlingual references are well-known to practitioners, there is no adequate tool-based solution up to today. The reason is, that such a tool needs to be capable to analyze source code in many languages and that the choice of these languages is subject to the specific requirements of a project: The tool has to be parametric in the languages themselves. The concept of datatype generic programming, developed in the functional programming community in recent years, builds up on ideas from category theory and there are working implementations especially in the Haskell-community. This approach finally allows to write type-safe software engineering tools that can be reused for (i.e. parametrized by) many languages. After the presentation of the underlying machinery and its application to real-life software engineering, we define these implicit interlingual references as links between specific subtrees in abstract syntax trees of possibly different languages. The notion of consistency for such a pair is then the definition of a function that maps two arbitrary subterms to a Boolean value. Based on this definition, we develop a framework that allows to manage such references, i.e. we can define, check and adapt them in a type-safe way. Finally, we perform a case study that proves that our approach works for real life languages and projects. We highlight the contributions of this work in the field of tension between theory and application: A theme that often reoccurs in scientific software engineering is abstraction - we seek for solutions that are independent of application specific context. But software engineering is about engineering, thus there are real-life problems in real-life applications that have to be solved. That means we have to identify a practical problem, abstract from everything unnecessary, find a solution, and bring that solution back into practice. This is quite a long way, and especially the last step is often overseen. In our case, the practical problem is well known among practitioners. At the same time, the abstract theories of programming languages and the relations to the even more abstract realms of algebra and category theory are well known to computer scientists and mathematicians for a long time (the fixed point result of Lambek dates back to 1968). In this work, we start with the problem of inconsistencies between artifacts of a different kind. Because the underlying references are interlingual, we need a consistent formal framework to formulate the problem. We express the underlying artifacts as terms that are typed with some algebraic datatype. This is implemented using Haskell which has both algebraic datatypes and a lot of parsers and general infrastructure. To argue about references between terms of arbitrary algebraic datatypes, we need an accessible specification of the signatures themselves. Formally this specification of specifications can be expressed using category theory: The notion of a functor that specifies the structure of a datatype is central in this respect and we find the according implementation in Haskell under the term "datatype generic programming". We use this as the technical basis of the prototype. In summary, the contribution of this thesis is not only the development of a framework that solves a known problem in a quite complicated way (though we are not aware of other more promising solutions) but also an example of the complete way from a practical problem to the deep theoretical formalization and back again to a practical solution.
Databáze: OpenAIRE