Popis: |
At XML conferences, most discussion of scriptural markup revolves around formats like OSIS and other TEI formats that are not widely used in the Bible translation community. The Bible translation community cares deeply about its translation data, and has developed a backslash-delimited markup language called USFM that is well-suited for marking up Scripture for Bible translation and publishing. It has also developed an XML-based equivalent called USX that is suitable for electronic publication. Neither of these languages is closely related to TEI or its conventions. Although USFM is well-suited for representing Scripture, it is not well-suited for representing lexicons, USFM handbooks, commentaries, translation handbooks, critical apparatus, and many other kinds of resources that translators use as they work. This has been a bottleneck for making some kinds of resources available to translators. Twenty years ago, a team that included well-known XML professionals designed OSIS to meet the needs of this community, but despite the technical merits of OSIS, the translation community continued to use USFM and USX instead. This paper explores the reasons that caused this community to choose USFM and USX, ways to leverage XML to provide reference materials to working translators, and reference systems needed to relate resources to each other. In the course of this paper, we will explore a wide variety of formats designed by different communities with different tastes for different purposes, including USFM, USX, XML, JSON, YAML, and CSV/TSV. All of these are text-based formats that support Unicode and allow data to be clearly labeled. Of course, life would be simpler if all data were created in the same format, but as long as common reference systems can make relationships among data clear, this variety of formats is not particularly problematic. The structure and relationships in the data are more important than the physical format. This paper discusses these issues in the context of Paratext, software actively used by over 10,000 working Bible translators in more than 2,900 languages. It explains the value of USFM, but also the problems caused by its lack of extensibility and the ways that Paratext is using XML to overcome that problem. This paper also gives some real-world examples of mediating among different formats to create resources that work well together, respecting the right of data creators to use formats that work for them. These same issues also occur outside of Paratext in systems that query or process the same kinds of data in other environments that do not use USFM. The same reference systems used to enable XML inside Paratext can also be used to integrate XML formats outside of Paratext and to create new resources that can be used in a wide variety of systems. |