Popis: |
It is important to segment a text, which is independent upon any text-embedded auxiliary information. This paper presents a technique for dividing the text into field-coherent passages. The presented method is based upon extracting field-associated terms from the text measuring how the topics grow, shrink and shift from sentence to sentence. We propose measures of topic continuity and of topic transition and suggest how those could be used to find the boundaries among passages. After collecting 12,500 documents, we obtain for average precision and for recall in Korean training set. |