Popis: |
Here we present a Hungarian corpus of spontaneous speech texts produced by patients with schizophrenia, schizoaffective or bipolar disorder, as well as those of healthy controls. Recordings which were later transcribed were produced in three different directed spontaneous speech tasks in a clinical environment. The survey was carried out involving 90 subjects and 526 texts were produced. Then, the collected recordings were manually transcribed by our research group. The written corpus texts were processed with a set of Natural Language Processing methods and tools. The final corpus consists of 158,386 tokens all together, without punctuation. During the data processing procedure, we also applied specific lexicons to enable us to examine linguistic intensification in the case of mental disorders. The dataset can be utilized in several related research tasks, like semantic-pragmatic analyses and in the automatic discrimination of the patients and the controls using our linguistic features. |