Popis: |
In this paper, the process of creating a Dependency Treebank for tweetsin Urdu,a morphologically rich and less-resourced languageis described. The 500 Urdu tweets treebank iscreated by manually annotating the treebank withlemma, POS tags, morphological and syntacticrelations using the Universal Dependencies annotation scheme, adopted to the peculiarities of Urdu social media text. annotation process is evaluated through Inter-annotator agreement for dependency relations and total agreement of 94.5% and resultant weighted Kappa = 0.876was observed. The treebank is evaluated through 10-fold cross validation using Maltparserwith various feature settings. Results show average UAS score of 74%, LAS score of 62.9% and LA score of 69.8%. |