A Maximum Entropy Approach to Identifying Sentence Boundaries
Autor: | Jeffrey C. Reynar, Adwait Ratnaparkhi |
---|---|
Jazyk: | angličtina |
Rok vydání: | 1997 |
Předmět: |
FOS: Computer and information sciences
Sentence boundary disambiguation Computer Science - Computation and Language Computer science business.industry Principle of maximum entropy Speech recognition media_common.quotation_subject Retraining Boundary (topology) computer.software_genre ComputingMethodologies_PATTERNRECOGNITION Simplicity Artificial intelligence business computer Computation and Language (cs.CL) Sentence Natural language processing media_common |
Zdroj: | ANLP |
Popis: | We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of ., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Roman-alphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains. 4 pages, uses aclap.sty and covingtn.sty |
Databáze: | OpenAIRE |
Externí odkaz: |