Lexical analysis of 1337 speek

1/20/2024

Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model. Anthology ID: 2020.lt4hala-1.8 Volume: Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages Month: May Year: 2020 Address: Marseille, France Venue: LT4HALA SIG: Publisher: European Language Resources Association (ELRA) Note: Pages: 52–58 Language: English URL: DOI: Bibkey: cheng-etal-2020-integration Cite (ACL): Ning Cheng, Bin Li, Liming Xiao, Changwei Xu, Sijia Ge, Xingyue Hao, and Minxuan Feng. Based on the experimental results of each test set, the F1-score of sentence segmentation reached 78.95, with an average increase of 3.5% the F1-score of word segmentation reached 85.73%, with an average increase of 0.18% and the F1-score of part-of-speech tagging reached 72.65, with an average increase of 0.35%. Research shows that the integration method adopted in ancient Chinese improves the F1-score of sentence segmentation, word segmentation and part of speech tagging.

The BiLSTM-CRF neural network model is used to verify the generalization ability and the effect of sentence segmentation and lexical analysis on different label levels on four cross-age test sets. This paper designs and implements an integrated annotation system of sentence segmentation and lexical analysis.

However, step-by-step processing is prone to cause multi-level diffusion of errors. Tasks such as lexical analysis need to be based on sentence segmentation because of the reason that a plenty of ancient books are not punctuated.

Abstract The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition.

0 Comments

Lexical analysis of 1337 speek

Leave a Reply.

Author

Archives

Categories