Studying the possibility of improving the function of a POS tagger system

Document Type : Research Paper

Author

Iranian Research Institute for Information Science and Technology (IranDoc)

Abstract

The aim of the present study is to check the possibility of improving the function of a POS tagger system via POS tag disambiguation of some of Persian noun and adjective homographs ending in <-ی>. The case study in present research is HAZM.The POS tag disambiguation program is based on some context-sensitive rules. the mentioned rules were extracted from Bijan Khan corpus, Hazm was trained by Bijan Khan corpus. General evaluation of the mentioned POS disambiguation program indicates that if some of the context-sensitive rules which play a role in better POS tagging are added to HAZM, the general accuracy of HAZM rises to %95.691 which is considered %1.34 higher than the state of applying all rules

Keywords


-  علایی، الهام (1395). بررسی ساخت­واژی هم­نگاره­های اسمی و صفتی به منظور کمک به برچسب­دهی «اسم» به کلیدواژه­ها در پیکره­های علمی، پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک).
-  علایی، الهام (1395). رفع ابهام از برچسب نحوی هم­نگاره­های اسمی و صفتی فارسی، پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک).
-  علایی، الهام و محمود بی­جن­خان (1392). «عمق خط فارسی. پژوهش­های زبانی» (مجله سابق دانشکده ادبیات و علوم انسانی دانشگاه تهران)، دوره 4، شماره 1.
-  قیومی، مسعود (1395). «بررسی مقایسه­ای تأثیر برچسب­زنی مقوله­های دستوری بر تجزیه در پردازش خودکار زبان فارسی»، فصل­نامة پردازش علائم و داده­ها، 4 (30). 130-121
-  محسنی، مهدی (1387). سیستم برچسب­گذاری و ابهام­زدایی خودکار اجزای کلام برای پیکره متنی زبان فارسی، دانشگاه علم و صنعت. دانشکده مهندسی کامپیوتر
-  محسنی، مهدی؛ مینایی بیدگلی، بهروز (1388). سیستم برچسب­گذاری اجزای واژگانی کلام در زبان فارسی. دو فصل­نامه پردازش علائم و داده­ها، 2 (12).
-  Assi, M. and Haji Abdolhosseini, M. (2000). “Grammatical tagging of a Persian corpus”, International journal of corpus linguistics. 5 (1): 69-82.
-  Felipe, M.M., and Zamorano, J.P. (2000). POS disambiguation and Partial Parsing Bidirectional interaction. LREC
-  Indrebo, K., Tao, J. and Trawicki, M. (2005). Automatic word sense disambiguation (WSD) system. Marquette University.
-  Jurafsky, D., & Martin, J. H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed., Prentice Hall Series in Artificial Intelligence). Upper Saddle River, New Jersey: Pearson Education.
-  Klyshinsky, E.S., Kochetkova, N.A., Litvinov, M.I., and Maximov, V.Yu. (2011). Method of POS disambiguation using information about words co-occurrence (for Russian). Multilingual Resources and multilingual Applications. 191.
-  Megerdoomian, Karine (2004). Developing a Persian part-of-speech tagger. Proceedings of the 1st workshop on Persian language and computer, 99-105.
-  Montoyo, A., Suarez, A., Rigau, G., and Palomar, M. (2005). “Combining knowledge –and corpus –based Word-Sense-Disambiguation methods”. Journal of Artificial Intelligence Research. 23: 299-330.
-  Pakray, P. and Majumder, G. (2016). NLP-NITMZ: part-of-speech tagging on Italian social media text using Hidden Markov Model. Shared Task On Postwita. 3RD Italian conference on computational linguistics. CLiC.
-  Ribeiro, R., Oliveira, L., and Trancoso, I. (2002). Morphosyntactic disambiguation for TTS systems. LREC
-  Vorontsov, A. (2004). Quality improvement of POS tagging in industrial text processing systems. National Conference on Modeling and Simulation, MS.
-  Wilks, Y and Stevenson, M. (1998). Word sense disambiguation using optimized combinations of knowledge sources. Proceedings of the 17th international conference on computational linguistics and the 36th annual meeting of the association for computational linguistics (COLING-ACL` 98). Montreal, Canada. 2: 1398-1402.
-  Zhao, Q. and Marcus, M. (2009). A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon. Proceedings of the conference on Empirical Methods in Natural Language Processing. Association for computational linguistics, 2: 688-697.