ICT English-Persian comparable textual corpus

Dashtbani, Shokoofe; Mansoorizadeh, Muharram; Nassiri, Mohammad

ICT English-Persian comparable textual corpus

Document Type : Research Paper

Authors

Abstract

In Computational Linguistics, corpus is a collection of written texts or spoken materials in machine-readable form, assembled for the purpose of studying linguistic structures, language changes over time as well as natural language processing projects. In this paper, we focus on designing a bilingual corpus. This corpus is made automatically and it consists of resources and documents in ICT domain. We developed a software framework for building textual corpus to reduce the cost and construction time. In addition, this software provides corpus management capabilities. We also proposed an alignment method for Persian-English ICT corpus. Our goal is to design an alignment system for the extraction of corresponding sentences. In this method, we deployed a bilingual dictionary and artificial intelligence techniques in order to calculate score representing the similarity between two sentences. then, we automatically map each pair of sentences in both languages.

Keywords

Article View: 3,416
PDF Download: 1,733

ICT English-Persian comparable textual corpus

Files

Share

How to cite

Statistics