In Computational Linguistics, corpus is a collection of written texts or spoken materials in machine-readable form, assembled for the purpose of studying linguistic structures, language changes over time as well as natural language processing projects. In this paper, we focus on designing a bilingual corpus. This corpus is made automatically and it consists of resources and documents in ICT domain. We developed a software framework for building textual corpus to reduce the cost and construction time. In addition, this software provides corpus management capabilities. We also proposed an alignment method for Persian-English ICT corpus. Our goal is to design an alignment system for the extraction of corresponding sentences. In this method, we deployed a bilingual dictionary and artificial intelligence techniques in order to calculate score representing the similarity between two sentences. then, we automatically map each pair of sentences in both languages.
Dashtbani, S. , Mansoorizadeh, M. and Nassiri, M. (2015). ICT English-Persian comparable textual corpus. Comparative Linguistic Research, 4(8), 121-141.
MLA
Dashtbani, S. , , Mansoorizadeh, M. , and Nassiri, M. . "ICT English-Persian comparable textual corpus", Comparative Linguistic Research, 4, 8, 2015, 121-141.
HARVARD
Dashtbani, S., Mansoorizadeh, M., Nassiri, M. (2015). 'ICT English-Persian comparable textual corpus', Comparative Linguistic Research, 4(8), pp. 121-141.
CHICAGO
S. Dashtbani , M. Mansoorizadeh and M. Nassiri, "ICT English-Persian comparable textual corpus," Comparative Linguistic Research, 4 8 (2015): 121-141,
VANCOUVER
Dashtbani, S., Mansoorizadeh, M., Nassiri, M. ICT English-Persian comparable textual corpus. Comparative Linguistic Research, 2015; 4(8): 121-141.