Prosodic Focus Modeling in Persian: An Articulatory–Functional Approach

Document Type : Research Paper

Authors

1 Institute for Humanities and Cultural Studies

2 Sharif University of Technology

3 Tehran University

Abstract

This paper is an attempt to model Persian prosodic focus using an articulatory-functional approach, i.e., parallel encoding and target approximation (PENTA). The modeling was done on 150 utterances produced in different focus conditions using PENTAtrainer2 in Praat software. PENTAtrainer2 is a trainable prosody synthesizer that optimizes categorical pitch targets each corresponding to multiple communicative functions. The appraisal was carried out through numerical and subjective evaluations by comparing the F0 trajectories generated by the extracted pitch targets to those of natural utterances. The numerical results showed that the synthesized F0 contours were close to the natural ones in terms of RMSE (= 1.94) and correlation coefficient (= 0.84). Subjective evaluation also showed that the rate of focus identification and naturalness judgment were highly similar between synthetic and natural F0 trajectories.

Keywords

Main Subjects


-  اسلامی، محرم (1384)، واج‌شناسی، نظام تحلیل آهنگ زبان فارسی. تهران، انتشارات سمت.
-  طاهری ‌اردلی، مرتضی و سهیل خرم (1391)، «مدل‌سازی نوای گفتار در سیستم‌های سنتز گفتار فارسی»، مجموعه مقالات هشتمین همایش زبانشناسی ایران، به کوشش محمد دبیرمقدم، تهران: دانشگاه علامه طباطبایی، صفحات 480-492.
-  طاهری ‌اردلی، مرتضی، خرم، سهیل، عاصی، مصطفی، صامتی، حسین و محمود بی‌جن‌خان (زیر چاپ)، «طراحی و ضبط پایگاه‌دادگان گفتاری برای سیستم‌های تبدیل متن به گفتار فارسی»، دو فصلنامه علمی- پژوهشی پژوهش‌های زبان‌شناسی تطبیقی.
-  نم­نبات مجید و عباس کوچاری (1385)، «تخمین منحنی گام در زبان فارسی برای یک سیستم تبدیل متن به گفتار با کمک درخت کلاس­بندی و رگرسیون»، مجموعه مقالات دوازدهمین کنفرانس ملی انجمن کامپیوتر ایران.
-  نم­نبات مجید و عباس کوچاری (1386)، «استخراج اتوماتیک پارامترهای مدل فوجی‌ساکی برای زبان فارسی»، مجموعه مقالات پانزدهیمن کنفرانس مهندسی برق ایران.
-  Abolhasanizadeh, V., Bijankhan, M., & Gussenhoven, C. (2012), "The Persian pitch accent and its retention after the focus", Lingua, 122(13), 1380-1394.
-  Bijankhan, M., Sheikhzadegan, M. J., Roohani, J., Samareh, Y., Lucas, C., & Tebyani, M. (1994), "FARSDAT-the Farsi spoken language database", Paper presented at the Proceedings of International Conference on Speech Sciences and Technology.
-  Boersma, P. & D. Weenink (2001), "Praat, a system for doing phonetics by computer." Glot international (5): 341–345.
-  Chen, Y., Guion-Anderson, S., & Xu, Y. (2012), "Post-Focus compression in second language Mandarin". Speech Prosody 2012, Shanghai.
-  Ferguson, C. (1957), "Word stress in Persian". Language 33, 123-135.
-  Fujisaki, H. & K. Hirose (1984), "Analysis of voice fundamental frequency contours for declarative sentences of Japanese." Journal of the Acoustical Society of America 5(4): 233-242.
-  Fujisaki, H. & S. Nagashima (1969), "A model for the synthesis of pitch contours of connected speech." Annual Report of the Engineering Research Institute 28: 53-60.
-  Jun, S.-A. (2005), Prosodic Typology: The Phonology of Intonation and Phrasing. Oxford: Oxford University Press.
-  Kahnemuyipour, A. (2003), "Syntactic categories and Persian stress." Natural Language & Linguistic Theory, 21(2), 333-379.
-  Ladd, R. (2008). Intonational Phonology. Cambridge, Cambridge University Press.
-  Lazard, G. (1957), Grammaire du Persan Contemporain. Klincksieck, Paris, New Edition published by Peeters, Paris, 2006.
-  Lee, A., Xu, Y., & Prom-on, S. (2014), "Modeling Japanese F0 contours using the PENTAtrainers and AMtrainer". Fourth International Symposium on Tonal Aspects of Languages. Nijmegen, Netherlands.
-  Mahjani, B. (2003), An Instrumental Study of Prosodic Features and Intonation in Modern Farsi (Persian), M.Sc. thesis, retrieved from:
-  Pierrehumbert, J. B. (1980), The phonology and phonetics of English intonation. (PhD dissertation), Massachusetts Institute of Technology.
-  Prom-on, S., Xu, Y., Thipakorn, B. (2009), "Modeling tone and intonation in Mandarin and English as a process of target approximation." The Journal of the Acoustical Society of America 125(1): 405–424.
-  Sadat-Tehrani, N. (2007), The Intonational Grammar of Persian. (PhD dissertation), University of Manitoba, Manitoba. 
-  Silverman, K. E., Beckman, M. E., Pitrelli, J. F., Ostendorf, M., Wightman, C. W., Price, P., & Hirschberg, J. (1992), "TOBI: a standard for labeling English prosody". In The Second International Conference on Spoken Language Processing, ICSLP 1992, Banff, Alberta, Canada, October 13-16.
-  Taheri-Ardali, M., Rahmani, H., & Xu, Y. (2014), "The perception of prosodic focus in Persian". Speech Prosody 2014. Dublin: 515-519.
-  Taheri-Ardali, M. & Y. Xu (2012), "Phonetic realization of prosodic focus in Persian". Speech Prosody 2012, Shanghai.
-  Taylor, P. (1992), A Phonetic Model of English Intonation. (PhD dissertation), University of Edinburgh, Edinburgh.
-  Taylor, P. (2000), "Analysis and synthesis of intonation using the tilt model." Journal of the Acoustical Society of America 107(4): 1697–1714.
-  Xu, Y. (2005), "Speech melody as articulatorily implemented communicative functions." Speech Communication 46(3–4): 220–251.
-  Xu, Y. & S. Prom-on (2014), "Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning." Speech Communication 57: 181–208.
-  Xu, Y. & E. Q. Wang (2001), "Pitch targets and their realization: Evidence from Mandarin Chinese." Speech Communication 33: 319-337.