![]() ![]() The phrase structure detection through prosodic break index labeling provides accuracies of 84% and 87% on the two corpora, respectively. The proposed maximum entropy acoustic-syntactic model achieves pitch accent and boundary tone detection accuracies of 86.0% and 93.1% on the Boston University Radio News corpus, and, 79.8% and 90.3% on the Boston Directions corpus. The proposed model is trained discriminatively and is robust in the selection of appropriate features for the task of prosody detection. Our framework utilizes novel syntactic features in the form of supertags and a quantized acoustic-prosodic feature representation that is similar to linear parameterizations of the prosodic contour. We apply the proposed framework to both prominence and phrase structure detection within the Tones and Break Indices (ToBI) annotation scheme. In this paper, we describe a maximum entropy-based automatic prosody labeling framework that exploits both language and speech information. The synthesis technique used is Concatenation to concatenate the words or combination of characters and play corresponding file from database. Work on TTS for Marathi Language is also done but not with natural prosody effect. Till now TTS for many languages is done like English, Maindrain, Telgu etc. In this paper, we implemented prosody model for Marathi Language TTS synthesis with unit search and selection speech database. It is thus more suitable to define Text-To-Speech as the automatic production of speech, through a grapheme-to-phoneme transcription of the sentences to utter. In the context of TTS synthesis, it is impossible (and luckily useless) to record and store all the words of the language. Systems that simply concatenate isolated words or parts of sentences, denoted as Voice Response Systems, are only applicable when a limited vocabulary is required (typically a few one hundreds of words), and when the sentences to be pronounced respect a very restricted structure, as is the case for the announcement of arrivals in train stations for instance. A Text-To-Speech (TTS) synthesizer is a computer-based system that should be able to read any text aloud, whether it was directly introduced in the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |