ABSTRACT
Prosodic phrasing is crucial to the naturalness and intelligibility of end-to-end Text-to-Speech (TTS). There exist both linguistic and emotional prosody in natural speech. As the study of prosodic phrasing has been linguistically motivated, prosodic phrasing for expressive emotion rendering has not been well studied. In this paper, we propose an emotion-aware prosodic phrasing model, termed \textit{EmoPP}, to mine the emotional cues of utterance accurately and predict appropriate phrase breaks. We first conduct objective observations on the ESD dataset to validate the strong correlation between emotion and prosodic phrasing. Then the objective and subjective evaluations show that the EmoPP outperforms all baselines and achieves remarkable performance in terms of emotion expressiveness. The audio samples and the code are available at https://github.com/AI-S2-Lab/EmoPP.
SPEECH DEMO
To further validate our EmoPP in terms of human perception, we build two emotional TTS systems that take both input text and the phrase breaks information as input. The phrase break information of the first system is obtained by the BiLSTM model, while the second is obtained by our EmoPP. The emotional TTS is trained with an emotional conversational TTS dataset, DailyTalk, by following this project: https://github.com/keonlee9420/DailyTalk.
Note: We attempted to train the emotional TTS model using the IEMOCAP dataset. However, the synthesized speech produced significant noise. Since IEMOCAP was not originally designed for TTS purposes, it is not optimal for our subjective test.
Utterances | Emotion |
BiLSTM We predict the phrase breaks using the BiLSTM model. |
EmoPP We predict the phrase breaks using the EmoPP model. |
---|---|---|---|
oh my god, what are you going to do. | surprise |
oh my god#, what# are you going to do.
|
oh my god# what are you going to do.
|
a rapper party, ho yeah, okay. | surprise |
a rapper party ho yeah# okay.
|
a rapper# party# ho# yeah# okay.
|
i guess we don't need glasses. | happy |
i guess# we don't need glasses.
|
i guess we don't need glasses.
|
nonsense, they have a bag of venom behind their fangs and they snap, they snap. | angry |
nonsense# they have a bag of venom# behind their fangs# and they snap they snap.
|
nonsense# they have a bag of venom behind their fangs and they snap# they snap.
|
uh, huh, i didn't come here, get in yelling match either. | neutral |
uh# huh# i didn't come here get in yelling match either.
|
uh# huh# i didn't come here# get in yelling match either.
|
oh, yeah, absolutely absolutely. | neutral |
oh yeah# absolutely absolutely.
|
oh# yeah# absolutely absolutely.
|
my computer which has all of my data which i'm collecting right now. | angry |
my computer which has all of my data which# i'm collecting right now.
|
my computer which has all of my data which i'm collecting right now.
|
just kind of feel numb, you know. | sad |
just kind of feel numb# you know.
|
just kind of feel numb# you know.
|
you have a business here, i said, what the hell is this. | angry |
you have a business here# i said what the hell is this.
|
you have a business here# i said# what the hell is this.
|
they don't know why, we don't know why, no one like sent them an invitation or gave them a map or direction. | surprise |
they don't know why# we don't know why no one like sent them an invitation or gave them a map or direction.
|
they don't know why# we don't know why# no one like sent them an invitation or gave them a map or direction.
|
well, so what do you think. | surprise |
well# so what do you think.
|
well# so what do you think.
|
are you cold, huh, do you want to go home. | neutral |
are you cold huh# do you want to go home.
|
are you cold# huh# do you want to go home.
|
yeah, it's pretty good. | happy |
yeah# it's pretty good.
|
yeah# it's pretty good.
|
yeah, i mean, candles wouldn't stay--no, i didn't--i didn't know anything about it. | happy |
yeah# i mean candles wouldn't stay--no# i didn't--i didn't know# anything about it.
|
yeah# i mean# candles wouldn't stay--no# i didn't--i didn't know anything about it.
|
yea, i just want to get this done. | neutral |
yea# i just want to get this done.
|
yea# i just want to get this done.
|
yea, i guess so, oh, my gosh, was she surprised. | surprise |
yea# i guess so# oh my gosh was she surprised.
|
yea# i guess so# oh# my gosh# was she surprised.
|
oh, yes, they they, you know they love her, and so i mean. | happy |
oh# yes# they they# you know# they love her and so i mean.
|
oh# yes# they they# you know they love her# and so i mean.
|
yes, i mean, she cared about all of us, she was great. | sad |
yes# i mean she cared about all of us# she was great.
|
yes# i mean# she cared about all of us# she was great.
|
yeah, right a cult, i'm looking forward to being in the cult. | surprise |
yeah# right a cult i'm looking forward to being in the cult.
|
yeah# right a cult# i'm looking forward to being in the cult.
|
no, i'm just making myself fascinating for you. | neutral |
no i'm# just making myself fascinating for you.
|
no# i'm just making myself fascinating for you.
|