ABSTRACT
Text-based speech editing (TSE) allows users to edit speech by modifying the corresponding text directly without altering the original recording.
Current TSE techniques often focus on minimizing discrepancies between generated speech and reference within edited regions during training to
achieve fluent TSE performance. However, the generated speech in the edited region should maintain acoustic and prosodic consistency with the
unedited region and the original speech at both the local and global levels. To maintain speech fluency, we propose a new fluency speech editing
scheme based on our previous FluentEditor model, termed FluentEditor2, by modeling the multi-scale acoustic and prosody
consistency training criterion in TSE training. Specifically, for local acoustic consistency, we propose Hierarchical Local Acoustic
Smoothness Constraint (\( \mathcal{L}_{H L A C} \) ) to align the acoustic properties of speech frames, phonemes, and words at the boundary between the generated speech in
the edited region and the speech in the unedited region. For global prosody consistency, we propose Contrastive Global Prosody
Consistency Constraint (\( \mathcal{L}_{G C P C} \) ) to keep the speech in the edited region consistent with the prosody of the original utterance. Extensive experiments
on the VCTK and LibriTTS datasets show that FluentEditor2 surpasses existing neural networks-based TSE methods, including
Editspeech, Campnet, \( A^3T \), FluentSpeech, and our Fluenteditor, in both subjective and objective. Ablation studies further highlight
the contributions of each module to the overall effectiveness of the system. Speech demos are available
at: https://github.com/Ai-S2-Lab/FluentEditor2
Speech Demo
Dataset: VCTK and LibriTTS
Operations: Insertion, Replacement and Deletion
1. Operation of text-based speech editing based on FluentEditor2
Insertion
item_name | GT | FluentEditor2 |
---|---|---|
p274_339 |
![]() original_text:There is a handful of rewarding paintings . |
![]() edited_text: There is a handful of rewarding but challenging paintings . |
4640_19187_000026_000005 |
![]() original_text:The mouse , plus the cat , is the proof of creation revised and corrected . |
![]() edited_text: The mouse , plus the cat , is the ultimate proof of creation revised and corrected . |