You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Prosody conversion is an important part in voice conversion, where fundamental frequency (F0), which carries important speaker individuality information (e.g., tone, intonation, etc.), is regarded as one of the key prosodic features in the excitation model for speech synthesis. In a conventional approach based on continuous wavelet transform for modeling F0, analysis is carried out on a frame level and is prone to losing high-frequency information in the process of decomposition and reconstruction. In order to address this problem, the paper shows a representation of long-term fundamental frequency based on Wavelet Packet Transform (WPT). Specifically, the long-term F0 is decomposed usingWPT, and a joint vector is formed by combining the resulted average power spectrum. Furthermore, the method is applied in a voice conversion system. Voice conversion experiments are conducted on Chinese and English speech data to evaluate the performance of the proposed method. The results show that the proposed method is obviously better than the method based on wavelet transform in all conversion scenarios but performs a little worse than the method based on mean and variance in same-gender conversion scenario.
Author (s): He, Weijun; Zhao, Yongyong; Lin, Pei; He, Yuxin; Feng, Qi
Affiliation:
School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China
(See document for exact affiliation information.)
Publication Date:
2024-03-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=22387
(577KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
He, Weijun; Zhao, Yongyong; Lin, Pei; He, Yuxin; Feng, Qi; 2024; Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion [PDF]; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=22387
He, Weijun; Zhao, Yongyong; Lin, Pei; He, Yuxin; Feng, Qi; Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion [PDF]; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; Paper ; 2024 Available: https://aes2.org/publications/elibrary-page/?id=22387
@article{he2024long-term,
author={he weijun and zhao yongyong and lin pei and he yuxin and feng qi},
journal={journal of the audio engineering society},
title={long-term fundamental frequency modeling based on wavelet packet transform for voice conversion},
year={2024},
volume={72},
issue={3},
pages={161-169},
month={march},}