E-library page

AES E-Library

Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion

Prosody conversion is an important part in voice conversion, where fundamental frequency (F0), which carries important speaker individuality information (e.g., tone, intonation, etc.), is regarded as one of the key prosodic features in the excitation model for speech synthesis. In a conventional approach based on continuous wavelet transform for modeling F0, analysis is carried out on a frame level and is prone to losing high-frequency information in the process of decomposition and reconstruction. In order to address this problem, the paper shows a representation of long-term fundamental frequency based on Wavelet Packet Transform (WPT). Specifically, the long-term F0 is decomposed usingWPT, and a joint vector is formed by combining the resulted average power spectrum. Furthermore, the method is applied in a voice conversion system. Voice conversion experiments are conducted on Chinese and English speech data to evaluate the performance of the proposed method. The results show that the proposed method is obviously better than the method based on wavelet transform in all conversion scenarios but performs a little worse than the method based on mean and variance in same-gender conversion scenario.

Author (s): He, Weijun; Zhao, Yongyong; Lin, Pei; He, Yuxin; Feng, Qi
Affiliation: School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China (See document for exact affiliation information.)
Publication Date: 2024-03-06 Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=22387

(577KB)

This paper costs $33 for non-members and is free for AES members and E-Libary subscribers.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type: Journal Article
E-Libary location: (CD JAES72) TMP/JAES72/3/

Learn more about the AES E-Library

About AES

Code of Conduct

AES Conventions

AES Conferences

AES Training & Development

Gift Membership

AES Membership Benefits

Gift Membership

AES Membership Benefits

Become a Sustaining Member

AES Membership Benefits

AES Inside Track

Current Standards

Standards Blog

Journal of the AES

AES E-library

Special Publications

AES Sections are active around the world and provide a means for members to meet locally.

AES Student Website

AES Educational Foundation

Student Sections

See the committee’s accomplishments in diversity & inclusion

AES Statement of solidarity

AES E-Library

Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion

Choose your country of residence from this list:

AES E-Library

Login Institutions

Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion

Choose your country of residence from this list: