profile
viewpoint

Tomcattwo/xwing_ai2 2

2nd Edition and HotAC AI addition to Web based AI for X-wing Miniatures game by Wyzbang {Ralph Berrett)

Tomcattwo/Real-Time-Voice-Cloning 1

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Tomcattwo/xwing-miniatures-font 0

Web font for the X-Wing Miniatures game.

issue commentCorentinJ/Real-Time-Voice-Cloning

Pretrained Models Using Datasets Other Than LibriSpeech?

@blue-fish , I have not started on this project yet. I have a few other projects (semi-related) working now. I read the TTS Corpus paper and it sounds interesting. Frankly I have gotten very good results from the single voice trained models I am using for my current prohect, but there's always room for improvement. I would love to be able to "help" the synthesizer using punctuation to tell it where to place the emphasis on a syllable or syllables in multi-syllabic a word... I would like to give a TTS-built-from-scratch synthesizer base a try once I get some of these other projects behind me. I will let you know when I start and I will keep you apprised of progress. No doubt I will hit some snags and will solicit your always-helpful advice. Regards, Tomcattwo

Tomcattwo

comment created time in a month

issue commentCorentinJ/Real-Time-Voice-Cloning

Pretrained Models Using Datasets Other Than LibriSpeech?

@blue-fish said: #I suggest using both train-clean-100 and 360 to more closely match the training of the pretrained models.#

How can I use both? Do I run training for 100k steps on train-cleanl100 then train another 200k steps using train-clean-360 on top of that? Or can I simply combine them both together in my datasets_root and train the combination once to 300k steps?

#If you decide to pursue this, good luck and please consider sharing your models.# Absolutely, assuming that the models come out sounding good. Happy to share plots, mid-training .wavs etc. upon request. Regards, TC2

Tomcattwo

comment created time in a month

issue commentCorentinJ/Real-Time-Voice-Cloning

Pretrained Models Using Datasets Other Than LibriSpeech?

@blue-fish , thanks for the reply. If I decide to go forward on this effort, I would plan to use train-clean-360. Easier to download, smaller size. After reading #449 , I agree that limiting max_mel_frames to 500 is a good idea. Thanks also for the accelerated training hparams info. R/ TC2

Tomcattwo

comment created time in a month

issue commentCorentinJ/Real-Time-Voice-Cloning

Pretrained Models Using Datasets Other Than LibriSpeech?

@blue-fish , thank you for the reply. If I were to try to train all three models (voice encoder, synthesizer and vocoder) from scratch, using LibriTTS, would you recommend using train-clean-100 or train-clean-500? My understanding from reading the doctoral papers and Corentin's remarks, is that for voice encoder you need lots of voices and quality is less important than quantity and for synthesizer and vocoder quality>quantity. If I were to do this, training the synthesizer alone would take a week, but I may give it a go.

Any hints, tips or settings for hparams you could share for such a project would be greatly appreciated. If I decide to try this, I would shoot for 300k steps to get to a 1e05 learning rate. Also, I have not tried any voice encoder training yet using this repo. Any helpful information or hparams for that evolution you could share?

I need to do a bit of research first on LibriTTS to see what it can and cannot do wrt punctuation. If it will be no better than the current LibriSpeech trained model, it may not be worth the time or effort. Your thoughts would be appreciated. Regards, TC2

Tomcattwo

comment created time in a month

startedCorentinJ/Real-Time-Voice-Cloning

started time in a month

issue openedCorentinJ/Real-Time-Voice-Cloning

Pretrained Models Using Datasets Other Than LibriSpeech?

Hello all, @blue-fish , I had very good success on my project to clone 14 voices from a computer simulation (samples available here ) using single-voice training (5000 additional steps) on the LibriSpeech pretrained synthesizer (295K) and Vocoder.

However, I would like to see if another model (in English) might provide better output reproducibility, and perhaps punctuation recognition and some better degree of emotion (perhaps with LibriTTS or some newer corpus that I am not aware of yet). Are you aware of any pretrained speech encoder/synthesizer/vocoder models built on another dataset that might be available for download? I tried building synthesizer and vocoder single voice training on the synthesizer LibriTTS model from your single voice training instructions, but only got garbled output in the demo_toolbox, probably due to the fact that the speech encoder was built on LibriSpeech and not on LibriTTS . Any info you or anyone else might have on a potential model-set download would be greatly appreciated. Thanks in advance, Tomcattwo

created time in a month

push eventTomcattwo/Real-Time-Voice-Cloning

Tomcattwo

commit sha 7432046efc23cabf176f9fdc8d2fd67020059478

Minor bug fixes and changes for improved Windows compatibility

view details

Tomcattwo

commit sha 7d8e690b0bc2fb10255b12eeed76469737424e1a

Merge branch 'CorentinJ:master' into master

view details

push time in a month

issue commentCorentinJ/Real-Time-Voice-Cloning

Tutorial: Windows installation

@CarvellScott You can install ffmpeg via sourceforge in Anaconda using this command: conda install -c conda-forge ffmpeg-python See here for details: [(https://anaconda.org/conda-forge/ffmpeg-python)] Regards, Tomcattwo

car1ot

comment created time in 2 months

more