[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuspeech-contact] Understanding diphones.mxml and improving vocalizati
From: |
Omari Stephens |
Subject: |
[gnuspeech-contact] Understanding diphones.mxml and improving vocalization quality |
Date: |
Fri, 26 Jan 2007 07:09:36 +0000 |
User-agent: |
Icedove 1.5.0.9 (X11/20061220) |
Hi, all
I'm part of a 5-person team at MIT that is participating in the class 6.189:
Multicore Programming Primer [1], a project based class in which we implement a
computationally intensive application on a parallel processor, the PlayStation
3's Cell architecture [2]. Put shortly, we are using gnuspeech as a reference
implementation for a speech synthesis implementation on the PS3.
I'm currently working on a stripped-down, non-interactive analog of Monet to
generate postures for the tube. It seems that everything I would need for this
is catalogued in diphones.mxml, but we're having trouble figuring out how to
calculate the transitions (that is, we're unsure how to use the rules,
transitions, and equations sections). Any specific help on this front or
pointers to useful spots in the source would be tremendously helpful.
Additionally, other group members are working on finding, implementing, and
hooking up a more realistic vocal fold model. From my own poking around on the
Internet, it seems that most of the models are two-mass models, but I haven't
read through anything in enough detail to know the differences between them.
Is there a model someone would recommend that would likely improve the
vocalization quality but also could be coded in a reasonable amount of time?
(Hopefully a day or less) We will probably implement this in C or C++, and may
put more hands on this part of the project if the benefits merit that sort of
attention. Our final product is due this coming Friday, 2 Feb.
Lastly, what other changes could we make to improve the vocalization quality?
I had thought of perhaps emulating smoother transitions between the different
vocal tract regions, but I know neither if this is feasible time-wise, nor if
it will make an appreciable difference/improvement in output sound quality.
[1] http://cag.csail.mit.edu/ps3/
[2] http://en.wikipedia.org/wiki/Cell_microprocessor
Thanks very much for your time and any help you all may be able to offer.
--xsdg, for the 6.189 Speech Synthesis team
signature.asc
Description: OpenPGP digital signature
- [gnuspeech-contact] Understanding diphones.mxml and improving vocalization quality,
Omari Stephens <=