Halfbakery: Themeable Speech Synthesis Accents

Computer: Speech: Synthesis
Themeable Speech Synthesis Accents (+3, -1) [vote for, against]
Have your computer talk to you in the accent of your choice.

Speech synthesis has advanced somewhat over the past few decades; today's speech synthesizers sound somewhat less coarse than, say, SAM on the Commodore 64. However, one fundamental aspect of speech synthesis that hasn't changed is the accent. Most speech synthesisers speak in a generic deregionalised American accent, and while you can change the lowest-level voice parameters (pitch and formants; i.e., the easy stuff), you can't change the accent. So you can have a male or female or insectoid alien voice, but they always sound like they've been raised in isolation on American TV shows.

In recent years, personalisability has been a trend in computers. Desktops can be themed, apps can be skinned, and so on. And as computers have grown more powerful, speech synthesis has become bundled with the OS. The Amiga started it in the 80s, but everybody ignored that; then Apple put MacinTalk into MacOS, and now Microsoft have put a speech synthesizer into Windows XP.

The next logical step would be to allow speech synthesizer accents to be customised. This could be done by having the process which transforms words into phonemes look at an external accent table as well as its internal data. The accent tables could consist of transformation rules; rules in the external one would take precedence, and where nonexistent, a default rule could be applied. Accents could be user-authored and swapped on the web, or purchased/licensed from professional designers. Thus, if you want your computer to sound BBC British or New York Jewish or Northern Irish, you'd just need to download the appropriate accent theme and install it.

To avoid reinventing the wheel, accents could be implemented with inheritance and dependencies; for example, there could be an American ruleset and a British ruleset; a Boston accent could depend on the American ruleset, tweaking its rules where necessary, and a Mancunian or Estuary English accent could similarly depend on the British ruleset. (The implementation of other accents, such as the Australian one, is left as an exercise to the reader.)

If the rule format is standardised (perhaps encoded in XML format with a published schema), the accents can be made interchangeable among platforms.
-- acb, Feb 21 2002

AT&T voice font fluff piece http://www.newsfact...rl/story/12416.html
"The library of voices, he said, will allow users to pick regional dialects -- Northeastern United States, Southern California -- as desired, and enventually to choose other lanuages." (Cringe; different languages and different dialects are unrelated problems, and doing the latter doesn't get you any closer to the former.) [jutta, Feb 21 2002]

The 'Estuary English' version would have no need for the letter T.
-- angel, Feb 21 2002

Agh! Misread this as "Thereminable Speech Synthesis Accents" - potentially a much more interesting idea.
My nice shiny iMac seems to have about 30 differently-accented voices it can talk in so this might be baked.
-- hippo, Feb 21 2002

I've used text-to-speech engines with different rulesets for British and US-English. Voice fonts already exist. (AT&T has done some work and productization there.)

It's not as good as you want, and modeling accents structurally is an interesting task, but in practice, if all you want is a specific speaker with a specific accent, that's already there.
-- jutta, Feb 21 2002

I've recently been involved in evaluating commercial text-to-speech systems, and it's interesting that the most natural voices are not always the easiest to understand. The slightly mechanical quality of some systems, where every syllable is pronounced clearly and equally, is in many cases better than a system which tries to vary the speed and intonation and slurs or drops certain phonemes. So trying to make text to speech systems sound natural may not be the best way to go, at least not unless they can be significantly improved in quality; and there are limits on quality without actually understanding the text being spoken.
-- pottedstu, Feb 21 2002

Further to what jutta says, there are also engines which claim to model the prosody (intonation and speed variations) of individual speakers, but I've not heard them myself.
-- pottedstu, Feb 21 2002

Of course, the real fun in an operating system such as XP, will be picking just which accent goes with what background image, popup sounds, screen transitions and the like.

I could see a big company like Microsoft getting into some real publicity dilemmas stereotyping whole markets – imagine for example, a French accent accompanying, garish red white and blue cursors with a fat croissant emblazoned in the background. Which nationality would get heaped in with ugly purple-ish colour sets? Et cetera…
-- sdm, Feb 21 2002

The country whose olympian would be most likely to win at the away games?

99% of the time, I have difficulty not responding to the reception robots that serve switchboards. I usually try pushing "zero" at least once. There is surely a need for a response-based implementation of one's accent so that the voice over your phone does not conflict with your own vocal style. Could this be done with a "state your full name?" type of opening remark that would categorize your speech speed and style and copy those attributes?
-- reensure, Feb 21 2002

Stephen Hawking has lamented that his voice synthesizer is only available in the American accent version. This was some time ago, and may have changed. Has anyone heard him lately?
-- waugsqueke, Feb 21 2002

Not since he did backing vocals on that Pink Floyd album.
-- angel, Feb 21 2002

There are varying methods of speech synthesis. Some use actual samples of speech stored in tables of combinations of 2 or 3 phonemes. Others use a purely parametric method of voice generation (similar to the algorithms used in low bitrate speech compression) which uses an electronic model of speech, based on pulses(=vocal cords) being modified by a variable filter(=shape of mouth).
-- pottedstu, Feb 21 2002

I have seen (heard) the AT&T labs Natural Voice(tm) demonstrated and it is pretty remarkable. So good that they were able to have it mimic (with some faults) some celebrity voices. (I suspect, though, that the demo was a bit rigged and vaporous--there was some pretty handwavy stuff going on--but have no way of really knowing for sure).
-- bristolz, Feb 21 2002

I would like to have Pippin's voice on my computer.
-- Amishman35, Jul 29 2002

random, halfbakery