24.12.2012 Views

Texthelp SpeechStream Overview

Texthelp SpeechStream Overview

Texthelp SpeechStream Overview

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Rev #8 – November, 2011<br />

<strong>Texthelp</strong> <strong>SpeechStream</strong> <strong>Overview</strong><br />

Table of Contents<br />

1. Introduction ............................................................................................................................................. 2<br />

2. <strong>SpeechStream</strong> Server .............................................................................................................................. 3<br />

2.1 Caching ....................................................................................................................................... 3<br />

2.2 Speech Server Configuration Options ........................................................................................ 4<br />

2.2.1 Explanation of Terms ............................................................................................................. 4<br />

2.2.2 <strong>Texthelp</strong>-Hosted Speech Server ............................................................................................ 5<br />

2.2.3 <strong>Texthelp</strong>-hosted Speech Server With External Cache ........................................................... 7<br />

2.2.4 Customer-Hosted Speech Server ......................................................................................... 10<br />

2.3 <strong>SpeechStream</strong> Server Specification and Performance ............................................................ 12<br />

2.3.1 Hardware and Operating System ........................................................................................ 12<br />

2.3.2 Text To Speech Performance ............................................................................................... 12<br />

2.4 Cache Server Specification and Performance .......................................................................... 13<br />

2.4.1 Scalability ............................................................................................................................. 13<br />

3. End-user software ................................................................................................................................. 14<br />

3.1 <strong>SpeechStream</strong> Toolbar (HTML) ................................................................................................ 14<br />

3.1.1 Web Browser Compatibility ................................................................................................. 15<br />

3.2 Flash ......................................................................................................................................... 16<br />

3.3 Custom access .......................................................................................................................... 16<br />

© Copyright <strong>Texthelp</strong> Systems Ltd. 2011<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com


1. Introduction<br />

The <strong>Texthelp</strong> <strong>SpeechStream</strong> Server delivers high quality computer-generated speech for web-based<br />

applications, complete with synchronized, dual-color, word-by-word highlighting.<br />

It does not require the installation of any speech software on end-user computers.<br />

Example of dual-colored highlighting<br />

The solution is scalable, can be used in a variety of application platforms, and is simple for the customer<br />

to implement.<br />

The <strong>SpeechStream</strong> Server solution consists of the following major components:<br />

� The <strong>SpeechStream</strong> Server itself (which actually generates the audio)<br />

� An optional speech cache device (to improve performance for repeat requests)<br />

� End user software (to communicate with the server and deliver the audio in the customer<br />

application)<br />

Supported application environments include:<br />

� HTML – a fully featured speech toolbar (the <strong>SpeechStream</strong> Toolbar) can be easily integrated into<br />

existing customer web pages.<br />

� Flash – Toolbar can be accessed by embedding in web page and making function calls from inside<br />

of flash. Additional direct server calls can be made.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 2 of 16


2. <strong>SpeechStream</strong> Server<br />

The <strong>SpeechStream</strong> Server is a dedicated computer that carries out the following functions:<br />

� Accept speech requests from the client application<br />

� Use a text-to-speech engine to generate audio for the supplied text<br />

o This can apply pronunciation rules to correct how words are spoken<br />

� Supply an audio file (MP3) and timing information (XML) so the web application can stream the<br />

audio and highlight text as it is spoken.<br />

2.1 Caching<br />

Speech generation and conversion of output audio to MP3 files is computationally expensive. There are<br />

two potential “bottlenecks” with a speech server system:<br />

� High load – when the number of users accessing the speech server is very high.<br />

� Text-To-Speech performance – slower Text-To-Speech voices may not support large numbers of<br />

simultaneous users<br />

By using a cache, repeat requests for the same text can bypass the speech generation process entirely.<br />

In most speech-enabled applications the content is largely static, and a speech cache is highly<br />

beneficial.<br />

If a particular speech engine has a lower level of performance, the audio content can be generated in<br />

advance by reading through all the content to ensure it is 100% cached. The speech engine on the<br />

server is then only used when new content is generated or existing content is updated.<br />

� The speech server itself has a built-in cache that it uses to improve repeat requests for a pregenerated<br />

text string.<br />

� The speech server can also be configured to use an external cache, entirely separate from the<br />

speech server for even faster performance in high load scenarios.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 3 of 16


2.2 Speech Server Configuration Options<br />

There are several different configurations of server are possible, depending on customer requirements<br />

such as expected user load, dynamic versus static content and size of content.<br />

<strong>Texthelp</strong> can advise customers which configuration best suits their needs. Text can also offer<br />

consultation and advise on suitable customized solutions if the application does not fit exactly into the<br />

three configurations described here.<br />

2.2.1 Explanation of Terms<br />

Dynamic content is content that will change from one user session to the next. User-typed text, such as<br />

that typed in a form field on a webpage, is considered dynamic. Pages created from a content<br />

management system (such as a commercial website, or even blog-type material) are also dynamic.<br />

Static content is content that remains the same, for all users, apart from occasional updates (such as<br />

corrections or new material).<br />

An article is a notional quantity of content, equivalent to an A4 page.<br />

Content size is a reference to the amount of textual content in a speech-enabled system. This is not a<br />

precise measure. Examples are as follows:<br />

� A web application with 100s of individual articles would be considered small.<br />

� A web application with 1000s or 10,000s of individual articles would be considered medium.<br />

� A web application with 100,000s of individual articles or more would be considered large.<br />

A cache server is a simple web server which acts as a file store for audio files. It does not require any<br />

special software or any royalty-bearing software – it could be a Linux server if required.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 4 of 16


2.2.2 <strong>Texthelp</strong>-Hosted Speech Server<br />

This configuration is ideal for low usage scenarios, where a customer wants to add speech to a<br />

relatively lightly-used system, with a small amount of content (either static or dynamic).<br />

It is also useful for a prototype implementation of the speech server, before stepping up to a more<br />

scalable final implementation.<br />

� All speech server resources are provided by <strong>Texthelp</strong>.<br />

� End-user software (such as the <strong>SpeechStream</strong> Toolbar) is included in the customer web pages.<br />

� The speech server has an integrated cache to improve performance<br />

� There is no additional cache for audio files.<br />

1. User accesses customer<br />

website<br />

2. Webpage is rendered by<br />

server and displayed to<br />

user in web browser<br />

3. User invokes speech via<br />

UI on website.<br />

4. <strong>Texthelp</strong> software on<br />

webpage communicates<br />

with remote <strong>SpeechStream</strong><br />

server<br />

5. <strong>Texthelp</strong> software on<br />

webpage highlights text and<br />

plays the audio to user<br />

Advantages:<br />

� Simple integration for customer<br />

� Ideal for lower volumes of usage<br />

TextHELP Systems, Inc.<br />

Customer site<br />

Customer Web Server<br />

<strong>Texthelp</strong><br />

Workflow for <strong>Texthelp</strong>-hosted speech server<br />

� No requirement for customer to host servers on-site<br />

<strong>SpeechStream</strong> Server<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 5 of 16


� No specialist technical resources are required to manage the servers<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 6 of 16


2.2.3 <strong>Texthelp</strong>-hosted Speech Server With External Cache<br />

This configuration is intended for medium to high usage scenarios, with a medium volume of content<br />

that is mainly static.<br />

� A speech server is provided by <strong>Texthelp</strong>.<br />

� A cache server is provided by the customer.<br />

� <strong>Texthelp</strong> end-user software (such as <strong>SpeechStream</strong> Toolbar) is included in the customer web<br />

pages.<br />

o This will access the cache server for each audio request. If the required audio is not in<br />

the cache, the software will communicate with the remote speech server.<br />

o The <strong>Texthelp</strong> speech server will then stream the audio to the end user. It will also<br />

transfer the audio files to the customer cache server for subsequent speech requests.<br />

1. User accesses customer<br />

website<br />

2. Webpage is rendered by<br />

server and displayed to<br />

user in web browser.<br />

3. User invokes speech via<br />

UI on website.<br />

4. <strong>Texthelp</strong> software on<br />

looks for audio on cache<br />

server<br />

6. If the audio is not cached,<br />

<strong>Texthelp</strong> software requests<br />

audio from remote<br />

<strong>SpeechStream</strong> server<br />

7. <strong>Texthelp</strong> software on<br />

webpage highlights text and<br />

plays the audio to user<br />

TextHELP Systems, Inc.<br />

Customer site<br />

Customer Web Server<br />

<strong>Texthelp</strong><br />

Cache server<br />

<strong>SpeechStream</strong> Server<br />

8. After the audio is generated<br />

for live playback, it will be<br />

transmitted to the cache server<br />

for repeat requests.<br />

5. If audio is in cache, access it directly and play back to user with color highlighting.<br />

Workflow for <strong>Texthelp</strong>-hosted speech server with external cache<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 7 of 16


TextHELP Systems, Inc.<br />

Continued overleaf �<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 8 of 16


Advantages:<br />

� Customer site only requires a simple web server to act as a cache.<br />

� This gives the advantage of fast access to pre-cached content for the majority of speech requests,<br />

without the need to manage a more complex speech server and pay royalties for Windows-based<br />

software.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 9 of 16


2.2.4 Customer-Hosted Speech Server<br />

This configuration is intended for high usage scenarios, with a high volume of content. Overall<br />

implementation is similar to the <strong>Texthelp</strong>-hosted speech server with external cache described<br />

previously except all the software and hardware is managed by the customer (with assistance from<br />

<strong>Texthelp</strong>).<br />

� <strong>SpeechStream</strong> Server software provided by <strong>Texthelp</strong> is installed on a customer server.<br />

� Optionally, a cache server is provided by the customer.<br />

� End-user software (such as the <strong>SpeechStream</strong> Toolbar) is included in the customer web pages.<br />

o This can access a cache server if required<br />

o The speech server is located at the customer site<br />

o A cache server can be updated across the network immediately rather than using FTP<br />

from a remote <strong>Texthelp</strong> server.<br />

1. User accesses customer<br />

website<br />

2. Webpage is rendered by<br />

server and displayed to<br />

user in web browser.<br />

3. User invokes speech via<br />

UI on website.<br />

4. <strong>Texthelp</strong> software on<br />

looks for audio on cache<br />

server (optional)<br />

6. If the audio is not cached,<br />

<strong>Texthelp</strong> software requests<br />

audio from customer’s<br />

<strong>SpeechStream</strong> server<br />

7. <strong>Texthelp</strong> software on<br />

webpage highlights text and<br />

plays the audio to user<br />

TextHELP Systems, Inc.<br />

Customer site<br />

Customer Web Server<br />

Cache server (optional)<br />

8. After the audio is generated<br />

for live playback, it will be<br />

transmitted to the cache server<br />

for repeat requests.<br />

<strong>SpeechStream</strong> Server<br />

(hosted by Customer)<br />

5. If audio is in cache, access it directly and play back to user with color highlighting.<br />

Workflow for Customer-hosted speech server with optional cache server<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 10 of 16


Advantages:<br />

� Maximum performance for customer – dedicated speech server<br />

� Optional cache server can be used to maximize performance in large deployments.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Continued �<br />

Page 11 of 16


2.3 <strong>SpeechStream</strong> Server Specification and Performance<br />

<strong>SpeechStream</strong> server performance depends on two main variables:<br />

� Physical specification of the server that the <strong>Texthelp</strong> Speech Server is installed on<br />

� Performance of the specific text to speech engine being used<br />

2.3.1 System Requirements<br />

The speech server must be installed on a 32 bit Windows server. <strong>Texthelp</strong> currently recommends<br />

Windows Server 2003. Both dedicated servers and cloud based servers are supported.<br />

2.3.2 Text To Speech Performance<br />

Performance characteristics of Text To Speech can differ between vendors and even between different<br />

voices from a single vendor. Support for multi-threading and multi-core processors can vary. <strong>Texthelp</strong><br />

can recommend the best voice for your implementation.<br />

Using a standard Nuance voice, Scansoft Jill (American English Female), a server as detailed above will<br />

generate up to two million speech requests for average length sentences in a 24 hour period.<br />

Some speech engines may not equal this level of performance. Normally, this can be mitigated through<br />

use of one of the caching solutions outlined previously, where end users will only access the pre-cached<br />

audio rather than requiring live speech generation.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 12 of 16


2.4 Cache Server Specification and Performance<br />

For scenarios where a cache system is being configured, a second server is necessary.<br />

A live speech server is responsible for the generation of audio data and conversion to MP3 format for<br />

playback by the end user software. Speech generation and MP3 conversion are both very expensive in<br />

terms of computer resources; in contrast, the cache is just a file store, and does not require the same<br />

level of heavyweight processing power as the live speech server.<br />

System Requirements<br />

� Server running web server (recommend Apache, can be any operating system)<br />

� FTP access<br />

� Disk space requirements depend on the website content.<br />

Typical figures for disk space requirements suggest:<br />

� A typical sentence of text returns 30KB of data (this is one speech request)<br />

� A typical page of content contains around 100 sentences – requiring around 3MB.<br />

� This can then be multiplied by the number of pages of content that are speech-enabled.<br />

� The resulting value indicates the current minimum disk space required. Room for growth should<br />

be considered, as should any space required for the operating system and web server.<br />

This does not consider the requirements of additional playback speeds or additional voices. If one<br />

sentence of text requires 30KB, then two will require 60KB, three will require 90KB, etc.<br />

Actual values also depend on the specific voice being used and the complexity of the text content.<br />

2.4.1 Scalability<br />

When the cache server capacity is reached, then further capacity should be obtained using a load<br />

balancing. There are two ways to implement this:<br />

� Via a hardware load balancer, with cache data synchronized between the cache servers.<br />

� The end user application can direct different groups of users to alternative cache servers<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 13 of 16


3. End-user software<br />

In addition to the speech server itself, <strong>Texthelp</strong> also provides software to enable customer applications<br />

to offer speech easily.<br />

3.1 <strong>SpeechStream</strong> Toolbar (HTML)<br />

For HTML-based applications, the <strong>SpeechStream</strong> Toolbar offers a simple method to add speech support<br />

to your application. This toolbar is provided as JavaScript that is easily added to any webpage.<br />

The implementation offers:<br />

� Speech support toolbar, consisting of:<br />

o Speak text that the user clicks with the mouse<br />

o Speak text selections<br />

o English to Spanish single word translation (other languages available on request)<br />

o Fact Finder (look up selected text on a specific search engine)<br />

o Dictionary to provide definitions for English words from a 100,000 word dictionary<br />

(customizable on request)<br />

o Four color highlight options to annotate text<br />

o Clear highlights/collect highlights option<br />

� Buttons can be hidden if required<br />

� Color highlights can be persisted on a server<br />

� Voice speed can be adjusted by the user<br />

The toolbar is highly customizable. You can:<br />

� Hide or show buttons using JavaScript<br />

� Hide the toolbar completely and call the functionality from JavaScript (useful if you want to design<br />

custom UI for speech, or create a UI that closely matches your own)<br />

� The toolbar can be docked at a static location on the page.<br />

� The toolbar appearance (colors and graphics) can also be customized if required.<br />

� A speech bubble mode is also available for minimal user interface implementations<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 14 of 16


Please note: The <strong>SpeechStream</strong> Toolbar can only read HTML text content and alt tags on images. It<br />

cannot read embedded Flash objects, PDF documents, ActiveX objects, Java objects or any non-text<br />

content.<br />

Other features of the <strong>SpeechStream</strong> Toolbar and Server combination are:<br />

� Your application can permit the user to change the voice if required. Otherwise, the application<br />

can use a pre-determined voice configuration:<br />

o Voice gender can be changed (a variety of male and female voices are available)<br />

o Voice speed can be changed (some readers prefer a slower speed to aid<br />

comprehension)<br />

o The language can be changed (Spanish, French and other non-English languages are<br />

available)<br />

� Pronunciation can be fine-tuned in cases where uncommon words are incorrectly pronounced by<br />

the text to speech engine.<br />

o Examples of this include scientific terms, names or abbreviations.<br />

3.1.1 Web Browser Compatibility<br />

The <strong>SpeechStream</strong> Toolbar will work on the following operating system and browser combinations.<br />

Adobe Flash 8, 9 or 10 is required in all cases.<br />

� Windows:<br />

o Internet Explorer<br />

o Firefox<br />

o Google Chrome<br />

� Apple Macintosh:<br />

o Firefox<br />

o Safari<br />

o Google Chrome<br />

Support for newer versions of these major browsers will be added as soon as possible.<br />

Please contact your <strong>Texthelp</strong> representative if you require further clarification of the browser support<br />

policy.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 15 of 16


3.2 Flash<br />

<strong>SpeechStream</strong> speech servers can also be accessed from Flash applications.<br />

Due to the nature of Flash applications, it is not possible to provide a generic solution for speech with<br />

dual-colored word highlighting. Unlike HTML, Flash applications do not have a standard DOM<br />

(Document Object Model) that can be used in a generic speech solution.<br />

Implementation of the user interface, text display and interaction with the user is therefore the<br />

responsibility of the Customer’s software developers.<br />

<strong>Texthelp</strong> can provide support for speech-enabling text boxes in both AS2 and AS3. Direct access to the<br />

speech server is also possible, enabling the Customer to provide as much or as little speech as required.<br />

Contact your <strong>Texthelp</strong> Representative for further details.<br />

3.3 Custom access<br />

Some applications do not suit either the HTML-based <strong>SpeechStream</strong> Toolbar or the Flash approach. An<br />

example of this would be an application developed in Java.<br />

For these applications, <strong>Texthelp</strong> can supply direct access to <strong>SpeechStream</strong> servers to obtain speech<br />

directly. Playback of the audio and user control is entirely the responsibility of the Customer’s<br />

application.<br />

TextHELP Systems, Inc.<br />

Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />

Page 16 of 16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!