Texthelp SpeechStream Overview
Texthelp SpeechStream Overview
Texthelp SpeechStream Overview
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Rev #8 – November, 2011<br />
<strong>Texthelp</strong> <strong>SpeechStream</strong> <strong>Overview</strong><br />
Table of Contents<br />
1. Introduction ............................................................................................................................................. 2<br />
2. <strong>SpeechStream</strong> Server .............................................................................................................................. 3<br />
2.1 Caching ....................................................................................................................................... 3<br />
2.2 Speech Server Configuration Options ........................................................................................ 4<br />
2.2.1 Explanation of Terms ............................................................................................................. 4<br />
2.2.2 <strong>Texthelp</strong>-Hosted Speech Server ............................................................................................ 5<br />
2.2.3 <strong>Texthelp</strong>-hosted Speech Server With External Cache ........................................................... 7<br />
2.2.4 Customer-Hosted Speech Server ......................................................................................... 10<br />
2.3 <strong>SpeechStream</strong> Server Specification and Performance ............................................................ 12<br />
2.3.1 Hardware and Operating System ........................................................................................ 12<br />
2.3.2 Text To Speech Performance ............................................................................................... 12<br />
2.4 Cache Server Specification and Performance .......................................................................... 13<br />
2.4.1 Scalability ............................................................................................................................. 13<br />
3. End-user software ................................................................................................................................. 14<br />
3.1 <strong>SpeechStream</strong> Toolbar (HTML) ................................................................................................ 14<br />
3.1.1 Web Browser Compatibility ................................................................................................. 15<br />
3.2 Flash ......................................................................................................................................... 16<br />
3.3 Custom access .......................................................................................................................... 16<br />
© Copyright <strong>Texthelp</strong> Systems Ltd. 2011<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com
1. Introduction<br />
The <strong>Texthelp</strong> <strong>SpeechStream</strong> Server delivers high quality computer-generated speech for web-based<br />
applications, complete with synchronized, dual-color, word-by-word highlighting.<br />
It does not require the installation of any speech software on end-user computers.<br />
Example of dual-colored highlighting<br />
The solution is scalable, can be used in a variety of application platforms, and is simple for the customer<br />
to implement.<br />
The <strong>SpeechStream</strong> Server solution consists of the following major components:<br />
� The <strong>SpeechStream</strong> Server itself (which actually generates the audio)<br />
� An optional speech cache device (to improve performance for repeat requests)<br />
� End user software (to communicate with the server and deliver the audio in the customer<br />
application)<br />
Supported application environments include:<br />
� HTML – a fully featured speech toolbar (the <strong>SpeechStream</strong> Toolbar) can be easily integrated into<br />
existing customer web pages.<br />
� Flash – Toolbar can be accessed by embedding in web page and making function calls from inside<br />
of flash. Additional direct server calls can be made.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 2 of 16
2. <strong>SpeechStream</strong> Server<br />
The <strong>SpeechStream</strong> Server is a dedicated computer that carries out the following functions:<br />
� Accept speech requests from the client application<br />
� Use a text-to-speech engine to generate audio for the supplied text<br />
o This can apply pronunciation rules to correct how words are spoken<br />
� Supply an audio file (MP3) and timing information (XML) so the web application can stream the<br />
audio and highlight text as it is spoken.<br />
2.1 Caching<br />
Speech generation and conversion of output audio to MP3 files is computationally expensive. There are<br />
two potential “bottlenecks” with a speech server system:<br />
� High load – when the number of users accessing the speech server is very high.<br />
� Text-To-Speech performance – slower Text-To-Speech voices may not support large numbers of<br />
simultaneous users<br />
By using a cache, repeat requests for the same text can bypass the speech generation process entirely.<br />
In most speech-enabled applications the content is largely static, and a speech cache is highly<br />
beneficial.<br />
If a particular speech engine has a lower level of performance, the audio content can be generated in<br />
advance by reading through all the content to ensure it is 100% cached. The speech engine on the<br />
server is then only used when new content is generated or existing content is updated.<br />
� The speech server itself has a built-in cache that it uses to improve repeat requests for a pregenerated<br />
text string.<br />
� The speech server can also be configured to use an external cache, entirely separate from the<br />
speech server for even faster performance in high load scenarios.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 3 of 16
2.2 Speech Server Configuration Options<br />
There are several different configurations of server are possible, depending on customer requirements<br />
such as expected user load, dynamic versus static content and size of content.<br />
<strong>Texthelp</strong> can advise customers which configuration best suits their needs. Text can also offer<br />
consultation and advise on suitable customized solutions if the application does not fit exactly into the<br />
three configurations described here.<br />
2.2.1 Explanation of Terms<br />
Dynamic content is content that will change from one user session to the next. User-typed text, such as<br />
that typed in a form field on a webpage, is considered dynamic. Pages created from a content<br />
management system (such as a commercial website, or even blog-type material) are also dynamic.<br />
Static content is content that remains the same, for all users, apart from occasional updates (such as<br />
corrections or new material).<br />
An article is a notional quantity of content, equivalent to an A4 page.<br />
Content size is a reference to the amount of textual content in a speech-enabled system. This is not a<br />
precise measure. Examples are as follows:<br />
� A web application with 100s of individual articles would be considered small.<br />
� A web application with 1000s or 10,000s of individual articles would be considered medium.<br />
� A web application with 100,000s of individual articles or more would be considered large.<br />
A cache server is a simple web server which acts as a file store for audio files. It does not require any<br />
special software or any royalty-bearing software – it could be a Linux server if required.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 4 of 16
2.2.2 <strong>Texthelp</strong>-Hosted Speech Server<br />
This configuration is ideal for low usage scenarios, where a customer wants to add speech to a<br />
relatively lightly-used system, with a small amount of content (either static or dynamic).<br />
It is also useful for a prototype implementation of the speech server, before stepping up to a more<br />
scalable final implementation.<br />
� All speech server resources are provided by <strong>Texthelp</strong>.<br />
� End-user software (such as the <strong>SpeechStream</strong> Toolbar) is included in the customer web pages.<br />
� The speech server has an integrated cache to improve performance<br />
� There is no additional cache for audio files.<br />
1. User accesses customer<br />
website<br />
2. Webpage is rendered by<br />
server and displayed to<br />
user in web browser<br />
3. User invokes speech via<br />
UI on website.<br />
4. <strong>Texthelp</strong> software on<br />
webpage communicates<br />
with remote <strong>SpeechStream</strong><br />
server<br />
5. <strong>Texthelp</strong> software on<br />
webpage highlights text and<br />
plays the audio to user<br />
Advantages:<br />
� Simple integration for customer<br />
� Ideal for lower volumes of usage<br />
TextHELP Systems, Inc.<br />
Customer site<br />
Customer Web Server<br />
<strong>Texthelp</strong><br />
Workflow for <strong>Texthelp</strong>-hosted speech server<br />
� No requirement for customer to host servers on-site<br />
<strong>SpeechStream</strong> Server<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 5 of 16
� No specialist technical resources are required to manage the servers<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 6 of 16
2.2.3 <strong>Texthelp</strong>-hosted Speech Server With External Cache<br />
This configuration is intended for medium to high usage scenarios, with a medium volume of content<br />
that is mainly static.<br />
� A speech server is provided by <strong>Texthelp</strong>.<br />
� A cache server is provided by the customer.<br />
� <strong>Texthelp</strong> end-user software (such as <strong>SpeechStream</strong> Toolbar) is included in the customer web<br />
pages.<br />
o This will access the cache server for each audio request. If the required audio is not in<br />
the cache, the software will communicate with the remote speech server.<br />
o The <strong>Texthelp</strong> speech server will then stream the audio to the end user. It will also<br />
transfer the audio files to the customer cache server for subsequent speech requests.<br />
1. User accesses customer<br />
website<br />
2. Webpage is rendered by<br />
server and displayed to<br />
user in web browser.<br />
3. User invokes speech via<br />
UI on website.<br />
4. <strong>Texthelp</strong> software on<br />
looks for audio on cache<br />
server<br />
6. If the audio is not cached,<br />
<strong>Texthelp</strong> software requests<br />
audio from remote<br />
<strong>SpeechStream</strong> server<br />
7. <strong>Texthelp</strong> software on<br />
webpage highlights text and<br />
plays the audio to user<br />
TextHELP Systems, Inc.<br />
Customer site<br />
Customer Web Server<br />
<strong>Texthelp</strong><br />
Cache server<br />
<strong>SpeechStream</strong> Server<br />
8. After the audio is generated<br />
for live playback, it will be<br />
transmitted to the cache server<br />
for repeat requests.<br />
5. If audio is in cache, access it directly and play back to user with color highlighting.<br />
Workflow for <strong>Texthelp</strong>-hosted speech server with external cache<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 7 of 16
TextHELP Systems, Inc.<br />
Continued overleaf �<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 8 of 16
Advantages:<br />
� Customer site only requires a simple web server to act as a cache.<br />
� This gives the advantage of fast access to pre-cached content for the majority of speech requests,<br />
without the need to manage a more complex speech server and pay royalties for Windows-based<br />
software.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 9 of 16
2.2.4 Customer-Hosted Speech Server<br />
This configuration is intended for high usage scenarios, with a high volume of content. Overall<br />
implementation is similar to the <strong>Texthelp</strong>-hosted speech server with external cache described<br />
previously except all the software and hardware is managed by the customer (with assistance from<br />
<strong>Texthelp</strong>).<br />
� <strong>SpeechStream</strong> Server software provided by <strong>Texthelp</strong> is installed on a customer server.<br />
� Optionally, a cache server is provided by the customer.<br />
� End-user software (such as the <strong>SpeechStream</strong> Toolbar) is included in the customer web pages.<br />
o This can access a cache server if required<br />
o The speech server is located at the customer site<br />
o A cache server can be updated across the network immediately rather than using FTP<br />
from a remote <strong>Texthelp</strong> server.<br />
1. User accesses customer<br />
website<br />
2. Webpage is rendered by<br />
server and displayed to<br />
user in web browser.<br />
3. User invokes speech via<br />
UI on website.<br />
4. <strong>Texthelp</strong> software on<br />
looks for audio on cache<br />
server (optional)<br />
6. If the audio is not cached,<br />
<strong>Texthelp</strong> software requests<br />
audio from customer’s<br />
<strong>SpeechStream</strong> server<br />
7. <strong>Texthelp</strong> software on<br />
webpage highlights text and<br />
plays the audio to user<br />
TextHELP Systems, Inc.<br />
Customer site<br />
Customer Web Server<br />
Cache server (optional)<br />
8. After the audio is generated<br />
for live playback, it will be<br />
transmitted to the cache server<br />
for repeat requests.<br />
<strong>SpeechStream</strong> Server<br />
(hosted by Customer)<br />
5. If audio is in cache, access it directly and play back to user with color highlighting.<br />
Workflow for Customer-hosted speech server with optional cache server<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 10 of 16
Advantages:<br />
� Maximum performance for customer – dedicated speech server<br />
� Optional cache server can be used to maximize performance in large deployments.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Continued �<br />
Page 11 of 16
2.3 <strong>SpeechStream</strong> Server Specification and Performance<br />
<strong>SpeechStream</strong> server performance depends on two main variables:<br />
� Physical specification of the server that the <strong>Texthelp</strong> Speech Server is installed on<br />
� Performance of the specific text to speech engine being used<br />
2.3.1 System Requirements<br />
The speech server must be installed on a 32 bit Windows server. <strong>Texthelp</strong> currently recommends<br />
Windows Server 2003. Both dedicated servers and cloud based servers are supported.<br />
2.3.2 Text To Speech Performance<br />
Performance characteristics of Text To Speech can differ between vendors and even between different<br />
voices from a single vendor. Support for multi-threading and multi-core processors can vary. <strong>Texthelp</strong><br />
can recommend the best voice for your implementation.<br />
Using a standard Nuance voice, Scansoft Jill (American English Female), a server as detailed above will<br />
generate up to two million speech requests for average length sentences in a 24 hour period.<br />
Some speech engines may not equal this level of performance. Normally, this can be mitigated through<br />
use of one of the caching solutions outlined previously, where end users will only access the pre-cached<br />
audio rather than requiring live speech generation.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 12 of 16
2.4 Cache Server Specification and Performance<br />
For scenarios where a cache system is being configured, a second server is necessary.<br />
A live speech server is responsible for the generation of audio data and conversion to MP3 format for<br />
playback by the end user software. Speech generation and MP3 conversion are both very expensive in<br />
terms of computer resources; in contrast, the cache is just a file store, and does not require the same<br />
level of heavyweight processing power as the live speech server.<br />
System Requirements<br />
� Server running web server (recommend Apache, can be any operating system)<br />
� FTP access<br />
� Disk space requirements depend on the website content.<br />
Typical figures for disk space requirements suggest:<br />
� A typical sentence of text returns 30KB of data (this is one speech request)<br />
� A typical page of content contains around 100 sentences – requiring around 3MB.<br />
� This can then be multiplied by the number of pages of content that are speech-enabled.<br />
� The resulting value indicates the current minimum disk space required. Room for growth should<br />
be considered, as should any space required for the operating system and web server.<br />
This does not consider the requirements of additional playback speeds or additional voices. If one<br />
sentence of text requires 30KB, then two will require 60KB, three will require 90KB, etc.<br />
Actual values also depend on the specific voice being used and the complexity of the text content.<br />
2.4.1 Scalability<br />
When the cache server capacity is reached, then further capacity should be obtained using a load<br />
balancing. There are two ways to implement this:<br />
� Via a hardware load balancer, with cache data synchronized between the cache servers.<br />
� The end user application can direct different groups of users to alternative cache servers<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 13 of 16
3. End-user software<br />
In addition to the speech server itself, <strong>Texthelp</strong> also provides software to enable customer applications<br />
to offer speech easily.<br />
3.1 <strong>SpeechStream</strong> Toolbar (HTML)<br />
For HTML-based applications, the <strong>SpeechStream</strong> Toolbar offers a simple method to add speech support<br />
to your application. This toolbar is provided as JavaScript that is easily added to any webpage.<br />
The implementation offers:<br />
� Speech support toolbar, consisting of:<br />
o Speak text that the user clicks with the mouse<br />
o Speak text selections<br />
o English to Spanish single word translation (other languages available on request)<br />
o Fact Finder (look up selected text on a specific search engine)<br />
o Dictionary to provide definitions for English words from a 100,000 word dictionary<br />
(customizable on request)<br />
o Four color highlight options to annotate text<br />
o Clear highlights/collect highlights option<br />
� Buttons can be hidden if required<br />
� Color highlights can be persisted on a server<br />
� Voice speed can be adjusted by the user<br />
The toolbar is highly customizable. You can:<br />
� Hide or show buttons using JavaScript<br />
� Hide the toolbar completely and call the functionality from JavaScript (useful if you want to design<br />
custom UI for speech, or create a UI that closely matches your own)<br />
� The toolbar can be docked at a static location on the page.<br />
� The toolbar appearance (colors and graphics) can also be customized if required.<br />
� A speech bubble mode is also available for minimal user interface implementations<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 14 of 16
Please note: The <strong>SpeechStream</strong> Toolbar can only read HTML text content and alt tags on images. It<br />
cannot read embedded Flash objects, PDF documents, ActiveX objects, Java objects or any non-text<br />
content.<br />
Other features of the <strong>SpeechStream</strong> Toolbar and Server combination are:<br />
� Your application can permit the user to change the voice if required. Otherwise, the application<br />
can use a pre-determined voice configuration:<br />
o Voice gender can be changed (a variety of male and female voices are available)<br />
o Voice speed can be changed (some readers prefer a slower speed to aid<br />
comprehension)<br />
o The language can be changed (Spanish, French and other non-English languages are<br />
available)<br />
� Pronunciation can be fine-tuned in cases where uncommon words are incorrectly pronounced by<br />
the text to speech engine.<br />
o Examples of this include scientific terms, names or abbreviations.<br />
3.1.1 Web Browser Compatibility<br />
The <strong>SpeechStream</strong> Toolbar will work on the following operating system and browser combinations.<br />
Adobe Flash 8, 9 or 10 is required in all cases.<br />
� Windows:<br />
o Internet Explorer<br />
o Firefox<br />
o Google Chrome<br />
� Apple Macintosh:<br />
o Firefox<br />
o Safari<br />
o Google Chrome<br />
Support for newer versions of these major browsers will be added as soon as possible.<br />
Please contact your <strong>Texthelp</strong> representative if you require further clarification of the browser support<br />
policy.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 15 of 16
3.2 Flash<br />
<strong>SpeechStream</strong> speech servers can also be accessed from Flash applications.<br />
Due to the nature of Flash applications, it is not possible to provide a generic solution for speech with<br />
dual-colored word highlighting. Unlike HTML, Flash applications do not have a standard DOM<br />
(Document Object Model) that can be used in a generic speech solution.<br />
Implementation of the user interface, text display and interaction with the user is therefore the<br />
responsibility of the Customer’s software developers.<br />
<strong>Texthelp</strong> can provide support for speech-enabling text boxes in both AS2 and AS3. Direct access to the<br />
speech server is also possible, enabling the Customer to provide as much or as little speech as required.<br />
Contact your <strong>Texthelp</strong> Representative for further details.<br />
3.3 Custom access<br />
Some applications do not suit either the HTML-based <strong>SpeechStream</strong> Toolbar or the Flash approach. An<br />
example of this would be an application developed in Java.<br />
For these applications, <strong>Texthelp</strong> can supply direct access to <strong>SpeechStream</strong> servers to obtain speech<br />
directly. Playback of the audio and user control is entirely the responsibility of the Customer’s<br />
application.<br />
TextHELP Systems, Inc.<br />
Tel: (888) 248-0652 • Fax: (866) 248-0652 • u.s.info@texthelp.com • www.texthelp.com<br />
Page 16 of 16