12.07.2015 Views

SpeechWorks Speechify 2.1.5 User's Guide - Avaya Support

SpeechWorks Speechify 2.1.5 User's Guide - Avaya Support

SpeechWorks Speechify 2.1.5 User's Guide - Avaya Support

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

User’s <strong>Guide</strong>S P E E C H I F Y <strong>2.1.5</strong>


: 3 :II. “Training Impact Assessment and Technology Delivery Models in Less FavouredAreas”. (Component-1 of L&CB, NAIP).Post No. of posts Duration Educational QualificationsResearchAssociateRs.23,000/-& HRA asadmissibleOne (01)Upto 31 stMarch, 2014orco-terminuswith theproject1). Essential: Post Graduation in Agricultureand allied Sciences preferably in AgriculturalExtension/Economics / Statistics / Managementwith 1 st Division or 60% marks or equivalentoverall grade point average with at leasttwo years of research experience as evidencedfrom Fellowship / Associateship / Training /other engagements.2).Desirable: Working experience in conductingsurveys, data-collection and analysis andphysical fitness and willingness toundertake travel. Published research papers.3). Skill sets required : Computer proficiencywith working experience in MS-Office andData Processing Software like SPSS & SAS;Communication proficiency in Hindi andEnglish to collect data from Scientists andfarmers; Office management skills like fileand record maintenance, secretarial and officecommunicationIII. “Agricultural Value Chains” (Component-1 of L&CB, NAIP).ResearchAssociateRs.23,000/-& HRA asadmissibleOne (01)Upto 31 stMarch, 2014orco-terminuswith theproject1). Essential: M.Sc. in Agricultural Economics/Statistics /Horticulture / Agricultural/Engineering/ Veterinary Science / Economics /Statistics with 1 st Division or 60% marks orequivalent overall grade point average withat least two years of research experience asevidenced from Fellowship / Associateship /Training / other engagements.2). Desirable: (a) Proficiency in Computers(b) Experience in Agro-industry/NGOs/ResearchOrganizations / Project Management / DatabaseManagement/GIS (c) Published researchpapers.Contd…4


Dynamic parameter configuration ....................................................................... 2-30Character set support .......................................................................................... 2-31Operational aids .................................................................................................. 2-31Implementation guidelines .................................................................................. 2-313. Standard Text NormalizationNumbers ............................................................................................................. 3-34Numeric expressions ............................................................................................ 3-35Mixed alphanumeric tokens ................................................................................. 3-35Abbreviations ...................................................................................................... 3-36Ambiguous abbreviations.................................................................................. 3-36Periods after abbreviations................................................................................. 3-36Measurement abbreviations............................................................................... 3-37E-mail addresses .................................................................................................. 3-37URLs ................................................................................................................... 3-38Pathnames and filenames ..................................................................................... 3-39Punctuation ......................................................................................................... 3-40Parentheses .......................................................................................................... 3-41Hyphen ............................................................................................................... 3-41Slash .................................................................................................................... 3-424. Embedded Tags<strong>Speechify</strong> tag format............................................................................................. 3-44Creating pauses ................................................................................................... 3-44Indicating the end of a sentence .......................................................................... 3-45Customizing pronunciations ............................................................................... 3-46Language-specific customizations...................................................................... 3-46Character spellout modes.................................................................................. 3-47Pronouncing numbers and years ....................................................................... 3-49Customizing word pronunciations.................................................................... 3-50Inserting bookmarks ............................................................................................ 3-50Controlling the audio characteristics .................................................................... 3-51Volume control................................................................................................. 3-52Speaking rate control ........................................................................................ 3-53Fifth Edition, Update 2iv<strong>SpeechWorks</strong> Proprietary


5. Symbolic Phonetic RepresentationsSPR format........................................................................................................... 4-56Syllable boundaries .............................................................................................. 4-56Syllable-level information .................................................................................... 4-56Speech sound symbols ......................................................................................... 4-576. User DictionariesOverview of dictionaries ...................................................................................... 5-60Main dictionary ................................................................................................... 5-60Additional notes on main dictionary entries...................................................... 5-61Abbreviation dictionary ....................................................................................... 5-61Interpretation of trailing periods in the abbreviation dictionary ........................ 5-61Root dictionary ................................................................................................... 5-62Choosing a dictionary ......................................................................................... 5-63Main and abbreviation dictionaries vs. root dictionary...................................... 5-64Main dictionary vs. abbreviation dictionary ...................................................... 5-64Dictionary interactions ..................................................................................... 5-65File-based dictionaries ......................................................................................... 5-66Dictionary loading............................................................................................ 5-67File names and locations ................................................................................... 5-68File format ........................................................................................................ 5-687. API ReferenceCalling convention ............................................................................................... 6-70Server’s preferred character set ............................................................................. 6-70Result codes.......................................................................................................... 6-71SWIttsAddDictionaryEntry( ) ............................................................................. 6-73SWIttsCallback( )................................................................................................ 6-75SWIttsClosePort( ) .............................................................................................. 6-80SWIttsDeleteDictionaryEntry( ).......................................................................... 6-81SWIttsGetDictionaryKeys( ) ............................................................................... 6-83SWIttsGetParameter( ) ........................................................................................ 6-86SWIttsInit( )........................................................................................................ 6-88SWIttsLookupDictionaryEntry( )........................................................................ 6-89<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2v


SWIttsOpenPort( ).............................................................................................. 6-91SWIttsPing( ) ...................................................................................................... 6-92SWIttsResetDictionary( ) .................................................................................... 6-93SWIttsSetParameter( ) ......................................................................................... 6-94SWIttsSpeak( ) .................................................................................................... 6-96SWIttsStop( ) ...................................................................................................... 6-98SWIttsTerm( ) ..................................................................................................... 6-998. Performance and SizingTest scenario and application.............................................................................. 7-102Performance statistics ......................................................................................... 7-103Latency (time-to-first-audio)........................................................................... 7-103Real-time factor .............................................................................................. 7-103Audio buffer underflow................................................................................... 7-104Resource consumption ....................................................................................... 7-105Server CPU utilization.................................................................................... 7-105Memory use.................................................................................................... 7-105Measuring server memory use on Windows NT/2000 .................................... 7-106Performance thresholds ...................................................................................... 7-107Appendix A: <strong>Speechify</strong> LoggingOverview of logs ................................................................................................A-109Error log .........................................................................................................A-109Diagnostic log.................................................................................................A-110Event log (server only) ....................................................................................A-110Client logging ....................................................................................................A-110Overview of client logs....................................................................................A-110Enabling diagnostic logging ............................................................................A-111Logging to stdout/stderr .................................................................................A-111Server logging ....................................................................................................A-112Default logging behavior.................................................................................A-112Controlling logging with command-line options ............................................A-112Log file size and roll-over ................................................................................A-113Error message format ......................................................................................A-114Event logging..................................................................................................A-114Fifth Edition, Update 2vi<strong>SpeechWorks</strong> Proprietary


Appendix B: SSMLAppendix C: SAPI 5Compliance .......................................................................................................C-123SAPI voice properties ........................................................................................C-125Appendix D: Frequently Asked QuestionsQuestion types ................................................................................................. D-128Changing rate or volume .................................................................................. D-128Rewinding and fast-forwarding ......................................................................... D-129Java interface .................................................................................................... D-129E-mail pre-processor and SAPI 5 ...................................................................... D-129SSML and SAPI 5 ............................................................................................ D-129Error codes 107 and 108 .................................................................................. D-130<strong>Speechify</strong> 2.1 and <strong>Speechify</strong> 2.0 voices ............................................................. D-131Connecting to the server .................................................................................. D-131Finding voices .................................................................................................. D-132Port types ......................................................................................................... D-132Index<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2vii


Fifth Edition, Update 2viii<strong>SpeechWorks</strong> Proprietary


PrefaceWelcome to <strong>SpeechWorks</strong><strong>Speechify</strong> is <strong>SpeechWorks</strong> International, Inc.’s state of the art Text-To-Speech(TTS) system. This guide is written for application developers who want to add<strong>Speechify</strong>’s high-quality text-to-speech functionality to their applications.New and changed informationChanges in <strong>2.1.5</strong>Installation changes❏ There are updates to various installation commands in Chapter 1.❏ The supported Red Hat version has changed. See page 1-3 and page 1-6.New conceptual overviews❏Chapter 3 is new. It explains the automatic normalization performed by<strong>Speechify</strong> on your input text.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2ix


<strong>Speechify</strong> User’s <strong>Guide</strong>❏Appendix D contains elaborations on “Frequently Asked Questions”.Embedded tags❏ A new tag is added for end of sentence. See page 4-45.Symbolic phonetic representations (SPRs)❏There are brief additions to the sections on syllable boundaries and markingsyllable-level stress (page 5-56).User dictionaries❏❏A new section compares each of the dictionaries and explains which dictionaryto use for which situations. See page 6-63.The ability to create dictionary files, in addition to defining them at runtimewith API functions, is described on page 6-66.API functions❏Details are added to the description of the character set parameters for variousfunctions:“SWIttsAddDictionaryEntry( )” on page 7-73“SWIttsDeleteDictionaryEntry( )” on page 7-81“SWIttsLookupDictionaryEntry( )” on page 7-89❏ Small clarifications are added to the SWIttsSpeak( ) function. See page 7-96.SAPI❏Added text and new supported output formats. See step 3 on page C-124.Changes in 2.1.3Changes in this release are not indicated with change bars.This release has the following documentation changes:Fifth Edition, Update 2x<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>❏❏❏Chapter 1 now lists the installation directories and the files installed in each forRed Hat Linux and Sun Solaris.A new command option (default_contenttype) when running <strong>Speechify</strong> from aconsole. See page 1-13.<strong>Speechify</strong> now provides tags to set output audio characteristics for volume andspeaking rate. The section “Controlling the audio characteristics” on page 4-51describes this new feature.❏ File-based dictionaries are described in “User Dictionaries” on page 6-59.❏❏❏❏❏Dictionary keys in the root dictionary must contain at least one vowel. See“Beware of keys without vowels” on page 6-63.To support the volume/rate control feature, these parameters are now availablevia SWIttsSetParameter( ) and SWIttsGetParameter( ): tts.audio.rate andtts.audio.volume.You can now query the current <strong>Speechify</strong> version number via theSWIttsGetParameter( ) function with the tts.client.version parameter.The SSML element “prosody” is now partially supported. See “SSML” on pageB-119.In the SPR chapter, information about “Syllable Stress” has been moved to eachlanguage version of the <strong>Speechify</strong> Language Supplement. This was done becauseSPRs can handle stress, tone, and accent differently in each language.❏ Event logging tokens have been added to Appendix A.❏You can now log diagnostic messages to a file. See “Enabling diagnostic logging”on page A-111.RoadmapRecommended reading (available documentation)The following documents are available for developers using <strong>Speechify</strong> TTSapplications:<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2xi


<strong>Speechify</strong> User’s <strong>Guide</strong>❏❏❏❏The <strong>Speechify</strong> User’s <strong>Guide</strong> provides installation, programming, and referenceinformation about the <strong>Speechify</strong> product.The E-mail Pre-processor Developer’s <strong>Guide</strong> covers the types of input that the<strong>SpeechWorks</strong> E-mail Pre-processor handles, how it processes the messages, themodes that the application can take advantage of at run time, the layout and useof the E-mail substitution dictionary, and the API functions.There is a <strong>Speechify</strong> Language Supplement for each supported language. Thesesupplements contain language-specific reference information for applicationdevelopers.Review the release notes distributed with this product for the latest information,restrictions, and known problems.<strong>SpeechWorks</strong> InstituteSee the <strong>SpeechWorks</strong> Institute page at http://www.speechworks.com or send e-mailto training@speechworks.com for details about available training courses.<strong>Support</strong> servicesTo receive technical support from <strong>SpeechWorks</strong> International, Inc.:❏Visit the Knowledge Base or ask a question at:http://www.speechworks.com/training/tech_support.cfm❏ Ask for “technical support” at +1 617 428-4444See Appendix A “<strong>Speechify</strong> Logging” for information on how to collect diagnosticsthat <strong>SpeechWorks</strong> may use to diagnose your problem.How this guide is organizedThe chapters of this document cover these topics:Fifth Edition, Update 2xii<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Chapter 1 “Installing and Configuring <strong>Speechify</strong>” describes the installationrequirements and installation packages.Chapter 2 “Programming <strong>Guide</strong>” contains an overview of features, and high-levelinformation on groups of API functions.Chapter 4 “Embedded Tags” describes tags that users can insert into input text tocustomize the speech output in a variety of ways.Chapter 5 “Symbolic Phonetic Representations” describes the phonetic spelling theuser can enter to explicitly specify word pronunciations.Chapter 6 “User Dictionaries” describes the user dictionaries available forcustomizing the pronunciations of words, abbreviations, acronyms, and othersequences.Chapter 7 “API Reference” describes the API functions.Chapter 8 “Performance and Sizing” summarizes the current performance numbersand how <strong>SpeechWorks</strong> arrives at those numbers.Appendix A “<strong>Speechify</strong> Logging” describes how to enable and use the error log,diagnostic log, and event log.Appendix B “SSML” describes <strong>Speechify</strong>’s compliance with the SSML specification.Appendix C “SAPI 5” describes <strong>Speechify</strong>’s compliance with the SAPI 5 interface.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2xiii


<strong>Speechify</strong> User’s <strong>Guide</strong>Fifth Edition, Update 2xiv<strong>SpeechWorks</strong> Proprietary


CHAPTER 1Installing and Configuring <strong>Speechify</strong>This chapter describes the installation requirements and the installation packages forthe <strong>Speechify</strong> client and server on the following platforms:❏ Windows NT or 2000❏ Red Hat Linux❏ Sun SolarisIn this chapter• Hardware and software requirements on page 1-2• Installation on Windows NT/2000 on page 1-4• Installation on Red Hat Linux on page 1-6• Installation on Solaris on page 1-9• Running the <strong>Speechify</strong> server from the console on page 1-13• Running <strong>Speechify</strong> as a Windows Service on page 1-17<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–1


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Hardware and software requirementsServer run-time hardware and software requirements<strong>Speechify</strong> requires the following minimum hardware and software to run:1. 500 MHz Intel Pentium III-based computer.2. Memory requirements vary from one <strong>Speechify</strong> voice to another because each usesa different amount to start. See the readme file for each voice for details. Readmefiles are named readme-language-voice.html, e.g., readme-en-US-Mara.html. Thereadme files are stored in the root directory on the CD-ROM. In general, for abox that supports 72 ports, we recommend a minimum of 512 MB of RAM forUnix systems and 768 MB of RAM for Windows systems.3. The <strong>Speechify</strong> server run-time software requires 15 MB of free disk space. Each<strong>Speechify</strong> voice has disk space requirements beyond that which vary from voice tovoice. See the readme file for each voice for disk space requirements for voices.4. Your system paging file size must be at least 300 MB.5. CD-ROM drive to install the software.6. Network Interface Card.7. One of the following operating systems, configured for TCP/IP networking:a. Microsoft Windows NT Workstation or Server 4.0 U.S. version with ServicePack 6ab. Microsoft Windows 2000 Server or Professional U.S. version with ServicePack 1c. Red Hat Linux 7.2.d. Sun Solaris 2.7 for SPARC or Intel i3868. Microsoft Windows users must have Internet Explorer 5.0 or later installed ontheir machine. To download Internet Explorer, go to:http://www.microsoft.com/windows/ie/9. Adobe Acrobat Reader 3.01 or later for reading the full <strong>Speechify</strong> documentationset.Fifth Edition, Update 21–2<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Client SDK run-time hardware and software requirementsThe following minimum hardware and software requirements for the applicationdevelopment tools within the <strong>Speechify</strong> SDK are:1. 300 MHz Intel Pentium III-based computer.2. 64 MB of RAM.3. 10 MB of free disk space for the <strong>Speechify</strong> Software Development Kit (SDK).4. CD-ROM drive to install the software.5. Network Interface Card.6. One of the following operating systems, configured for TCP/IP networking:a. Microsoft Windows NT Workstation or Server 4.0 U.S. version with ServicePack 6b. Microsoft Windows 2000 Server or Professional U.S. version with ServicePack 1c. Red Hat Linux 7.2d. Sun Solaris 2.7 for SPARC or Intel i3867. Microsoft Windows users must have Internet Explorer 5.0 or later installed ontheir machine. To download Internet Explorer, go to:http://www.microsoft.com/windows/ie/8. A C/C++ compiler:a. On Windows systems, <strong>SpeechWorks</strong> builds and tests with Microsoft VisualC++ 6 with Service Pack 3. Sample applications are provided with Visual C++project files.b. On Unix systems, <strong>SpeechWorks</strong> builds and tests with GNU GCC 2.95.2which you download from:http://www.gnu.org/software/gcc/gcc-2.95/gcc-2.95.2.html9. Adobe Acrobat Reader 3.01 or later for reading the full <strong>Speechify</strong> documentationset.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–3


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Installation on Windows NT/2000Internet Explorer 5.0 or later must be installed before installing <strong>Speechify</strong>.1. Log into Windows NT/2000 as the Administrator.2. Insert the <strong>Speechify</strong> CD into the CD-ROM drive.3. The installer should run automatically; if not, open the CD-ROM drive inWindows Explorer and run setup.exe.4. Follow the on-screen instructions, reading the Welcome screen and the SoftwareLicense Agreement.If you choose a “Typical” installation, the <strong>Speechify</strong> installer asks no furtherquestions and installs the following components to C:\Program Files\<strong>Speechify</strong>:• <strong>Speechify</strong> server run-time• <strong>Speechify</strong> client run-time• <strong>Speechify</strong> client SDKTo override these defaults and choose a different install location or differentcomponents, choose “Custom.” A dialog box appears for customizing the install.5. When prompted, you should read the <strong>Speechify</strong> Release Notes to familiarizeyourself with any special programming considerations and known bugs.6. You must install a <strong>Speechify</strong> voice before you can run the server. All voiceinstallations are described in the <strong>Speechify</strong> Language Supplement that correspondsto the language being spoken. See the appropriate supplement for moreinformation.In addition, the <strong>Speechify</strong> installation does the following:❏❏❏Installing the client SDK creates an environment variable named SWITTSSDKwhich points to the path where the SDK is installed.The <strong>Speechify</strong> installation uses Windows Installer technology and updates thecopy on your machine if necessary before proceeding with the install.<strong>Speechify</strong> uses the Microsoft Management Console (MMC) for serverconfiguration. The MMC comes with Windows 2000 by default but not withWindows NT. If your NT system does not have the MMC, <strong>Speechify</strong> installs it.Fifth Edition, Update 21–4<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>❏❏If you choose to install the SAPI interface component, the <strong>Speechify</strong> installeronly installs the SAPI 5 run-time components. It does not contain the SAPI 5SDK that you can download from here: http://www.microsoft.com/speech/speechsdk/sdk5.asp. You need to download the SDK to have access to SAPI 5documentation and sample code.The <strong>Speechify</strong> server is automatically configured to start at each reboot.Uninstalling on Windows/NT 2000To uninstall <strong>Speechify</strong>:1. Go to Start >> Settings >> Control Panel >> Add/Remove Programs.2. On the Install/Uninstall tab, locate <strong>Speechify</strong> 2.1 for Windows in the list ofinstalled applications and click the Add/Remove button.3. Follow the instructions to uninstall <strong>Speechify</strong>.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–5


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Installation on Red Hat LinuxInstalling the <strong>Speechify</strong> server1. Log into Linux as root.2. Insert the <strong>Speechify</strong> CD into the CD-ROM drive.3. Open a shell window and mount the CD-ROM. Then change directory to thedirectory that the CD-ROM device resides on. (This directory is usually /mnt/cdrom/.)4. Copy the server files onto the machine:rpm --install <strong>Speechify</strong>-Engine-2.1-5.i386.rpm5. You must install a <strong>Speechify</strong> voice before you can run the server. All voiceinstallations are described in a <strong>Speechify</strong> Language Supplement that corresponds tothe language being spoken. See the appropriate supplement for more information.Installing the <strong>Speechify</strong> client1. Log into Linux as root.2. Insert the <strong>Speechify</strong> CD into the CD-ROM drive.3. Open a shell window and mount the CD-ROM. Then change directory to thedirectory that the CD-ROM device resides on. (This directory is usually /mnt/cdrom/.)4. Copy the client files onto the machine:rpm --install <strong>Speechify</strong>-Client-2.1-5.i386.rpmFifth Edition, Update 21–6<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>NOTEBy default, the above packages install to /usr/local/bin and /usr/local/<strong>Speechify</strong>. To relocate the packages, specify the --prefix option to install toanother directory, e.g., rpm --install --prefix /home/<strong>Speechify</strong> <strong>Speechify</strong>-Engine-2.1-5.i386.rpm.Installed file locationsThis table shows where certain files are located after installing various packages:Directory/usr/local/bin//usr/local/<strong>Speechify</strong>/Voices//usr/local/<strong>Speechify</strong>/doc//usr/local/<strong>Speechify</strong>/lib//usr/local/<strong>Speechify</strong>/include//usr/local/<strong>Speechify</strong>/bin//usr/local/<strong>Speechify</strong>/samples//usr/lib/Descriptionengine/server binaryvoice datadocumentation filese-mail preprocessor dictionaryheader files for SWI librariespre-built sample applicationssource for sample applicationslibrary files for <strong>Speechify</strong> client, e-mailUninstalling the <strong>Speechify</strong> server1. Log into Linux as root.2. First, uninstall any voices you have installed. See the appropriate <strong>Speechify</strong>Language Supplement for details.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–7


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>3. Next, uninstall the <strong>Speechify</strong> server/engine files:rpm --erase <strong>Speechify</strong>-Engine-2.1-5Uninstalling the <strong>Speechify</strong> client1. Log into Linux as root.2. Next, uninstall the <strong>Speechify</strong> client files:rpm --erase <strong>Speechify</strong>-Client-2.1-5Fifth Edition, Update 21–8<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Installation on SolarisBy default, the packages below install in /opt/<strong>Speechify</strong>/bin. To relocate the packages,specify the –R option to install to another directory, e.g.:pkgadd -R /home/<strong>Speechify</strong> <strong>Speechify</strong>.Engine.<strong>2.1.5</strong>.i386.pkg.When installing these packages on a Sparc machine, replace the “i386” in theexample filename with “sparc”. Also, if you logged in as root using gunzip, you mayhave to explicitly set the path to where the gunzip executable is located. For example:/usr/local/bin/gunzip <strong>Speechify</strong>.Engine.<strong>2.1.5</strong>.i386.pkg.tar.gzInstalling the <strong>Speechify</strong> server1. Log into Solaris as root.2. Insert the <strong>Speechify</strong> CD into the CD-ROM drive and mount the CD-ROM.3. Copy the packages to an intermediate directory which is large enough toaccommodate them.4. Change to the directory you just created.5. Unpack the server software:gunzip <strong>Speechify</strong>.Engine.<strong>2.1.5</strong>.i386.pkg.tar.gztar xvf <strong>Speechify</strong>.Engine.<strong>2.1.5</strong>.i386.pkg.tarAdd the package:pkgadd -d . ttsEngine6. You must install a <strong>Speechify</strong> voice before you can run the server. All voiceinstallations are described in a <strong>Speechify</strong> Language Supplement that corresponds tothe language being spoken. See the appropriate supplement for more information.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–9


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Installing the <strong>Speechify</strong> client1. Log into Solaris as root.2. Insert the <strong>Speechify</strong> CD into the CD-ROM drive and mount the CD-ROM.3. Copy the packages to an intermediate directory which is large enough toaccommodate them.4. Change to the directory you just created.5. Unpack the client software:gunzip <strong>Speechify</strong>.Client.<strong>2.1.5</strong>.i386.pkg.tar.gztar xvf <strong>Speechify</strong>.Client.<strong>2.1.5</strong>.i386.pkg.tarAdd the package:pkgadd -d . ttsClientInstalled file locationsThis table shows where certain files are located after installing various packages:Directory/opt/<strong>Speechify</strong>/bin//opt/<strong>Speechify</strong>/doc//opt/<strong>Speechify</strong>/Voices//opt/<strong>Speechify</strong>/SDK/lib//opt/<strong>Speechify</strong>/SDK/include//opt/<strong>Speechify</strong>/SDK/bin//opt/<strong>Speechify</strong>/SDK/samples//usr/lib/Descriptionengine/server binarydocumentation filesvoice datae-mail preprocessor dictionaryheader files for SWI librariespre-built sample applicationssource for sample applicationslibrary files for <strong>Speechify</strong> client, e-mailFifth Edition, Update 21–10<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Configuring shared memoryIf you run the <strong>Speechify</strong> server on Solaris without configuring a large enough sharedmemory segment, the server prints an error and exits. Typically, Solaris machinesdefault to allowing only 1 MB of shared memory per segment whereas <strong>Speechify</strong>requires a much larger segment.To set the maximum shared memory segment size on Solaris 2.7 and 2.8, you canalter the value of the shmmax kernel parameter. The parameter is an upper limitwhich is checked by the system to see if it has enough resources to create the memorysegment.Increasing the size of the shared memoryEdit /etc/system (save a backup copy). Read the comments located near the bottomof the file to find the appropriate point for setting an integer variable or module andadd the following line:set shmsys:shminfo_shmmax=196608000Save, then reboot the machine to execute the setting. To check the setting, issue the“sysdef” command and grep for SHMMAX or “max shared memory segment.” The<strong>Speechify</strong> server was tested with a setting of 196608000 which yielded good results.Uninstalling the <strong>Speechify</strong> server1. Log into Solaris as root.2. First, uninstall any voices you have installed. See the appropriate <strong>Speechify</strong>Language Supplement for details.3. Next, uninstall the <strong>Speechify</strong> server/engine files:pkgrm ttsEngine<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–11


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Uninstalling the <strong>Speechify</strong> client1. Log into Solaris as root.2. Next, uninstall the <strong>Speechify</strong> client files:pkgrm ttsClientFifth Edition, Update 21–12<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Running the <strong>Speechify</strong> server from the consoleStarting <strong>Speechify</strong>The <strong>Speechify</strong> server can be started from the command line on all platforms. (It canalso run as a service, or daemon, on Windows. See Running <strong>Speechify</strong> as a WindowsService on page 1-17.)The <strong>Speechify</strong> binary is located in the /bin directory of the <strong>Speechify</strong> install directory.On UNIX platforms the binary is named <strong>Speechify</strong> and on Windows, <strong>Speechify</strong>.exe.<strong>Speechify</strong> The following table lists the command-line options and their descriptions. Someswitches are required and are noted as such. Most switches require additionalarguments.Switch--default_contenttypeDescriptionSpecifies a MIME content type to use for speak requests whenthe content_type parameter to SWIttsSpeak( ) is NULL. See“SWIttsSpeak( )” on page 7-96 for a list of supported contenttypes.Because most shells treat a semicolon as a special character, usea backslash to escape; for example:--default_contenttype text/plain\;charset=us-ascii--name --format (Required.) Specifies the name of the voice that <strong>Speechify</strong>loads. For example: “--name mara” or “--name rick”.(Required.) Specifies the audio format of the voice databasethat <strong>Speechify</strong> loads.❏ 8: 8 kHz, 8-bit µ-law❏ 16: 16 kHz, 16-bit linearFor example: “--format 8”.--language (Required.) Specifies the language of the voice database that<strong>Speechify</strong> loads. The argument consists of a 2-letter languageidentifier, a hyphen, and a 2-letter country code. For example:❏ US English: “--language en-US”❏ Parisian French: “--language fr-FR”<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–13


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Switch--port --voice_dir --help--versionDescription(Required.) Specifies the sockets port where the <strong>Speechify</strong>server should listen for incoming connections. If you arerunning multiple instances of <strong>Speechify</strong> on one servermachine, this number should be different for each instance.Here the term “port” refers to a sockets (networking) port, notto a single connection created between the client and theserver by the SWIttsOpenPort( ) function; you can havemultiple client connections to the same sockets port. Thisaffects the parameters you pass to the SWIttsOpenPort( ): anychange to the port number must be reflected in theconnectionPort parameter you pass to SWIttsOpenPort( ). (See“SWIttsOpenPort( )” on page 7-91 for more details about thisfunction.)Specifies the directory where <strong>Speechify</strong> can find the voicedatabase. This defaults to the value of the SWITTSSDKenvironment variable which is set during installation.Prints a short summary of these switches and then exits.Prints a short banner and then exits. The banner displays the<strong>Speechify</strong> release, the client/server protocol version, and thebuild date and time.Example:To start the 8 kHz version of the US English female voice (“mara”) on port 5555, usethe following command from the command line:<strong>Speechify</strong> --language en-US --name mara --format 8 --port5555The following group of options controls the logging functionality built into theserver. For more information, see “<strong>Speechify</strong> Logging” on page A-109.Option--verbose --diagnostic_file --event_file --system_logDescriptionSpecify a number greater than 0 to turn on detaileddiagnostics. By default, diagnostics are disabled.Send diagnostic and error information to a text file as wellas stdout. By default, diagnostics only go to stdout.Create an event log file which can be used for reportingsystem usage. By default, event reporting is disabled.Send error messages to the system logging service as wellas stdout or a text file. On Unix this is syslog. OnWindows, this is the event log.Fifth Edition, Update 21–14<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>The following group of options controls the process management and healthmonitoring functionality built into the server.On UNIX, <strong>Speechify</strong> creates a new server process to handle each connection from aclient and destroys that process when the connection closes. On Windows, before anyconnection occurs, <strong>Speechify</strong> starts a pool of server processes to service connectionrequests from clients and retains those processes when the connections close. Someoptions in the pool are configurable, and described below.OptionDescription--max_clients Specify the maximum size of the pool. <strong>Speechify</strong> nevercreates more child server processes than specified. Anyattempts to connect when at maximum capacity arerejected by sending an error to the client. Default: 32.--prestart (Windows only.) Specify the minimum size of the pool.When you launch <strong>Speechify</strong>, this is the number of serverprocesses started. If the number of simultaneousconnection requests from clients exceeds this number, theconnection succeeds but <strong>Speechify</strong> has to start a newserver process for it which may delay the connectiontime. When a client connection closes, <strong>Speechify</strong> checksto see if the number of servers exceeds the minimum and,if it does, closes that client’s corresponding server process.Default: value of max_clients.--health_interval (Windows only.) <strong>Speechify</strong> performs a periodic statuscheck of the pool of <strong>Speechify</strong> processes to make surethey are healthy. This option sets the amount of time inseconds between each status check. Default: 10 seconds.A value of 0 disables this check and the next four options.--processing_timeout (Windows only.) <strong>Speechify</strong> shuts down a server if itexceeds the amount of specified time for processing arequest without sending any data over the network to theclient. This option sets the amount of time in seconds.Default: 120 seconds.--write_timeout (Windows only.) <strong>Speechify</strong> shuts down a server if itexceeds the specified amount of time while trying to writeto a socket without success. This usually indicates a hungclient that isn’t reading from its end of the socket. Thisoption sets the amount of time in seconds. Default: 100seconds.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–15


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Option--idle_timeout --excess_fallback_timeoutDescription(Windows only.) <strong>Speechify</strong> shuts down a server if theserver has seen no network activity from the childconnected to it within a certain amount of time. Thisoption sets the amount of time in seconds. Default: 0seconds (deactivated).(Windows only.) When <strong>Speechify</strong> decides to shut down aserver process to keep the number of processes down tothe minimum, it can be instructed to wait in case anotherclient request comes in quickly. This option sets theamount of time to wait in seconds. A zero means<strong>Speechify</strong> won’t wait at all. Default: 600 seconds.Stopping <strong>Speechify</strong>Use Ctrl–c to stop the server on UNIX or Windows. If you started the server as abackground process, use ps and kill on UNIX or Task Manager on Windows to stopthe server.Fifth Edition, Update 21–16<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Running <strong>Speechify</strong> as a Windows ServiceConfiguring the serverOn Windows NT or 2000 systems you can run <strong>Speechify</strong> as a Windows Service. Youconfigure the command-line switches listed above via the Microsoft ManagementConsole (MMC). (If the <strong>Speechify</strong> installer doesn’t detect MMC on your NT system,it installs it for you. Windows 2000 machines already have MMC installed.)<strong>Speechify</strong> provides an MMC applet that allows you to edit and save configurationsfor the <strong>Speechify</strong> server. These configurations are created when you install a <strong>Speechify</strong>voice and are removed when you uninstall one.NOTEYou can disable each <strong>Speechify</strong> port configuration by accessing the Generalproperty page via the <strong>Speechify</strong> MMC console. Right-click on the desiredconfiguration, then choose Disable. When the <strong>Speechify</strong> NT Service is started,the disabled configuration is not launched.Starting the <strong>Speechify</strong> MMC appletTo access the <strong>Speechify</strong> Server Management MMC applet, use this path:Start >> Programs >> <strong>Speechify</strong> >> <strong>Speechify</strong> ServerManagementA <strong>Speechify</strong> Server Management applet window opens with a list of <strong>Speechify</strong> portconfigurations for the machine on the left and the parameters for the currentlyselected configuration on the right.Editing a port configurationAll port configurations appear in the left side of the window. Edit the configurationparameters using one of these methods:❏❏right-click the port name and choose Propertiesdouble-click any property listed on the right<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–17


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>A dialog box appears containing four tabs:❏❏❏❏GeneralVoiceHealthLogThe editable parameters in this dialog box directly correlate to the command-lineoptions listed in the section “Running the <strong>Speechify</strong> server from the console” on page1-13.TabGeneralVoiceHealthLogDescriptionThe General tab of the configuration dialog box contains several editable fields:❏ Port: see “--port ” on page 1-14.❏ Minimum number of child processes: see “--prestart ” on page 1-15.❏ Maximum number of child processes: see “--max_clients ” on page 1-15.❏ Drop excess child processes after X seconds: see “--excess_fallback_timeout ” on page1-16.Four fields control how <strong>Speechify</strong> starts a voice for this configuration:❏ Voice: see “--name ” on page 1-13.❏ Voice directory: see “--voice_dir ” on page 1-14❏ Language: see “--language ” on page 1-13❏ Format: see “--format ” on page 1-13<strong>Speechify</strong> monitors the status of the server processes it starts. If a problem is detected, <strong>Speechify</strong>shuts down the server process then restarts it. The settings are configurable on this tab:❏ Health Monitoring Interval: see “--health_interval ” on page 1-15.<strong>Speechify</strong> uses three metrics to detect problems. Entering a zero for the value can disable each ofthese. The metrics are:❏ Drop idle connection after X seconds: see “--idle_timeout ” on page 1-16.❏ Drop connection if processing any packet for X seconds: see “--processing_timeout ”on page 1-15.❏ Drop connection if trying to write a packet for X seconds: see “--write_timeout ” onpage 1-15.There are two options to control diagnostic logging for <strong>Speechify</strong>. (Note that <strong>Speechify</strong>always logs errors and alarms to NT’s event log when run as an NT service.)❏ Detail Level: see “--verbose ” on page 1-14.❏ Diagnostic Message file: see “--diagnostic_file ” on page 1-14.There is one option to set which controls the event logging for <strong>Speechify</strong>.❏ Event log: See “--event_file ” on page 1-14.Configuring for high channel densityIf you need to start more than 185 <strong>Speechify</strong> processes as a service, you may run intoa Windows limitation. To work around this limitation, you can edit the registry key:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SessionManager\SubSystems\WindowsFifth Edition, Update 21–18<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>In this key, the substring SharedSection is probably set to 1024,3072,512. Change itto:SharedSection=1024,3072Running and stopping the serverOn Windows 2000 systems, use the <strong>Speechify</strong> MMC to start and stop the <strong>Speechify</strong>service. On Windows NT systems, use the Services applet in the Control Panel tostart and stop the <strong>Speechify</strong> service.Starting or stopping the <strong>Speechify</strong> server on Windows 20001. In the <strong>Speechify</strong>MMC window, right-click <strong>Speechify</strong> Server Management.2. Select Start <strong>Speechify</strong> Service.The server does not start instantaneously; the initialization time depends on yoursystem’s speed and RAM. You may notice multiple new processes named<strong>Speechify</strong>.exe listed in the Task Manager. This is normal because <strong>Speechify</strong> is notmulti-threaded within one process. Instead, it launches a process to handle eachconcurrent connection.3. To stop all running configurations, right-click <strong>Speechify</strong> Server Management andselect Stop <strong>Speechify</strong> Service.Starting or stopping the <strong>Speechify</strong> server on Windows NT1. Go to Start >> Settings >> Control Panel.2. Open the Services applet.3. Scroll down the list of services to the <strong>Speechify</strong> entry and select Start or Stop tostart or stop the server.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 21–19


Installing and Configuring <strong>Speechify</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Fifth Edition, Update 21–20<strong>SpeechWorks</strong> Proprietary


CHAPTER 2Programming <strong>Guide</strong>This chapter contains an overview of features, and high-level information on groupsof API functions.For a detailed explanation of the API functions, see “API Reference” on page 7-69.In this chapter• Features on page 2-22• Order of API functions on page 2-23• Callback function on page 2-24• Sample time lines on page 2-26• Overview of user dictionaries on page 2-30• Dynamic parameter configuration on page 2-30• Character set support on page 2-31• Operational aids on page 2-31• Implementation guidelines on page 2-31<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 22–21


Programming <strong>Guide</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Features<strong>Speechify</strong> offers the following features:❏❏❏❏❏❏❏A client with a C language interface.A flexible threading model so that you can call the API functions from anythread in your process.<strong>Support</strong> for a variety of character sets.A variety of tags to be embedded in text to customize the speech output.Bookmark, wordmark, and phoneme-mark support, for synchronizing externalevents with the speech output.A set of dynamic user dictionaries for customizing pronunciations, which canbe modified at run time.<strong>Support</strong> for file-based dictionaries, to be loaded at initialization.Fifth Edition, Update 22–22<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Order of API functionsThis section describes an outline for your <strong>Speechify</strong> application:1. First, call the SWIttsInit( ) function to initialize the client library. This is doneonce per application process no matter how many TTS ports are opened so thatthe library can allocate resources and initialize data structures.You must call this function before calling any other; the other functions return anerror if it has not been called.2. Call SWIttsOpenPort( ) to open a TTS port on the server. This establishes aconnection to the server. SWIttsOpenPort( ) registers an application-definedcallback function with the <strong>Speechify</strong> client library. This callback function isinvoked to receive all audio, supporting information, and errors generated by theserver.3. Call SWIttsSpeak( ) to start synthesizing the specified text on the server.After receiving this call, the server begins sending audio packets to the callbackfunction defined above. The SWIttsSpeak( ) call itself returns without waiting forsynthesis to complete or even begin.4. Call SWIttsClosePort( ) after you finish making speak requests, to close down theTTS port. The client tells the server to shutdown this port and then closes theconnection to the server.5. Call SWIttsTerm( ) when your process is ready to shutdown. This tells the clientto shut itself down and clean up any resources.Pseudo-code for a simple <strong>Speechify</strong> application looks like this:SWIttsInit()SWIttsOpenPort()while (more TTS requests to make) {SWIttsSpeak()wait for SWItts_cbEnd to be received by the callback}SWIttsClosePort()wait for SWItts_cbPortClosed to be received by the callbackSWIttsTerm()Call SWIttsStop( ) to stop a TTS request before it has completed. Once this is called,the client sends no further information to your callback except a confirmation thatthe server has stopped acting on the request.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 22–23


Programming <strong>Guide</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Callback functionOf the functions mentioned above, all operate synchronously except forSWIttsSpeak( ), SWIttsStop( ), and SWIttsClosePort( ). When one of these threefunctions is called, it returns immediately before the operation is complete. A returncode indicating success only indicates that the message was communicated to theserver successfully. In order to capture the output produced and returned by theserver, you must provide a function for the client library to call. This function iscalled a callback function and you pass a pointer to it when you callSWIttsOpenPort( ). Your callback function’s behavior must conform to thedescription of the SWIttsCallback( ) function specified in “SWIttsCallback( )” onpage 7-75.When the client library receives data from the server, it calls your callback and passesa message and any data relevant to that message. The set of possible messages isdefined in the SWItts.h header file in the enumeration SWItts_cbStatus and listedbelow along with the type of data that is sent to the callback in each case:SWItts_cbStatusSWItts_cbAudioSWItts_cbBookmarkSWItts_cbDiagnosticSWItts_cbEndSWItts_cbErrorSWItts_cbLogErrorSWItts_cbPhonememarkSWItts_cbPingSWItts_cbPortClosedSWItts_cbStartSWItts_cbStoppedSWItts_cbWordmarkData typeSWIttsAudioPacketSWIttsBookMarkSWIttsMessagePacketNULLSWIttsMessagePacketSWIttsMessagePacketSWIttsPhonemeMarkNULLNULLNULLNULLSWIttsWordMarkThe SWIttsAudioPacket, SWIttsBookMark, SWIttsWordMark,SWIttsPhonemeMark, and SWIttsMessagePacket structures are defined and furtherexplained in “SWIttsCallback( )” on page 7-75.For a speak request that proceeds normally, message codes occur in the followingmanner:❏A SWItts_cbStart message arrives indicating that SWItts_cbAudio messages areabout to arrive.Fifth Edition, Update 22–24<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>❏❏One or more SWItts_cbAudio messages arrive along with theirSWIttsAudioPacket structures. The SWIttsAudioPacket contains a pointer to abuffer of audio samples.A SWItts_cbEnd message arrives indicating that all speech has been sent.For example, to write the synthesized speech to a file, the pseudo-code for yourcallback might look like this:if message == SWItts_cbStartopen fileelse if message == SWItts_cbAudiowrite buffer to fileelse if message == SWItts_cbEndclose fileYou must copy audio from the SWIttsAudioPacket structure to your own buffersbefore the callback returns. Once the callback returns, the client library may free theSWIttsAudioPacket structure.Sometimes a speak request does not proceed normally, such as when you callSWIttsStop( ) to stop a speak request before it completes. (You might want to do thisto support barge-in.) In this case, no more audio messages are sent to your callback,and instead of receiving the SWItts_cbEnd message, your callback receives theSWItts_cbStopped message, indicating that the speak request was stopped beforeproper completion.There are other ways that messages can proceed if you use the bookmark, wordmark,and phoneme-mark features. These are shown in the sample time lines below.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 22–25


Programming <strong>Guide</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Sample time linesGiven a high-level knowledge of the order of API functions required and the order inwhich the client calls your callback, here are some sample time lines to illustrate thecomplete order of messages.Simple interaction time lineFigure 2-1 shows a simple interaction where a speak request proceeds from start tofinish with no other messages such as errors or stop requests.ApplicationClientSWIttsInit()SWIttsOpenPort()SWIttsSpeak()SWItts_cbStartSWItts_cbAudioSWItts_cbAudio...SWItts_cbAudioSWItts_cbEndSWIttsClosePort()SWItts_cbPortClosedSWIttsTerm()timetimeFigure 2-1 Speak request from start to finishFifth Edition, Update 22–26<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Time line with bookmarksFigure 2-2 illustrates the application/client communication when bookmarks havebeen embedded in the input text and therefore bookmark messages are beingreturned with audio messages.ApplicationSWIttsInit()SWIttsOpenPort()SWIttsSpeak()SWItts_cbStartSWItts_cbWordmarkSWItts_cbPhonememarkSWItts_cbBookmark (x times)...ClientSWItts_cbAudio (y times)...SWItts_cbEndSWIttsClosePort()SWItts_cbPortClosedSWIttsTerm()timetimeFigure 2-2 Application/client communication with bookmarksFor example, if your input text is “This is \!bmbookmark1 a test.” you may expect thecallback would receive audio messages containing samples for the words “This is,”then a bookmark message, then audio messages for the words “a test.” Instead,<strong>Speechify</strong> only guarantees that a bookmark message is sent before any audio messagesthat contain the bookmark’s corresponding audio sample (indicated by thebookmark’s timestamp). That is, a bookmark never arrives “late” but it may andusually does arrive “early.”<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 22–27


Programming <strong>Guide</strong><strong>Speechify</strong> User’s <strong>Guide</strong><strong>Speechify</strong> server calculates and sends bookmark locations every time it sees thebeginning of a “phrase.” A phrase can be loosely understood as some period of speechthat is followed by a pause. Phrases are often delimited by the punctuation markssuch as periods, commas, colons, semicolons, exclamation points, question marks,and parentheses, but there is not an exact correspondence between punctuation andphrase boundaries.For simple input texts you can figure out when bookmarks are sent to your callback.For example, the text “This is a \!bmbookmark1 test,” is a single phrase, so thebookmark message arrives before any of the audio messages. The time line shown inFigure 2-2 illustrates that scenario. If your text is “Hello, this is a \!bmbookmark1test.” then your callback receives the audio messages for the phrase “Hello,” abookmark message, then the audio messages for the phrase “this is a test.”NOTEAny mark messages you use (e.g., bookmark, wordmark, phoneme mark)are always sent to the callback function before the corresponding audiomessages.For more complex input text, it may be difficult to predict when your bookmarkmessages will be returned, since there is not an exact correlation between punctuationand phrase boundaries. <strong>Speechify</strong>’s standard text processing inserts phrase boundariesat locations where there is no punctuation, in order to enhance understandability.Conversely, some punctuation marks do not correspond to a phrase boundary.Fifth Edition, Update 22–28<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Time line with SWIttsStop( )Figure 2-3 illustrates the communication between the application and the clientwhen the SWIttsStop( ) function is called. Note that the SWItts_cbEnd message isnot generated when this function is called. Instead you receive the SWItts_cbStoppedmessage. You always receive either SWItts_cbEnd or SWItts_cbStopped, but neverboth.ApplicationClientSWIttsInit()SWIttsOpenPort()SWIttsSpeak()SWItts_cbStartSWItts_cbAudio (n times)...SWIttsStop()SWItts_cbStoppedSWIttsClosePort()SWItts_cbPortClosedSWIttsTerm()timetimeFigure 2-3 SWIttsStop( ) function<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 22–29


Programming <strong>Guide</strong><strong>Speechify</strong> User’s <strong>Guide</strong>Overview of user dictionaries<strong>Speechify</strong> provides a set of dynamic user dictionaries that allow you to customize thepronunciation of words, abbreviations, acronyms, and other text strings. Thesedictionaries are described in detail in “User Dictionaries” on page 6-59. (Somelanguages do not have all types of dictionaries; see the appropriate languagesupplement for details.)❏❏❏The main dictionary is an all-purpose exception dictionary used to replace asingle token in input text with almost any other input sequence.The abbreviation dictionary is used to expand abbreviations, and provides forspecial treatment of period-delimited tokens.The root dictionary is used to specify pronunciations, orthographically or viaphonetic representations, of words or morphological word roots. Details of theroot dictionary vary by language.Any changes made to the dictionaries are reflected in the synthesis of subsequent text.You can make changes at any time during a connection (i.e., any time betweenSWIttsOpenPort( ) and SWIttsClosePort( )) except while synthesis is taking place.Dictionary changes only apply for the duration of the connection. If a connection isterminated and reestablished, all dictionary changes must be reapplied.For more information on the dictionary functions, see “API Reference” on page 7-69.Dynamic parameter configurationUse SWIttsSetParameter( ) to configure the <strong>Speechify</strong> server while it is running.Certain parameters are read-only; if you try to set them, the function returns an error.Other parameters’ valid values depend on the command line options passed to theserver at startup.Retrieve the value of individual parameters with SWIttsGetParameter( ).Both SWIttsSetParameter( ) and SWIttsGetParameter( ) are synchronous calls thatcannot be made while synthesis is taking place.For the list of valid parameters, see “SWIttsSetParameter( )” on page 7-94.Fifth Edition, Update 22–30<strong>SpeechWorks</strong> Proprietary


Character set support<strong>Speechify</strong> User’s <strong>Guide</strong>Character set support<strong>Speechify</strong> supports strings in multiple character sets as input to SWIttsSpeak( ),SWIttsAddDictionaryEntry( ), SWIttsDeleteDictionaryEntry( ), andSWIttsLookupDictionaryEntry( ). For a list of supported character sets, see“SWIttsSpeak( )” on page 7-96.Functions which return strings to the user, such as SWIttsGetDictionaryKeys( ) andSWIttsLookupDictionaryEntry( ), as well as bookmarks, return strings in the server’spreferred character set. See “Server’s preferred character set” on page 7-70 for moreinformation.Operational aidsTo verify your connection to the TTS port on the server, call SWIttsPing( ). Besidesverifying the low-level network connection, this confirms that a particular TTS portis alive and processing requests. It is an asynchronous function and the client sendsthe SWItts_cbPing event to your callback after it receives a reply from the server.Implementation guidelinesTo support multiple simultaneous TTS ports, you can employ a multi-threadedprocess model that opens multiple ports in one process or a multi-process model thatopens one port per process.The current implementation of the client spawns a thread for each open port. Thisthread is created in the SWIttsOpenPort( ) function and destroyed in theSWIttsClosePort( ) function, no matter which model you choose for your applicationcode. The spawned threads receive the asynchronous network events for that port,and then call your callback function. You must be efficient in your callback code;blocking or waiting for an event may cause networking errors.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 22–31


Programming <strong>Guide</strong><strong>Speechify</strong> User’s <strong>Guide</strong>The threads created by SWIttsOpenPort( ) have their stack sizes set to reasonabledefault values, which typically do not need to be changed. However, you can overridethe default by setting the SWITTSTHREADSTACKSIZE environment variable to adesired value (in bytes).Opening and closing a TTS port for every speak request is costly. The act of openingand closing a port requires some processing and resource management overhead onthe server, as does creating or tearing down the networking sockets on both the clientand server. Thus, we recommend that you open a port for the duration of a “session”(defined by you). This session could be for the length of a phone call or indefinitely.Fifth Edition, Update 22–32<strong>SpeechWorks</strong> Proprietary


CHAPTER 3Standard Text Normalization<strong>Speechify</strong>’s text normalization interprets the input text and converts it into a sequenceof fully spelled-out words. This process is primarily responsible for an intelligentreading of text, and includes:❏❏❏❏converting digitsexpanding abbreviations and acronymshandling punctuationinterpreting special characters such as currency symbolsRelevant languagesThis chapter covers the major features of <strong>Speechify</strong>’s default text normalization, whichapplies unless overridden by a main or abbreviation dictionary entry (see “UserDictionaries” on page 6-59) or an embedded tag (see “Embedded Tags” on page 4-43). The discussion in this chapter applies to de-DE, en-AU, en-GB, en-US, es-MX,fr-CA, fr-FR, and pt-BR, but not to ja-JP, illustrated with en-US examples whererelevant.Neutral expansionsIn general, the default normalization expands expressions neutrally unless there is adisambiguating context. For example, the expression “1/2” can be interpreted in avariety of ways: “one half,” “one over two,” “January second,” “February first,” and soon. In special cases it is possible to disambiguate the expression; for example, in “3 1/2” it is a fraction and should be read “one half.”When a disambiguating context cannot be identified, <strong>Speechify</strong> supplies a neutralreading, such as “one slash two,” on the theory that it is preferable to provide areading that is intelligible in all circumstances instead of a specialized reading whichis wrong for the context (e.g., reading “one half” when “January first” was intended).<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 23–33


Standard Text Normalization<strong>Speechify</strong> User’s <strong>Guide</strong>Level of detail provided for normalizationIn this chapter and in the language supplements, processing is described in generalterms, to help application developers construct main and abbreviation dictionaries,custom pre-processors, and marked-up text. It is beyond the scope of this chapter toprovide an exhaustive description of the entire set of sophisticated, context-sensitivetext normalization rules used in <strong>Speechify</strong>, and you may therefore observe deviationsfrom the generalizations provided here in specific instances.In this chapter•“Numbers” on page 3-34• “Numeric expressions” on page 3-35• “Mixed alphanumeric tokens” on page 3-35• “Abbreviations” on page 3-36• “E-mail addresses” on page 3-37•“URLs” on page 3-38• “Pathnames and filenames” on page 3-39• “Punctuation” on page 3-40• “Parentheses” on page 3-41•“Hyphen” on page 3-41•“Slash” on page 3-42Numbers<strong>Speechify</strong> handles cardinal and ordinal numbers in standard formats, floating pointdigits, negative numbers, and fractions where these are unambiguous. See theappropriate language supplement for language-specific details on formats andinterpretation.Fifth Edition, Update 23–34<strong>SpeechWorks</strong> Proprietary


Numeric expressions<strong>Speechify</strong> User’s <strong>Guide</strong>Numeric expressions<strong>Speechify</strong> handles dates, times, and monetary expressions in a variety of currenciesand formats. Individual languages also handle phone numbers and social securitynumbers. See the appropriate language supplement for language-specific details onformats and interpretation.Mixed alphanumeric tokensThis table shows en-US examples of <strong>Speechify</strong>’s expansion of strings that mixalphabetic and numeric characters.Format Handling Examples ExpansionCombination of uppercaseletters and numbersCombination of lowercase ormixed case letters andnumbersSpecial cases: languagespecificexceptionsSpell out letters.Default numberprocessing.Break into separatewords at letter/digitboundaries.Single word.32XCJ2EEVOS34Group1224hour12min55th1930s3RDthirty-two eks ceejay two ee eevee oh ess thirty-fourgroup twelvetwenty-four hourtwelve minutesfifty-fifthnineteen thirtiesthirdSee the appropriate language supplement for language-specific exceptions.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 23–35


Standard Text Normalization<strong>Speechify</strong> User’s <strong>Guide</strong>AbbreviationsAmbiguous abbreviationsThere are two types of ambiguity in abbreviations.❏❏More than one possible expansion; for example, “St.” means either “street” or“saint.” <strong>Speechify</strong> employs a set of context-sensitive rules to disambiguate thistype of ambiguous abbreviation.The “abbreviation” can be either a whole word or an abbreviated form: e.g.,“in.” is either the word “in” or the abbreviation for “inches.”A token of the second type is interpreted as an abbreviation only in specificdisambiguating contexts. For instance, when the abbreviation ends in a period, andthe following word begins with a lowercase letter, it is interpreted as an abbreviation.For example:InputThere are 5 in. of snow on the ground.Put 5 in. Then take 3 out.Expansionthere are five inches of snow on the ground.put five in. then take three out.Note that abbreviations do not necessarily end in a period:InputWe met Mr Smith yesterday.Expansionwe met mister smith yesterday.Periods after abbreviationsOnce the abbreviation is expanded, if it ended in a final period it must be determinedwhether or not the period also indicates a sentence end:InputThere are 5 ft. of snow on the ground.It snowed 5 ft. The next day it all melted.Expansionthere are five feet of snow on the ground.it snowed five feet. the next day it all melted.Fifth Edition, Update 23–36<strong>SpeechWorks</strong> Proprietary


E-mail addresses<strong>Speechify</strong> User’s <strong>Guide</strong>Measurement abbreviationsMeasurement abbreviations are expanded to singular or plural depending on thevalue of a preceding number or other disambiguating context. In the absence of adisambiguating context, they default to a plural expansion. For example:InputThere are 5 in. of snow on the ground.There is 1 in. of snow on the ground.There are 1.1 in. of snow on the ground.How many cm. are in a km?cm.Expansionthere are five inches of snow on the ground.there is one inch of snow on the ground.there are one point one inches of snow on theground.how many centimeters are in a kilometer?centimetersE-mail addressesAn e-mail address is divided into two portions, a username and a domain name,separated by an at sign (@). A phrase break is inserted following the username.Symbols in the e-mail address are read by name. The following table illustrates withen-US expansions.Symbol Expansion@at. dot- dash_underscore<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 23–37


Standard Text Normalization<strong>Speechify</strong> User’s <strong>Guide</strong>UsernameThe username is spelled out character by character, unless word boundaries areindicated unambiguously. Sequences of two and three letters are always spelled out.Digit sequences are read digit by digit. Examples (commas are used to indicatephrasing):Usernamebruce_smithjohn.jonesstar-chasersteve5050redrfrmacExpansionbruce underscore smith,john dot jones,star dash chaser,steve five zero five zero,ar ee dee,ar eff ar emm ay cee,Domain nameTwo- and three-letter domain and country extensions are either read as words orspelled out, following standard convention. The host name is read as a single word,unless word boundaries are indicated unambiguously. English examples:Host nameaccess1.nethawaii.rr.comcornell.eduamazon.co.ukExpansionaccess one dot nethawaii dot ar ar dot comcornell dot ee dee youamazon dot cee oh dot you kayURLsA token beginning with “www.” or “http://” or “ftp://” is interpreted as a URL. Aphrase break is inserted following “http://” or “ftp://,” and the “://” piece is notpronounced.Fifth Edition, Update 23–38<strong>SpeechWorks</strong> Proprietary


Pathnames and filenames<strong>Speechify</strong> User’s <strong>Guide</strong>Symbols in a URL are expanded as follows in US English:SymbolExpansion/ URL-final: not expandedotherwise: slash. dot- dash_underscoreEach slash-delimited segment of the URL is expanded as follows: Two- and threeletterdomain and country extensions are either read as words or spelled out,following standard conventions. Each remaining segment is read as a single word,unless word boundaries are indicated unambiguously. Examples:URLhttp://www.lobin.freeserve.co.uksarahs-price/page001.htmlwww.serbia-info.com/news/Expansionaitch tee tee pee, double you double you double youdot lobin dot freeserve dot cee oh dot you kaysarahs dash price slash page zero zero one dot aitch teeem elldouble you double you double you dot serbia dash infodot com slash newsPathnames and filenamesForward and backward slashes are not expanded at the end of a pathname. In othercontexts, they are expanded by name. Symbols in pathnames are expanded by name.The following table illustrates the en-US expansions.SymbolExpansion/ URL-final: not expandedotherwise: slash\ URL-final: not expandedotherwise: backslash. dot- dash_underscore<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 23–39


Standard Text Normalization<strong>Speechify</strong> User’s <strong>Guide</strong>Each slash-delimited segment of a pathname is read as a single word, unless wordboundaries are unambiguously indicated. Common filename extensions are read as aword or spelled out, following standard conventions.PathnameC:\docs\my_book\chapter12.doc/product/release/speechify-2-1-5/release-notes.txtExpansioncee backslash docs backslash my underscore bookbackslash chapter twelve dot docslash product slash release slash speechify dash twodash one dash five slash release dash notes dot tee eksteePunctuationPunctuation generally triggers a phrase break, except in a limited set of special casesthat are determined on a language-specific basis. Some English examples of commasthat don't produce phrase breaks:InputHe lives in Boston, Massachusetts.That is a very, very old building.John won’t come, either.<strong>SpeechWorks</strong> International, Inc.Drive D: is full.Expansionhe lives in boston massachusetts.that is a very very old building.john won’t come either.speechworks international incorporated.drive dee is full.Fifth Edition, Update 23–40<strong>SpeechWorks</strong> Proprietary


Parentheses<strong>Speechify</strong> User’s <strong>Guide</strong>ParenthesesParentheses generally trigger a phrase break, except in a very limited set of specialcases which vary from language to language. See the appropriate languagesupplement for relevant examples. Some English examples (commas used to indicatephrasing).InputTom (my son) and Susan (my daughter)book(s)getText()ExpansionTom, my son, and Susan, my daughterbooksget textHyphenThe hyphen is read neutrally as “dash” (or the equivalent) unless it can bedisambiguated with a high degree of confidence. See the appropriate languagesupplement for language-specific examples.InputExpansionmid-90s mid ninetiesA-1 ay one32-bit thirty-two bit-7 minus sevenApril 3-4 April third to fourthpp. 35-40 pages thirty-five to forty1974-1975 nineteen seventy-four to nineteen seventy-five2-3 inches two to three inches3-2 three dash two (since the desired expansion could be “three to two,” “threeminus two,” “three two,” etc.)<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 23–41


Standard Text Normalization<strong>Speechify</strong> User’s <strong>Guide</strong>SlashA slash is read as “slash” (or the equivalent) unless one of the following is true:a. The following word is a unit of measure, when the slash is read as per (or theequivalent),b. When the entire token is a familiar expression.See the appropriate language supplement for language-specific examples.InputSeattle/Bostoncm/seche/sheExpansionseattle slash bostoncentimeters per secondhe or sheFifth Edition, Update 23–42<strong>SpeechWorks</strong> Proprietary


CHAPTER 4Embedded TagsEmbedded tags are special codes that can be inserted into input text to customize<strong>Speechify</strong>’s behavior in a variety of ways. You can use embedded tags to:❏❏❏❏❏Create pauses at specified points in the speech output.Customize word pronunciations.Spell out sequences of characters by name.Specify the interpretation of certain numeric expressions.Insert bookmarks in the text so your application receives notification when aspecific point in the synthesis job has been reached.In this chapter• “<strong>Speechify</strong> tag format” on page 4-44• “Creating pauses” on page 4-44• “Indicating the end of a sentence” on page 4-45• “Customizing pronunciations” on page 4-46• “Inserting bookmarks” on page 4-50• “Controlling the audio characteristics” on page 4-51<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 24–43


Embedded Tags<strong>Speechify</strong> User’s <strong>Guide</strong><strong>Speechify</strong> tag formatA <strong>Speechify</strong> tag begins with the sequence \! followed immediately by a string ofalphanumeric characters. For example:\!p300\!tscSynthesize a pause of 300 ms.Pronounce all characters individually by name.Separate a tag from any preceding input by at least one unit of white space. A tagcannot be followed immediately by an alphanumeric character, though most tags maybe followed immediately by punctuation, parentheses, and similar symbols. (Thebookmark tag is an exception, since it must end in white space; see “Insertingbookmarks” on page 4-50.)Any sequence of non-white-space characters beginning with the prefix \! that is not arecognizable <strong>Speechify</strong> tag is ignored in the speech output.Creating pausesUse a pause tag to create a pause of a particular duration at a specified point in thespeech output.Pause tag\!pN\!p0DescriptionCreate a pause N milliseconds long.This is ignored since it cannot be used to remove a rule-generatedpause. The smallest possible value of the pause tag is 1.The maximum value of the integer argument in a pause tag is 32767. To create alonger pause, use a series of consecutive pause tags.The behavior produced by the pause tag varies depending on its location in the text.When a pause tag is placed immediately before a punctuation mark, the standardpause duration triggered by the punctuation is replaced by the pause durationspecified in the tag. In other locations, the tag creates an additional pause.Fifth Edition, Update 24–44<strong>SpeechWorks</strong> Proprietary


Indicating the end of a sentence<strong>Speechify</strong> User’s <strong>Guide</strong>For example, sentence (a) has a default 150 ms pause at the comma. Sentences (b)and (c) replace the default pause with a longer and shorter pause, respectively, whilesentence (d) inserts an additional pause of 300 ms, resulting in a total pause durationof 450 ms. In sentence (e) a 25 ms pause is inserted in a location where no pausewould otherwise occur.Input(a) Tom is a good swimmer, because hetook lessons as a child.(b) Tom is a good swimmer \!p300,because he took lessons as a child.(c) Tom is a good swimmer \!p100,because he took lessons as a child.(d) Tom is a good swimmer, \!p300because he took lessons as a child.(e) Tom is a good swimmer, because hetook lessons \!p25 as a child.PronunciationTom is a good swimmer (150 ms pause), becausehe took lessons as a child.Tom is a good swimmer (300 ms pause), becausehe took lessons as a child.Tom is a good swimmer (100 ms pause), becausehe took lessons as a child.Tom is a good swimmer (150 ms pause), (300 mspause) because he took lessons as a child.Tom is a good swimmer (150 ms pause), becausehe took lessons (25 ms pause) as a child.Indicating the end of a sentenceUse an end-of-sentence tag to trigger an end of sentence at the preceding word,whether or not that word is followed by punctuation.End of sentence tag Description\!eosTreat the preceding word as the end of sentence.This tag is useful for forcing a sentence end in unpunctuated text, such as lists, orafter a period-final abbreviation when the context does not unambiguously determinea sentence end. Here are some examples. (Periods in the Pronunciation columnindicate sentence ends.)Inputapplespearsbananasapples \!eospears \!eosbananas \!eosPronunciationapples pears bananasapples. pears. bananas.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 24–45


Embedded Tags<strong>Speechify</strong> User’s <strong>Guide</strong>InputThe Dow industrials were down 15.50 at4,585.90 at 1:59 p.m. NYSE decliners ledadvancers…The Dow industrials were down 15.50 at4,585.90 at 1:59 p.m. \!eos NYSE declinersled advancers…Pronunciationthe dow industrials were down fifteen pointfive zero at four thousand five hundred eightyfivepoint nine zero at one fifty-nine pea emmnew york stock exchange decliners ledadvancers…the dow industrials were down fifteen pointfive zero at four thousand five hundred eightyfiveat one fifty-nine pea emm. new york stockexchange decliners led advancers…Customizing pronunciationsIn certain cases, you may want to specify a pronunciation that differs from the onegenerated by <strong>Speechify</strong>’s internal text analysis rules. The tags described in this sectionare used to modify <strong>Speechify</strong>’s default text processing behavior in a variety of ways:❏ “Language-specific customizations” on page 4-46❏ “Character spellout modes” on page 4-47❏ “Pronouncing numbers and years” on page 4-49❏ “Customizing word pronunciations” on page 4-50Language-specific customizationsSome languages have tags that solve problems specific to those languages. Forexample:❏❏Mexican Spanish (es-MX) has tags for pronouncing the dollar sign ($) as“pesos” or “dollars” depending on the needs of the application.United States English (en-US) has a tag for pronouncing postal addresses.Such tags are described per language in the corresponding <strong>Speechify</strong> LanguageSupplement.Fifth Edition, Update 24–46<strong>SpeechWorks</strong> Proprietary


Customizing pronunciations<strong>Speechify</strong> User’s <strong>Guide</strong>Character spellout modesThe following tags are used to trigger character-by-character spellout of subsequenttext.Spellout mode tag\!ts0\!tsc\!tsa\!tsrDescriptionDefault mode.All-character spellout mode: pronounce all charactersindividually by name.Alphanumeric spellout mode: pronounce only alphanumericcharacters by name.Radio spellout mode: like alphanumeric mode, but alphabeticcharacters are spelled out according to the InternationalRadio Alphabet. <strong>Support</strong>ed in English only.For example:InputMy account number is\!tsa 487-B12.My account number is\!tsc 487-B12.The last name is spelled\!tsr Dvorak.PronunciationMy account number is four eight seven bee one two.My account number is four eight seven dash bee one two.The last name is spelled delta victor oscar romeo alpha kilo.Spellout modes remain in effect until they are turned off by the tag \!ts0, whichrestores the engine to its default processing mode. For example:InputThe composer’s name is spelled \!tsa FranzLiszt \!ts0 and pronounced “Franz Liszt.”PronunciationThe composer’s name is spelled eff ar ay en zeeell aye ess zee tee and pronounced Franz Liszt.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 24–47


Embedded Tags<strong>Speechify</strong> User’s <strong>Guide</strong>There are many words which are spelled out as a result of <strong>Speechify</strong>’s default textprocessing behavior. In such cases, the use of a spellout mode tag may have noadditional effect. For example:InputHe works for either the CIA or theFBI.He works for either the \!tsa CIA \!ts0or the \!tsa FBI \!ts0.PronunciationHe works for either the cee aye ay or the eff bee aye.He works for either the cee aye ay or the eff bee aye.In alphanumeric and radio spellout modes, punctuation is interpreted exactly as it isin the default (non-spellout) mode; i.e., in most contexts it triggers a phrase break. Inall-character spellout mode, punctuation is spelled out like any other character, ratherthan being interpreted as punctuation. Speech output continues without pause untilthe mode is turned off. For example:Input\!tsa 3, 2, 1 \!ts0 blastoff.\!tsc 3, 2, 1 \!ts0, blastoff.Pronunciationthree, two, one, blastoffthree comma two comma one, blastoffFifth Edition, Update 24–48<strong>SpeechWorks</strong> Proprietary


Customizing pronunciations<strong>Speechify</strong> User’s <strong>Guide</strong>Pronouncing numbers and yearsIn some languages (for example, English and German), a four digit numeric sequencewith no internal commas or trailing decimal digits, like 1984, can be interpretedeither as a year (nineteen eighty four) or as a quantity (one thousand nine hundred eightyfour). <strong>Speechify</strong> applies the year interpretation by default, as in:InputHe was born in May 1945.PronunciationHe was born in May nineteen forty five.To override or restore the default year interpretation, use the following tags:Year mode tag\!ny0\!ny1DescriptionQuantity interpretation.Year interpretation (default).For example:InputIn May \!ny0 1945 peopleemigrated.PronunciationIn May one thousand nine hundred forty five people emigrated.Each tag remains in effect until the interpretation is toggled by the use of the othertag. For example:Input\!ny0 1945 \!ny1 peopleemigrated in 1945.PronunciationOne thousand nine hundred forty five people emigrated innineteen forty five.NOTEThese tags have no effect in French or Spanish, where there is no difference inpronunciation between the year and quantity interpretations.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 24–49


Embedded Tags<strong>Speechify</strong> User’s <strong>Guide</strong>Customizing word pronunciationsYou can use a Symbolic Phonetic Representation (SPR) to specify the exactpronunciation of a word using <strong>Speechify</strong>’s special phonetic symbols. The SPR tagtakes the following form:SPR tag\![SPR]DescriptionPronounce the word phonetically as specified inside the square brackets.NOTEUnlike other tags, the \![SPR] tag does not modify the pronunciation offollowing text. Instead, it is used in place of the word for which it specifies apronunciation.“Symbolic Phonetic Representations” on page 5-55 provides detailed information onformatting SPRs.Inserting bookmarksA bookmark tag inserted into the text triggers notification when that point in thesynthesis job is reached. Bookmarks are useful for timing other events with the speechoutput. For example, you may want to insert additional audio between specific pointsin the speech data, or log that the user heard a certain portion of the text before youstop the speech. Bookmark tags have the following format:Bookmark tag\!bmstrDescriptionInsert a user bookmark with identifier str, which is any string endingin white space and containing no internal white space.Fifth Edition, Update 24–50<strong>SpeechWorks</strong> Proprietary


Controlling the audio characteristics<strong>Speechify</strong> User’s <strong>Guide</strong>When you insert a bookmark, the client sends a data structure to your callbackfunction. This data structure contains a bookmark identifier and a timestampindicating where in the audio data the bookmark occurs. (See “SWIttsCallback( )” onpage 7-75.) The timestamps in the bookmark structures ensure that the bookmarksare resolved to the correct point in the audio data. For example:InputThe clarinet \!bmCL came in just beforethe first violins. \!bmVL1OutputThe clarinet came in just before the first violins.Bookmark \!bmCL triggers a bookmark data structure containing the identifier CLand the timestamp of the sample just after clarinet. Bookmark \!bmVL1 triggers adata structure containing the identifier VL1, and the timestamp at the end of theaudio data for this sentence.See the sample “Time line with bookmarks” on page 2-27 and its description formore information about bookmark notifications.NOTEThe bookmark identifier is a string of any non-white-space characters, and itmust be delimited by final white space. It is returned as a 0-terminated widecharacter string. (See “SWIttsCallback( )” on page 7-75.)Controlling the audio characteristicsEach voice has a default volume and speaking rate, known as the baseline (default)values. An application can adjust these values for each open port individually. You canalso use tags to adjust these settings within the text. Thus, there are three possiblesettings for rate and volume, in order of precedence:1. Embedded tags – set relative to port-specific or baseline default, described below.Separate embedded tags control volume and speaking rate.2. Port-specific settings – use SWIttsSetParameter( ) to set tts.audio.volume andtts.audio.rate as percentages of the baseline values. These port-specific values areused if there are no overriding embedded tags.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 24–51


Embedded Tags<strong>Speechify</strong> User’s <strong>Guide</strong>3. Baseline – the default values for the voice when you install it; you cannot changethese values, only override them.For example, you can set a port-specific value for volume if the baseline value is tooloud for your application. Set tts.audio.volume to a value less than 100 (percent) tomake the volume quieter than the default. To adjust the volume within the text, usethe embedded tags.There are embedded tags to adjust volume and rate relative to the port-specific valuesand relative to the baseline values. For example, if you want a particular word orphrase to be spoken more quickly than the rest of the text, you can use the embeddedtag to change the rate relative to the port-specific rate. If you want a particular wordor phrase spoken at the baseline volume, no matter what volume the rest of the textis, use an embedded tag to set the volume to 100 percent of the baseline volume.Setting a parameter relative to the port-specific value is useful if the port-specificvalue is unknown at the time that the source text is composed. For example, settingthe volume of a word or phrase to 50 percent of the port-specific value guaranteesthat the corresponding audio is half as loud as the surrounding text. Conversely,setting the volume to 50 percent of the baseline default has the same effect only if theport-specific value is set to 100; if the port-specific value is already 50 percent ofbaseline, there is no difference for the tagged piece of text.Volume controlThe server produces audio output at the default volume, which is the maximumamplitude that may be obtained for all utterances without causing distortion(clipping). This setting minimizes quantization noise in the digital waveform that isdelivered to the application by using all of the available numerical range. Theapplication can override this on any given port by using the SWIttsSetParameter( )function to set the port-specific volume (tts.audio.volume) to a lower value.The port-specific volume applies to all text spoken on that port, i.e., in thatconversation. To adjust the volume for a particular word or phrase, use these tags tovary volume:Volume control tags\!vpN\!vprDescriptionSet volume to the given percentage of the port-specific value.(N is an integer greater or equal to 0.)Reset volume to the port-specific value.Fifth Edition, Update 24–52<strong>SpeechWorks</strong> Proprietary


Controlling the audio characteristics<strong>Speechify</strong> User’s <strong>Guide</strong>Volume control tags\!vdN\!vdrDescriptionSet volume to the given percentage of the baseline defaultvalue. (N is an integer greater or equal to 0.)Reset volume to the server default value.Note that the tags allow the effective volume to be set above 100, for example,\!vd120. The server attempts to scale its audio output accordingly, and may produceclipped and distorted output.These examples assume port-specific volume has been set to 50 withSWIttsSetParameter( ) prior to each SWIttsSpeak( ) call:InputThe flight departs at \!vp150 sevenfifty\!vpr in the evening.Severe \!vd60 storm warning \!vd50has been issued for \!vd80 Suffolk andNassau \!vpr counties.Result[volume = 50] The flight departs at [volume = 75]seven-fifty [volume = 50] in the evening.[volume = 50] Severe [volume = 60] storm warning[volume = 50] has been issued for [volume = 80]Suffolk and Nassau [volume = 50] counties.Speaking rate controlThe baseline speaking rate of the synthesized speech is chosen to result in the highestaudio quality. Departures from this rate are generally detrimental to audio qualityand should be employed with care. Extreme variations may result in unintelligible orunnatural-sounding speech. To improve the match between the desired and obtainedrates, include at least a few words in the scope of a rate change tag.Server performance generally decreases when rate change is requested, becauseincreased computation is necessary to implement rate variation. The exactperformance impact depends on the given text and the magnitude of the requestedrate change.Note that changes in speaking rate do not translate precisely into duration changes.So an utterance spoken at half the default rate is not exactly twice as long as the sameutterance spoken as the default rate, although it is close. Also, rate changes affect theduration of any pauses in the output – including pauses that are specified explicitlywith a \!p tag.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 24–53


Embedded Tags<strong>Speechify</strong> User’s <strong>Guide</strong>The following tags control rate changes within an utterance:Speaking rate control tags Description\!rpN\!rpr\!rdN\!rdrSet rate to the given percentage of the port-specific value.(N is an integer greater than 0.)Reset rate to the port-specific value.Set rate to the given percentage of the server default value.(N is an integer from 33 to 300.)Reset rate to the server default value.The rate cannot be set to less than one-third or more than 3 times the server default.Attempting to do so results in a value at the corresponding minimum or maximum.These examples assume port-specific rate has been set to 200 withSWIttsSetParameter( ) prior to each SWIttsSpeak( ) call:InputThe flight departs at \!rp50 sevenfifty\!rpr in the evening.\!rd150 The share price of AcmeCorporation stock is \!rd120 $7.00Result[rate = twice default] The flight departs at [rate =default] seven-fifty [rate = twice default] in the evening.[rate = twice default] [rate = 1.5 times default] Theshare price of Acme Corporation stock is [rate = 1.2times default] seven dollars.Fifth Edition, Update 24–54<strong>SpeechWorks</strong> Proprietary


CHAPTER 5Symbolic Phonetic RepresentationsA Symbolic Phonetic Representation (SPR) is the phonetic spelling used by <strong>Speechify</strong>to represent the pronunciation of a single word. An SPR represents:❏❏❏The sounds of the wordHow these sounds are divided into syllablesWhich syllables receive stress, tone, or accentYou can use SPRs as input to <strong>Speechify</strong> in order to specify pronunciations that aredifferent from those generated by the system automatically. SPRs are used in twodifferent ways:❏Enter an SPR directly into text in place of the ordinary spelling of a word. Inthe following example, an SPR replaces the word root in order to trigger apronunciation that rhymes with foot rather than boot:We need to get to the \![rUt] of the problem.❏To make the pronunciation change more permanent, enter an SPR as thetranslation value of either a main or root dictionary entry. The specifiedpronunciation is then generated whenever that word is encountered in any text.(See “User Dictionaries” on page 6-59.)In this chapter• “SPR format” on page 5-56• “Syllable boundaries” on page 5-56• “Syllable-level information” on page 5-56• “Speech sound symbols” on page 5-57<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 25–55


Symbolic Phonetic Representations<strong>Speechify</strong> User’s <strong>Guide</strong>SPR formatAn SPR consists of a sequence of allowable SPR symbols for a given language,enclosed in square brackets [ ] and prefixed with the embedded tag indicator \! (See“Embedded Tags” on page 4-43.)These examples are valid SPRs in US English:though \![.1Do]shocking \![.1Sa.0kIG]The periods signal the beginning of a new syllable, the digits 1 and 0 indicate thestress levels of the syllables, and the letters D, o, S, a, k, I, and G represent specificUS English speech sounds. Each of these elements of the SPR is discussed below.An SPR entry which does not conform to the requirements detailed in this chapter isconsidered invalid and ignored.Syllable boundariesYou can use periods to delimit syllables in an SPR, although they are not required. Inde-DE, these periods determine the location of the syllable boundaries of the word.In other languages the periods are only used to enhance the readability of the SPR,and do not affect the way the word is syllabified in the speech output. <strong>Speechify</strong>’sinternal syllabification rules apply as usual to divide the word into syllables. See theappropriate language supplement for details.Syllable-level informationUse digits to indicate the stress or (in Japanese) the pitch accent of the syllable. Inlanguages other than Japanese, a polysyllabic word with no stress marked is assigned adefault stress. To ensure the correct pronunciation of the word, you should markstress explicitly in the SPR. Marking stress and pitch accent in an SPR varies bylanguage and is described in the appropriate <strong>Speechify</strong> Language Supplement.Fifth Edition, Update 25–56<strong>SpeechWorks</strong> Proprietary


Speech sound symbols<strong>Speechify</strong> User’s <strong>Guide</strong>Speech sound symbolsEach language uses its own inventory of SPR symbols for representing its speechsounds. See the appropriate <strong>Speechify</strong> Language Supplement for a table of SPR symbolsand examples of words in which each sound occurs. These tables show valid symbolsfor vowels, consonants, syllable stresses, and syllable boundaries.NOTELetters are case-sensitive, so \![e] and \![E] represent two distinct sounds.Multi-character symbols must be contained in single quotes; for example,French peu is represented \![p’eu’]. SPRs containing sound symbols that do notbelong to the inventory of the current language are considered invalid, andignored.Some speech sounds have limited distributional patterns in specific languages. Forexample, in English, the sound [G] of sing \![.1sIG] does not occur at the beginningof a word. Other US English sounds that have a particularly narrow distribution arethe flap [F], and the syllabic nasal [N]. Entering a sound symbol in a context where itdoes not normally occur may result in unnatural-sounding speech.<strong>Speechify</strong> applies a sophisticated set of linguistic rules to its input to reflect theprocesses by which sounds change in specific contexts in natural language. Forexample, in US English, the sound [t] of write \![.1rYt] is pronounced as a flap [F] inwriter \![.1rY.0FR]. SPR input undergoes these modifications just as ordinary textinput does. In this example, whether you enter \![.1rY.0tR] or \![.1rY.0FR], thespeech output is the same.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 25–57


Symbolic Phonetic Representations<strong>Speechify</strong> User’s <strong>Guide</strong>Fifth Edition, Update 25–58<strong>SpeechWorks</strong> Proprietary


CHAPTER 6User Dictionaries<strong>Speechify</strong> provides a set of user dictionaries for customizing the pronunciations ofwords, abbreviations, acronyms, and other sequences. For example, if you want thesequence SWI to be pronounced <strong>SpeechWorks</strong> every time it occurs in any text, you cancreate a dictionary entry specifying this pronunciation.NOTEThe user dictionaries should be used only for permanent pronunciationchanges. They apply from the time the entries are added (see“SWIttsAddDictionaryEntry( )” on page 7-73) until the connection with theserver is terminated or until the entry is explicitly removed from thedictionary. For customized pronunciations which apply only to specificinstances of a text string, use Symbolic Phonetic Representations and the otherembedded tags described in “Symbolic Phonetic Representations” on page 5-55 and “Embedded Tags” on page 4-43.In this chapter• “Overview of dictionaries” on page 6-60• “Main dictionary” on page 6-60• “Abbreviation dictionary” on page 6-61• “Root dictionary” on page 6-62• “Choosing a dictionary” on page 6-63• “File-based dictionaries” on page 6-66<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 26–59


User Dictionaries<strong>Speechify</strong> User’s <strong>Guide</strong>Overview of dictionariesA dictionary entry consists of a key and a translation value. When the key matches astring in the input text, the translation value replaces it, and the translation is thenpronounced according to <strong>Speechify</strong>’s internal pronunciation rules.<strong>Speechify</strong> provides three user dictionaries:❏❏❏Main dictionaryAbbreviation dictionaryRoot dictionaryEach dictionary is characterized by the kinds of keys and translation values it accepts.An invalid dictionary key or translation causes dictionary lookup to fail, and the textstring is handled as if the dictionary entry did not exist.Dictionary lookups ignore surrounding parentheses, quotation marks, and the like.For example, if WHO is a key in a main dictionary entry, it matches input stringssuch as (WHO) and “WHO”.Dictionary entries can be added at run time via the API (see“SWIttsAddDictionaryEntry( )” on page 7-73), or they can be loaded from a file atstartup. See “File-based dictionaries” on page 6-66 for details.Main dictionaryThe main dictionary is an all-purpose exception dictionary which can replace a wordin input text with almost any type of input string. It is more permissive than the otherdictionaries in the make-up of its keys and translations: a valid translation consists ofany valid input string (with the exception of SAPI and SSML tags), and a valid keymay contain any characters other than white space (except that the final character ofthe key may not be a punctuation symbol). You can use the main dictionary for:❏❏❏❏❏Strings that translate into more than one wordTranslations which include Symbolic Phonetic Representations or otherembedded tagsURLs and e-mail addresses with idiosyncratic pronunciationsKeys containing digits or other non-alphabetic symbolsAcronyms with special pronunciationsFifth Edition, Update 26–60<strong>SpeechWorks</strong> Proprietary


Abbreviation dictionary<strong>Speechify</strong> User’s <strong>Guide</strong>Additional notes on main dictionary entriesThe main dictionary is case-sensitive. For example, lowercase who does not match amain dictionary key WHO.Main dictionary translations may include <strong>Speechify</strong> embedded tags but not SAPI orSSML tags.For formatting requirements and examples of valid main dictionary keys andtranslations, see the appropriate <strong>Speechify</strong> Language Supplement.Abbreviation dictionaryThe abbreviation dictionary is designed to handle word abbreviations which translateto one or more words in ordinary spelling.Like the main dictionary, the abbreviation dictionary is case-sensitive. For example, ifyou enter the key Mar with translation march, lowercase mar is still pronounced asmar.For formatting requirements and examples of valid abbreviation dictionary keys andtranslations, see the appropriate <strong>Speechify</strong> Language Supplement.Interpretation of trailing periods in the abbreviation dictionaryWhen you enter a key in the abbreviation dictionary, it is not necessary to include the“trailing” period that is often the final character of an abbreviation (as in the finalperiod of etc.). If the key does not contain the trailing period, it matches input textboth with and without the period. However, if you want an abbreviation to be<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 26–61


User Dictionaries<strong>Speechify</strong> User’s <strong>Guide</strong>pronounced as specified in the translation only when it is followed by a period in thetext, then you must enter the trailing period in the key. The following tablesummarizes the use of trailing periods in entry keys:Keyinvsid.Matchesinv.invsid.An abbreviation dictionary entry invokes different assumptions about how tointerpret the trailing period in the text than does a main dictionary entry. Since thefinal period cannot be part of a main dictionary entry key, it is always interpreted asend-of-sentence punctuation. A period following an abbreviation dictionary entry, onthe other hand, is ambiguous. It is only interpreted as end-of-sentence punctuation ifother appropriate conditions obtain (e.g., if it is followed by at least two spaces andan uppercase letter). For example, input (a) is interpreted as one sentence, while input(b) is interpreted as two sentences.Input(a) It rained 2 cm. on Monday.(b) On Sunday it rained 2 cm. On Monday, it was sunny.Root dictionaryThe root dictionary is used for ordinary words, like nouns (including proper names),verbs, or adjectives. Unlike the main and abbreviation dictionaries, lookups are notcase-sensitive. While you must enter a root into the dictionary in lowercase letters,the entry matches an input string regardless of case. For example, the pronunciationspecified in the translation still applies when the word begins with an uppercase letter(as the first word in a sentence, for example) or when it is spelled in all uppercaseletters for emphasis.The translation value of a root dictionary entry may be either a Symbolic PhoneticRepresentation (see “Symbolic Phonetic Representations” on page 5-55) or a word inordinary spelling. For example, if you want the word route to be pronounced with thevowel of out rather than the vowel of boot, you could enter the translation as the SPR\![rWt], or in letters as rout, which also produces the desired vowel.Fifth Edition, Update 26–62<strong>SpeechWorks</strong> Proprietary


Choosing a dictionary<strong>Speechify</strong> User’s <strong>Guide</strong>Beware of keys without vowelsDo not specify keys that consist entirely of consonants in the root dictionary; usemain dictionary instead.A root dictionary key cannot consist entirely of consonants, since <strong>Speechify</strong> spells outsequences of all consonants before root dictionary lookup occurs. An entry like thefollowing can be placed in the main dictionary instead:ng \![.1EG]Check language supplements for variationsThe use of the root dictionary varies among languages. For formatting requirementsand examples of valid root dictionary keys and translations, see the appropriate<strong>Speechify</strong> Language Supplement.Choosing a dictionaryThis section covers guidelines for determining which dictionary to use for particularentries.<strong>Speechify</strong> includes two kinds of text processing functions:❏❏determining how the input text should be readspecifying how words are pronouncedThe first kind of processing is discussed in greater detail in Chapter 3, and includesexpanding abbreviations and acronyms, reading digit sequences, and interpretingnon-alphanumeric characters such as punctuation and currency symbols. Forexample, specifying that the abbreviation “Fri.” should be read as “Friday” is differentfrom specifying whether the word “Friday” should be pronounced so that the lastvowel is the same as in the word “day” or the same as in the word “bee.” Thisdistinction is reflected in the dictionaries, as discussed in the following sections.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 26–63


User Dictionaries<strong>Speechify</strong> User’s <strong>Guide</strong>Main and abbreviation dictionaries vs. root dictionaryBoth the main and abbreviation dictionaries are used to specify how input text shouldbe read, while the root dictionary is used to specify how individual words should bepronounced. Word pronunciations can be specified by either an SPR (phoneticrepresentation) or another word (real or invented) that has the desired pronunciation.Because words are, with a few exceptions, pronounced the same whether they are inuppercase, lowercase, or mixed case letters, root dictionary lookup is not casesensitive.An entry that you add with a lowercase key applies to words in text whenthey are capitalized or uppercase. For example, if you specify that “friday” should bepronounced “frydi,” then "Friday" and “FRIDAY” are also pronounced “frydi.”On the other hand, main and abbreviation dictionary lookups are case-sensitive, sincethe reading of particular character strings can be affected by letter case. For example,the abbreviation “ft.” is read as “foot,” while “Ft.” is read as “Fort.” “ERA” might beread as “Equal Rights Amendment,” but “era” should still be read as “era.”Main dictionary vs. abbreviation dictionaryThe main dictionary can be thought of as a string replacement dictionary: with thefew exceptions outlined in the Main dictionary section above, it is used to replace anyinput string with any other input string. Use it when either the key or the translationcontains non-alphabetic characters. You can use it to specify the reading of e-mailaddresses, alphanumeric sequences such as “Win2K,” digit sequences, and the like, oreven to encode entire phrases or sentences in a single token.The abbreviation dictionary is used to expand what we normally think of asabbreviations: shortened forms of a word that may end in a period to signal that theyare abbreviations, and which expand into full words. Abbreviations are entered in aseparate dictionary because the final period is handled differently in abbreviationsthan in main dictionary entries (see “Interpretation of trailing periods in theabbreviation dictionary” on page 6-61).Fifth Edition, Update 26–64<strong>SpeechWorks</strong> Proprietary


Choosing a dictionary<strong>Speechify</strong> User’s <strong>Guide</strong>The following table shows some sample entries, along with the dictionary they shouldbe entered in.Key Translation DictionaryWin2K Windows 2000orWindows two thousand4x4 4 by 4orfour by fourmainmain50% half mainPhrase217 How may I help you? mainFAA Federal Aviation Administration mainf.a.a. free of all average abbreviationsundaysundior\![1sHndi]rootquay key rootDictionary interactionsThe order of lookups is main, abbreviation, root. Since the translation from onedictionary is looked up in subsequent dictionaries, dictionary entries can feed intoone another. For example, given the dictionary entries in the following table:Key Translation Dictionaryh/w hrs. per wk. mainhrs. hours abbreviationwk. week abbreviationhours \![1arz] rootThe input and output are as follows:Inputh/wPronunciation\![1arz] \![1pR] \![1wik]<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 26–65


User Dictionaries<strong>Speechify</strong> User’s <strong>Guide</strong>If the same key occurs in more than one dictionary, the order of lookups determineswhich entry is used. For example, given the following entries:Key Translation Dictionarywin Microsoft Windows mainwin windows abbreviationwin \![wIn] rootThe input and output are as follows:InputwinPronunciationmicrosoft windowsRemember, however, that the main and abbreviation dictionary lookups are casesensitive,while the root dictionary lookups are case-insensitive. Given the entriesshown above, only the root dictionary entry will apply to "Win":InputWinPronunciation\![wIn]File-based dictionariesThere are two ways to add entries to a user dictionary:❏❏At run time: use an API function to add entriesWhen the server initializes: load a set of entries from a file. This option may bemore convenient when the set of entries is large or changes infrequently. Usingthis facility, you can make changes to the entries in the file without having tochange or rebuild your application.Note that entries added during run time supersede entries from a file-baseddictionary.Details on adding dictionary entries at run time are found in“SWIttsAddDictionaryEntry( )” on page 7-73. This section provides details aboutfile-based dictionaries.Fifth Edition, Update 26–66<strong>SpeechWorks</strong> Proprietary


File-based dictionaries<strong>Speechify</strong> User’s <strong>Guide</strong>File-based dictionaries are loaded when you start the <strong>Speechify</strong> server. For eachdictionary type (main, root, or abbreviation in Western languages, mainext inJapanese) you can have one file containing a set of entries (a file-based dictionary) forthe language and one for the voice. Entries in language-specific dictionaries affect allvoices for a certain language. Entries in voice-specific dictionaries affect only onevoice.In general, language-specific dictionaries are used to overcome areas where <strong>Speechify</strong>’spronunciation rules do not produce correct pronunciations. This can be the case withproper names, names of corporations, etc. Voice-specific dictionaries are used lessoften and are mostly useful when a specific voice pronounces a word inconsistentlycompared to all other voices.Once the server has initialized, you can modify the loaded file-based dictionaries bycalls to SWIttsAddDictionaryEntry( ), SWIttsResetDictionary( ), andSWIttsDeleteDictionaryEntry( ). Thus, the API functions take precedence over thecontent of the file-based dictionaries. The language dictionary is loaded before thevoice dictionary, so the entries in the voice dictionary take precedence over the entriesin the language dictionary.For example, if your root file dictionary contains an entry for “cat,” that entry isloaded when you start the server. If during run time you useSWIttsDeleteDictionaryEntry( ) to delete “cat,” it is deleted until you either re-add“cat” with SWIttsAddDictionaryEntry( ) or you restart the server (to load it from thefile again). This behavior is consistent for all languages.Dictionary loadingThe <strong>Speechify</strong> engine attempts to load the dictionaries each time the application callsSWIttsOpenPort( ). Subsequent changes made via API functions such asSWIttsAddDictionaryEntry( ) can override the contents of the default dictionaries. Ifan error occurs while loading a dictionary, the server generates an error message andcontinues to initialize. The server logs a diagnostic message if a dictionary file issuccessfully loaded or if a dictionary file is not found.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 26–67


User Dictionaries<strong>Speechify</strong> User’s <strong>Guide</strong>File names and locationsTo use language-specific dictionaries, create a text file for one or more of thedictionary types described above and place it in the /directory. The name of the text file is xx-XX-type.dic, e.g.:C:\Program Files\<strong>Speechify</strong>\en-US\en-US-main.dic\en-US-root.dic\en-US-abbreviation.dicTo use voice-specific dictionaries, place the files in the // directory. The name of the file is name-type.dic, e.g.:C:\Program Files\<strong>Speechify</strong>\en-US\rick\rick-main.dic\rick\rick-root.dic\rick\rick-abbreviation.dicFile formatA file-based dictionary is a text file that contains one or more dictionary entries.There is only one entry per line and each entry has the following format:❏❏❏For Western European languages, use the Latin-1/ISO-8859-1 encoding. ForJapanese, use the Shift-JIS encoding.The key and translation values should be equivalent to what you would pass tothe SWIttsAddDictionaryEntry( ) function; therefore the rules for whatconstitutes legal values for these functions apply. See the chapter on userdictionaries in the appropriate <strong>Speechify</strong> Language Supplement to get a list ofthese rules.The new line character must be in the format of the server operating system(i.e., on Windows: CR+LF, on Unix: LF).For example, you may have a file C:\Program Files\<strong>Speechify</strong>\en-US\en-US-main.dicthat contains the following entry:test\![hx1lo]If you submit the text “this is a test” to <strong>Speechify</strong>, it says, “this is a hello.”Fifth Edition, Update 26–68<strong>SpeechWorks</strong> Proprietary


CHAPTER 7API ReferenceThis chapter describes the API functions in alphabetical order. All API functionprototypes, types, error codes, and constants are located in the header file SWItts.h.In this chapter• Calling convention on page 7-70• Server’s preferred character set on page 7-70• Result codes on page 7-71• SWIttsAddDictionaryEntry( ) on page 7-73• SWIttsCallback( ) on page 7-75• SWIttsClosePort( ) on page 7-80• SWIttsDeleteDictionaryEntry( ) on page 7-81• SWIttsGetDictionaryKeys( ) on page 7-83• SWIttsGetParameter( ) on page 7-86• SWIttsInit( ) on page 7-88• SWIttsLookupDictionaryEntry( ) on page 7-89• SWIttsOpenPort( ) on page 7-91• SWIttsPing( ) on page 7-92• SWIttsResetDictionary( ) on page 7-93• SWIttsSetParameter( ) on page 7-94• SWIttsSpeak( ) on page 7-96• SWIttsStop( ) on page 7-98• SWIttsTerm( ) on page 7-99<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–69


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>Calling conventionAll <strong>Speechify</strong> API functions use the stdcall (or Pascal) calling convention. The headerfiles contain the appropriate compiler directives to ensure correct compilation. Whenwriting callback functions, be sure to use the correct calling convention.The calling convention is dependent on the operating system, and is defined in theSWItts.h header file.Under NT:#define SWIAPI __stdcallUnder UNIX:#define SWIAPIServer’s preferred character setThe server’s preferred character set is ISO-8859-1 (also known as Latin-1). All stringspassed to the server by calls to SWIttsAddDictionaryEntry( ),SWIttsDeleteDictionaryEntry( ), SWIttsLookupDictionaryEntry( ), andSWIttsSpeak( ) are converted to ISO-8859-1 before they are processed internally.Consequently, in <strong>Speechify</strong> 2.1, text entered into these functions must berepresentable in ISO-8859-1 even if it is encoded in another character set supportedby <strong>Speechify</strong>. (Japanese is an exception, because it does not support ISO-8859-1. TheJapanese server’s preferred character set is Shift-JIS.)Calls which return strings, such as SWIttsLookupDictionaryEntry( ) andSWIttsGetDictionaryKeys( ), return them in ISO-8859-1. Bookmark IDs areconverted to 0-terminated wide character strings before they are returned to the user.Fifth Edition, Update 27–70<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>Result codesThe following result codes are defined in the enum SWIttsResult in SWItts.h.CodeSWItts_SUCCESSSWItts_ALREADY_EXECUTING_APISWItts_CONNECT_ERRORSWItts_ENGINE_ERRORDescriptionThe API function completed successfully.This API function cannot be executed because another APIfunction is in progress on this port on another thread.The client could not connect to the server.An internal error occurred in the TTS engine on the server.SWItts_ERROR_PORT_ALREADY_STOPPING SWIttsStop( ) was called when the port was already in the processof stopping.SWItts_ERROR_STOP_NOT_SPEAKING SWIttsStop( ) was called when the port was not speaking.SWItts_FATAL_EXCEPTION(NT only.) A crash occurred within the client library. A crash lognamed ALTmon.dmp might have been generated in the <strong>Speechify</strong>SDK directory. Please send it to <strong>Speechify</strong> technical support. Thisis an unrecoverable error and you should close the application.SWItts_HOST_NOT_FOUNDCould not resolve the host name or IP address.SWItts_INVALID_PARAMETEROne of the parameters passed to the function was invalid.SWItts_INVALID_PORTThe port handle passed is not a valid port handle.SWItts_MUST_BE_IDLEThis API function can only be called if the TTS port is idle.SWItts_NO_ENGINEThe engine on this port is no longer available.SWItts_NO_MEMORYAn attempt to allocate memory failed.SWItts_NO_MUTEXAn attempt to create a new mutex failed.SWItts_NO_THREADAn attempt to create a new thread failed.SWItts_NOT_EXECUTING_APIAn internal error. Notify <strong>SpeechWorks</strong> technical support if you seethis result code.SWItts_PORT_ALREADY_SHUT_DOWN The port is already closed. You cannot invoke SWIttsClosePort( )on a port that has been closed.SWItts_PORT_ALREADY_SHUTTING_DOWNSWItts_PORT_SHUTTING_DOWNSWItts_PROTOCOL_ERRORSWItts_READ_ONLYSWItts_SERVER_ERRORSWItts_SOCKET_ERRORSWItts_SSML_PARSE_ERRORSWItts_UNINITIALIZEDSWIttsClosePort( ) was called when the port was already beingclosed.A command could not be executed because the port is shuttingdown.An error in the client/server communication protocol occurred.SWIttsSetParameter( ) was called with a read-only parameter.An error occurred on the server.A sockets error occurred.Could not parse SSML text.The TTS client is not initialized.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–71


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>CodeDescriptionSWItts_UNKNOWN_CHARSETCharacter set is invalid or unsupported.SWItts_UPDATE_DICT_PARTIAL_SUCCESS SWIttsAddDictionaryEntry( ) or SWIttsDeleteDictionaryEntry( )was called with one or more invalid entries.SWItts_WINSOCK_FAILEDWinSock initialization failed. (NT Only.)This is the full set of codes that the API functions return. No functions return all thecodes. SWItts_SUCCESS and SWItts_FATAL_EXCEPTION are the only codesthat are common to all functions. All functions except SWIttsInit( ) return SWItts_UNINITIALIZED if SWIttsInit( ) was not the first function called.Fifth Edition, Update 27–72<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsAddDictionaryEntry( )Mode: SynchronousAdds a list of dictionary entries to the specified dictionary.SWIttsResult SWIAPI SWIttsAddDictionaryEntry (SWIttsPort ttsPortExt,const char *dictionaryType,const char *charset,unsigned int numEntries,SWIttsDictionaryEntry *entries);ParameterDescriptionttsPort Port handle returned by SWIttsOpenPort( ).dictionaryType The type of dictionary to add entries to. The following types aresupported as US-ASCII strings:❏ main❏ root❏ abbreviationSee “User Dictionaries” on page 6-59 for information aboutthese dictionary types.charsetThe charset used to encode the content of individual entries.For Western languages, the default charset (if NULL is passed in)is ISO-8859-1. The following charsets are supported as caseinsensitive,US-ASCII strings:❏ US-ASCII (acceptable synonym: ASCII)❏ ISO-8859-1❏ UTF-8❏ UTF-16For Japanese, the default charset is Shift-JIS. The followingcharsets are supported:❏ Shift-JIS❏ EUC❏ UTF-8❏ UTF-16numEntriesNumber of entries being passed to the call.entriesArray of dictionary entries to be added to the dictionary.SWIttsDictionaryEntry is the data structure encapsulating dictionary entry data. Inthe context of SWIttsAddDictionaryEntry( ) it is used as follows:typedef struct {const unsigned char *key;unsigned int keyLengthBytes;const unsigned char *translation;<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–73


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>unsigned int translationLengthBytes;} SWIttsDictionaryEntry;MemberkeykeyLengthBytestranslationtranslationLengthBytesDescriptionThe key to be added to the dictionary. It must be in thecharacter set specified by the charset parameter in the call toSWIttsAddDictionaryEntry( ). See “User Dictionaries” on page6-59 for information about the format of the key.The length of key in bytes.The translation to be added to the dictionary. It must be in thecharacter set specified by the charset parameter in the call toSWIttsAddDictionaryEntry( ). See “User Dictionaries” on page6-59 for information about the format of the translation.The length of translation in bytes.NotesSWIttsAddDictionaryEntry( ) is not an atomic operation. If one or more of theentries fail to be added, for whatever reason, the rest may still take place. SWItts_UPDATE_DICT_PARTIAL_SUCCESS is returned and the entries that failed to beadded are logged on the server. See “<strong>Speechify</strong> Logging” on page A-109 for moreinformation about logging.If SWIttsAddDictionaryEntry( ) is called with a key that already exists in thedictionary, the existing translation is replaced with the new translation. At present,there can only be one translation for each key in the dictionary.See also“SWIttsDeleteDictionaryEntry( )” on page 7-81“SWIttsGetDictionaryKeys( )” on page 7-83“SWIttsLookupDictionaryEntry( )” on page 7-89“SWIttsResetDictionary( )” on page 7-93Fifth Edition, Update 27–74<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsCallback( )Mode: Synchronous. IMPORTANT: You must not block/wait in this function.User-supplied handler for data returned by the synthesis server.typedef SWIttsResult (SWIAPI SWIttsCallback) (SWIttsPort ttsPort,SWItts_cbStatus status,void *data,void *userData);ParameterttsPortstatusdatauserDataDescriptionThe port handle returned by SWIttsOpenPort( ) or -1 if the callback iscalled from within SWIttsInit( ), SWIttsOpenPort( ), or SWIttsTerm( ).These are enumerated types that are used to inform the callback function ofthe status of the void *data variable. See table below.Pointer to a structure containing data generated by the server. This pointeris declared as void * because the exact type varies. The status parameterindicates the exact type to which this pointer should be cast.This is a void * in which the application programmer may include anyinformation that he wishes to be passed back to the callback function. Atypical example is a thread ID that is meaningful to the application. TheuserData variable is a value you pass to SWIttsOpenPort( ).This table lists the values of SWItts_cbStatus:Status codeDescriptionSWitts_cbAudio Audio data packet. The data structure is a SWIttsAudioPacketshown below.SWItts_cbBookmark User-defined bookmark. The data structure is aSWIttsBookMark as shown below.SWItts_cbDiagnostic Diagnostic message. The data structure is aSWIttsMessagePacket as shown below. You only receive thismessage if the SWITTSLOGDIAG environment variable isdefined. See “<strong>Speechify</strong> Logging” on page A-109 for moreinformation about logging.SWItts_cbEnd End of audio packets from the current SWIttsSpeak( )command. The data is a NULL pointer.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–75


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>Status codeSWItts_cbErrorSWItts_cbLogErrorSWItts_cbPhonememarkSWItts_cbPingSWItts_cbPortClosedSWItts_cbStartSWItts_cbStoppedSWItts_cbWordmarkDescriptionAsynchronous error message. The data structure is aSWIttsMessagePacket which contains error information, and isdescribed below. This message is received if an asynchronousAPI function encounters an error when trying to perform anasynchronous operation such as reading from the network. Ifyou receive this message, consider it fatal for that port. You arefree to call SWItts functions from the callback but you shouldconsider the receipt of SWItts_cbError fatal and callSWIttsClosePort( ) to properly clean up the port.Error message. The data structure is a SWIttsMessagePacketwhich contains error information, and is described below. Thecallback may receive the cbLogError and cbDiagnostic events atanytime, whether inside a synchronous or asynchronousfunction. The user is not allowed to call any SWItts function atthis time. If you receive this message, log it to a file, console,etc., and continue execution.Represents information about a phoneme boundary in theinput text. The data structure is a SWIttsPhonemeMark shownbelow.A reply was received from the TTS port on the server after a callto SWIttsPing( ). The data is a NULL pointer.The port was successfully closed after a call toSWIttsClosePort( ). The data is a NULL pointer.Represents the commencement of audio packets from thecurrent SWIttsSpeak( ) command. The data is a NULL pointer.SWIttsStop( ) has been called and recognized. There is noSWItts_cbEnd notification. The data is a NULL pointer.Represents information about a word boundary in the inputtext. The data structure is a SWIttsWordMark shown belowStructuresThe audio packet data structure is described here:typedef struct {void *samples;unsigned int numBytes;unsigned int firstSampleNumber;} SWIttsAudioPacket;Fifth Edition, Update 27–76<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>MembersamplesnumBytesfirstSampleNumberDescriptionThe buffer of speech samples. You must copy the data out of this bufferbefore the callback returns as the client library may free it or overwrite thecontents with new samples.The number of bytes in the buffer. This number of bytes may be largerthan the number of samples, e.g., if you’ve chosen a sample format of 16-bit linear, the number of bytes would be twice the number of samples.The accumulated number of samples in the current SWIttsSpeak( ) call.The first packet has a sample number of zero.The message packet data structure is described here:typedef struct {time_t messageTime;unsigned short messageTimeMs;unsigned int msgID;unsigned int numKeys;const wchar_t **infoKeys;const wchar_t **infoValues;const wchar_t *defaultMessage} SWIttsMessagePacket;MembermessageTimemessageTimeMsmsgIDnumKeysDescriptionThe absolute time at which the message was generated.An adjustment to messageTime to allow millisecond accuracy.An unique identifier. A value of 0 is used for SWItts_cbDiagnostic messages.The number of key/value pairs.infoKeys/infoValues Additional information about message, in key/value pairs of 0-terminated wide character string text. These members are onlyvalid for SWItts_cbError and SWItts_cbLogError messages.defaultMessage A pre-formatted 0-terminated wide character message. Thismember is only valid for SWItts_cbDiagnostic messages.See “<strong>Speechify</strong> Logging” on page A-109 for information about how to map msgIDinto a meaningful message.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–77


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>The bookmark data structure is described here:typedef struct {const wchar_t *ID;unsigned int sampleNumber;} SWIttsBookMark;MemberIDsampleNumberDescriptionA pointer to the bookmark 0-terminated wide character string in the server’spreferred character set. It corresponds to the user-defined string specified inthe bookmark tag.See “Server’s preferred character set” on page 7-70 for more information.The accumulated number of samples in the current SWIttsSpeak( ) call. Abookmark placed at the beginning of a string has a timestamp of 0.The wordmark data structure is described here:typedef struct {unsigned int sampleNumber;unsigned int offset;unsigned int length;} SWIttsWordMark;MembersampleNumberoffsetlengthDescriptionThe sample number correlating to the beginning of this word.The index into the input text of the first character where thisword begins. Starts at zero.The length of the word in characters not bytes.The phoneme-mark data structure is described here:typedef struct {unsigned int sampleNumber;const char *name;unsigned int duration;unsigned int stress;} SWIttsPhonemeMark;MembersampleNumbernameDescriptionThe sample number correlating to the beginning of this phoneme.The name of the phoneme as a NULL-terminated US-ASCII string.(The phoneme names are described in the <strong>Speechify</strong> supplements foreach language.)Fifth Edition, Update 27–78<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>MemberdurationstressDescriptionThe length of the phoneme in samples.Indicates whether the phoneme was stressed or not. A 0 indicates nostress, a 1 indicates primary stress, and a 2 indicates secondary stress.NotesThe callback function is user-defined but is called by the SWItts library, i.e., the userwrites the code for the callback function, and a pointer to it is passed into theSWIttsOpenPort( ) function. The client calls this function as needed when dataarrives from the <strong>Speechify</strong> server. It is called from a thread created for the port duringthe SWIttsOpenPort( ) function.The SWItts_cbStatus variable indicates the reason for invoking the callback and alsowhat, if any, type of data is being passed. The SWIttsResult code returned by thecallback is not currently interpreted by <strong>Speechify</strong>, but may be in the future, thus thecallback function should always return SWItts_SUCCESS.NOTEBecause the callback function is user-defined, the efficiency of its code has adirect impact on system performance – if it is inefficient, it may hinder theclient’s ability to service the network connection and data may be lost.The <strong>Speechify</strong> server does not “throttle” its transmission of audio data from it to theclient; it sends the audio as fast as it can. This means that sending a large amount oftext to the SWIttsSpeak( ) function may cause the server to send back a large amountof audio before the application needs to send it to an audio device or telephony card.On average, expect about one second of audio for every ten characters of text input.For example, if you pass 10 KB of text to the SWIttsSpeak( ) function, your callbackmay receive about 1000 seconds of audio samples. That is 8 MB of data if you choseto receive 8-bit µ-law samples and 16 MB of data if you chose to receive 16-bit linearsamples. This amount of text may require more buffering than you want to allow for,especially in a scenario with multiple TTS ports.A common technique to avoid a buffering load is to divide your input text intopieces, sending a new speak request for one piece when the previous is finishedplaying. For best results, you should divide the text at sentence boundaries such asperiods.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–79


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsClosePort( )Mode: AsynchronousCloses a TTS port which frees all resources and closes all communication with theserver.SWIttsResult SWIAPI SWIttsClosePort (SWIttsPort ttsPort);ParameterDescriptionttsPort Port handle returned by SWIttsOpenPort( ).After closing, SWIttsClosePort( ) sends a SWItts_cbPortClosed message to thecallback upon successful closing of the port. Once a port is closed, you cannot passthat port handle to any SWItts function.See also“SWIttsOpenPort( )” on page 7-91Fifth Edition, Update 27–80<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsDeleteDictionaryEntry( )Mode: SynchronousDeletes a list of dictionary entries from a dictionary.SWIttsResult SWIAPI SWIttsDeleteDictionaryEntry (SWIttsPort ttsPort,const char *dictionaryType,const char *charset,unsigned int numEntries,SWIttsDictionaryEntry *entries);ParameterDescriptionttsPort Port handle returned by SWIttsOpenPort( ).dictionaryType The type of dictionary to delete entries from. The following typesare supported as US-ASCII strings:❏ main❏ root❏ abbreviationSee “User Dictionaries” on page 6-59 for information about thedifferent dictionary types.charsetThe charset used to encode the content of individual entries.For Western languages, the default charset (if NULL is passed in)is ISO-8859-1. The following charsets are supported as caseinsensitive,US-ASCII strings:❏ US-ASCII (acceptable synonym: ASCII)❏ ISO-8859-1❏ UTF-8❏ UTF-16For Japanese, the default charset is Shift-JIS. The followingcharsets are supported:❏ Shift-JIS❏ EUC❏ UTF-8❏ UTF-16numEntriesNumber of entries being passed to the call.entriesAn array of dictionary entries to be deleted from the dictionary.The dictionary entry data structure in the context of SWIttsDeleteDictonaryEntry( )is explained here:typedef struct {const unsigned char *key;unsigned int keyLengthBytes;const unsigned char *translation;<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–81


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>unsigned int translationLengthBytes;} SWIttsDictionaryEntry;MemberDescriptionkeyThe key to be deleted from the dictionary. It must be in thecharacter set specified by the charset parameter in the call toSWIttsDeleteDictionaryEntry( ). The associated translation isalso be deleted.See “User Dictionaries” on page 6-59 for information about theformat of the key.keyLengthBytes The length of key in bytes.translationNot used in this call. It should be set to NULL.translationLengthBytes Not used in this call. It should be set to 0.NotesSWIttsDeleteDictionaryEntry( ) is not an atomic operation. If one or more of theentries fail to be deleted, for whatever reason, the rest may still take place. SWItts_UPDATE_DICT_PARTIAL_SUCCESS is returned and the entries that failed to bedeleted are logged on the server. See “<strong>Speechify</strong> Logging” on page A-109 for moreinformation about logging.Requesting deletion of non-existent entries does not return an error.See also“SWIttsAddDictionaryEntry( )” on page 7-73“SWIttsGetDictionaryKeys( )” on page 7-83“SWIttsLookupDictionaryEntry( )” on page 7-89“SWIttsResetDictionary( )” on page 7-93Fifth Edition, Update 27–82<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsGetDictionaryKeys( )Mode: SynchronousEnumerates dictionary keys from the specified dictionary.SWIttsResult SWIAPI SWIttsGetDictionaryKeys(SWIttsPort ttsPort,const char *dictionaryType,SWIttsDictionaryPosition *startingPosition,unsigned int *numkeys,SWIttsDictionaryEntry **keys,const char *reserved);ParameterDescriptionttsPort The port handle returned by SWIttsOpenPort( ).dictionaryType The type of dictionary to enumerate entries from. The following typesare supported as US-ASCII strings:❏ main❏ root❏ abbreviationSee “User Dictionaries” on page 6-59 for information about thedifferent dictionary types.startingPosition A token encoding the state of the enumeration. See below for adescription.numKeys On call, the maximum number of keys to be retrieved (use a very largenumber e.g., UINT_MAX, to get all of them), on return the numberof keys actually retrieved.keysPointer to an array of keys retrieved. This array is valid until the nextcall to SWIttsGetDictionaryKeys( ).reservedReserved for future use (should be NULL).The dictionary entry data structure in the context of SWIttsGetDictionaryKeys( ) isexplained here:typedef struct {const unsigned char *key;unsigned int keyLengthBytes;const unsigned char *translation;unsigned int translationLengthBytes;} SWIttsDictionaryEntry;<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–83


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>MemberkeykeyLengthBytestranslationtranslationLengthBytesDescriptionA key retrieved from the dictionary in the server’spreferred character set. The key is not NULL-terminated.See “Server’s preferred character set” on page 7-70 formore information.The length of the key in bytes.Not used in this call. It is set to NULL upon return.Not used in this call. It is set to 0 upon return.NotesEnumerating keys from a dictionary requires a pointer to a SWIttsDictionaryPositiontoken. To start enumeration from the beginning, you need to create aSWIttsDictionaryPosition token, initialize it to 0, and pass a pointer to it toSWIttsGetDictionaryKeys( ). Upon return, check the token’s value.If the token is 0, there are no more keys left in the dictionary. If it is non-zero, callSWIttsGetDictionaryKeys( ) again with a pointer to the token obtained from the lastSWIttsGetDictionaryKeys( ) call. Continue this until, upon return fromSWIttsGetDictionaryKeys( ), the position token has a value of 0.Currently, you are allowed only one outstanding enumeration at a time. To begin anew enumeration, set your SWIttsDictionaryPosition token to a value of 0.ExampleThe following C++ code enumerates all dictionary entries, two at a time, from the“main” dictionary:SWIttsResult tts_rc = SWItts_SUCCESS;SWIttsPort ttsPort;…SWIttsDictionaryEntry *entries;SWIttsDictionaryPosition pos = 0;unsigned int numEntries = 2;// Retrieve two keys at a time.do{tts_rc = SWIttsGetDictionaryKeys(ttsPort, 7, &pos,&numEntries, &entries, NULL);Fifth Edition, Update 27–84<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>if (tts_rc == SWItts_SUCCESS){for (unsigned int i = 0; i < numEntries; i++){char *key = new char[entries[i].keyLengthBytes + 1];memcpy(key, entries[i].key, entries[i].keyLengthBytes);key[entries[i].keyLengthBytes] = '\0';printf("%s\n", key);delete []key;}numEntries = 2;}} while (tts_rc == SWItts_SUCCESS && pos != 0);See also“SWIttsAddDictionaryEntry( )” on page 7-73“SWIttsDeleteDictionaryEntry( )” on page 7-81“SWIttsLookupDictionaryEntry( )” on page 7-89“SWIttsResetDictionary( )” on page 7-93<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–85


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsGetParameter( )Mode: SynchronousRetrieves the value of a parameter from the server.SWIttsResult SWIAPI SWIttsGetParameter (SWIttsPort ttsPort,const char *name,char *value);Parameter DescriptionttsPort The port handle returned by SWIttsOpenPort( ).nameThe name of the parameter to retrieve.valueTakes a preallocated buffer of size SWITTS_MAXVAL_SIZE.The following table describes the parameters that can be retrieved. All parametershave a default value and certain parameters are read-only.Name Possible values Default Read only Descriptiontts.audio.rate 33–300 100 no Port-specific speaking rate of thesynthesized text as a percentage of thedefault rate.tts.audio.volume 0–100 100 no Port-specific volume of synthesizedspeech as a percentage of the defaultvolume: 100 means maximum possiblewithout distortion and 0 meanssilence.tts.audioformat.encoding ulaw, alaw, linear server a yes Encoding method for audio generatedduring synthesis.This value can be set via the mimetype.tts.audioformat.mimetypeaudio/basicaudio/x-alaw-basicaudio/L16;rate=8000audio/L16;rate=16000server no b The audio format of the server:❏ audio/basic corresponds to 8 kHz,8-bit µ-law;❏ audio/x-alaw-basic corresponds to8 kHz, 8-bit A-law;❏ audio/L16;rate=8000 correspondsto 8 kHz, 16-bit linear;❏ audio/L16;rate=16000 correspondsto 16 kHz, 16-bit linear.All other values generate a SWItts_INVALID_PARAM return code.In all cases, audio data is returned innetwork byte order.Fifth Edition, Update 27–86<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>Name Possible values Default Read only Descriptiontts.audioformat.samplerate 8000, 16000 server yes Audio sampling rate in Hz.This value can be set via the mimetype.tts.audioformat.width 8, 16 server yes Size of individual audio sample in bits.This value can be set via the mimetype.tts.client.versionCurrent <strong>Speechify</strong>version numberyes The returned value is a string of theform major.minor.maintenance. Forexample, 2.1.0 or 2.0.4.This parameter reflects the clientversion, and can be retrieved afterSWIttsInit( ) is called but beforeSWIttsOpenPort( ) is called. UseSWITTS_INVALID_PORT for thefirst argument toSWIttsGetParameter( ).tts.marks.phoneme true, false false no Controls whether phoneme marks arereported to the client.tts.marks.word true, false false no Controls whether wordmarks arereported to the client.tts.network.packetsize 1K, 2K, 4K, 8K,1MTU, 2MTU,4MTU4K no Size of the data packet sent to thecallback in bytes or MTUs.tts.reset none none no Command which causes all parameterscontrollable via SWIttsSetParameter( )to revert to their default values; thevalue is ignored.tts.voice.gender male, female server yes Synthesis voice gender.tts.voice.language server server yes Synthesis language.tts.voice.name server server yes Unique name identifying the voice.a. Possible and default values indicated as “server” are determined by the particular server instance to which the application connects.b. tts.audioformat.mimetype values may be switched between audio/basic, audio/x-alaw-basic, and audio/L16;rate=8000 if theserver has been instantiated with the 8 kHz voice database. If the server is instantiated with the 16 kHz voice database, thisparameter has the read-only value of audio/L16:rate=16000See also“SWIttsSetParameter( )” on page 7-94<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–87


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsInit( )Mode: SynchronousInitializes the client library so that it is ready to open ports.SWIttsResult SWIAPI SWIttsInit (SWIttsCallback *callback,SWIttsCallback *userData);ParametercallbackuserDataDescriptionA pointer to a callback function that may receive SWItts_cbLogErrorand/or SWItts_cbDiagnostic messages during the SWIttsInit( ) call. Ifthis callback is called, the ttsPort parameter is –1. This may be the samecallback that is passed to SWIttsOpenPort( ) or SWIttsTerm( ).User information passed back to callback. It is not interpreted ormodified in any way by the client.NOTEThis must be the first API function called, and it should only be called onceper process, not once per call to SWIttsOpenPort( ).See also“SWIttsOpenPort( )” on page 7-91“SWIttsTerm( )” on page 7-99Fifth Edition, Update 27–88<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsLookupDictionaryEntry( )Mode: SynchronousRetrieves the translation for the given key from the specified dictionary.SWIttsResult SWIAPI SWIttsLookupDictionaryEntry(SWIttsPort ttsPort,const char *dictionaryType,const unsigned char *key,const char *charset,unsigned int keyLengthBytes,unsigned int *numEntries,SWIttsDictionaryEntry **entries);ParameterDescriptionttsPort The port handle returned by SWIttsOpenPort( ).dictionaryType The type of dictionary to retrieve the translation from. The followingtypes are supported as US-ASCII strings:❏ main❏ root❏ abbreviationSee “User Dictionaries” on page 6-59 for information about thedifferent dictionary types.keyThe key whose translation is to be retrieved. See “User Dictionaries”on page 6-59 for information about the format of the key.charsetThe charset used to encode the content of individual entries.For Western languages, the default charset (if NULL is passed in) isISO-8859-1. The following charsets are supported as case-insensitive,US-ASCII strings:❏ US-ASCII (acceptable synonym: ASCII)❏ ISO-8859-1❏ UTF-8❏ UTF-16For Japanese, the default charset is Shift-JIS. The following charsetsare supported:❏ Shift-JIS❏ EUC❏ UTF-8❏ UTF-16keyLengthBytes The length of the key in bytes.numEntries The number of translations retrieved.entriesPointer to an array of the translations retrieved. This array is validuntil the next call to SWIttsLookupDictionaryEntry( ). If the datatherein needs to be preserved, the application must copy it into aseparate memory location.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–89


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>The dictionary entry data structure in the context ofSWIttsLookupDictionaryEntry( ) is explained here:typedef struct {const unsigned char *key;unsigned int keyLengthBytes;const unsigned char *translation;unsigned int translationLengthBytes;} SWIttsDictionaryEntry;MemberDescriptionkeyNot used in this call. It is set to NULL upon return.keyLengthBytes Not used in this call. It is set to 0 upon return.translation A translation for the key passed to SWIttsLookupDictionaryEntry( )in the server’s preferred character set. The translation is not NULLterminated.See “Server’s preferred character set” on page 7-70 formore information.translationLengthBytes The length of the translation in bytes.At present, there can only be one translation for each key in the dictionary. Thismeans that numEntries is always 1 upon return from the call.Calling SWIttsLookupDictionaryEntry( ) with a key that is not in the dictionarydoes not return an error. It does, however, set numEntries to 0.See also“SWIttsAddDictionaryEntry( )” on page 7-73“SWIttsDeleteDictionaryEntry( )” on page 7-81“SWIttsGetDictionaryKeys( )” on page 7-83“SWIttsResetDictionary( )” on page 7-93Fifth Edition, Update 27–90<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsOpenPort( )Mode: SynchronousOpens and connects to a <strong>Speechify</strong> server port. This should be called afterSWIttsInit( ).SWIttsResult SWIAPI SWIttsOpenPort (SWIttsPort *ttsPort,char *hostAddr,unsigned short connectionPort,SWIttsCallback *callback,void *userData);ParameterttsPorthostAddrconnectionPortcallbackuserDataDescriptionAddress of a location to place the new port’s handle.A string containing the host server’s name or IP address.The port to connect to on the server. Note that this parameter refers to asockets port and not a TTS port. This parameter is always the same portas the one specified when you start the server with the --ports switch.A pointer to a callback function that receives audio buffers and othernotifications when the server sends data to the client. If an error occursduring the call to SWIttsOpenPort( ), the callback is called with aSWItts_cbLogError message and a ttsPort of –1.User information passed back to callback.See also“SWIttsClosePort( )” on page 7-80“SWIttsInit( )” on page 7-88<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–91


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsPing( )Mode: AsynchronousPerforms a basic test of server response.SWIttsResult SWIAPI SWIttsPing (SWIttsPort ttsPort);ParameterDescriptionttsPort The port handle returned by SWIttsOpenPort( )More than just an IP-level “ping,” this verifies that the instance of the TTS server forthis port is alive and accepting requests.A return code of SWItts_SUCCESS means that the ping has been successfully sent tothe TTS port. When the server replies, the client calls the callback for this port with astatus of SWItts_cbPing. If this function returns an error code, shut down the portwith the SWIttsClosePort( ) call. The amount of time you should wait for theSWItts_cbPing message in your callback varies depending on the load on your server;a good rule of thumb is to wait about five seconds for a ping reply before assumingthe port is dead.See also“SWIttsClosePort( )” on page 7-80“SWIttsOpenPort( )” on page 7-91Fifth Edition, Update 27–92<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsResetDictionary( )Mode: SynchronousRemoves all entries from the specified dictionary.SWIttsResult SWIAPI SWIttsResetUserDictionary (SWIttsPort ttsPort,const char *dictionaryType);ParameterDescriptionttsPort Port handle returned by SWIttsOpenPort( ).dictionaryType The type of dictionary to reset. The following types are supported asUS-ASCII strings:mainrootabbreviationSee “User Dictionaries” on page 6-59 for information about thedifferent dictionary types.See also“SWIttsAddDictionaryEntry( )” on page 7-73“SWIttsDeleteDictionaryEntry( )” on page 7-81“SWIttsGetDictionaryKeys( )” on page 7-83“SWIttsLookupDictionaryEntry( )” on page 7-89<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–93


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsSetParameter( )Mode: SynchronousSends a parameter to the server.SWIttsResult SWIAPI SWIttsSetParameter (SWIttsPort ttsPort,const char *name,const char *value);Parameter DescriptionttsPort The port handle returned by SWIttsOpenPort( ).nameA parameter name represented as a NULL-terminated US-ASCII string.valueA parameter value represented as a NULL-terminated US-ASCII string.NotesIf SWIttsSetParameter( ) returns an error, the parameter is not changed. Setting aparameter is not a global operation, it only affects the TTS port passed to the call.The following table describes the parameters that can be set. All parameters have adefault value. SWIttsGetParameter( ) lists the read-only parameters. If you try to set aread-only parameter, SWIttsSetParameter( ) returns SWItts_READ_ONLY.Name Possible values Default Descriptiontts.audio.rate 33–300 100 Port-specific speaking rate of the synthesized text asa percentage of the default rate.tts.audio.volume 0–100 100 Port-specific volume of synthesized speech as apercentage of the default volume: 100 meansmaximum possible without distortion and 0 meanssilence.tts.audioformat.mimetype audio/basicserver The audio format of the server:audio/L16;rate=16000 a A-law;audio/x-alaw-basic❏ audio/basic corresponds to 8 kHz, 8-bit µ-law;audio/L16;rate=8000❏ audio/x-alaw-basic corresponds to 8 kHz, 8-bit❏ audio/L16;rate=8000 corresponds to 8 kHz, 16-bit linear;❏ audio/L16;rate=16000 corresponds to 16 kHz,16-bit linear.All other values generate a SWItts_INVALID_PARAM return code.In all cases, audio data is returned in network byteorder.Fifth Edition, Update 27–94<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>Name Possible values Default Descriptiontts.marks.phoneme true, false false Controls whether phoneme marks are reported tothe client.tts.marks.word true, false false Controls whether wordmarks are reported to theclient.tts.network.packetsize1K2K4K8K1MTU2MTU4MTU4KSize of the data packet sent to the callback in bytesor MTUs.tts.reset none none Command which causes all parameters controllablevia SWIttsSetParameter( ) to revert to their defaultvalues; the value is ignored.a. tts.audioformat.mimetype values may be switched between audio/basic, audio/x-alaw-basic, and audio/L16;rate=8000 if theserver has been instantiated with the 8 kHz voice database. If the server is instantiated with the 16 kHz voice database, thisparameter has the read-only value of audio/L16:rate=16000b. Possible and default values indicated as “server” are determined by the particular server instance to which the application connects.See also“SWIttsGetParameter( )” on page 7-86<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–95


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsSpeak( )Mode: AsynchronousSends a text string to be synthesized. Call this function for every text string tosynthesize.SWIttsResult SWIAPI SWIttsSpeak (SWIttsPort ttsPort,const unsigned char *text,unsigned int lengthBytes,const char *content_type);ParameterDescriptionttsPort The port handle returned by SWIttsOpenPort( ).textThe text to be synthesized: an array of bytes representing a string in agiven character set.lengthBytes The length of the text array in bytes; note that this means any NULL inthe text is treated as just another character.content_type Description of the input text according to the MIME standard (per RFC-2045 Sec. 5.1 and RFC 2046).Default (if set to NULL): text/plain;charset=iso-8859-1. (You can changethe default using the --default_contenttype server parameter described onpage 1-13.)NotesThe content types that are supported are text/* and application/synthesis+ssml.Any subtype may be used with “text”. However, only the subtype “xml” is treatedspecially: the text is assumed to be in SSML and if it is not, an error is returned. Allother subtypes are treated as “plain”.The application/synthesis+ssml content type is used to indicate SSML content,which is parsed accordingly. If SSML input is not signaled via the content typeparameter, it is pronounced as plain text.The only meaningful content_type parameter is “charset,” which is case-insensitive.(See www.iana.org/assignments/character-sets for more details.) All otherparameters are ignored. If “charset” is not specified, it is assumed to be ISO-8859-1.An example of a valid content type:text/plain;charset=iso-8859-1Fifth Edition, Update 27–96<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>The following charsets are supported for Western languages:❏❏❏❏❏ISO-8859-1 (default)US-ASCII (synonym: ASCII)UTF-8UTF-16wchar_tThe “wchar_t” character set is not a MIME standard. It indicates that the input is inthe form of the client platform’s native wide character array (i.e., wchar_t*). Note thatinput length must still be specified in bytes (i.e., the number of wide characters in theinput times the number of bytes per wide character).The supported charsets vary for Asian languages. For example, Japanese does notsupport ISO-8859-1 (and thus, content_type becomes required). Japanese supports:❏❏❏❏❏UTF-8UTF-16EUCShift-JISwchar_tAlso for Japanese, the text-to-speech engine ignores any white space in the input text.For example, this allows correct processing when text begins on one line and iscontinued on the next. A result of this behavior, which you might not anticipate, isthat numeric values separated by spaces, tabs, or returns are pronounced as singleunits: “1 2 3”, “123”, and “123” are all pronounced as“123”. To speak the digits individually, separate them with commas: “1,2,3”.Note that for <strong>Speechify</strong> Japanese, the default byte order is set to big endian when thebyte order mark is missing.If text contains byte sequences that do not represent valid characters when convertedto the server’s preferred character set, the server still performs the synthesis. It doesthis by removing the invalid characters and speaking the remaining text.See also“SWIttsStop( )” on page 7-98<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–97


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsStop( )Mode: AsynchronousInterrupts a call to SWIttsSpeak( ).SWIttsResult SWIAPI SWIttsStop (SWIttsPort ttsPort);ParameterDescriptionttsPort The port handle returned by SWIttsOpenPort( ).NotesWhen the currently active speak request is completely stopped and the port is idle,the SWItts library calls the port’s callback with a status of SWItts_cbStopped. Thecallback is called with SWItts_cbStopped only if the SWIttsStop( ) function returnswith a SWItts_SUCCESS result.If there is no SWIttsSpeak( ) function in progress, or if a currently active speakrequest is already stopping due to a previous call to SWIttsStop( ), this functionreturns an error.See also“SWIttsSpeak( )” on page 7-96Fifth Edition, Update 27–98<strong>SpeechWorks</strong> Proprietary


Server’s preferred character set<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsTerm( )Mode: SynchronousCloses all ports, terminates their respective threads, shuts down the API library, andcleans up memory usage.SWIttsResult SWIAPI SWIttsTerm(SWIttsCallback *callback,void *userData);ParametercallbackuserDataDescriptionA pointer to a callback function that may receive SWItts_cbError,SWItts_cbLogError, and/or SWItts_cbDiagnostic messages during theSWIttsTerm( ) call.User information passed back to callback.NotesIf SWIttsTerm( ) closes one or more open TTS ports, you receive SWItts_cbPortClosed messages in their respective callbacks.See also“SWIttsInit( )” on page 7-88<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 27–99


API Reference<strong>Speechify</strong> User’s <strong>Guide</strong>Fifth Edition, Update 27–100<strong>SpeechWorks</strong> Proprietary


CHAPTER 8Performance and SizingThis chapter describes the metrics used to characterize the speed, efficiency, and loadhandlingcapabilities of <strong>Speechify</strong>. Included is a description of the testing applicationitself.Actual test results for any given <strong>Speechify</strong> release, platform, and voice are provided ina readme file that is created on your system during installation. These tests assume nospecific tuning of any operating system to improve application performance. For allmeasurements, the client and test application are run on a machine separate from thathosting the server.In this chapter• Test scenario and application on page 8-102• Performance statistics on page 8-103• Resource consumption on page 8-105• Performance thresholds on page 8-107<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 28–101


Performance and Sizing<strong>Speechify</strong> User’s <strong>Guide</strong>Test scenario and application<strong>Speechify</strong>’s performance is dependent on the text being synthesized, including itscontent, complexity, individual utterance length, presence of unusual strings,abbreviations, etc. Similarly, the usage pattern, i.e., frequency and distribution ofsynthesis requests, affect the speed with which the server provides audio to theapplication.The test application simulates a scenario where each TTS request is sent only afterthe allotted time required to play the audio from the previous request has elapsed. Forexample, if the application requests 10 seconds of audio from the server and receivesthe request within one second, it waits another 9 seconds before sending the nextrequest. This request-wait-request pattern is executed on multiple threads with eachthread assigned one connection to the server. This test application is included withthe <strong>Speechify</strong> client SDK installation for easy measurement of performance figuresfor your specific server machines.Sample texts are taken from a selection of newspaper articles with each heading andparagraph forming a separate speak request which is sent to the server in a randomorder, observing the timing described above. All of the measurements, except serverCPU utilization and memory usage, are made on the application side of the <strong>Speechify</strong>API. Performance statistics are reported as a function of the number of channels tothe server that are simultaneously opened by the test application.The application is designed to submit text to the server over a variable andconfigurable number of channels (simultaneous connections). The command linespecifies the initial and final number of channels as well as the increment. Forexample, a test run may begin with 40 channels and progress to 70 in increments offive. At each step the same number of synthesis requests, for example 100, issubmitted per channel. For each channel count, the application reports variousstatistics describing the responsiveness and speed of the server. These are describedbelow.Fifth Edition, Update 28–102<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Performance statistics<strong>SpeechWorks</strong> uses several metrics to characterize <strong>Speechify</strong>’s performance. You mustunderstand their meaning to estimate the tasks and loads that the server is capable ofhandling on a given platform.Latency (time-to-first-audio)For <strong>Speechify</strong>, latency is defined as the time delay between the application’s call toSWIttsSpeak( ) and the arrival of the first corresponding audio packet in the callbackfunction. <strong>Speechify</strong> uses streaming audio to minimize latency, but to obtain optimalrhythm and intonation, the TTS engine must process sentence-sized segments of textone at a time which causes a processing delay. The absolute value of latency is highlydependent on the characteristics of the requested text, specifically the length and, to alesser degree, the complexity of the first sentence. Latency also begins to increase withthe number of active server ports once the server CPU load approaches 90%.Real-time factorThe real-time factor (xRT) of a <strong>Speechify</strong> TTS request is defined as the ratio of theaudio playback duration to the processing time required to generate it. Expectedlatency is estimated from xRT and the duration of the audio corresponding to thefirst sentence of text. The latter should be approximately equal to the product oflatency and the xRT.For example, if it takes 500 ms to generate 10 sec of audio, the xRT is 20. A xRT ofless than 1.0 takes the server longer to generate audio than an application to play itwhich means the system can never catch up with itself.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 28–103


Performance and Sizing<strong>Speechify</strong> User’s <strong>Guide</strong>Audio buffer underflowAudio buffer underflow refers to the audio dropout which occurs when the applicationconsumes (plays out) all of the audio it received from the server and is idle whilewaiting for the next audio data packet. This may cause audible gaps in the outputalthough many underflows are not perceptible.An audio buffer underflow does not always lead to an audible gap. There is usually afurther level of buffering performed by the audio driver software layers or even theapplication. This extra buffering “hides” a gap of a few milliseconds at the TTS APIlevel at the cost of slightly greater latency.Audio buffer underflow rate is the percentage of audio packets that arrive after theirimmediate predecessors already have been consumed. For the simulation application,each audio packet passed to the callback function contains 512 ms of audio data. Anunderflow rate of 1% therefore translates to a potential gap every 100 audio buffers,or 51.2 seconds of audio, and a rate of 0.01% equals a gap on average once every 90minutes.By default, the <strong>Speechify</strong> engine divides input text into sentences, and then processeseach one in turn. Audio buffer underflows occur when the audio from a sentence hasall been sent back through the API, but the audio for the next sentence is not ready.This can happen if the first sentence is very short, perhaps a single word, and thefollowing one is long and complex. The client API adds some trailing silence to eachsentence which means a short additional gap occurring during this silence may not benoticed or found objectionable.Fifth Edition, Update 28–104<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Resource consumptionThe <strong>Speechify</strong> server’s primary resources are CPU cycles and memory. Due to thedemands placed on the hosting platform, you should choose a platform for the tasksthat the server is expected to handle.Server CPU utilizationServer CPU utilization is the percentage of the total CPU time spent processing useror system calls (the remainder is time the CPU is idle). As more TTS ports areopened and used for processing, CPU utilization increases approximately linearly.When CPU utilization approaches 100%, the <strong>Speechify</strong> server starts to saturate andperformance degrades rapidly as further ports are opened.NOTECPU utilization should be kept below 85%, which gives some leeway on theserver allowing more consistent TTS performance.Memory useThe <strong>Speechify</strong> server’s base memory use varies per voice, and is usually on the order of80–100 MB. See the appropriate <strong>Speechify</strong> Language Supplement for details onmemory use for the supported voices. Each connection made from a client via theSWIttsOpenPort( ) function is handled by a child process of the <strong>Speechify</strong> server.❏❏On UNIX systems, these additional processes are forked from the <strong>Speechify</strong>server.On Windows systems, these additional processes are usually pre-started forefficiency.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 28–105


Performance and Sizing<strong>Speechify</strong> User’s <strong>Guide</strong>Child processes share most of their memory space with the parent server process anduse additional memory only to store dynamic information needed to create eachsegment of speech. The size of this additional memory varies with the length andcomplexity of the input text but is usually on the order of 3–4 MB. Consequently, a<strong>Speechify</strong> server servicing 50 ports requires an additional 150–200 MB of memory.Measuring server memory use on Windows NT/2000Using the Windows Task Manager to measure memory usage can be misleading. TheMem Usage column shows the physical memory being used by each process but thatincludes memory shared between processes. <strong>Speechify</strong> processes share a base of about100–110 MB, which means that while most or all <strong>Speechify</strong> processes may show amemory usage of 100 MB, almost all of it is shared and should not be counted morethan once.For a more accurate way of measuring <strong>Speechify</strong>’s total memory usage:1. Boot your server machine and let it come to a steady state with respect to memoryuse with all of its native services, etc.2. Start the <strong>Speechify</strong> processes.3. Use all the <strong>Speechify</strong> processes for a few minutes then stop.4. For each <strong>Speechify</strong> process, run “vadump –osp ”, where vadump is in theWindows NT/2000 Resource Kit and is the decimal process ID availablefrom the Task Manager.5. Calculate total memory usage for <strong>Speechify</strong> as the sum of the following:a. Grand Total Working Set Private Kbytes for each processb. Grand Total Working Set Shareable Kbytes for each processc. Grand Total Working Set Shared Kbytes, only the largest one of all processes.Fifth Edition, Update 28–106<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>Performance thresholds<strong>SpeechWorks</strong> furnishes specific performance figures for each <strong>Speechify</strong> release andvoice. <strong>SpeechWorks</strong> uses Pentium III, 1 GHz machines with 512 MB of RAM as thereference servers.We summarize performance by reporting the maximum number of channels supportedby the server on the reference platform. This is the maximum number ofsimultaneous connections that can be made by our test application while keeping thefollowing performance limits:❏ Audio buffer underflow rate


Performance and Sizing<strong>Speechify</strong> User’s <strong>Guide</strong>Fifth Edition, Update 28–108<strong>SpeechWorks</strong> Proprietary


APPENDIX A<strong>Speechify</strong> LoggingOverview of logs<strong>Speechify</strong> has comprehensive logging facilities. Both the client and server have errorand diagnostic logs. In addition, the server has an event log.Error logThe <strong>Speechify</strong> server uses the error log to report system faults (errors) and possiblesystem faults (warnings) to the system operator for diagnosis and repair. The clientand server report errors by error numbers that are mapped to text with an associatedseverity level. This mapping is maintained in a separate XML document that allowsintegrators to localize and customize the text without code changes. It also allowsintegrators and developers to review and re-define severity levels without codechanges. Optional key/value pairs are used to convey details about the fault, such asthe filename associated with the fault. These can be formatted into text using theinformation in the XML file. <strong>Speechify</strong> installs a US English version of this XMLdocument in /doc/<strong>Speechify</strong>Errors.en-US.xml.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2A–109


<strong>Speechify</strong> User’s <strong>Guide</strong>Diagnostic logDevelopers and <strong>SpeechWorks</strong> support staff use the diagnostic log to trace anddiagnose system behavior. Diagnostic messages are hard-coded text since they areprivate messages that are not intended for system operator use. Diagnostic logginghas little or no performance impact if disabled but can impact system performance ifturned on.Event log (server only)The <strong>Speechify</strong> server can automatically log results from usage activity to a file with awell-defined format. This is called event logging and you must use the “event_file”command-line option (see “--event_file ” on page 1-14) to turn this on.The event log that is generated summarizes usage activity in order to supportreporting for such things as capacity planning, performance, and applicationsynthesis usage. Key/value pairs are used to convey details about each event, such asthe number of characters in a synthesis request or the number of bytes of audio sentto the client. You may then process the file or files with tools such as Perl or grep toview statistics, identify problem areas or generate reports.The events that <strong>Speechify</strong> logs describe such things as synthesis requests, clientconnections opened and closed, and error conditions. It is important to note that theevent log is not the same thing as a diagnostic log that <strong>SpeechWorks</strong> may request inthe event you need technical support. Nor is it an error log since not all error detailsare logged to the event log.Client loggingOverview of client logsThe client has two classes of messages: error and diagnostic. The <strong>Speechify</strong> clientlibrary indicates these to an application by passing the SWItts_cbLogError andSWItts_cbDiagnostic status codes to the application’s SWIttsCallback( ) function.These codes have a structure associated with them, SWIttsMessagePacket, that theclient sends with the status code. This structure contains detailed information that letFifth Edition, Update 2A–110<strong>SpeechWorks</strong> Proprietary


Client logging<strong>Speechify</strong> User’s <strong>Guide</strong>the application writer determine how, where, and in what format, logging shouldhappen. For more information on these codes and the structure, see“SWIttsCallback( )” on page 7-75.Enabling diagnostic loggingBecause the client can output a very large amount of diagnostic messages, the clientdoes not send diagnostic messages to the application’s SWIttsCallback( ) functionunless you explicitly tell it to do so by setting an environment variable namedSWITTSLOGDIAG. The value does not matter as long as it exists. Unless it is set,the client library sends no diagnostic messages to SWIttsCallback( ). For example,from an Windows NT/2000 command shell:set SWITTSLOGDIAG=1To log the messages to a file, use these environment variables:VariableSWITTSLOGTOFILESWITTSMAXLOGSIZEDescriptionFilename to collect the diagnostic messages.Maximum size of the log file in Kb. If this variable is not set,there is no limit on the log size.When the log reaches the size limit, it rolls over into a filenamed the same as the log file with a .old suffix, then thelogging resumes.Logging to stdout/stderrIt is not necessary to implement code in the application SWIttsCallback function toenable logging from the <strong>Speechify</strong> client. To guarantee that <strong>SpeechWorks</strong> technicalsupport staff can always obtain a log, default logging abilities are built into the client.To enable these, set an environment variable named SWITTSLOGTOSTD. If thatvariable exists (value is unimportant) then the client library logs diagnostic and errormessages to stdout and stderr. Note that the callback function is invoked with thestatus codes SWItts_cbLogError and SWItts_cbDiagnostic whether or not thevariable is set. Also, diagnostics are not printed unless they are turned on withSWITTSLOGDIAG as described above.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2A–111


<strong>Speechify</strong> User’s <strong>Guide</strong>Members of error messages are printed to stderr/stdout in the following order:❏❏❏❏❏A timestampThe port number associated with the error (-1 if the port number was unknownat the time of the message generation)The thread instance in which the error occurred (currently the thread ID)The error ID numberZero or more key/value pairs with data specific to that errorSep 22 11:45:33.19|0|7603|10002|filename=foo.barIn the above example, the error occurred on September 22, at approximately11:45 am. The port number was 0, the thread ID was 7603, and the error numberwas 10002. Looking at the <strong>Speechify</strong>Errors.en-US.xml, this error code refers to amissing file error. The error message in the XML file refers to a key/value pair namedfilename. The message above supplies a value for filename of “foo.bar” which is themissing file.Server loggingDefault logging behaviorBy default, the <strong>Speechify</strong> server logs error messages to stdout (not stderr) and doesnot log diagnostic messages or events. On Windows systems, when the server is runas an NT Service, the server logs error messages to the NT Event Log.Controlling logging with command-line optionsThe default logging behavior can be modified by the use of various command-lineoptions. When running the server as an NT service, you can control these options viathe <strong>Speechify</strong> MMC snap-in. See “Installing and Configuring <strong>Speechify</strong>” on page 1-1for more details on using this applet and which options there correspond to thecommand-line options discussed here.Fifth Edition, Update 2A–112<strong>SpeechWorks</strong> Proprietary


Server logging<strong>Speechify</strong> User’s <strong>Guide</strong>--verboseAs in the client, the server does not print every diagnostic messageto stdout by default. Otherwise, the output size would bevery large. To enable detailed logging on the server, use the“--verbose ” option where is an integer indicatingthe level of detail desired. The numbering starts at zero.--diagnostic_fileThe server normally logs diagnostic output (controlled by theverbose flag) and errors to stdout. If you want to redirect theseto a file, you could use standard command-line redirection. Ifyou use the “--diagnostic_file ” command-lineoption, the server sends these messages to the file specified (inaddition to sending them to stdout). Using this option insteadof standard redirection has the benefits of ease of use, limitingthe size of the file and rolling it over when it grows past thatsize.--event_fileTo enable event logging, use the “--event_file ” command-lineoption. See “Event logging” on page A-114 for moredetails on event logging.--system_logAs mentioned above, the server logs errors to stdout. If you usethe “--system_log” command-line option, the server also logserror messages to the syslog facility on Unix systems and the NTEvent Log facility on Windows systems. Note that this behavioris the default when running the server as an NT service on Windows.Log file size and roll-overWhen using the --diagnostic_file or the --event_file options, the <strong>Speechify</strong> server canlog a large amount of data to disk. The rate of file growth differs from application toapplication but you can expect <strong>Speechify</strong> to log 10 MB of data for a systemsupporting 72 channels every 24 hours. If the file size exceeds 10 MB, the eventlogging system renames the current file to .old and starts logging to anempty file named . The system only keeps one old file around.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2A–113


<strong>Speechify</strong> User’s <strong>Guide</strong>Error message formatErrors contain the following information:❏❏❏❏A timestampThe server instance in which the error occurred (currently the process ID)The error ID numberZero or more key/value pairs with data specific to that errorExamples:Sep 13 14:01:13.52|329|10011In the above example, the error occurred on September 13, a little after 2 pm. Theprocess ID was 329 and the error number was 10011. <strong>Speechify</strong>Errors.en-US.xmlidentifies this error code as an SSML parsing error.Sep 22 11:45:33.19|7603|10002|filename=foo.barIn the above example, the error occurred on September 22, at approximately11:45 am. The process ID was 7603 and the error number was 10002.<strong>Speechify</strong>Errors.en-US.xml identifies this error code as a missing file error. The errormessage in the XML file refers to a key/value pair named filename. The messageabove supplies a value for filename of “foo.bar” which is the missing file.Event loggingEvents contain the following information:❏❏❏❏❏A timestampTwo tokens that describe the amount of CPU the current server instance hasused so far:• The UCPU token records the User CPU used since the process started.• The KCPU token records the total Kernel CPU used since the process started.The server instance in which the event occurred (currently the process ID)The event nameZero or more tokens of data specific to that event.Fifth Edition, Update 2A–114<strong>SpeechWorks</strong> Proprietary


Server logging<strong>Speechify</strong> User’s <strong>Guide</strong>File formatThe <strong>Speechify</strong> event log is an ANSI text file that contains one event per line. An eventnever spans more than one line. Each line is terminated by a new-line controlsequence; Unix systems use “\n”, Windows systems use “\r\n”. The maximum size ofa line is 10 kilobytes.Event formatWithin each event, multiple fields are separated by the “|” character.❏❏❏❏The first field is a timestamp with month, date, and time to the hundredth of asecond.The second field is the event name.The third field is the <strong>Speechify</strong> instance ID (currently the process ID).This is followed by zero or more fields containing keys and values separated bythe “=” character. These key/value pairs are known as “tokens.”NOTEAny tool written to process <strong>Speechify</strong> event log files should allow for andignore any unknown events or unknown fields within an event.Example event logThis sample shows a simple series of events: the server starts (STRT then INIT), aclient connects (OPEN), the client sets two parameters (SETP) and makes onesynthesis request (SPKB then FAUD), the request completes (SPKE), then the clientcloses the connection (CLOS).Mar 18 10:46:56.47|UCPU=20|KCPU=10|259|STRT|VERS=Developmentbuild|PROT=20020117|NAME=mara|LANG=en-US|FRMT=8|PORT=5555Mar 18 10:46:56.50|UCPU=2854|KCPU=640|334|INITMar 18 10:47:25.27|UCPU=40|KCPU=20|259|OPEN|clientIP=127.0.0.1Mar 18 10:47:25.32|UCPU=40|KCPU=20|259|SETP|NAME=tts.marks.word|VALU=falseMar 18 10:47:25.37|UCPU=40|KCPU=20|259|SETP|NAME=tts.marks.phoneme|VALU=trueMar 18 10:47:25.53|UCPU=40|KCPU=20|259|SPKB|NBYT=164|NCHR=81Mar 18 10:47:28.40|UCPU=70|KCPU=40|259|FAUDMar 18 10:47:28.59|UCPU=70|KCPU=50|259|SPKE|XMIT=23651Mar 18 10:47:28.62|UCPU=70|KCPU=50|259|CLOS<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2A–115


<strong>Speechify</strong> User’s <strong>Guide</strong>Description of events and tokensThe following list shows the current set of <strong>Speechify</strong> event names and tokens.NOTEThis list may expand in future releases. Any tool written to process <strong>Speechify</strong>event log files should allow for and ignore any unknown events or unknowntokens within an event.Description Token names Example values MeaningSTRT – Indicates that <strong>Speechify</strong> has launched and is beginning the initialization phase.VERS Relx-x X The version of <strong>Speechify</strong>, for example: “Rel2-0 d”PROT 20010801 The version of the client/server protocol used.LANG en-US The language of the server.NAME rick The name of the voice used.FRMT 8 The audio format of the speech database.PORT 5555 The sockets port the server listens on.INIT – Initialization done(no tokens)OPEN – Open connection from clientCLIP 10.2.1.244 The IP address of the client making the connection.CLOS – Close connection to client(no tokens)SPKB – Speak begin. Logged when a speak request is received.NBYT 4000 The number of bytes in the synthesis text.NCHR 2000 The number of characters in the synthesis text.FAUD – First audio. Logged when the first audio packet is sent to the client.(no tokens)STRQ – Stop request(no tokens)Fifth Edition, Update 2A–116<strong>SpeechWorks</strong> Proprietary


Server logging<strong>Speechify</strong> User’s <strong>Guide</strong>Description Token names Example values MeaningSTOP – StopXMIT 5678 The number of bytes of audio sent to the client beforestopping.SPKE – Speak endXMIT 10450 The number of bytes of audio sent for the synthesisrequest.SETP – Set parameterNAME tts.marks.word The name of the modified parameter.VALU false The new value of the parameter.ERR – ErrorERRN 10002 The server error number<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2A–117


<strong>Speechify</strong> User’s <strong>Guide</strong>Fifth Edition, Update 2A–118<strong>SpeechWorks</strong> Proprietary


APPENDIX BSSML<strong>Speechify</strong> can accept input in the form of Speech Synthesis Markup Language(SSML). <strong>Speechify</strong> complies with the January 2001 draft specification which can befound at http://www.w3.org/TR/speech-synthesis. That document describes all themarkup specifications that make up the SSML.<strong>Speechify</strong> includes appropriate processing for most of the elements and attributesdefined by the specification. For tags whose content fails to parse or is ambiguouslyformatted the server attempts to synthesize the raw text surrounded by the tag. Inpractice, this often results in correct output. Such cases are noted in the server errorlog. As long as the input is identified as SSML, the server never pronounces the tagmarkup itself. Nevertheless, unsupported tags encountered in the input appear in theserver error log.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2B–119


<strong>Speechify</strong> User’s <strong>Guide</strong>Element support statusThe table below outlines the extent to which SSML elements are currently supportedby <strong>Speechify</strong>. It also notes the limitations on element attributes and content that<strong>Speechify</strong> currently requires of supported elements.Elements <strong>Support</strong> Attributes and content notes/limitationsAudio NoBreak Yes If neither the size nor the time attributes are used, the defaultbreak duration is 0.7 seconds. If size is specified to be “small”,the duration is 0.35 seconds, if “medium”, 0.7 seconds, and if“large”, 1 second.Emphasis NoMark NoParagraph YesPhoneme Yes alphabetUnsupported attribute. The phonetic representation (i.e., the“ph” attribute) must be specified in the <strong>Speechify</strong> SPR format.See “Symbolic Phonetic Representations” on page 5-55.Prosody Partial The attributes “volume” and “rate” are supported; all others areignored. Note that the values given for these attributes areinterpreted relative to the corresponding port-specific values ineffect at the time. (See “Controlling the audio characteristics”on page 4-51 for an explanation of port-specific values.)Say-as Yes This element includes a large number of attributes and formats,virtually all of which are supported, see special section below fordetails.Sentence YesSpeak Yes xml:langOnly available for languages supported by <strong>Speechify</strong>.If an unsupported language is specified, the server logs an errorand attempts to process subsequent tags using US English rules.Voice NoFifth Edition, Update 2B–120<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong><strong>Support</strong> of the “say-as” elementThe “say-as” element identifies the type of text contained therein. This can facilitatethe correct pronunciation of the element’s content. <strong>Speechify</strong> supports both the“type” and the “sub” attributes. The table below lists the possible values of the formerand provides some guidance to their usage:say-as “type”attributeacronymaddresscurrencydatedurationmeasurenamenetnumbertelephonetimeNotes and cautionsLetters and numbers are pronounced individually.<strong>Support</strong>s currency as commonly specified in the countrycorresponding to the target language.For example, dollars and cents for US English, Euros for ParisianFrench, and pesos and centavos for Mexican Spanish.All formats are supported; if the year is written using only two digits,numbers less than 50 are assumed to occur in the 2000s, greater than50 in the 1900s.All formats are supported; example: “duration:hms” is read out as“ hour(s), minute(s), and seconds” (assuming xml:langwas specified as “en-US”).A variety of units (e.g., km, hr, dB, lb, MHz) is supported; the unitsmay appear immediately next to a number (e.g., 1cm) or be separatedby a space (e.g., 15 ms); for some units the singular/plural distinctionmay not always be made correctly.Both the “uri” and “email” formats are supported; note, however, thate-mail addresses are difficult to pronounce the way a user expects.All three formats, “digits”, “ordinal”, and “cardinal”, are supported (ifrelevant in the target language); Roman numerals are not supported.<strong>Support</strong>s telephone numbers as commonly specified in the countrycorresponding to the target language. For “en-US” this includes:(800) 123-4567, 617-123-4567, and 212 123-4567.All formats are supported; the hour should be less than 24, minutesand seconds less than 60; AM/PM is read out only if explicitlyspecified.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2B–121


<strong>Speechify</strong> User’s <strong>Guide</strong>Fifth Edition, Update 2B–122<strong>SpeechWorks</strong> Proprietary


APPENDIX CSAPI 5The SAPI 5 interface for <strong>Speechify</strong> is compliant with the Microsoft SAPI 5 text-tospeechspecification. Although certain SAPI 5 features are not currently supported,those most commonly used by SAPI 5 developers are.ComplianceThe SAPI 5 interface for <strong>Speechify</strong>:1. Provides an in-process COM server implementing the following interfaces:• IspTTSEngine• IspObjectWithTokenIf you use the SAPI IspVoice interface to manipulate a specific voice,<strong>SpeechWorks</strong> does not support the following methods:• SkipIn doing so, <strong>Speechify</strong> fails the following SAPI Compliance Tool tests:• TTS Compliance Test\ISpTTSEngine\Skip• TTS Compliance Test\Real Time Rate/Vol Tests\Real Time Rate Change• TTS Compliance Test\Real Time Rate/Vol Tests\Real Time Volume Change<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2C–123


<strong>Speechify</strong> User’s <strong>Guide</strong>2. <strong>Support</strong>s the following SAPI-defined speak flags:Speak flagsActionSPF_DEFAULTThis is the default speak flag. The engine renders text tothe output side under normal conditions.SPF_PURGEBEFORESPEAK The engine can stop rendering when passed an abortrequest.SPF_IS_XMLThe engine correctly interprets and acts upon a numberof SAPI XML tags (defined below).SPF_ASYNCThe engine uses SAPI 5 thread management to supportthe asynchronous synthesis of audio output withoutblocking the calling application.3. <strong>Support</strong>s the following output formats natively, depending on the voice selected:Output formatsSPSF_CCITT_uLaw_8kHzMonoSPSF_CCITT_ALaw_8kHzMonoSPSF_8kHz16BitMonoSPSF_16kHz16BitMonoDefinitionSingle channel 8 kHz µ-law with an 8-bit sample sizeSingle channel 8 kHz A-law with an 8-bit sample sizeSingle channel 8 kHz linear PCM with a 16-bit samplesizeSingle channel 16 kHz linear PCM with a 16-bit sample sizeBy default, <strong>Speechify</strong> SAPI 5 voices return audio to SAPI in these formats:• <strong>Speechify</strong> 8kHz: SPSF_8kHz16BitMono• <strong>Speechify</strong> 16kHz: SPSF_16kHz16BitMonoTo set an audio format in the table to be <strong>Speechify</strong>’s native engine format (beforeinstantiating that voice), use the registry to set the Attributes\MimeType subkeyof the appropriate voice key to one of <strong>Speechify</strong>’s MIME types. (The MIME typesare documented in “SWIttsSetParameter( )” on page 7-94.) If MimeType is notpresent or is set to an empty string, <strong>Speechify</strong> voices use their default nativeformats.Thus, you can set a default in the registry (e.g., if you’re using an 8 kHz voice andwant µ-law instead of 8 kHz linear PCM), which is equivalent to usingSWIttsSetParameter( ) to set tts.audioformat.mimetype for each port. SAPI hasno equivalent of SWIttsSetParameter( ); setting parameters is combined withcreating an instance.Note that SAPI 5 can convert to other audio formats not listed in this table.Fifth Edition, Update 2C–124<strong>SpeechWorks</strong> Proprietary


SAPI voice properties<strong>Speechify</strong> User’s <strong>Guide</strong>4. <strong>Support</strong>s SAPI 5’s User and Application Lexicons.SAPI 5 has a separate client/application-specific dictionary called the Lexicon.<strong>Speechify</strong> for SAPI 5 interacts with the SAPI lexicon allowing clients to override thedefault word pronunciation of the <strong>Speechify</strong> Server. <strong>Speechify</strong> ignores the user–supplied part-of-speech field in the Lexicon.5. <strong>Support</strong>s the following SAPI XML tags. All other SAPI XML tags are ignored by<strong>Speechify</strong>:• • • • • • By ignoring other XML tags, <strong>Speechify</strong> fails the following tests in SAPICompliance Tool:• TTS Compliance Test\TTS XML Markup\Pitch (uses tag)• Features\Emph (uses tag)• Features\PartOfSp (uses tag)6. Synthesizes the following SAPI events:Bookmark names beginning with “!SWI” are now reserved for internal SAPIprocessing and cannot be used as names of user–provided SAPI bookmarks.• SPEI_TTS_BOOKMARK• SPEI_WORD_BOUNDARY• SPEI_PHONEME• SPEI_VISEME (for US English only)7. <strong>Support</strong>s the instantiation and simultaneous use of multiple TTS Engineinstances within a single client process.SAPI voice propertiesThe SAPI 5 API was not explicitly designed to allow for client/server speech synthesissystems such as <strong>Speechify</strong>. By default, the <strong>Speechify</strong> installer configures <strong>Speechify</strong>'sSAPI 5 interface to look for the server on the same machine as the client (i.e.,localhost) but it is possible to configure the interface to look for a server elsewhere on<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2C–125


<strong>Speechify</strong> User’s <strong>Guide</strong>the network, even on a non-Windows machine. To do this configuration, theinterface provides a properties dialog box accessible via the Speech control panelapplet. This dialog box also lets you examine the voice's attributes and turndiagnostic logging on or off for technical support purposes.To invoke the properties dialog box, use Start >> Settings >> Speech and click theText To Speech tab. From the Voice Selection list, choose a <strong>Speechify</strong> voice. (You maysee a message about not being able to initialize the voice. Ignore this message; it isinnocuous.) Once the <strong>Speechify</strong> voice is selected, the button labeled Settings…should be enabled. Click this button to open the <strong>Speechify</strong> SAPI Voice Propertiesdialog box.The properties dialog box displays the voice name at the top of the box. The nextsection displays attributes of the voice including the language, gender, age, and thenative audio format of the server. None of these values are editable. Although SAPIcan provide an application with almost any audio format it requests, the applicationshould request audio in the voice’s native audio format for the best audio quality.The next section of the dialog box lets you configure the <strong>Speechify</strong> server’s addressand port number. You can select either Host name and enter a server’s host name orselect Host IP and enter a server’s IP address. Below those fields, enter the server'sport number. This corresponds to the “--port ” option used to start the<strong>Speechify</strong> server.The final section of the dialog box is for diagnostic logging. If you experienceproblems with <strong>Speechify</strong>’s SAPI 5 interface and require technical support,<strong>SpeechWorks</strong> may ask you to turn on diagnostic logging within the interface. Unlessrequired for technical support, you should leave logging turned off to avoidunnecessary CPU and disk activity. To turn on diagnostic logging, click the arrows toset the logging level:❏❏❏0: logging is turned off1: log high-level information2: log verbose informationWhen logging is turned on, information is logged to the <strong>Speechify</strong> /bin directory, in afile named <strong>Speechify</strong>-SAPI5.log.When you are done modifying the settings in the Voice Properties dialog box, clickOK to accept the changes. The changes go into effect the next time an engineinstance is created.Fifth Edition, Update 2C–126<strong>SpeechWorks</strong> Proprietary


APPENDIX DFrequently Asked QuestionsThis appendix answers frequently asked questions about using <strong>Speechify</strong>.❏❏❏❏❏Question types: Why is the output the same, whether I end a sentence with aquestion mark or a period?Changing rate or volume: Can the application change the speaking rate orvolume of the audio output while <strong>Speechify</strong> is synthesizing?Rewinding and fast-forwarding: Can the application move forward or backwardin the audio output (e.g. go back 10 seconds)?Java interface: Does <strong>Speechify</strong> or <strong>Speechify</strong> Solo offer a Java interface?E-mail pre-processor and SAPI 5: Is it possible to use the e-mail pre-processorwith SAPI 5?❏ SSML and SAPI 5: Can I use input text containing SSML with SAPI 5?❏❏❏❏Error codes 107 and 108: What are the error codes 107 and 108? (For example:ERROR 108 received in myTTSCallBack.)<strong>Speechify</strong> 2.1 and <strong>Speechify</strong> 2.0 voices: After installing <strong>Speechify</strong> 2.1 and tryingto install a <strong>Speechify</strong> 2.0 voice, some customers have seen the following errormessage: “You must have <strong>Speechify</strong> 2.0 installed to run this voice.”Connecting to the server: When trying to run the <strong>Speechify</strong> “dialogGUI”sample application, you may get the following error message box: “ERROR:The <strong>SpeechWorks</strong> TTS library could not connect to the server.”Finding voices: When trying to start up a <strong>Speechify</strong> service, you may get thefollowing message: “The voice that you entered cannot be found. Check thepath of the installed voices, Name, Language, and Format.”<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2D–127


<strong>Speechify</strong> User’s <strong>Guide</strong>❏Port types: What is the relationship between the SWIttsOpenPort( ) functionand the port number used to configure a voice on the <strong>Speechify</strong> server?Question typesWhy is the output the same, whether I end a sentence with a question mark or aperiod?There are many different types of questions, including:❏❏❏Yes/No questions, e.g. Did you see the film yesterday?Tag questions, e.g. You saw the film yesterday, didn't you?Wh-questions, e.g. Where did you see the film?Different question types have fundamentally different prosodic patterns. It is acommon misconception that all a TTS engine needs to do is generate risingintonation at the end of every question. For certain question types, the intonationproduced by <strong>Speechify</strong> (and many other TTS systems) might be inappropriate. Inorder to get the most natural speech possible out of <strong>Speechify</strong>, we have concentratedon generating optimal output for statements and wh-questions (who, how, what,when, where, why, which). If you design your user interactions carefully, it’s possibleto have your text consist entirely of statements and wh-questions. For example, “Youcan choose one of options a, b, or c. Which would you like?”Changing rate or volumeCan the application change the speaking rate or volume of the audio output while<strong>Speechify</strong> is synthesizing?Once it has asked <strong>Speechify</strong> to synthesize some text, the application cannot changethe rate or volume of the output. However, the application can tell <strong>Speechify</strong> tochange to different rates or volumes at different points in the output by embeddingrate and/or volume control tags in the input text.Fifth Edition, Update 2D–128<strong>SpeechWorks</strong> Proprietary


Rewinding and fast-forwarding<strong>Speechify</strong> User’s <strong>Guide</strong>Rewinding and fast-forwardingCan the application move forward or backward in the audio output (e.g. go back 10seconds)?<strong>Speechify</strong> and <strong>Speechify</strong> Solo do not support this directly, but an application couldbuffer the audio output and provide that functionality.Java interfaceDoes <strong>Speechify</strong> or <strong>Speechify</strong> Solo offer a Java interface?<strong>Speechify</strong> and <strong>Speechify</strong> Solo currently do not offer a Java interface. A Java developerwill need to write the JNI (Java Native Interface) wrapper to access the C API. If youwould like <strong>SpeechWorks</strong> to provide a Java interface, please contact your <strong>SpeechWorks</strong>sales representative or technical support.E-mail pre-processor and SAPI 5Is it possible to use the e-mail pre-processor with SAPI 5?It is possible to use the e-mail pre-processor and SAPI together. The application mustprocess the e-mail via the e-mail preprocessor first, independently of SAPI, and thenpass the resulting text into SAPI.SSML and SAPI 5Can I use input text containing SSML with SAPI 5?<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2D–129


<strong>Speechify</strong> User’s <strong>Guide</strong>It is not possible to use SSML input with the SAPI 5 interface. This is because SAPI5 uses its own XML mark-up called SAPI XML. It may be possible to use XSLT totranslate between the two XML formats.Error codes 107 and 108What are the error codes 107 and 108? (For example: ERROR 108 received inmyTTSCallBack.)Error codes 107 and 108 are both related to networking:Error 107 indicates that the connection between the client and the server existed atone point but was disconnected unexpectedly. There are several likely causes:❏❏❏The <strong>Speechify</strong> server stopped, either by an operator shutting it down or by acrash.The machine that the server was running on was shut down.The network connection between the client and server was broken, either byunplugging a cable or shutting down a switch.Error 108 indicates that some network operation between the client and server timedout. There are several common causes:❏❏❏The <strong>Speechify</strong> server stopped, either by an operator shutting it down or by acrash.The server machine is under heavy load and the <strong>Speechify</strong> server could notrespond fast enough to the client.The <strong>Speechify</strong> server is not started.Fifth Edition, Update 2D–130<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> 2.1 and <strong>Speechify</strong> 2.0 voices<strong>Speechify</strong> User’s <strong>Guide</strong><strong>Speechify</strong> 2.1 and <strong>Speechify</strong> 2.0 voicesAfter installing <strong>Speechify</strong> 2.1 and trying to install a <strong>Speechify</strong> 2.0 voice, somecustomers have seen the following error message: “You must have <strong>Speechify</strong> 2.0installed to run this voice.”This is a known problem. If this occurs, follow this workaround to avoid theproblem:1. Open the Registry Editor and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\<strong>SpeechWorks</strong> International\<strong>Speechify</strong>\2.1.2. Rename the key 2.1 to 2.0.3. Install the 2.0 voice package.4. Rename the key 2.0 back to 2.1.Connecting to the serverWhen trying to run the <strong>Speechify</strong> “dialogGUI” sample application, you may get thefollowing error message box: “ERROR: The <strong>SpeechWorks</strong> TTS library could notconnect to the server.”Common causes:❏❏❏The <strong>Speechify</strong> service has not yet been started. Start the <strong>Speechify</strong> service andtry running the sample again.You have started the <strong>Speechify</strong> server but it has not completed initialization so isnot ready to accept connections.You entered an incorrect host-name/port-number pair in the Server Addressbox. Restart the sample after checking that you have the correct address.<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2D–131


<strong>Speechify</strong> User’s <strong>Guide</strong>Finding voicesWhen trying to start up a <strong>Speechify</strong> service, you may get the following message: “Thevoice that you entered cannot be found. Check the path of the installed voices,Name, Language, and Format.”A common cause is that you do not have enough virtual memory configured in yoursystem. Check that the size of your paging file meets the minimum requirements(specified on page 1-2), and also check the release notes for each voice since therequirements may differ.Port typesWhat is the relationship between the SWIttsOpenPort( ) function and the portnumber used to configure a voice on the <strong>Speechify</strong> server?A common mistake is to think that each call that the application makes toSWIttsOpenPort( ) must specify a different port number for the server address. Thisis because the application developer believes that they can only “open” a port once.The cause of this confusion is the name chosen for the function. The TTS “port”handle returned from the SWIttsOpenPort( ) function has nothing to do with the“port” number passed to the function. The handle returned refers to an instance of a<strong>Speechify</strong> voice. The port number passed to the function and used on the server toconfigure the voice refers to a different type of port: a networking/sockets port.When you configure a voice on the server for a certain port number, that number ispart of the network address on which the server listens for connections from a clientapplication. That application creates an instance of a voice by passing the server’saddress, i.e., the host name and the port number, to SWIttsOpenPort( ). To createmultiple TTS ports for that voice, always pass the same host name/port numbercombination to the function. Even though a single <strong>Speechify</strong> server listens on onlyone network address, it can support multiple voice instances. One useful way to thinkof the problem is to imagine that SWIttsOpenPort( ) is renamed toSWIttsCreateVoice( ) and that it returns a handle to a voice instance.Fifth Edition, Update 2D–132<strong>SpeechWorks</strong> Proprietary


IndexSymbols!SWI C-125\! 3-44\![SPR] 3-50\!bm 3-50\!ny0 3-49\!ny1 3-49\!p 3-44\!rdN 3-54\!rdr 3-54\!rpN 3-54\!rpr 3-54\!ts0 3-47\!tsa 3-47\!tsc 3-44, 3-47\!tsr 3-47\!vdN 3-53\!vdr 3-53\!vpN 3-52\!vpr 3-52Aabbreviation dictionary 2-30, 5-61, 5-64, 5-67ambiguous expressions 3-33annotationsentering SPRs 3-50pause 3-44pronouncing numbers and years 3-49API function overview 6-69API result codes 6-71application outline 2-23audio buffer underflow 7-104Bbookmarksinserting 3-50Ccallback function 2-24, 6-79calling convention 6-70cardinal numbers 3-34character setfor Japanese 6-97ISO-8859-1 6-70, 6-97preferred 6-70Shift-JIS 6-70supported 6-97US-ASCII 6-97UTF-16 6-97UTF-8 6-97wchar_t 6-97charsetJapanese 6-73, 6-81, 6-89Western languages 6-73, 6-81, 6-89client SDK on NThardware requirements 1-2software requirements 1-2configuring server 1-17configuring shared memory 1-11configuring <strong>Speechify</strong> 1-1currencies 3-35Ddates 3-35dictionariesabbreviation 5-61file-based 5-67language-specific 5-67main 5-60root 5-62voice-specific 5-67dictionaryabbreviation 5-67main 5-67root 5-67dictionary processing of abbreviationsannotations 3-50digitsfloating point 3-34Eend of sentence 3-45, 5-62Ffeatures 2-22file-based dictionaries 5-67floating point digits 3-34fractions 3-34Gguidelinesimplementation 2-31Iimplementation guidelines 2-31International Radio Alphabet 3-47JJapanese, supported charsets 6-97Kkey, dictionary 5-60Llanguage-specific dictionaries 5-67latency 7-103<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2133


<strong>Speechify</strong> User’s <strong>Guide</strong>Mmain dictionary 2-30, 5-60, 5-64, 5-67max shared memory segment 1-11monetary expressions 3-35Nnegative numbers 3-34numberscardinal 3-34negative 3-34ordinal 3-34numbers and yearsannotations 3-49Ooperations aids 2-31order of API functions 2-23ordinal numbers 3-34Ppaging file 1-2, D-132pausesinserting 3-44performance 7-101period, trailing abbreviation dictionary) 5-61phone numbers 3-35Rreal-time factor 7-103result codes 6-71root dictionary 2-30, 5-62, 5-64, 5-67Ssample time line 2-26SAPI 5 C-123SHMMAX 1-11sizing 7-101social security numbers 3-35<strong>Speechify</strong>configuring 1-1features 2-22installing 1-1spellout modes 3-47SPR tag 3-50structureSWIttsAudioPacket 2-25, 6-76SWIttsBookMark 6-78SWIttsDictionaryEntry 6-74, 6-82, 6-83, 6-90SWIttsMessagePacket 6-77SWIttsPhonemeMark 6-78SWIttsWordMark 6-78support services xiiSWItts_ALREADY_EXECUTING_API 6-71SWitts_cbAudio 6-75SWItts_cbBookmark 6-75SWItts_cbDiagnostic 6-75SWItts_cbEnd 6-75SWItts_cbError 6-76SWItts_cbLogError 6-76SWItts_cbPhonememark 6-76SWItts_cbPing 6-76SWItts_cbPortClosed 6-76SWItts_cbStart 6-76SWItts_cbStatus 2-24, 6-75SWItts_cbStopped 6-76SWItts_cbWordmark 6-76SWItts_CONNECT_ERROR 6-71SWItts_ENGINE_ERROR 6-71SWItts_ERROR_PORT_ALREADY_STOPPING 6-71SWItts_ERROR_STOP_NOT_SPEAKING 6-71SWItts_FATAL_EXCEPTION 6-71SWItts_HOST_NOT_FOUND 6-71SWItts_INVALID_PARAMETER 6-71SWItts_INVALID_PORT 6-71SWItts_MUST_BE_IDLE 6-71SWItts_NO_ENGINE 6-71SWItts_NO_MEMORY 6-71SWItts_NO_MUTEX 6-71SWItts_NO_THREAD 6-71SWItts_NOT_EXECUTING_API 6-71SWItts_PORT_ALREADY_SHUT_DOWN6-71SWItts_PORT_ALREADY_SHUTTING_DOWN 6-71SWItts_PORT_SHUTTING_DOWN 6-71SWItts_PROTOCOL_ERROR 6-71SWItts_READ_ONLY 6-71SWItts_SERVER_ERROR 6-71SWItts_SOCKET_ERROR 6-71SWItts_SSML_PARSE_ERROR 6-71SWItts_SUCCESS 6-71SWItts_UNINITIALIZED 6-71SWItts_UNKNOWN_CHARSET 6-72SWItts_UPDATE_DICT_PARTIAL_SUCCESS6-72SWItts_WINSOCK_FAILED 6-72SWIttsAddDictionaryEntry() 2-31, 6-73SWIttsAudioPacket structure 2-25, 6-76SWIttsBookMark structure 6-78SWIttsCallback() 6-73, 6-75SWIttsClosePort() 6-80SWIttsDeleteDictionaryEntry() 2-31, 6-81SWIttsDictionaryEntry structure 6-74, 6-82, 6-83,6-90SWIttsGetDictionaryKeys() 2-31, 6-83SWIttsGetParameter() 2-30, 6-86Fifth Edition, Update 2134<strong>SpeechWorks</strong> Proprietary


<strong>Speechify</strong> User’s <strong>Guide</strong>SWIttsInit() 6-86, 6-88SWITTSLOGDIAG A-111SWITTSLOGTOFILE A-111SWIttsLookupDictionaryEntry() 2-31, 6-89SWITTSMAXLOGSIZE A-111SWIttsMessagePacket structure 6-77SWIttsOpenPort() 6-89, 6-91stack size 2-32SWIttsPhonemeMark structure 6-78SWIttsPing() 6-92SWIttsResetDictionary() 6-93SWIttsResult 6-71SWIttsSetParameter() 2-30, 6-94SWIttsSpeak() 2-31, 6-93, 6-96SWIttsStop() 6-98SWIttsTerm() 6-99SWIttsWordMark structure 6-78syllable boundaries 4-56syllable stress 4-56Symbolic Phonetic Representation 3-50Ttime linesample 2-26times 3-35translation value, dictionary 5-60tts.audio.rate 6-86, 6-94tts.audio.volume 6-86, 6-94tts.audioformat.encoding 6-86tts.audioformat.mimetype 6-86, 6-94tts.audioformat.samplerate 6-87tts.audioformat.width 6-87tts.client.version 6-87tts.marks.phoneme 6-87, 6-95tts.marks.word 6-87, 6-95tts.network.packetsize 6-87, 6-95tts.reset 6-87, 6-95tts.voice.gender 6-87tts.voice.language 6-87tts.voice.name 6-87Vvoice-specific dictionaries 5-67Wwhite space, in Japanese text 6-97<strong>SpeechWorks</strong> Proprietary Fifth Edition, Update 2135

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!