Various Speech Providers
Introduction
These views shows the connections, building-blocks and order of steps taken for an inbound call scenario, how various speech providers are used to generate or process speech through the local (basic) or modern AI based cloud cognitive services of Google or Microsoft.
Note
Some steps in the views shown can be a simplified representation of reality but should provide you with a clearer understanding of these steps and building blocks involved. The direction of an arrow typically represent the initial direction a connection is made, after which signals or traffic can flow both ways. (For a step marked as "0" it is assumed the connection has already been configured or established in an earlier process)
View 1: Inbound Call, Local TTS Synthesizer
View 1 shows an Inbound call, the IVR Interactive Voice Response, or IVR, is a telephone application to take orders via telephone keypad or voice through a computer. By choosing menu options the caller receives information, without the intervention of a human operator, or will be forwarded to the appropriate Agent. is generated by the Local (basic) Text-To-Speech (TTS) generator. This is a basic (legacy, i.e. developed by Microsoft in the year 2011) speech engine with support for just 26 languages/dialects and only 1 voice(female-gender) per language. The pronunciation quality is fixed and varies per language.
View 2: Inbound Call, Local Speech Recognition Engine.
View 2 shows an Inbound call, the IVR is generated by the Local (basic) Text-To-Speech (TTS) generator. The Microsoft Local Speech Recognition (SR) Listener for key-phrase recognition of customer IVR choice. This is a basic (legacy, i.e. developed by Microsoft in the year 2011) speech engine with support for just 26 languages/dialects. The recognition quality is fixed and varies per language.
View 3: Inbound Call, Microsoft Cognitive Speech Services.
View 3 shows an Inbound call, the IVR gets processed by Microsoft Text-To-Speech Cognitive Services. The IVR is exclusively configured and maintained through a Dialogue Studio flow. Similarly to Text-To-Speech processing, Microsoft Speech-To-Text and Translation Cognitive services can be accessed through the same pathways (subject to regional availability and performance).
View 4: Inbound Call, Google Cloud AI Speech Services.
View 4 shows an Inbound call, the IVR gets processed by Google Cloud's Text-To-Speech AI Services. The IVR is exclusively configured and maintained through a Dialogue Studio flow. Similarly to Text-To-Speech processing, Google Cloud's Speech-To-Text or hook into any of the other emerging Google Clouds Language AI services like Natural Language AI or Dialogflow can be accessed through the same pathways (subject to regional availability).