Some scenarios during which non-talk occurs might include: It's not uncommon for as much as 35 percent of a support call to be what's called non-talk time. You can aggregate the set of values that are obtained as part of a call transcript to determine the sentiment of the call for both your agents and the customer. The Microsoft Batch Transcription API offers sentiment analysis per utterance. In the call center space, the ability to gauge whether customers have had a good experience is one of the most important areas of Speech analytics. Because they're trained with tens of thousands of hours of acoustic data and billions of bits of lexical information, Unified models are the most accurate in the market for transcribing call center data. Our latest Unified version 4.x models are the solution to both transcription accuracy and latency. Therefore, it's important to be able to customize the model to your locale. WER is highly correlated with how well the acoustic and language models are trained for a specific locale. One of the key challenges in call center transcription is the noise that’s prevalent in the call center (for example, other agents speaking in the background), the rich variety of language locales and dialects, and the low quality of the actual telephone signal. Because many of the downstream analytics processes rely on transcribed text, the word error rate (WER) metric is of utmost importance. Speech-to-text is the most sought-after feature in any call center solution. Whether the domain is post-call or real-time, Azure offers a set of mature and emerging technologies to help improve the customer experience. Here is an architecture diagram showing a typical implementation of a batch scenario:Ĭomponents of speech analytics technology Voice assistants (bots), which either drive the dialogue between customers and the bot in an attempt to solve their issues, without agent participation, or apply AI protocols to assist the agent.
This data is also narrowband, in the range of 8 KHz, which can create challenges when you're converting speech to text. Telephony data that's generated through landlines, mobile phones, and radios is ordinarily of low quality.