Overview
In this example, we will take a closer look at the process to generating the audio for a single source file.
The first phase is, quite normally, the process of transforming the .docx file into a series of statements to be executed by Polly ou Azure.

The foreach line to process, the call to Polly is executed by the specified voice, enclosed by a <speak /> XML block. The WEB call will either return an error or open a streaming that can be then downloaded into a temporary file. Each line will generate one audio file.
Once the process completes for all the lines, sox.exe is used to concatenate the resulting files into a 1-1 audio files matching the .docx.

You can take a quick listen.
Caching
In order to avoid repeated calls to Azure or AWS for the exact same text read the exact same way by the same voice, ABGenenis uses a cache per voice. If the exact XML text to be sent to the web services is already in the software’s cache it is used instead of calling. That way, a simple change in the text or the voice of one character also requires those segments to be reprocessed, saving a huge amount of precious dollars!