It is a truth universally acknowledged – and by universally I mean among us techies – that Automated Speech Recognition (ASR) is the cornerstone of NLU-based premium client service. This February’s Athens training (where it’s always sunny – or so we tell ourselves) was proof that our partners, tech companies and businesses from around the globe, recognize, with high accuracy and low RTF, the value of Omilia’s solutions in the field.
My not-at-all-impossible mission was to show participants how to combine theory and practice in order to create speech recognition grammars that will provide the basis of human-like interaction in their speech applications. The sessions revolved around three axes: a) key concepts of ASR modelling, b) different methodologies for creating speech grammars, and c) tuning grammars for maximum accuracy (and low RTF, of course).
Our ASR-team consists of many people, but I’m usually the one who takes up the training, as my everyday tasks involve more than ASR: from dialog design to implementation (speech, semantics and dialog flow) to tuning. I have been doing this for eleven years now (has it really been so long?) and have been teaching ASR for the past four. I’ve learned that participants throughout the years have their own modus operandi, their own notions about how ASR should work, but what we do here is more advanced and more efficient.
This year’s training was rather streamlined as we had incorporated the feedback we’d received the previous years even though the core material has always been worthy of an Oscar (and if you’re looking for a host, I’m available!). Our partners, who come both from business and tech backgrounds, had a tough decision to make, as the training had to be done in parallel sessions: ASR or NLU (vanilla or chocolate, seaside or the mountains). The ASR sessions were mostly attended by business people, while the majority of tech chose to venture into the Minecraft of the NLU world, i.e. our Conversation Studio, and create their own Intents.
As far as the ASR sessions are concerned, I declare them a huge success, my own sweaty palms notwithstanding. The audience easily kept up with the material and the questions were interesting and to the point. The plan for the morning session was to cover the theoretical concepts, which are the trickiest to understand. The afternoon session the next day had to be more relaxed but equally informative. Never underestimate a good lunch, dessert included! We had some hands-on fun with the deepASR tools, compiling grammars using different recipes, and tuned our freshly-baked grammars for high accuracy (and low RTF, of course).
These events offer opportunities to pick the minds of other industry people. Coffee breaks, lunches, dinners, drinks brought together professionals with common interests but also diverse ideas and backgrounds. We talked a lot about our approaches, concerns, difficulties, and success stories. What still strikes me as odd is that the industry has moved on from directed dialogs, yet there are few robust platforms for open ended NLU applications and so directed dialogs are still around. We will come back to that soon.
This is an ongoing process, since technology evolves and new tools are being brought to the fore. Training weeks will always be a cause for celebration for us. Not only because we enjoy sharing our breakthroughs with our partners, but also, on a more personal note, because I get to learn as much as I teach.
See you around,
Sofia Kasviki can work her way around every aspect of NLU applications, from solution design & coding to delivery & voice talent coaching. She is part of the Omilia team as senior NLU engineer and team leader. She has worked on actual deployment of AI conversational systems in over 3 continents. She considers herself a human person, though some people have called her a robot.