Get in touch

Please enter your information below.

Thank you! Your message has been sent.
Looks like we're having trouble
X Close
<- Back to Blog List

Cutting humans out of your AI testing

August 12, 2020

At HelloDone we’re trying to make things easier. For consumers that means integrating the everyday — info management, ticketing, payments and other tasks — into their messaging, their lived-in digital channels. For companies, that means delivering an automated conversation platform that allows more productive, efficient and better customer engagement, whilst saving time and money.

It all relies on the HelloDone’ platform — Cicero. This is a natural language processing platform, or Conversation Agent, that manages third party or client information as well as live customer conversations. It ensures rich, live and productive interactions between customers and companies via messaging channels. For companies there is no fancy new app to develop, and the burden of providing legacy systems with modern interfaces is mitigated. Once Cicero accumulates client and consumer data, it can learn and report what else users would like to know or achieve. Cicero is the homo sapiens to the old chatbots' caveman.

The Artificial Intelligence we use at HelloDone is applied AI. It is not trying to make a sentient computer, but instead applies AI inspired techniques to solve problems which, were they to be done by a human, would require intelligence.

For all their mathematical beauty, the conversation agent still needs maintaining. With such a living piece of AI, which regularly interacts with humans, with all their quirks, as you would expect there are significant complexities. Making sure such a system is simultaneously useful, secure, resilient, capable and safe is a challenge. HelloDone needs to know it is all of that and more and at speed for our clients and customers.

So how do I know Cicero is working well, to the level I expect? Even before I go to market? How can I make sure it is going to work with ten, twenty, a hundred thousand people using the system, making mistakes, talking nonsense? How I can I make sure, quickly, that it really is useful, and doing what the client needs and expects?

As a team obsessed with AI and its uses, we thought why not use a bit of “good old fashioned artificial intelligence” (aka GOFAI) to help make HelloDone, and Cicero, a better product? We wanted to make our lives easier. We wanted to speed up development of our software and processes, making a more manageable and effective natural language AI system.So, we made Torch.

Torch is a significant breakthrough for AI development, not technique, wiping out hours of testing and so speeding up software development. With Torch, you can artificially run hundreds of thousands of differing conversations with an AI, improving your system at a rate and cost effectiveness previously not possible. User interface designers have visual tools to design websites and mobile apps: Torch is conceptually the same for natural language conversations.

How to solve a problem in AI, with AI?

Torch was not easy to design, because of the intrinsic characteristics of AI that make it hard to validate. Many of those characteristics that make AI desirable, are also the ones that make it hard to test. Artificial Intelligence is:

• Required to learn and adapt, so differences in behaviour, even in the same circumstances on one occasion from another, are not necessarily errors.

• Often non-deterministic, because non-deterministic algorithms give the best answers, for example those starting with random weights or solutions which then migrate to the local best answer.

• Designed to operate correctly in new circumstances: although we trained it on these English conversations, it should operate on those English conversations it has never encountered before.

• Tackling tasks where the definition of correct is moot. What is the “correct” summary of this Blog, for example?

• Tackling tasks where there are many good enough, but different, solutions. Different solutions may be found on different occasions.

• Designed to operate with uncertainty, where some input values may be missing or known only to very low accuracy.

Amazing but frustrating. The benefits of AI are therefore somewhat assuaged by the difficulty of testing. So, many systems will only be partially validated. It is also why regulators who don’t understand the above may try to impose validation requirements appropriate to deterministic traditional IT systems on AI.

Torch was designed to work with this and remove that laborious work of natural language AI testing. The idea of people having never ending conversations with a natural language AI, noting down issues, fixing and repeating is slow and expensive. We had to bring rigour and speed to the process.

Torch: the details

Torch is a domain-specific language in which to specify a conversation with, and so emulate a user. The Torch compiler reads a conversation specification, i.e. a Torch program, and creates a program which can interact with Cicero, our natural language platform, as if it were the user.

Torch conversation specifications are rigorous but not rigid. They can (and generally do) include the fact that the user may wander off subject, answer the wrong question, digress into FAQs, phrase everything differently and rarely express the same thing in the same way twice, and mis-spell nearly everything. Torch can emulate all of these, yet the specifications are surprisingly short.

Using Torch we obtain a number of benefits:

• The specification of a conversation from the user’s perspective. This can be agreed with a client so that they know what it will do for them, and we know what our system has to achieve.

• A simulation of that dialog, where a person plays the part of Cicero. This means we can test the experience BEFORE implementing it. This solves the common IT problem that users rarely know what they want until after they get it.

• Testing Cicero to confirm that it does indeed satisfy that dialog.

• Regression testing, when either the system or a conversation requirement is changed.

• Stress testing. Torch can run multiple copies of itself, so that thousands of end-users could be simulated at the same time.

• Monitoring that a production system is alive though having Torch be a virtual end-user

Ultimately, we have taken a problem with natural language AI — how you make sure it is correct — and solved it with AI. Its an oddly virtuous circle and one that has extensive application for natural language AI and, I am sure, beyond.

HelloDone Blogs

In follow up to this blog I will be running through the ins and outs of the AI and business benefits behind Conversational Agents, how you can manage them more effectively and look under the hood of Cicero.

If you would like to know more or you have any questions just drop me a comment below and we’ll be in touch.

Share this article:
Andrew Lea
Head of AI

WE LOVE TO TALK

Ready to learn more?