Conversation Agent engineering

As we’ve seen in my previous blog, Conversation Agents offer an “open” experience, rather than the “closed” experience of an app.  For the right business use case, they deliver greater user satisfaction, are cheaper to run, help identify new business opportunities, and cost less to build and maintain.  

But to be useful, secure, resilient, capable, and safe Real Conversation Agents, as well as being sophisticated and intelligent, must navigate many “real-world” issues.

Here Be Dragons

What are those issues?  Real Conversation Agents have to cope with:-

  • Receiving input from different devices, in different formats, in different character sets;
  • Avoiding SPAM or worse, such as denial of service attacks;
  • Identifying which language is in use, ideally from the text itself;
  • Understanding multiple natural languages;
  • Coping with spelling mistakes, typos, abbreviations, and even emojis which are now normal;
  • Remembering the typical for this user: “Id like 2 order my normal bear pls.” is entirely valid.
  • Understanding what is being said: comprehending text to a meaningful semantically rich and computationally tractable representation;
  • Conversation - getting the facts it needs, when they arrive out of order, and interspersed with either other conversations, digressions, or irrelevance;
  • Interacting with multiple back-end system APIs;
  • Security issues: not everyone who asks a particular fact or action is entitled to it;
  • Data security: normally most of what is told to a Conversation Agent is private;
  • Exercising the duty of care.  People can grow to trust AIs, and some people may be feeling depressed, or even suicidal, and may share this with the AI.  They need help.
  • Knowing what it doesn’t know, and…
  • Escalating to an agent, for example when a customer appears frustrated;
  • Taking the appropriate action: booking the ticket, ordering the cider, or whatever;
  • Having interchangeable knowledge to make these decisions;
  • Having “common sense”: answering frequent questions, without every single possible variation of the question and its answer having been thought of before.
  • Learning what else customers would like to know about, to extend the knowledge base;
  • Writing in language and idioms appropriate to the culture for which it is being generated;
  • Replying with variety, to keep up interest, and to appear intelligent and interesting;
  • Being able to take advantage of information when it is there, yet cope when it isn’t;
  • Reformatting replies to both rich environments (eg web pages) and poor (eg SMS).


Oh, and they also need to be resilient, robust, self-correcting, high performance, scalable, testable, regression-testable, and self-testable.


Cicero Overview

As a Conversation Agent, HelloDone’s Cicero enables commercial clients to engage with their users in natural language.  Whilst its components may not be unique - though possibly best of breed - the combined system offers unique advantages.   A quick tour of Cicero, illustrated and described below, will make this clear.

The subject area can be whatever is pertinent to the relationship: booking tickets or enquiring train times with a rail company, or ordering cider and enquiring delivery time of a brewery.

An interaction, one of many in a conversation, conceptually begins with the user making some sort of enquiry, simply by writing an SMS, iMessage, or WhatsApp message.  There is no new interface to learn nor is there a special syntax.  That message, and its subsequent reply, then passes through these steps (with these numbers corresponding to the diagram):-

  1. A multi-channel transport layer gets it from any of those possible messaging routes.
  2. Image understanding, enables it to also use images, but we aren’t saying how here(!).
  3. The message arrives at the supervisor, which determines which other components get to read and possibly react to the message.  It also logs interactions, for the system to learn from.
  4. Filter identifies spam, DoS or other cyber attacks, inappropriate content, nonsense, chatbot output, and so forth so that the supervisor can drop such messages.
  5. Escalate identifies messages which should be sent to a human agent.  This will depend on business rules, but may include service dissatisfaction, or even aspects of duty of care.
  6. Info bots are third party or open source chat bots that may have been plugged in.
  7. More interesting is the high performance intent recogniser (aka HelloDone Luminate): a neural net previously trained on examples designed to work out what is being asked: intents.  With multiple language training, Cicero gains multi-lingual understanding.  Identified intents are handed over to specialist handlers, which may speak to agents to take action on the user’s behalf, for example booking tickets or ordering cider through external systems.
  8. Handlers may use the knowledge-base driven natural language generator.  It generates  a range of human languages (as required), with variety and with different personalities.  It is able to choose what to say by reference to those details which are known at that point.
  9. The FAQ system (aka HelloDone Clarity) answers generic questions, whose answers do not depend on who is asking or their circumstances.  (Eg “when is the brewery open?”).  The knowledge base is designed to be easy to write, so it knows about synonyms for example.  Clarity can be part of a conversation as it can understand and even generate intents.  (In fact Clarity probably subsumes the capability of a traditional ChatBot on its own.)
  10. FallBack cuts in if the text has not been handled.
  11. The Channel Renderer knows which route the reply is taking back to the user, and can adjust formats accordingly, relieving other components of this complexity.
  12. Finally the multi-channel transport layer sends the reply back to the user.


At the top of the diagram, the central web interface is used for systematic testing, and the robot + expectations is where an automated AI powered regression testing system sits.  This is complex, because by its very nature neither people nor Cicero always work in the same order or generate the same text for the same communication.


The information logged by the Supervisor is mined using unsupervised learning to find any “queries” or “FAQs” that are being asked but not satisfied.  In this way the “recall” is continuously increased and changes in user behaviours and expectations are tracked.


One final point: unless you are building your own Conversation Agent, there is no need to be dismayed by all this complexity.  The overall system is easy to integrate precisely because there are internal components which, under the hood, cope with the intrinsic complexity of the task.


My next blog is about the AI used to make the Cicero conversation agent effectively intelligent.

New Posts to Your Inbox!

Lorem ipsum dolor sit amet, consectetur adipiscing elit.