In part one of this article [9] we discussed the different kinds of chatty AI interfaces and the merits of a mixed natural-language GUI interface.
Now we will dig a
little deeper in what is underneath the covers of a Natural Language
Application (NLA).
Natural Language Processing Components
Natural Language
Processing (NLP) has been around since the 1950s. We will exclude
speech-to-text interface in this part of the discussion. Such
interfaces have their own unique challenges but output / provide
mostly similar “text” to an NLA. We will also only discuss an
English NLA. Language with different glyphs, syntax and grammar have
to be dealt with separately.
NLP is a cross
discipline between Linguistics and Computer Science. It consists of
taking raw strings of text of a language, and breaking it down into
various components for classification. It usually consists of:
- Sentence boundary detection (finding the unique sentences in some text)
- Syntactic analysis (“tagging” the nouns and verbs)
- Special Entity Recognition (finding dates and times, and time-intervals and putting them into a local context)
- Named Entity Recognition (NER) (recognizing names of places, people, etc.)
and sometimes
- Semantic analysis (assigning meaning to words in their context)
- Pragmatic analysis (finding the meaning of a sentence as a whole)
These last two
classifications are still at the forefront of research. Google made
some strides using Word2Vec [1] in the early 2010s and some work has
progressed this into Sentence2Vec [2]. These methods represents
words, and complete sentence structures in higher dimensional vector
spaces (300 up to 3000 dimensions aren’t uncommon).
However, meaningful
inferences cannot be made using such structures. Euclidean distance,
and cosine differences between vectors don’t work well in higher
dimensional spaces [3].
Intent Matching
Neural networks are
an integral part these days in intent matching [4]. I have created a
sophisticated training-set balancing, targeted synonym expansion,
and automatic neural-network deployment software to make it all look seamless.
I never got
word/sentence vectors to work [3], although I built a prototype
using a ball-tree for storing/retrieving higher dimensional objects
efficiently.
NLP and the Serverless Architecture
Bots and NLP
pipelines are now commercialized. Any of the popular cloud platforms
and IBM’s Watson are bots and NLP pipelines for hire.
However, these
systems with all their sophistication have limitations [5]. They
usually support NER libraries and editors but not semantic
hierarchies. For instance, rather than having a set of named
entities, it might be just as easy to encode the extra relationship
information in a hypernym (generalization) graph.
Such
a structure is arbitrary, but so is language in the end [6]. Let us
look at a more traditional software architecture vs an modern n-tier
architecture [7].
This
diagram was borrowed from “Using Kotlin in a Serverless
Architecture with AWS Lambda” [8]. The right
hand side represents the more modern Serverless architecture with its
layers divided into disconnected tiers. An API facade abstracts the
services and proxy-services.
Why is this focus on
the architecture so important? First of all, I put it that the
Lambda’s and NLP services provided by external platforms aren’t
powerful nor customizable enough. One could still use these services
as part of some pipe-line, but they do not form a “complete”
pipeline that meets all our requirements.
Such requirements
include:
- custom grammatical analysis of special concepts (e.g. GST numbers, phone numbers, date)
- date-time and date-range language requirements
- filtering out of date-time and date-ranges pre- the intent
- creating user contexts stack (i.e. what the topics we “understand” the user has been on about and in what order (most recently used being a top priority)
- custom semantic markup of concepts including multi-noun concepts (e.g. “bank teller”). This is akin to NER but more sophisticated
- what I call “aware synonyms”, a machine learning system that assigns “word expansions” to the context of a conversation.
And only once these
requirements have been met, can one effectively start looking at
matching meaning to intent. Once this is done, we go from intent to
action. This we’ll discuss some more in part 3, as this is where
the real magic starts happening.
Comments
Post a Comment