What is predictive search or auto complete search? – Part 1:  The Must Have Features


Well if I ask what popularized the most this cool concept of Predictive Searching, I am sure a significant lot amongst us will credit that to the simple (? 😄 ) yet powerful Google Search implementation.

Predictive Search or Auto complete search Example

Sitting in the digital age these have now become the minimum expectations the customers have from any businesses to offer as powerful and simple interface to search their products, services, catalogues and documentations etc… Gone are the days of multi field search form UIs, no user would enjoy them now, particularly for flagship search journeys. And off course if you have been around and have implemented such searches w/o using modern full text search technologies, you would remember the pain of conditional / if-else implementations and sour taste of less than perfect searches.

Very recently I had to implement single field based predictive search for a project, which was some serious effort, including reading through multiple documents / forums, trying / testing /optimizing a many of the approaches. I am writing this blog with intent to share my learnings with a hope that it would benefit you and ease out your journey towards implementing Predictive Searches in your solutions.

In this multi part article, I will first describe the most common features which are needed as part of Predictive Search implementations. As they say, always best to start with defining the problem. Then, I will talk about the challenges that I faced and approaches I took towards solving them. While I might show the UI to demonstrate the results and correctness of approaches used in searches, but this article is not about the UI part of the predictive searching but is focused towards the server side / search side of the implementation. And, UI part is typically not as complex, all you need is to use any open source JavaScript autocomplete library to render the HTML containing the autocomplete control, which would typically call the server side search REST API on every key press/up event passing the partial input text as an input. And REST API would call an indexing/search engine to complete the predictive search using the partial text as input and would return results as JSON to browser, where autocomplete library would show them as innerHTML of a DIV tag.

My project scenario was about a security/search where a trader / investor are to be offered this predictive search capability on the UI control and also through voice enabled searches. The relevant attributes through which the user can search for stocks on a single UI text field control could be any of the security attributes like stock symbols, ISIN Number, stock names, security codes etc… Or, they could simply call any of these on a voice enabled interface. Though typically names and symbols are more popular than codes and numbers.

So having set the context for this multi part blog let me start with the first part formally by answering to a query –

What is Predictive Search?

I will answer this question by explaining the features that I thought are “Must Haves” in any predictive search implementation. And, here you go –

Multi field searching –

The implementation should enable searching for multiple data fields through a single search term in one go. From the single autocomplete UI field or in a single input attribute in search API the search should enable searching using security name or security symbol or ISIN Number or security code. And obviously without any additional explanation being needed /provided for the type of search term passed; the search implementation should intelligently match across multiple relevant fields and perform the search.

Starts with or Prefix style auto completions –

Given a starting few characters for a data field search should be able to match the record containing that field. And, this should be true if search term contained multiple words or even if the data in the records / field contained multiple terms. So, search for terms “Lars” or “Tou” or “Lars Tou”, all should be able to fetch the record with security named “Larsen & Toubro”.

Contains style auto completions –

Given a continuous set of characters not necessarily the starting characters of a word in a field, the match should still work and find such a record. So search for “info” should find “Infosys”, “Infobeans” and as well as “Uniinfo”; though first two should have higher score than third one.

Auto correct –

Users have come to expect that systems should correct their typos automatically; so search using “Larson” should still search “Larsen” in the results. Likewise with voice enabled search and interfaces becoming common place but not perfect; it is expected that search for “CIPLA” may actually land as search for “SIPLA” or one for “BHARTI” may land as one for “BHARATI” and all of this should off course succeed 😄 . And, that is where auto correction capable search is such a necessity.

Ordering should matter and not

Let me explain 😄 with example, so search for term “inf eng” should result in both “Infrastructure engineering” and “Engineering Infrastructure” getting found however former should appear first in the results. Likewise search for term “eng inf” should also find both records but with orders reversed.

Different search scheme for each field –

  • For text fields like a security name which users do remember easily; it might be useful to have “contains” style auto completions.
  • Whereas for fields like security symbol which are business identifiers comprising of character strings e.g. like ‘Infy’, ‘Bhartiartl’ etc..  It might be useful to have only “starts with” or “prefix” style auto completions.
  • Whereas for something like security codes which are difficult to remember numeric / alphanumeric data e.g. like ‘INE129Z01016’, ‘526921’ etc … it would be useful to have only exact match searches.
  • Likewise, autocorrect feature will be sensible for security name field, may be still for security symbols but not for security codes fields.

Single search query –

All of the above features should be possible through a single search query and should not require multiple round trips to search engine. So single search query even though there are multiple data fields of various types and different search schemes. Importantly search algorithm should factor multiple of above considerations in an integrated fashion and return sorted results based on relevance.

Milli-seconds or sub-second performance –

Autocompletes and suggests are sensible only when they return not just intelligent results but in sub second performance. Otherwise, your user would have run away filling the whole field or worse browsing away to your competition’s site while your search API might still be crunching on servers and networks 😄.

Effective search results Precedence or Relevance Scoring –

Precedence order in the results should be sensible. Exact matches first, followed by records/fields starting with search term, followed by contains with style auto completions and lastly auto corrections. Well generally so … but you get the point…

Data preparation or pre-processing

Users obviously cannot remember special characters in attributes like brackets and hyphens in following security names – DB (International) Stock Brokers Limited, D-Link (India) Limited, Hi-Tech Pipes Limited; so it’s appropriate to process them before they impact actual search. And it’s better to do these on API side or still better if search engine itself allows such a pre-processing. Otherwise, the callers of API may miss such a pre-processing or do those incorrectly, thus reducing the search effectiveness itself.  Likewise based on the context of search domain you may want to remove characters like ‘.’, ’&’ etc… from the search.

And, common place terms like ‘Limited’, ‘Ltd.’ etc… in the context of security names do not add any differentiating value to search. So search engine should allow ignoring those common place terms.

Implementation Stack

In my project I used AWS OpenSearch Service as the core search engine. This is a managed service from AWS and is based on open source OpenSearch (which is derived from Elastic Search and Kibana). AWS Free Tier policy applies to AWS OpenSearch Service, so that is a great plus. My server side search implementation is python based AWS Lambdas which use OpenSearch python client libraries to connect to OpenSearch Service.

What’s coming in the later parts of this blog?

In the later parts of this article I shall incrementally build out the implementation covering the MUST HAVE requirements mentioned in this part of the article. I will share the real code snippets, talk about the challenges and how they could be solved and there could be options and why I chose mine. Believe me I had to do multiple trial and errors ; compare / discard approaches; so while there could be other options available to solve some if not all of the above options; I believe approach here would save you a significant time and present a holistic solution to implement all of the above requirements.

So, stay tuned and leave your comments and enquiries particularly on the requirements listed above at this stage.

Other Parts in this article series –

Part 2 – How to implement autocomplete search API – Getting Basics out of the way!

Part 3 – Prefix, Contains and Fuzzy match – Feature by feature Autocomplete implementation

,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: