Rebirth of chatbots

VC Ramesh
May 17, 2023
4 min read

A key significance of ChatGPT is that it has ignited the rebirth of chatbots. Chatbots, which first emerged around 2016, have been widely deployed by many enterprises, big and small. However, the current generation of deployed chatbots are scripted (rule-based) and brittle. ChatGPT showed that a new generation of chatbots, based on large language models (LLMs) can finally fulfill the original promise of chatbots -- that of a universal conversational interface to practically any software application.

Chatbots are often portrayed as one business use-case of LLMs, which is selling them short; I would argue that LLMs are chatbots, per se. People have also, rightly, been burned by the previous hype-cycle in chatbots and wary of the technical challenges in true multi-turn dialog based conversation. However, ChatGPT has demonstrated that LLM-based chatbots can provide a robust dialog interface to a multitude of tasks across domains. This was the original promise of chatbots and hence the initial wave of excitement in 2016. After all, conversation is the natural form of human-computer interaction. Thanks to LLMs, in general, and ChatGPT, in particular, that dream has finally come true.

For enterprise applications, LLMs need to be fine-tuned to specific domains (with niche vocabularies) and need to able to interface to enterprise databases to query for information and perform back-end actions. Some recent technical developments have enabled this for LLMs:

Parameter-efficient fine-tuning (PEFT): Approaches such as Low-Rank Adaptation (LoRA) enable fine-tuning of LLMs on domain and task specific datasets with a small "add-on" adapter based weights update. Instead of updating billions of parameters, as was the case with the traditional fine-tuning approaches, PEFT-based methods update a few million parameters (weights). An added advantage of these lightweight tuning approaches is that they can be dynamically plugged in and out during inference / deployment / run time, while keep the base LLM constant. There are multiple PEFT approaches that are complementary to each other ; so LoRA can be used in conjunction with other PEFT methods like prompt tuning and prefix tuning. PEFT accelerates in-context learning tailored to specific enterprises, domains and tasks, while retaining the generality of the foundation models.
Quantization and mixed-precision: Quantization converts the model weights from floats to low-bit integer representations, for example, 8-bit integers (and, recently, even 4-bit integers). Mixed-precision training refers to the use of 16-bit floats, instead of, or, in combination with the standard 32-bit floats as model parameters (weights). Lowering the precision from 32 bits to 16 bits works, but lowering it further to 8 bits or below affects accuracy. Quantization doesn't just cut the bit size ; it rounds off numbers by, in effect, compressing the range. Quantization is noisy compression. For a practical code implementation that combines PEFT and quantization, check out this google colab notebook in python.
Plug-ins: ChatGPT recently introduced plug-ins as a means of allowing third-party enterprise APIs to connect with the LLM. While the following description is specific to the ChatGPT plug-in architecture and methodology, I believe it is general enough to be applicable to other LLMs. As far as I know, at this point, Google Bard doesn't seem to have such a generic plug-in interface.

Plug-ins are particularly noteworthy and deserve more discussion. Let me start with excerpts from OpenAI's description: Plugin developers expose one or more API endpoints, accompanied by a standardized manifest file and an OpenAPI specification. These define the plugin's functionality, allowing ChatGPT to consume the files and make calls to the developer-defined APIs. The AI model acts as an intelligent API caller. Given an API spec and a natural-language description of when to use the API, the model proactively calls the API to perform actions. For instance, if a user asks, "Where should I stay in Paris for a couple nights?", the model may choose to call a hotel reservation plugin API, receive the API response, and generate a user-facing answer combining the API data and its natural language capabilities.

Third-party ChatGPT plug-in developers need to follow these 4 steps:

Create a manifest file and host it at yourdomain.com/.well-known/ai-plugin.json. The file includes metadata about your plugin (name, logo, etc.), details about authentication required (type of auth, OAuth URLs, etc.), and an OpenAPI spec for the endpoints you want to expose.
Register your plugin in the ChatGPT UI
Users activate your plugin
Users begin a conversation

By using the OpenAPI spec, ChatGPT is able to create an easy way for third-party service providers to quickly expose their API end-points via ChatGPT. A number of providers such as Expedia and Slack already have ChatGPT plug-ins that enable users to utilize those services using the ChatGPT chatbot interface.

I am sure more technological progress is on the way. In particular, the emergence of open-source LLM alternatives, to ChatGPT and Bard, is particularly encouraging. Chatbot developers can now use these open-source LLMs, in conjunction with PEFT fine-tuning and quantization, to offer low-cost robust conversational chatbots to enterprises of all sizes. These LLM-based chatbots are not script-based and hence less brittle. They are easy to generalize to multiple tasks and domains across enterprises. We are a step closer to the original chatbot vision of a robust conversational interface to all software applications.

Rebirth of chatbots

Recent Posts

Comments