2020-08-02 19:56

Four LLM trends since ChatGPT and their implications for AI builders

In October 2022, I published an article on LLM selection for specific NLP use cases , such as conversation, translation and summarisation. Since then, AI has made a huge step forward, and in this article, we will review some of the trends of the past months as well as their implications for AI builders. Specifically, we will cover the topics of task selection for autoregressive models, the evolving trade-offs between commercial and open-source LLMs, as well as LLM integration and the mitigation of failures in production.

1. Generative AI pushes autoregressive models, while autoencoding models are waiting for their moment.

For many AI companies, it seems like ChatGPT has turned into the ultimate competitor. When pitching my analytics startups in earlier days, I would frequently be challenged: “what will you do if Google (Facebook, Alibaba, Yandex…) comes around the corner and does the same?” Now, the question du jour is: “why can’t you use ChatGPT to do this?”

The short answer is: ChatGPT is great for many things, but it does by far not cover the full spectrum of AI. The current hype happens explicitly around generative AI — not analytical AI, or its rather fresh branch of synthetic AI [1]. What does this mean for LLMs? As described in my previous article, LLMs can be pre-trained with three objectives — autoregression, autoencoding and sequence-to-sequence (cf. also Table 1, column “Pre-training objective”). Typically, a model is pre-trained with one of these objectives, but there are exceptions — for example, UniLM [2] was pre-trained on all three objectives. The fun generative tasks that have popularised AI in the past months are conversation, question answering and content generation — those tasks where the model indeed learns to “generate” the next token, sentence etc. These are best carried out by autoregressive models, which include the GPT family as well as most of the recent open-source models, like MPT-7B, OPT and Pythia. Autoencoding models, which are better suited for information extraction, distillation and other analytical tasks, are resting in the background — but let’s not forget that the initial LLM breakthrough in 2018 happened with BERT, an autoencoding model. While this might feel like stone age for modern AI, autoencoding models are especially relevant for many B2B use cases where the focus is on distilling concise insights that address specific business tasks. We might indeed witness another wave around autoencoding and a new generation of LLMs that excel at extracting