Blog Post

Your Large Language Model - it's as Dumb as a Rock

Mar 9, 2023

Share this post

Unless you’ve been living under a rock lately you likely think we’re entering some sort of AI-pocalypse. The sky is falling and the bots have come calling. There are endless reports of ChatGPT acing college-level exams, becoming self-aware, and even trying to break up people’s marriages! The way OpenAI and their ChatGPT product have been depicted, it’s a miracle we haven’t all unplugged our devices and shattered our screens. It seems like a sensible way to stop the AI overlords from taking control of our lives.

But never fear! I am here to tell you that large language models (LLMs) and their various compatriots are as dumb as the rocks we all might be tempted to smash them with. Well, ok, they are smart in some ways. But don’t fret—these models are not conscious, sentient, or intelligent at all. Here’s why.

Some Like it Bot: What’s an LLM?

Large Language Models (LLMs) actually do something quite simple. They take a given sequence of words and predict the next likely word to follow. Do that recursively, and add in a little extra noise each time you make a prediction to ensure your results are non-deterministic, and voila! You have yourself a “generative AI” product like ChatGPT.

But what if we restate a description of LLMs a little more succinctly:

LLMs estimate an unknown word based on extending a known sequence of words.

It may sound fancy—revolutionary, even—but the truth is it’s actually old school. Like, really, really old school—it’s almost the exact definition of extrapolation, a common mathematical technique that’s existed since the time of Archimedes! If you take a step back, Large Language Models are nothing more than a fancy extrapolation algorithm. Last I checked nobody thinks their standard polynomial extrapolation algorithm is conscious or intelligent. So why exactly do so many believe LLMs are?

Hear Ye, Hear Ye: What’s in an Audio Sample

Sometimes it’s easier to explain a complex topic by comparison. Let’s take a look at one of the most common human languages in existence—music. Below are a few hundred samples from Bob Dylan’s “Like a Rolling Stone.”

If I were to take those samples and feed them into an algorithm and then recursively extrapolate out for a few thousand samples, I’d have generated some additional audio content. But there is a lot more information encoded in that generated audio content than just the few thousand samples used to create it.

At the lowest level:

Pitch
Intensity
Timbre

At a higher level:

Melody
Harmony
Rhythm

And at an even higher level:

Genre
Tempo

So by simply extrapolating samples of audio, we generated all sorts of complex higher-level features of auditory or musical information. But pump the brakes! Did I just create AI Mozart? I don’t think so. It’s more like AI Muzak.

An AI of Many Words: What’s Next?

It turns out that predicting the next word in a sequence of words will also generate more than just a few lines of text. There’s a lot of information encoded in those lines, including the structure of how humans speak and write, as well as general information and knowledge we’ve previously logged. Here’s just a small sample of things encoded in a sequence of words:

Vocabulary
Grammar/Part of Speech (PoS) tagging
Coreference resolution (pronoun dereferencing)
Named entity detection
Text categorization
Question and answering
Abstract summarization
Knowledge base

All of the information above can, in theory, be extracted by simply predicting the next word, much in the same way predicting the next musical sample gives us melody, harmony, rhythm, and more. And just like our music extrapolation algorithm didn’t produce the next Mozart, ChatGPT isn’t going to create the next Shakespeare (or the next horror movie villain, for that matter).

LLMs: Lacking Little Minds?

Large Language Models aren’t the harbinger of digital doom, but that doesn’t mean they don’t have some inherent value. As an early adopter of this technology, I know it has a place in this time. It’s integral to the work we do at Xembly, where I’m the co-founder and CTO. However, once you understand that LLMs are just glorified extrapolation algorithms, you gain a better understanding of the limitations of the technology and how best to use it.

Five Alive: How to Use LLMs So They Don’t Take Over the World

LLMs have huge potential. Just like any other tool, though, in order to extrapolate the most value, you have to use them properly. Here are five areas to consider as you incorporate LLMs into your life and work.

Information must be encoded in text
Extrapolation error with distance
Must be prompted
Limited short-term memory
Fixed in time with no long-term memory

Let’s dig a little deeper.

Information Must Be Encoded in Text

Yan LeCun probably said it best:

LLMs do *not* capture much of human thought, because most of human thought and all of animal thought is entirely non verbal.

The factual, logical, and physical reasoning mistakes that current LLMs make clearly show that they have *not* captured much of human thought. https://t.co/mc0kXJWcBg

— Yann LeCun (@ylecun) January 4, 2023

Humans are multi-modal input devices and many of the things we observe and are aware of that drive our behavior aren’t verbal (and hence not encoded in text). An example we contend with at Xembly is the prediction of action items from a meeting. It turns out that the statement “I’ll update the row in the spreadsheet” may or may not be a future commitment to do work. Language is nuanced, influenced by other real-time inputs like body language and hundreds of other human expressions. It’s entirely possible in this example that the task was completed in real-time during the meeting, and the spoken words weren’t an indication of future work at all.

Extrapolation Error with Distance

Like all extrapolation algorithms, the further you get away from your source signal (or prompt in the case of LLMs), the more likely you will experience errors. Sometimes a single prediction that negates an otherwise affirmative statement or an incorrectly assigned gendered pronoun, can cause downstream errors in future predictions. These tiny errors can often lead to convincingly good responses that are factually inaccurate. In some cases, you may find LLMs return highly confident answers that are completely incorrect. These types of errors are referred to as hallucinations.

But both of these examples are really just forms of extrapolation error. The errors will be more pronounced when you make long predictions. This is especially true for content largely unseen by the underlying language model (for example, when trying to do long-form summarization of novel content).

Must Be Prompted

Simply put, if you don’t provide input text an LLM will do nothing. So if you are expecting ChatGPT to act as a sage and give you unsolicited advice, you’ll be waiting a long time. Many of the features Xembly customers rave about are based on our product providing unsolicited guidance. Large Language Models are no help to us here.

Limited Short-Term Memory

LLMs generally only operate on a limited window of text. In the case of ChatGPT, that window is roughly 3000 words. What that means is that new information not already incorporated in the initial LLM training data can very quickly fall out of memory. This is especially problematic for long conversations where new corporate lingo may be introduced at the start of a conversation and never mentioned again. Once whatever buzzword is used falls out of the context window it will no longer contribute to any future prediction, which can be problematic when trying to summarize a conversation.

Fixed in Time with no Long-term Memory

Every conversation you have with ChatGPT only exists for that session. Once you close that browser or exit your current conversation, there is no memory of what was said. That means you cannot depend on new words being understood in future conversations unless you reintroduce them within a new context window. So, if you introduce an LLM to anything it hasn’t heard before in a given session, you may find it uses that word correctly in subsequent responses. But if you enter a new session and have any hopes that the word will be used without introducing it in a new prompt, brace yourself—you will be disappointed.

To Use an LLM or Not to Use an LLM

It’s a big question. LLMs are exceedingly powerful, and you should strongly consider using them as part of your NLP stack. I’ve found the greatest value of many of these LLMs is that they potentially replace all the bespoke language models folks have been making for some time. You may not need these custom entity modes, intent models, abstract summarization models, etc. It’s quite possible that LLMs can accomplish all of these things at similar or better accuracy, while possibly greatly reducing time to market for products that rely on this type of technology.

There are many items in the LLM plus column, but if you are hoping to have a thought-provoking intelligent conversation with ChatGPT, I suggest you walk outside and consult your nearest rock. You just might have a more engaging conversation!

This article first appeared on the Speech Wrecko blog at www.speechwrecko.com.