How to build an LLM – a primer

An LLM (Large Language Model) is an AI system that uses neural networks and deep learning algorithms to predict the next word in a sequence in order to generate original content. Traditional AI, often called Machine Learning (ML) enabled the learning of patterns, classification and prediction. Generative AI goes further and is able to create new, original content such as text or computer code. Multimodal language models can handle text as well as images, audio and video. Summarised here at high level, is how an LLM such as ChatGPT, Bard, Claude, or Gemini, work, and how they are trained.

Since 2017, all LLMs utilize a transformer architecture. The “T” in “ChatGPT” stands for “Transformer”. More specifically, ChatGPT stands for “Chatbot Generative Pretrained Transformer”. The transformer utilizes a decoder and an encoder where words are represented as numbers because computers work better with numbers.

When text is input into the system, the LLM represents the words as a sequence of numbers, called input embeddings. These embeddings are placed in vector form, a mathematical space where words with similar meanings are mapped near to each other. The order of words in a sentence are also represented as numbers, called positional encoding. Understanding the order of words in a sentence allows the system to generate output text that is grammatically correct and syntactically meaningful.

The LLM processes the input text (the prompt) with an encoder, which generates a series of hidden states capturing the meaning and context of the input text. Output is generated in a decoder which learns how to predict the next word by examining the words before it. This is made possible from the system training with large amounts of text. Output embeddings convert the numbers used by the system into text. Positional encoding ensures an appropriate order of words in a sentence. A “self attention” layer, which is what transformer models are known for, then allows the model to hone in on relevant words to further improve contextual awareness. Take the following sentence: “I deposited my money in the bank”. The word “money” allows the model to understand that the sentence refers to a money bank, not a river bank.

Because of the way the LLM works by manipulating numbers rather than an inherent understanding of isolated words in the way humans understand words, you can never be 100% sure if the output is real or what is called an “hallucination”.

Two stages are involved when building an LLM. The first stage is the training stage where billions (and sometimes trillions) of words are fed into the model so that it can learn what different words mean and how closely they are related. Stage one is pretraining and utilizes a large amount of data from the internet, typically measured in petabytes (a million gigabytes). This training is about knowledge and provides the base model. The purpose of this stage is to teach the system how to generate text by predicting the next word in a sequence.

Graphics Processing Units (GPUs), originally developed for high-speed graphics rendering (such as required in computer gaming), prove to be particularly useful in first stage training of LLMs due to their multi-threading capability. This has led to a shortage of supply of this hardware and the corresponding rapid rise in stock prices of GPU supplier Nvidia, propelling the company to a $2 trillion market capitalization and making it the third largest corporation in the world.

Language models use layers of nodes to generate their predictions. Nodes are like gears in a machine. Individually, they lack purpose, but when trained to work together, nodes can understand and interpret complex data like language. Initially, the connections between nodes will be assembled randomly, so the model’s prediction will also be random. But as the model is trained, the nodes learn to predict useful output by adjusting the weights and biases that connect them together. The number of weights and biases that a model uses to make a prediction is called its “parameters”.

The second stage of building an LLM is the fine-tuning stage. This further trains the stage one–trained model to perform a particular kind of task like answering questions or generating computer code. Human input is utilized during this stage where humans can rate different outputs from the system, or feed in human-written Q&A documents, so that the system learns to modify outputs until they are more like what we want. Using humans to modify the results is known as RLHF or Reinforcement Learning from Human Feedback. This stage is all about aligning and tweaking the system so that outputs are modified and formatted better to provide desired results. For example, if the system is asked a computer coding question, it learns to produce code as output. The AI industry is now able to replace more and more RLHF with computer-generated feedback.

Organisations can further enhance foundational LLMs by incorporating their own unstructured and structured data into the training to tailor the model to their own specific needs and gain competitive advantage. There are obvious security challenges relating to using sensitive data in the system – I will elaborate on the specific security threats and discuss the security issues in a subsequent blog post. LLMs are used by organisations for content generation, for use in chatbots, or to assist with first drafts of press releases, blogs, marketing material, social media posts. Due to their natural language understanding, they can also be used for translating text into different languages, summarizing meaning, sentiment analysis, and insight extraction.

This week, buy now – pay later Swedish fintech company, Klarna, announced that their LLM-powered chatbot does the work of 700 full-time staff. The chatbot had 2.3 million conversations with customers in its first month. Klarna collaborated with OpenAI for its development. Klarna estimated it would result in $40m additional profit for the firm in a year. They said the LLM matches human operators in customer satisfaction, but operates much faster, solving most issues in under 2 minutes compared to an average of 11 minutes with human operators. The LLM also reduces repeat enquiries by 25% and handles queries in 35 languages across 23 countries. This is part of the growing evidence of large productivity gains and cost savings to be had from LLMs.

The implementation of LLMs will affect companies providing human-based customer support services. This week French company Teleperformance lost $1.8b in market capitalization on the back of the news from Klarna.

An LLM-powered chatbot is useful for first level support however we still need people to handle more complex customer support issues which are escalated. A chatbot can do simpler tasks much better than humans, by remembering all previous interactions with a particular customer, remembering when they bought the product, what they bought, how they use it, what operating system they have, etc. When LLMs output language translation, they can translate not only into the appropriate language required but also the specific dialect.

Will the uptake of LLMs lead to mass layoffs and unemployment as AI performs work better and cheaper than humans? Klarna have not had any layoffs. Employees will focus on more meaningful work, leaving simpler, repetitive tasks to LLMs. LLMs will reduce costs. It will be possible for more work to be done cheaper. We may even soon see the creation of a billion dollar unicorn with less than 10 people, or perhaps even operated by one person leveraging AI.

LLMs are still in their infancy. It will be interesting to see which small companies are the winners in the emerging application layer of LLM utilization. Huge opportunities are opening up.

LLM developers compete to build the best LLM for specific purposes. Last month we saw the disastrous launch of Google’s Gemini, as a result of inappropriate human intervention in stage 2 training promoting DEI ideology. It resulted in Google’s market capitalization plummeting around 5%. The market sent a message to Google that it values truth and accuracy over manipulation and brainwashing. I suspect Google are too woke an organization to fix the problem and will simply try to hide their manipulation better. To produce better LLMs, organisations look for an edge in the training dataset. After training on as much surface internet data as can be gathered, competitors then add inhouse deep web data such as Meta using Facebook, but they then need to look what else they can add for competitive edge. Google for example, has done a deal with Reddit for $60m p.a. to use their data for LLM training. This adds Training Acquisition Cost (TAC) to the project.

Current efforts are going into LLM performance improvement, improving the quality of output and speed of output, and scaling so that LLMs can work locally on devices such as smartphones. Learning through self-improvement is an avenue which will likely have more opportunities for performance gains in the future.

Leave a Reply

Discover more from Dave Waterson on Security

Subscribe now to keep reading and get access to the full archive.

Continue reading