Notes on Large Language Models aka LLMs.
Table of contents
Open Table of contents
Introduction
A large language model (LLM) is a type of language model that is designed to understand and generate language. LLMs are commonly trained with large amounts of data and computing resources. These models work by taking a text and predicting the next work accurately. Notable LLMs include OpenAI’s GPT-4, Meta’s LLaMa, Google’s PaLM and Anthropic’s Claude.
It’s like a super smart computer program that has read a lot of books, websites, and other information. It can understand and generate human-like language, helping people by answering questions, providing information, and even chatting with them. It’s a bit like having a really clever robot friend who knows whole bunch of stuff!
Essentially, it’s a tool that can process and generate human-like text based on the patterns and information it learned during its training.
Types of LLMs
Base LLM
Base LLM predicts next word, based on text training data.
For example, If you write “once upon a time, there was a unicorn”, then it may complete it by adding “that lived in a magical forest with all her unicorn friends”.
But if you may prompt “what is the capital of France?” then it may answer with another set of questions like “what’s France’s largest city?” or “what is France’s population?”, etc. Because articles on the internet could quite possibly list such questions about the country of France.
Instruction Tuned LLM
An instruction-tuned LLM has been trained to follow instructions.
Model Limitations
Hallucination
Makes statements that sound plausible but are not true.
Reducing hallucinations
Ask the model to first find the relevant information then answer the question based on the relevant information.