In-Context Learning: what it is and how it works

The rise of the GPT series and eventually ChatGPT gave rise to a field of work called prompt engineering. People working in this field are experienced LLM users who know how to tweak prompts, i.e. the pieces of text inserted into an LLM, to ensure that it behaves the way you want it to.

However, it can be the case that you want the LLM to behave in ways that it does not know by itself. For example, it's possible to extract fields (such as price, discount, and so on) from a text, using a generic LLM (or even a medium language model or MLM). And that without fine-tuning! But you'll need to instruct the model how to do it.

In this short article, you're going to take a look at precisely that. In it, we'll define the practice of In-Context Learning (ICL) and show how it works using ChatGPT. Also, you'll learn what it is, how it works and how you can use it yourself. Let's take a look!

Defining In-Context Learning

Let's take a look at definitions first.

In-Context Learning, or ICL, can be defined as follows:

In-context learning (ICL) is a specific method of prompt engineering where demonstrations of the task are provided to the model as part of the prompt (in natural language). With ICL, you can use off-the-shelf large language models (LLMs) to solve novel tasks without the need for fine-tuning (Hopsworks, 2023).

In other words,

You have a medium or large language model (an MLM or an LLM).
You want it to behave in a specific way, e.g. have it extract relevant information and provide that information as a JSON string.
You don't want to fine-tune it, e.g. because you don't have the compute budget for it.
You add one or more of these examples into the prompt to teach the model how to behave in-context.

ICL by example

Here's a brief example of In-Context Learning.

Let's ask ChatGPT to generate a default text.

User: Write a commercial text for a house, including a variety of descriptors such as price, rating, location, and so forth. Make it 200 words long.

Here's what it comes up with:

ChatGPT: Discover your dream home nestled in the heart of the picturesque Woodland Hills neighborhood. This exquisite 4-bedroom, 3-bathroom oasis offers the perfect blend of luxury and comfort. Priced competitively at $650,000, this home is a true gem in the current market. Boasting a generous 2,500 square feet of living space, this residence has been impeccably maintained and boasts a 5-star rating from previous owners. The open-concept living area is flooded with natural light, creating an inviting atmosphere for family gatherings and entertaining. Situated on a spacious lot, this property offers a lush backyard with a pristine garden and a relaxing patio area. It's an ideal space for outdoor enthusiasts or those seeking a tranquil retreat. The location is unbeatable, with top-rated schools, parks, and shopping centers just minutes away. Commuting is a breeze with easy access to major highways. Don't miss this opportunity to make this house your forever home. Schedule a viewing today and experience the charm of Woodland Hills living. Act quickly, as this home won't stay on the market for long!

Suppose that I want a JSON object that looks similar to this: { "price": 650000, "rooms": { "bedrooms": 4, "bathrooms": 3 }, "price_per_square_foot": 260 }

I could of course apply fine-tuning approaches, for which I optimize a base model with pairs of (text, JSON) (typically such bases models are open source ones). This way, the model learns to appreciate the expected structure, and will likely learn to produce such outputs upon subsequent use.

But today's LLMs are so powerful, especially since they work in a conversational fashion (i.e., using previous elements of the conversation to generate new outputs), that it's possible to teach the LLM to do this without fine-tuning it, during inference within the prompt. In fact, it may thus even be a waste of your compute budgets to start with fine-tuning... maybe you can apply ICL too.

Let's see what happens if we add the expected model behavior to the prompt, in an ICL fashion, followed by a new text to extract information from:

I would like to extract price, information about bedrooms and bathrooms and price per square feet from a text. Here is an example text with an example JSON output. Do the same for the new text. Provide nothing but the output in JSON format.

Example text: Discover your dream home nestled in the heart of the picturesque Woodland Hills neighborhood. This exquisite 4-bedroom, 3-bathroom oasis offers the perfect blend of luxury and comfort. Priced competitively at $650,000, this home is a true gem in the current market.

Boasting a generous 2,500 square feet of living space, this residence has been impeccably maintained and boasts a 5-star rating from previous owners. The open-concept living area is flooded with natural light, creating an inviting atmosphere for family gatherings and entertaining.

Situated on a spacious lot, this property offers a lush backyard with a pristine garden and a relaxing patio area. It's an ideal space for outdoor enthusiasts or those seeking a tranquil retreat. The location is unbeatable, with top-rated schools, parks, and shopping centers just minutes away. Commuting is a breeze with easy access to major highways.

Don't miss this opportunity to make this house your forever home. Schedule a viewing today and experience the charm of Woodland Hills living. Act quickly, as this home won't stay on the market for long!

Example output: { "price": 650000, "rooms": { "bedrooms": 4, "bathrooms": 3 }, "price_per_square_foot": 260 }

Text to generate output from: Welcome to your new haven in the vibrant neighborhood of Brookside Meadows. This stunning 3-bedroom, 2.5-bathroom home is priced to sell at just $425,000, making it an exceptional value in today's market.

With a 4.8-star rating from previous homeowners, this property exudes quality and comfort. The spacious living area with a cozy fireplace provides the perfect spot to unwind after a long day. The kitchen features modern appliances and granite countertops, making it a chef's dream.

Step outside to your private backyard oasis, complete with a covered patio and a beautifully landscaped garden. It's the ideal space for outdoor gatherings and relaxation, with 1000 square feet in total.

Located in the heart of Brookside Meadows, you'll have easy access to trendy cafes, boutique shops, and parks. The neighborhood is known for its friendly community and excellent schools.

This home is a perfect fit for young families, professionals, or anyone seeking a stylish and convenient lifestyle. Don't miss out on this incredible opportunity. Schedule a viewing today and experience the charm of Brookside Meadows living!

Extracted output:

And here's what the model outputs:

{
  "price": 425000,
  "rooms": {
    "bedrooms": 3,
    "bathrooms": 2.5
  },
  "price_per_square_foot": 425
}

Let's see if it's correct:

What	In text	In JSON	Model correctness
Property price	$425,000	$425,000	✅
Rooms	3 bedrooms, 2.5 bathrooms	3 bedrooms, 2.5 bathrooms	✅
Price per square foot	425	425	✅

Yes! This is how In-Context Learning or ICL can be used to let models behave in specific ways without the need for fine-tuning 😎

References

Hopsworks. (2023, December 23). In context learning (ICL). Hopsworks - Batch and Real-time ML Platform. https://www.hopsworks.ai/dictionary/in-context-learning-icl

OpenAI. (n.d.). Saved chat. https://chat.openai.com/share/58b6693a-06db-4eb0-8a33-df6ce61b7623

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Learn how large language models and other foundation models are working and how you can train open source ones yourself.

Keras

Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.

TensorFlow

TensorFlow is the most popular deep learning framework. It is is used by many companies.

PyTorch

PyTorch is a deep learning framework which is popular for its ease of use and flexibility.

Machine learning theory

Read about the fundamentals of machine learning, deep learning and artificial intelligence.

Transformer architectures

Emerging since 2017, Transformer architectures are part of the state of the art in deep learning.

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

llm

mlm

openai

large language models

foundation models

in context learning

icl

Connect on social media

Connect with me on LinkedIn

To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!

See my work on GitHub

My work is available on GitHub. Feel free to check it out and see if it can be of use to you!

Side info

The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.

All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.

If you have any questions or remarks, feel free to get in touch.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.

Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.

Mathjax is licensed under the Apache License, Version 2.0.

In-Context Learning: what it is and how it works

December 27, 2023 by Chris

Defining In-Context Learning

ICL by example

References

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Connect on social media

Connect with me on LinkedIn

See my work on GitHub

Side info

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Side info

Connect with me on LinkedIn

See my work on GitHub