Zero-Shot Learning: Unlocking New Possibilities in AI

As children, we often learn in a zero-shot way.

Picture a young child learning about different fruits for the first time. They know what an apple and orange look like (and what colors they are). They haven’t seen a banana before, but we’ve told them bananas are yellow, curved like the moon, and have a peel around the part you eat.

The next time you take them to the grocery store, they can point out a banana to you based on these descriptions.

In machine learning, zero-shot learning operates on a comparable principle.

What is zero-shot learning?

Zero-shot learning is a training type in which machine learning models recognize and categorize an object without having seen an example of that object beforehand - hence zero shots.

Unlike traditional supervised learning methods, which require training models on vast amounts of labeled data to pair inputs with desired outputs, zero-shot learning allows models to generalize and categorize data quickly based on large unlabeled datasets.

Zero-shot learning enables large language models (LLMs) to categorize information successfully without labeled datasets and frequent retraining. Businesses across sectors use these models for various tasks, including but not limited to translation, summarization, answering questions, content generation, and sentiment analysis.

How does zero-shot learning work?

One of the key aspects of zero-shot learning is its reliance on semantic attributes and relationships between different classes or objects.

Instead of explicitly training the model on a particular object, we train it to understand the characteristics and attributes that define various objects. For instance, in the case of recognizing animal species, the model might learn about the defining features of each species, such as the habitat, physical traits, and behaviors.

To illustrate, we might describe a koala as a small animal with a leathery nose and big fluffy ears that lives in eucalyptus trees. The model learns to recognize and understand these features through our description. Then, when the model encourages an image of a koala (remember, it hasn’t seen an image of one before), it uses the described features (i.e., “small animal,” “leathery nose,” and “big fluffy ears”) to make an educated guess that the animal in the image is a koala.

While humans can indeed learn in a zero-shot way, it’s critical to understand that our learning process integrates experience, emotions, context, and deep understanding when we generalize information. In contrast, zero-shot learning in artificial intelligence (AI) relies strictly on data and patterns without personal experiences, feelings, and other human thoughts.

How does zero-shot learning help large language models?

Zero-shot learning enables large language models, like ChatGPT and Gemini, to perform tasks they haven’t been explicitly trained on. These models can tackle new tasks based on instructions provided through natural language prompting.

As LLMs are exposed to vast amounts of data, they develop new understandings and connections of language, concepts, and tasks. This allows them to use their broad knowledge to scale and adapt to new functions without retraining each time.

For example, you can ask an LLM about a niche topic, and it will pull from its broad knowledge base to generate relevant content based on underlying attributes, even if it hasn’t been specifically trained on that topic.

Applications of zero-shot learning

There are many ways to use zero-shot learning to complete AI tasks, but computer vision and natural language processing are two of the most common.

Computer vision

Similar to the example of recognizing an image of a koala without ever having seen one, zero-shot learning allows AI models to analyze pictures of new objects and identify them correctly.

Rather than relying on vast training data for each new object, zero-shot learning allows models to understand and categorize new, unseen objects by connecting the information they already know with the new information they encounter.

Natural language processing (NLP)

NLP is a significant application of zero-shot learning, as it allows models to predict words or phrases they haven’t encountered previously based on semantic similarities with known words.

This capability is crucial for enterprises using chatbots or virtual assistants since it equips the models to handle new queries and provide quality customer service.

Suppose a business trains a chatbot to handle questions about refunds and lost packages. If a new customer asks about a stolen package and a refund, the chatbot can use its knowledge of refunds and lost packages to provide a relevant answer.

Advantages of zero-shot learning

Zero-shot learning offers some compelling advantages, including the following.

It doesn't require extensive amounts of labeled data

Traditional supervised learning models require large labeled datasets to perform new tasks and recognize objects. On the other hand, zero-shot learning relies on descriptive attributes and features to identify new classes of information. It makes machine learning models more accessible to those without extensive training datasets or the time to collect and label them.

Kelwin Fernandes, CEO of NILG.AI, said that the lack of data needed to train the AI models is one of the primary advantages of zero-shot learning. “It facilitates the adoption of AI systems even in scenarios where the target user has no data. For example, even if your company doesn't have any historical data about categorizing customer support tickets, as long as you can provide the names of the categories, it should be able to predict the right category for new tickets.”

It has scalability potential

Zero-shot learning can scale efficiently to new areas, categories, and concepts without significant model retraining time. Suppose a business uses a model to assist with customer segment development. In that case, teams can share new descriptions for evolving customer segments over time, allowing the AI to iterate and improve to meet these needs.

It's cost-effective for small teams and researchers

Since zero-shot learning minimizes the dependency on large datasets, it can help teams reduce the costs associated with data collection and annotation. This cost-effectiveness is particularly beneficial for research teams and small businesses that want to leverage AI solutions but lack the funding or resources to compile extensive labeled datasets.

Limitations of zero-shot learning

As with all forms of technology, zero-shot learning possesses challenges worth considering before using these models.

It might yield lower accuracy compared to other learning methods

Recall that zero-shot learning relies on descriptive attributes and features to classify new information. While it benefits from not requiring a large labeled dataset, trainers must use comprehensive descriptions to support accurate prediction-making. Imprecise information can lead to misclassifications and categorization errors.

According to Dmytro Shevchenko, a data scientist at Aimprosoft, zero-shot learning isn’t as effective for complex tasks that require context without extensive training, which can lead to accuracy issues.

“Accurate results usually require training with multiple examples or fine-tuning. I can give an excellent example of medical image classification. ZSL may fail if a model needs to accurately classify medical images into rare diseases because it lacks specific knowledge. In this case, additional training or customization with examples is required,” Shevchenko said.

There are some bias and fairness concerns

Zero-shot learning models can inherit biases in the presented training data or auxiliary information they use to classify information. In other words, models can be biased toward the classes they’ve seen and may force unseen data into the seen class data.

Researchers Akanksha Paul, Narayanan C. Krishnan, and Prateek Munjal have proposed a new method, Semantically Aligned Bias Reducing (SABR), to reduce bias in zero-shot learning and mitigate these effects.

It doesn't work well for complex or niche tasks

Zero-shot learning is best suited for simple tasks that require general knowledge. Models trained using these techniques may struggle with more complex tasks requiring specialized knowledge and domain expertise. In such cases, another training technique with more labeled data and examples may be necessary for the best results.

Fernandes noted, “Although current models tend to work well in general domain tasks, they become less accurate if you go into very niche applications (e.g., industrial applications), and you may need to train/fine-tune your custom models.”

Zero-shot vs. one-shot vs. few-shot models

Zero-shot, one-shot, and few-shot learning are all techniques that help machine learning models predict new classes with minimal or no labeled data.

Zero-shot learning involves training machine learning models to recognize new classes without any labeled data. Instead of relying on labeled examples, these models utilize their existing knowledge and semantic similarities to make informed predictions. For instance, when identifying a koala, a zero-shot learning model might use its understanding of other bear species to make a reasonable prediction.

In one-shot learning, machine learning algorithms are trained to classify objects using a single example of each class. For example, a one-shot learning scenario in computer vision occurs when a deep learning model is presented with only one image and must quickly determine whether it is similar or different from a reference image. This approach allows models to make generalizations based on minimal data by focusing on similarities to make accurate predictions.

Few-shot learning expands on these principles by training AI models to generalize new data classes based on a few labeled samples per class. By considering a small number of examples, these models can make better, more accurate generalizations by extracting meaningful information from multiple instances. This method provides more training data, allowing the model to understand a data class better.

You get zero shots!

Zero-shot learning represents a significant step towards enabling machines to exhibit more human-like generalization and adaptability, albeit within the constraints of data-driven learning.

Ultimately, zero-shot learning enables LLMs to handle tasks they weren’t explicitly taught or trained for. They rely on their existing knowledge and understanding of concepts and semantics to conduct simple tasks.

While zero-shot learning is advantageous due to the lack of data need, scalability potential, and cost-effectiveness, it isn’t well-suited to assist with complex tasks and may yield lower accuracy.

Learn other ways to train large language models (LLMs) outside of zero-shot learning.

Alyssa Towns

Alyssa Towns works in communications and change management and is a freelance writer for G2. She mainly writes SaaS, productivity, and career-adjacent content. In her spare time, Alyssa is either enjoying a new restaurant with her husband, playing with her Bengal cats Yeti and Yowie, adventuring outdoors, or reading a book from her TBR list.