A-Z of Machine Learning and Computer Vision Terms

Foundation models are large-scale, general-purpose machine learning models trained on broad datasets and designed to be adaptable to a wide range of downstream tasks through fine-tuning or prompting. These models are typically trained using self-supervised or unsupervised learning on massive corpora of text, images, code, or multimodal data, allowing them to develop rich and transferable representations of the data domain.

Examples of foundation models include GPT (Generative Pre-trained Transformer) for natural language processing, CLIP for vision-language tasks, and SAM (Segment Anything Model) for image segmentation. What distinguishes foundation models is not just their size (often containing billions of parameters), but their generality — they can perform well across tasks and domains with little or no task-specific supervision, enabling capabilities like zero-shot, few-shot, and in-context learning.

These models serve as the “foundation” for building specialized systems, where users adapt them through:

Fine-tuning on domain-specific data,
Prompt engineering, or
Retrieval-augmented generation (RAG) frameworks.

Despite their impressive performance, foundation models raise challenges around bias, fairness, interpretability, and environmental cost due to their scale. As a result, the development and deployment of foundation models are central to ongoing discussions in responsible AI.