A-Z of Machine Learning and Computer Vision Terms

Gradient Descent

Hierarchical Clustering

Histogram of Oriented Gradients (HOG)

Human Pose Estimation

Human in the Loop (HITL)

Hyperparameter Tuning

Hyperparameters

Image Annotation

Image Augmentation

Image Captioning

Image Classification

Image Degradation

Image Generation

Image Processing

Image Recognition

Image Restoration

Image Segmentation

Imbalanced Data

Imbalanced Dataset

In-Context Learning

Instance Segmentation

Instance Segmentation

Interpretability

Intersection over Union (IoU)

Jupyter Notebooks

K-Means Clustering

Knowledge Graphs

Large Language Model (LLM)

Latent Dirichlet Allocation (LDA)

Linear Discriminant Analysis (LDA)

Linear Regression

Logistic Regression

Long Short-Term Memory (LSTM)

Machine Learning (ML)

Manifold Learning

Mean Average Precision (mAP)

Mean Squared Error (MSE)

Medical Image Segmentation

Model Parameters

Model Validation

Motion Detection

Motion Estimation

Multi-Task Learning

Natural Language Processing (NLP)

Neural Architecture Search

Neural Networks

Neural Style Transfer

Object Detection

Object Localization

Object Recognition

Object Tracking

One-Shot Learning

Optical Character Recognition (OCR)

Optimization Algorithms

Outlier Detection

PACS (Picture Archiving and Communication System)

Pandas and NumPy

Panoptic Segmentation

Parameter-Efficient Fine-Tuning (Prefix-Tuning)

Pattern Recognition

Pool-Based Sampling

Pose Estimation

Predictive Model Validation

Principal Component Analysis

Prompt Chaining

Prompt Engineering

Prompt Injection

I

Image Captioning

Image captioning is the task of generating a natural language description (caption) for an image. It’s a multimodal problem that combines computer vision and natural language processing. Typical modern approaches use a CNN (like ResNet or VGG) to encode the image into feature representations, then feed that into an RNN or Transformer-based decoder that generates a sentence word by word (often trained with pairs of images and ground truth captions). The model learns to associate visual concepts with language. For instance, given an image of a dog playing with a ball, a caption might be “A dog is playing fetch with a blue ball in a park.” Challenges include correctly identifying objects, their attributes and relations, and producing coherent, grammatically correct sentences. Evaluation metrics like BLEU, METEOR, or CIDEr compare generated captions with human-written captions. Image captioning has applications in accessibility (describing images to visually impaired users) and content management.

Further Reading

🔗 Research Paper 📄 Blog Post 📄 Blog Post

Explore Our Products

Lightly One

Data Selection & Data Viewer

Get data insights and find the perfect selection strategy

Lightly Train

Self-Supervised Pretraining

Leverage self-supervised learning to pretrain models

Lightly Edge

Smart Data Capturing on Device

Find only the most valuable data directly on devide

Ready to Get Started?

Experience the power of automated data curation with Lightly

Class Boundary (Statistics & Machine Learning)

Long Short-Term Memory (LSTM)

Recurrent Neural Network (RNN)

Large Language Model (LLM)

Chain of Thought (CoT)

Foundation Models

Semantic Segmentation

Variance (Model Variance)

XAI (Explainable AI)

YOLO (You Only Look Once)

Weight Decay (L2 Regularization)

Text Generation Inference

True Positive Rate (TPR)

Type II Error (False Negative)

Type I Error (False Positive)

Transformers (Transformer Networks)

Stream-Based Selective Sampling

Support Vector Machine (SVM)

Sentiment Analysis

Surrogate Model

Supervised Learning

Semi-supervised Learning

Selective Sampling

Sliding Window Attention

Sensitivity and Specificity of Machine Learning

Segment Anything Model (SAM)

Regularization Algorithms

ROC (Receiver Operating Characteristic) Curve

Scale Imbalance

Regression (Regression Analysis)

Region-Based CNN (R-CNN)

Recall (Sensitivity or True Positive Rate)

RAG Architecture

Query Synthesis Methods

Query Strategy (Active Learning)

Predictive Model Validation

Prompt Injection

Prompt Engineering

Prompt Chaining

Pose Estimation

Pool-Based Sampling

Pattern Recognition

Parameter-Efficient Fine-Tuning (Prefix-Tuning)

Pandas and NumPy

Panoptic Segmentation

PACS (Picture Archiving and Communication System)

Outlier Detection

Object Tracking

Optical Character Recognition (OCR)

One-Shot Learning

Object Localization

Object Detection

Natural Language Processing (NLP)

Neural Networks

Multi-Task Learning

Motion Detection

Motion Estimation

Model Validation

Latent Dirichlet Allocation (LDA)

Medical Image Segmentation

Model Parameters

Mean Squared Error (MSE)

Mean Average Precision (mAP)

Machine Learning (ML)

Linear Regression

Linear Discriminant Analysis (LDA)

Intersection over Union (IoU)

Interpretability

Imbalanced Dataset

Instance Segmentation

Image Processing

Image Restoration

Image Segmentation

Image Recognition