📣 Big news: LightlyStudio is now live! Try it for free.

Lightly RL Environments Catalogue 2026

Training and evaluating AI agents requires environments that reflect real operational complexity. This catalogue presents Lightly's curated set of reinforcement learning environments spanning IT service workflows, business intelligence tools, data processing pipelines, and finance, each designed for multi-step, tool-using agents across varying difficulty levels.

Gain insights into

Business Intelligence Environments: Looker, Google Sheets, Power BI

Three environments covering the most common BI and data reporting tools. Agents must interpret ambiguous data requests, navigate live dashboards, and produce accurate outputs across multi-turn interactions. Includes 130 to 160 tasks per environment with benchmark results across difficulty levels for three frontier models.

Finance Environments: Accounting Workflows and Investment Banking

Two environments focused on high-stakes financial operations, from reconciling bank statements and processing invoices to preparing full company valuations using multi-source data. Tasks require agents to reason over structured financial data, coordinate across multiple connectors, and produce verifiable outputs under realistic enterprise constraints.

IT and Data Science Environments: Service Desk, Data Acquisition, Exploratory Analysis, Modeling

Four environments covering the end-to-end data science and IT operations stack. Agents handle password resets with security verification, query and clean messy datasets, identify predictive features from time series, and train and evaluate ML model ensembles, all in multi-turn, tool-connected settings with real-world data rather than mock inputs.

About Lightly's RL Environment Design

All environments in this catalogue share a common design philosophy: agents must interpret task context, plan multi-step actions, interact with external tools, and adapt based on intermediate feedback. This chapter covers what makes Lightly's environments model-agnostic and suitable for benchmarking frontier systems, and how to get in touch with the team for custom environment design and data workflows.