In this post

RAG Workbench Beta Launch

Written By

Sarmad Qadri

Published onJune 20, 2024

tl;dr: LastMile AI is proud to announce RAG Workbench Beta, a dedicated developer tool for evaluating, debugging, and optimizing RAG applications. RAG Workbench provides deeper transparency into your RAG app, a rich debugging tool to go from insight -> action, and enables unparalleled customization by harnessing your data to fine-tune evaluators for each of your use cases.

Sign up for early access!

Background

Innovations in LLMs have drastically outpaced the tools needed to test, evaluate, and debug these models and their applications. We’ve surveyed over 200 engineering leaders across industries including technology, finance, energy in enterprises of all sizes who have surfaced the need for better testing harnesses and evaluations to guide the development of LLM applications, and get to production faster.

Core Problems Today

RAG ≠ Enterprise RAG

The Enterprise RAG architecture today has become more complex than the basic RAG architecture (embeddings, vector store, retrieval func and LLM generation). The Enterprise RAG includes ingestion pipelines from various data sources (with users and permissions), dynamic chunking strategies, reranking models, output guardrails and much more. Root causing the origin of inaccuracies has become more challenging.

Evaluators are not aligned with Business KPIs

Determining if an LLM application is performing well is a hard problem. Data science metrics (e.g. BLEU, ROGUE, Perplexity) are not translating to what engineering leaders care about (the Business KPIs). Each use case is unique and requires customization to more precisely measure progress. The current state of evals is more vibes based than science based.

RAG Workbench Beta

We are proud to announce RAG Workbench, a developer tool for debugging, evaluating, and optimizing Enterprise Retrieval Augmented Generated (RAG) pipelines. The tool has three core capabilities:

Distributed Tracing
Fine-tuned Evaluators
Debugging & Experimentation

Distributed Tracing: RAG Workbench has first-class support for tracing both retrieval and data ingestion pipelines, allowing you to bridge offline/online systems and identify anomalies with greater accuracy. The tracing SDK is compatible with OpenTelemetry, an open standard protocol for tracing and observability.

Fine-tuned Evaluators: LastMile AI has built in-house evaluator models that represent a step-function jump from the current state-of-the-art, which measure metrics specific to RAG applications, such as hallucination rates, response relevancy and more. The zero-shot models are 1/1000th the cost of GPT-4, and outperform GPT-4 base evaluations.

As part of RAG Workbench, we are expanding support for custom use cases and fine-tuning evaluator models. We would love to help get the right evaluators specifically trained for you and your unique use case.

Debugging & Experimentation: RAG Workbench also has an interactive companion to help developers debug and iterate on any RAG application. Inspect retrieved context, rerun prompts to LLMs within the debugger and have an audit log of calls to your RAG application.

We are continuously releasing improvements to help developers systematically identify the best-performing configuration for their RAG pipelines.

RAG Workbench is self-hostable and designed for Enterprises. For the Beta launch, we are opening up registration.

Getting Started

Request access to RAG Workbench to receive your API key.
Install the lastmile-eval package from PyPI, and follow the Getting Started | RAG Workbench (lastmileai.dev) to try it out.

📺 Video Tutorial - https://www.youtube.com/watch?v=HkvHs4A5xWU

We are constantly iterating to improve the daily lives of AI engineers, and we value every piece of feedback. Please reach out to the team on Discord to get in touch.

In this post

RAG Workbench Beta Launch

Background

Core Problems Today

RAG Workbench Beta

More Posts

Bertelsmann / LastMile Case Study

Introducing AutoEval Experiments