Changelog

Changelog

Libretto is continually releasing new features, improvements, and bug fixes. Here's a list of our most important updates:


Changes in October 2024

  • Rate limiting for API events and prompt template creation

  • Better handling of user authentication and authorization

  • More accurate exact text matching for evaluating responses

  • Boolean scoring options for generation criteria

  • Ability to persist evaluations without creating test cases

  • Better handling of database connections and timeouts

  • Improved caching for embeddings and other operations

  • Revamped signin page and authentication flow

  • New projects dashboard interface

  • Better parameter display and test case management

  • Improved organization management features

    Changes in September 2024

  • Improved performance and loading times across the application, particularly for prompts pages with many prompts

  • Added ability to hide matching or passing results in comparison views for clearer analysis

  • Enhanced test case generation capabilities, including support for chat history variables

  • Added new model support for Perplexity 3.1 models (and deprecated 3.0 versions)

  • Improved CSV exports with clearer column headers and more descriptive filenames

  • Added tooltips showing better/worse distributions in comparison views

  • Added compact view toggle to show more data in tables

  • Added copy buttons for easy copying of parameters and responses

  • Improved visualization of complex response chains with better formatting and tooltips

  • Added tracking of user access events and logging

  • Fixed various UI issues including tooltip display and table scrolling behavior

Changes in August 2024

  • Added new evaluation types including Exact Text Match, Toxicity Check, and LLM-as-Judge to help assess prompt outputs
  • Improved chain monitoring interface to better track and visualize sequences of prompt calls
  • Enhanced test case management with better error handling and the ability to edit test parameters
  • Updated model defaults to use GPT-4o mini for better performance
  • Added ability to view human grades alongside LLM grades in test results
  • Improved JSON output display and formatting across the application
  • Added floating support button for easier access to help
  • Enhanced date picker interface on the production calls page
  • Fixed various UI issues including scrolling behavior and layout alignment
  • Added tooltips and better error messages throughout the interface
  • Improved handling of chat history in prompts
  • Added new Terms of Service page

Changes in July 2024

  • Added support for viewing and grading test cases with multimodal content (images + text)
  • Improved grading workflow UI with better scrolling, navigation between test cases, and starting at first ungraded item
  • Added ability to auto-generate evaluation criteria outside of prompt creation flow
  • Added pagination controls for browsing assistant threads, making it easier to navigate between conversations
  • Improved prompt editing experience with unsaved changes warning when navigating away
  • Added support for Google's Gemini 1.5 models with chat-style messaging
  • Enhanced JSON result viewing with better expansion controls and formatting
  • Fixed issues with test case generation and evaluation flows
  • Improved handling of tool calls and function responses in test cases
  • Added ability to share grading results between organization members
  • Updated criteria scoring to better handle both positive and negative rubrics
  • Fixed various UI bugs related to scrolling, navigation and data display

Changes in June 2024

  • Added Terms of Service requirement for users
  • Added support for Claude 3.5-Sonnet model
  • Added ability to use fetch in JavaScript evaluations
  • Improved test case generation:
    • Now generates more test cases by default
    • Better spacing and alignment of UI elements
    • Added "Generate Test Cases" button for empty states
  • Enhanced evaluation features:
    • New alignment score formula for auto-calibration
    • Improved progress tracking during calibration
    • Added ability to create new judge evaluations
    • Added tooltips showing criteria and rubric details
  • Fixed several model-related issues:
    • Fixed temperature display in test summaries
    • Fixed Llama 3 instruction handling
    • Fixed context window sizes for models
    • Improved handling of Anthropic model details
  • UI Improvements:
    • Cleaner navigation styling
    • Better error displays in test case tables
    • Added support for pasting plain text in editors
  • Added support for Anthropic tool use in playground
  • Improved bulk upload functionality to support chat history variables

Changes in May 2024

  • Added support for OpenAI's Assistants API Threads, enabling more complex conversational interactions
  • Added GPT-4 with Vision (gpt-4o) support for both regular prompts and assistants
  • Added Perplexity models as new model options
  • Improved test case management:
    • Added ability to edit and review test cases in bulk
    • Added columns showing individual variables in test case tables
    • Made target outputs optional for test cases
    • Prevented duplicate test case creation
  • Enhanced project organization:
    • Added ability to archive projects and prompts
    • Added 404 pages for invalid projects/prompts
    • Improved project metrics and filtering
  • UI Improvements:
    • Added version dates in playground version selector
    • Improved table layouts and pagination
    • Added documentation link in header
    • Fixed various UI glitches and animation issues
  • Added ability to export test cases with full argument details
  • Added safeguards to prevent concurrent test runs of the same scenario

Changes in April 2024

  • Added a new Leaderboard page to compare performance across different prompt versions and models
  • Improved test case management:
    • Added ability to suggest test case outputs automatically
    • Added validation for function names and parameters
    • Fixed issues with test case argument editing and generation
    • Made test case list more organized with better sorting and filtering
  • Enhanced model support:
    • Added GPT-4 Turbo and GPT-4 Turbo 2024-04-09 models
    • Added Llama 3 support via Groq and Replicate
  • Improved experiment workflow:
    • Fixed issues with experiment status updates and cancellation
    • Added better handling of partial test run updates
    • Improved performance of experiment calculations
  • UI improvements:
    • Made playground layout more spacious and improved scrolling
    • Improved formatting of numbers and classification results
    • Added confirmation dialog for file deletion
    • Fixed various layout and scrolling issues
  • Performance optimizations:
    • Improved pagination and polling logic
    • Optimized database queries and caching
    • Reduced payload sizes for large test runs
    • Added better handling of rate limits
Previous
REST API Overview