Changelog
Changelog
Libretto is continually releasing new features, improvements, and bug fixes. Here's a list of our most important updates:
Changes in October 2024
Rate limiting for API events and prompt template creation
Better handling of user authentication and authorization
More accurate exact text matching for evaluating responses
Boolean scoring options for generation criteria
Ability to persist evaluations without creating test cases
Better handling of database connections and timeouts
Improved caching for embeddings and other operations
Revamped signin page and authentication flow
New projects dashboard interface
Better parameter display and test case management
Improved organization management features
Changes in September 2024
Improved performance and loading times across the application, particularly for prompts pages with many prompts
Added ability to hide matching or passing results in comparison views for clearer analysis
Enhanced test case generation capabilities, including support for chat history variables
Added new model support for Perplexity 3.1 models (and deprecated 3.0 versions)
Improved CSV exports with clearer column headers and more descriptive filenames
Added tooltips showing better/worse distributions in comparison views
Added compact view toggle to show more data in tables
Added copy buttons for easy copying of parameters and responses
Improved visualization of complex response chains with better formatting and tooltips
Added tracking of user access events and logging
Fixed various UI issues including tooltip display and table scrolling behavior
Changes in August 2024
- Added new evaluation types including Exact Text Match, Toxicity Check, and LLM-as-Judge to help assess prompt outputs
- Improved chain monitoring interface to better track and visualize sequences of prompt calls
- Enhanced test case management with better error handling and the ability to edit test parameters
- Updated model defaults to use GPT-4o mini for better performance
- Added ability to view human grades alongside LLM grades in test results
- Improved JSON output display and formatting across the application
- Added floating support button for easier access to help
- Enhanced date picker interface on the production calls page
- Fixed various UI issues including scrolling behavior and layout alignment
- Added tooltips and better error messages throughout the interface
- Improved handling of chat history in prompts
- Added new Terms of Service page
Changes in July 2024
- Added support for viewing and grading test cases with multimodal content (images + text)
- Improved grading workflow UI with better scrolling, navigation between test cases, and starting at first ungraded item
- Added ability to auto-generate evaluation criteria outside of prompt creation flow
- Added pagination controls for browsing assistant threads, making it easier to navigate between conversations
- Improved prompt editing experience with unsaved changes warning when navigating away
- Added support for Google's Gemini 1.5 models with chat-style messaging
- Enhanced JSON result viewing with better expansion controls and formatting
- Fixed issues with test case generation and evaluation flows
- Improved handling of tool calls and function responses in test cases
- Added ability to share grading results between organization members
- Updated criteria scoring to better handle both positive and negative rubrics
- Fixed various UI bugs related to scrolling, navigation and data display
Changes in June 2024
- Added Terms of Service requirement for users
- Added support for Claude 3.5-Sonnet model
- Added ability to use fetch in JavaScript evaluations
- Improved test case generation:
- Now generates more test cases by default
- Better spacing and alignment of UI elements
- Added "Generate Test Cases" button for empty states
- Enhanced evaluation features:
- New alignment score formula for auto-calibration
- Improved progress tracking during calibration
- Added ability to create new judge evaluations
- Added tooltips showing criteria and rubric details
- Fixed several model-related issues:
- Fixed temperature display in test summaries
- Fixed Llama 3 instruction handling
- Fixed context window sizes for models
- Improved handling of Anthropic model details
- UI Improvements:
- Cleaner navigation styling
- Better error displays in test case tables
- Added support for pasting plain text in editors
- Added support for Anthropic tool use in playground
- Improved bulk upload functionality to support chat history variables
Changes in May 2024
- Added support for OpenAI's Assistants API Threads, enabling more complex conversational interactions
- Added GPT-4 with Vision (gpt-4o) support for both regular prompts and assistants
- Added Perplexity models as new model options
- Improved test case management:
- Added ability to edit and review test cases in bulk
- Added columns showing individual variables in test case tables
- Made target outputs optional for test cases
- Prevented duplicate test case creation
- Enhanced project organization:
- Added ability to archive projects and prompts
- Added 404 pages for invalid projects/prompts
- Improved project metrics and filtering
- UI Improvements:
- Added version dates in playground version selector
- Improved table layouts and pagination
- Added documentation link in header
- Fixed various UI glitches and animation issues
- Added ability to export test cases with full argument details
- Added safeguards to prevent concurrent test runs of the same scenario
Changes in April 2024
- Added a new Leaderboard page to compare performance across different prompt versions and models
- Improved test case management:
- Added ability to suggest test case outputs automatically
- Added validation for function names and parameters
- Fixed issues with test case argument editing and generation
- Made test case list more organized with better sorting and filtering
- Enhanced model support:
- Added GPT-4 Turbo and GPT-4 Turbo 2024-04-09 models
- Added Llama 3 support via Groq and Replicate
- Improved experiment workflow:
- Fixed issues with experiment status updates and cancellation
- Added better handling of partial test run updates
- Improved performance of experiment calculations
- UI improvements:
- Made playground layout more spacious and improved scrolling
- Improved formatting of numbers and classification results
- Added confirmation dialog for file deletion
- Fixed various layout and scrolling issues
- Performance optimizations:
- Improved pagination and polling logic
- Optimized database queries and caching
- Reduced payload sizes for large test runs
- Added better handling of rate limits