Key Features

Testing & Evaluation

One of the first and primary pain points in prompt engineering is meaningfully testing a prompt, and really understanding how well a prompt performs. Libretto not only makes it drop-dead simple to test your prompt template against a wide variety of metrics and test cases, but makes evaluating and comparing prompt variations both intuitive and - dare we say - fun.


Evaluation Metrics

Automated LLM evaluation is tricky and very much dependent on the type of prompt. For some prompts, like sentiment analysis or categorization prompts, you can just do a string compare for your test case. If the LLM gets the right answer, it passes. But for prompts that are more generative, like customer service chats or retrieval augmented generation, you may want to use a fuzzy string match or another LLM to grade the responses.

In Libretto, we have many different options for evaluating the LLM’s response so that you can tailor your evaluation strategy to your prompt. We can test sentiment, toxicity, JSON structure, custom subjective criteria, BLEU, ROUGE, BERTScore, embedding similarity, and even custom-written evaluations.


Test Cases

As you hone in on the answer, you need to build up your set of test cases to be more rigorous in testing changes to your prompts. This quickly becomes a massive headache. When you change your prompt and re-run all your tests, and you too often end up skimming spreadsheets to see how the LLM did without gaining an in-depth picture of how performance really shifted.

At Libretto, we track all of the test cases you’ve built up for your various prompts, and keep a large library of ways to evaluate the answers that come back from LLMs.

Test Case Generation

Knowing how difficult but crucial it is to create new test cases, we offer the ability and structure to generate new test cases by calling the LLMs themselves. This can drastically speed up the deceptively iterative test-case-to-prompt workflow, while also allowing room for the discovery of new edge cases.

Previous
Quickstart