Using the Playground - Libretto

If you weren't having fun prompt engineering with Libretto before, you certainly will now! The Playground is the home of Libretto's core prompt template editing tools, and is a great place to run tests or Experiments.

Model and Parameter Selection

Perhaps second-most important to the content of your prompt template is the model and parameters you choose to call.

Models

On the left-hand side above the Editor in the Playground, you can find a dropdown of the LLM models Libretto currently supports. We are constantly adding the latest and greatest (stable) models to the Playground.

Some LLM providers and models differ in format and accepted inputs. The Editor may change to reflect those capabilities and restrictions when a new model is selected.

Model Parameters

For models that support it, Libretto allows you to adjust the Temperature of the LLM output. Temperate dictates the randomness of the response generated by the LLM model.

When generating text, models produce a probability distribution for the set of possible next words in the sequence. When the temperature is set low, the model tends to choose words or responses that are more likely, resulting in outputs that are safer and more deterministic. A temperature of 0.0 will always provide the most likely next word.

On the other hand, a higher temperature encourages the model to select less likely words and phrases, which can result in more creative and diverse responses, but often at the risk of reducing coherence and accuracy.

Some LLM providers, like OpenAI, will allow Temperature settings between 0.0 and 2.0; whereas as others may restrict it to 0.0 and 1.0. As you change models in the Playground, the Temperature control will change to only allow legal values for Temperature.

Use the slider above the Editor to set the Temperate to your desired setting.

Template Editing

Similar to the prompt template creation step, you'll be presented with the editing tools to modify and add System, User, and Assistant inputs. After you're done editing, you can either Save this new version or Save & Run Tests, the latter of which will kick off a test run with the latest test cases and evaluation settings.

Don't worry about losing old versions of your prompt template, as each new edit creates a separate entry you can continue to use and reference.

Version Naming

Each new version of your prompt will have a corresponding name generated. If you'd like to edit this, click the Edit icon next to the version name under the Editor.

Switch Versions

Want to go back to an old version of your prompt template? Click on the name of the current version, and selected the desired version from the dropdown under the Editor. This version will then load into the Editor.

Copy Prompt

If you'd like to copy the chat template (so you can export it into an API call for instance), select the Clipboard icon on the bottom-right of the Editor.

Tests and Experiments

To the bottom-right of the Editor and on the right-hand pane labeled "Tests", you'll see the ability to "Run Experiments" and "Run Tests".

Experiments

An Experiment focuses on discovery new Variants of your prompt. You can take a deeper-dive on how to run an Experiments here.

You can cancel any currently running Experiments by clicking the X or Cancel button on the right-hand panel.

Tests

A Test Run consists of calling the LLM provider with the selected model and specified parameters for each test case within the prompt template. When the responses are returned, Libretto then automatically runs each of the evaluation metrics between test case's target output and the LLM response.

As these metrics are calculated, the right-hand panel of the Editor will be automatically updated with the latest results.

You can cancel any currently running tests by clicking the X or Cancel button on the right-hand panel.

If you'd like to re-run tests at any time, simply select the set of version/model pairings, and click Re-run.

Out of Date

If your set of test cases or metrics are updated, older test runs will show an "Out of Date" flag to let you know that you may no longer be comparing apples-to-apples. Simply re-run any "Out of Date" tests to use the new test settings.

Compare

Similar to re-running tests, you can view a detailed test-case-by-case breakdown and comparison by selecting the desired version/model pairings and clicking the Compare button.