Name: Lindr
Rating: 5.0 (1 reviews)

Eval datasets are collections of prompts you use to test your models. By using the same dataset across base and fine-tuned models, you ensure a fair comparison.

Dataset Structure

{
  "name": "customer-support-scenarios",
  "description": "50 customer support prompts across various categories",
  "prompts": [
    {
      "id": "complaint-1",
      "messages": [
        { "role": "user", "content": "I've been waiting 3 weeks for my order!" }
      ],
      "category": "complaint"
    },
    {
      "id": "inquiry-1",
      "messages": [
        { "role": "user", "content": "What's your return policy?" }
      ],
      "category": "inquiry"
    }
  ]
}

If the last message in a prompt is an assistant response, Lindr treats it as a pre-collected output. Otherwise, pass a monitor_id when running batch evals to generate responses from the model.

Creating a Dataset

dataset = client.datasets.create(
    name="support-scenarios-v1",
    prompts=[
        {
            "id": "complaint-1",
            "messages": [{"role": "user", "content": "..."}],
            "category": "complaint"
        },
        # ... more prompts
    ]
)

print(f"Created dataset: {dataset.id}")
print(f"Prompt count: {dataset.prompt_count}")

Best Practices

Diversity: Include prompts from different categories and edge cases
Size: Aim for at least 50 prompts for statistically meaningful results
Consistency: Use the same dataset for baseline and fine-tune evals
Versioning: Create new dataset versions as your test cases evolve

Eval Datasets

Dataset Structure

Creating a Dataset

Best Practices

JSON Import

Programmatic Creation