Fine-Tune Evaluation

Evaluation Workflow

Validate your fine-tuned model's personality in 5 steps. From baseline to production.

Most teams fine-tune their models and hope for the best. They do manual "vibe checks" or rely on generic benchmarks that don't capture behavioral changes. Lindr provides a systematic workflow to quantify personality shifts before and after fine-tuning. Use prompt-only datasets to run models via a monitor, or include assistant responses to analyze pre-collected outputs.

The 5-Step Process

Define Target Persona

Create a persona profile with target values for each of the 10 personality dimensions. This is what you want your fine-tuned model to sound like.

client = lindr.Client(api_key="lnd_...")

persona = client.personas.create(
  name="Empathetic Support Agent",
  dimensions=lindr.PersonalityDimensions(
    agreeableness=85,
    assertiveness=40,
    neuroticism=20,
    # ... 7 more dimensions
  )
)

Establish Baseline

Run your eval dataset through Lindr to capture your base model's personality profile. This becomes your comparison point.

baseline, summary = client.evals.batch(
  name="baseline-llama-3.2",
  persona_id=persona.id,
  dataset_id="support-scenarios",
  monitor_id="monitor-id",
  model_name="llama-3.2-8b"
)

Fine-Tune Your Model

Train your model using any method—LoRA, full fine-tune, DPO. Lindr is model-agnostic and works with any training approach.

# Use your preferred fine-tuning method
# Lindr doesn't care how you train
# Just bring the outputs

Post-Fine-Tune Evaluation

Run the same eval dataset on your fine-tuned model. Lindr computes personality scores for each response.

finetuned, _ = client.evals.batch(
  name="finetuned-v1",
  persona_id=persona.id,
  dataset_id="support-scenarios",
  monitor_id="monitor-id",
  model_name="llama-3.2-8b-my-finetune"
)

Compare & Validate

Generate a dimension-by-dimension comparison. See exactly what changed, whether it moved toward your target, and if you should ship.

comparison = client.comparisons.create(
  baseline_eval_id=baseline.id,
  candidate_eval_id=finetuned.id
)

print(comparison.recommendation)
print(comparison.overall_improvement)

What You Get

Quantifiable proof that your fine-tune achieved the desired personality shift
Per-dimension breakdown showing exactly what changed and by how much
Ship/review/reject recommendations based on target persona alignment
Same persona profiles carry forward to production monitoring

Running Batch Evals View All 10 Dimensions