Evaluation Workflow
Validate your fine-tuned model's personality in 5 steps. From baseline to production.
Most teams fine-tune their models and hope for the best. They do manual "vibe checks" or rely on generic benchmarks that don't capture behavioral changes. Lindr provides a systematic workflow to quantify personality shifts before and after fine-tuning. Use prompt-only datasets to run models via a monitor, or include assistant responses to analyze pre-collected outputs.
The 5-Step Process
Define Target Persona
Create a persona profile with target values for each of the 10 personality dimensions. This is what you want your fine-tuned model to sound like.
client = lindr.Client(api_key="lnd_...")
persona = client.personas.create(
name="Empathetic Support Agent",
dimensions=lindr.PersonalityDimensions(
agreeableness=85,
assertiveness=40,
neuroticism=20,
# ... 7 more dimensions
)
)Establish Baseline
Run your eval dataset through Lindr to capture your base model's personality profile. This becomes your comparison point.
baseline, summary = client.evals.batch(
name="baseline-llama-3.2",
persona_id=persona.id,
dataset_id="support-scenarios",
monitor_id="monitor-id",
model_name="llama-3.2-8b"
)Fine-Tune Your Model
Train your model using any method—LoRA, full fine-tune, DPO. Lindr is model-agnostic and works with any training approach.
# Use your preferred fine-tuning method
# Lindr doesn't care how you train
# Just bring the outputsPost-Fine-Tune Evaluation
Run the same eval dataset on your fine-tuned model. Lindr computes personality scores for each response.
finetuned, _ = client.evals.batch(
name="finetuned-v1",
persona_id=persona.id,
dataset_id="support-scenarios",
monitor_id="monitor-id",
model_name="llama-3.2-8b-my-finetune"
)Compare & Validate
Generate a dimension-by-dimension comparison. See exactly what changed, whether it moved toward your target, and if you should ship.
comparison = client.comparisons.create(
baseline_eval_id=baseline.id,
candidate_eval_id=finetuned.id
)
print(comparison.recommendation)
print(comparison.overall_improvement)What You Get
- Quantifiable proof that your fine-tune achieved the desired personality shift
- Per-dimension breakdown showing exactly what changed and by how much
- Ship/review/reject recommendations based on target persona alignment
- Same persona profiles carry forward to production monitoring