Fine-Tune Evaluation
A/B Model Comparison
Compare personality profiles between two models. See exactly what changed and whether it moved toward your target.
After running batch evaluations on your base model and fine-tuned model, you can create a comparison to see the dimension-by-dimension personality shift.
Creating a Comparison
comparison = lindr.comparisons.create(
baseline_eval_id="uuid-of-base-model-eval",
candidate_eval_id="uuid-of-finetuned-eval",
name="llama-base-vs-finetune-v1"
)Understanding the Diff Report
The comparison returns a detailed diff showing how each dimension changed:
Example Comparison Report
Agreeableness
52→70+18%✓ Closer
Assertiveness
45→40-5%✓ Closer
Neuroticism
25→28+3%⚠ Regression
Conscientiousness
65→78+13%✓ Closer
Openness
60→62+2%✓ Closer
Overall Improvement:+12% toward target
Recommendations
Based on the comparison, Lindr provides a recommendation:
- Ship - Overall improvement >10%, no major regressions
- Review - Mixed results, some dimensions improved while others regressed
- Reject - Overall regression or critical dimensions moved away from target