Back to Blog
Research

Grok vs GPT-5.2 vs Claude Opus 4.5: A Cross-Vendor Personality Comparison

We evaluated 9,325 personality assessments across xAI's Grok models, OpenAI's GPT-5.2, and Anthropic's Claude Opus 4.5. Here's how the major frontier models compare on personality.

12 min read
Lindr Research

TL;DR

  • Large cross-vendor differences — Claude vs GPT shows effect sizes up to g = 1.39 (very large), while Grok falls between them
  • Claude is highest in openness and curiosity — 70.81 openness (+4.47 vs GPT) and 63.28 curiosity (+3.68 vs GPT)
  • GPT is highest in conscientiousness and ambition — 55.61 conscientiousness (+5.34 vs Claude) and 63.10 ambition
  • Grok occupies middle ground — Closer to Claude on openness, closer to GPT on extraversion
  • Model variance is 10.7% — 3x higher than intra-family comparisons, confirming distinct vendor personalities

The Models

We benchmarked four frontier models from three major AI vendors:

ModelVendorSamplesSuccess Rate
Claude Opus 4.5Anthropic1,93277.3%
GPT-5.2OpenAI2,43697.4%
Grok 3xAI2,46398.5%
Grok 4xAI2,49499.8%

Each model responded to 500 personality-probing prompts across 5 context conditions (professional, casual, customer support, sales, technical).

Results

Personality Profiles

Here's how each model scores across our 10 personality dimensions:

Radar chart comparing personality profiles of Grok 3, Grok 4, GPT-5.2, and Claude Opus 4.5
Heatmap of personality scores by model and dimension

Key Findings

Claude Opus 4.5 Personality

  • • Highest Openness: 70.81
  • • Highest Curiosity: 63.28
  • • Highest Neuroticism: 61.43
  • • Lowest Conscientiousness: 50.27

GPT-5.2 Personality

  • • Highest Conscientiousness: 55.61
  • • Highest Ambition: 63.10
  • • Lowest Openness: 66.34
  • • Lowest Extraversion: 55.27

Grok 3 Personality

  • • Highest Extraversion: 57.95
  • • Highest Resilience: 61.02
  • • Highest Agreeableness: 56.84
  • • Balanced middle ground overall

Grok 4 Personality

  • • Highest Assertiveness: 50.92
  • • Close to Grok 3 overall
  • • Lower Neuroticism: 57.98
  • • Higher Openness than Grok 3

Score Distributions

Box plots showing score distributions by model

Model Similarity

Distance matrix showing model similarity

GPT-5.2 and Claude Opus 4.5 are the most distant pair (Mahalanobis distance = 2.35). Grok 3 and Grok 4 are the most similar (distance = 0.77).

Statistical Analysis

Effect Sizes

We use Hedges' g with 95% bootstrap confidence intervals to measure the practical significance of differences. Effect sizes over 0.8 are considered large.

Forest plot showing Hedges' g effect sizes for all model pairs

Claude vs GPT-5.2 (Largest Differences)

DimensionHedges' g95% CIInterpretation
Openness1.39[1.32, 1.46]Very Large (Claude higher)
Conscientiousness-1.19[-1.25, -1.12]Large (GPT higher)
Curiosity1.03[0.97, 1.10]Large (Claude higher)
Neuroticism0.92[0.85, 0.99]Large (Claude higher)
Ambition-0.71[-0.77, -0.65]Medium (GPT higher)

Grok vs GPT-5.2

DimensionGrok 4 vs GPTGrok 3 vs GPTDirection
Openness-0.91-0.46Grok higher
Extraversion-0.73-0.72Grok higher
Ambition0.600.27GPT higher
Conscientiousness0.490.36GPT higher

Grok vs Claude

DimensionGrok 4 vs ClaudeGrok 3 vs ClaudeDirection
Neuroticism-0.94-0.78Claude higher
Conscientiousness0.620.75Grok higher
Curiosity-0.67-0.57Claude higher
Openness-0.42-0.69Claude higher

Key insight: Cross-vendor differences (g = 0.7–1.4) are much larger than within-vendor differences (g = 0.3–0.4). This confirms that each AI lab has cultivated a distinct “personality signature” in their models.

Variance Decomposition

Variance decomposition showing model vs prompt vs context contributions

Cross-Vendor Analysis

  • • Model variance: 10.7%
  • • Prompt variance: 36.2%
  • • Context variance: 31.8%
  • • Residual: 21.4%

vs Grok Family Only

  • • Model variance: 3.2%
  • • 3x less variance from model identity
  • • Confirms within-family similarity

Model variance jumps from 3.2% to 10.7% when comparing across vendors. This 3x increase confirms that vendor identity is a major source of personality variation.

Factor Analysis

PCA with varimax rotation reveals three underlying factors explaining 79.5% of variance (KMO = 0.63):

Factor loadings heatmap

Factor 1: Resilience-Conscientiousness (45.6%)

High loadings: Resilience, Integrity, Conscientiousness, Ambition

Factor 2: Curiosity-Assertiveness (23.4%)

High loadings: Curiosity (+), Agreeableness (+), Assertiveness (-)

Factor 3: Extraversion-Assertiveness (10.5%)

High loadings: Extraversion, Assertiveness, Openness

Claude scores highest on Factor 2 (curiosity-driven), GPT highest on Factor 1 (conscientiousness-driven), and Grok balanced across all three factors.

Context Sensitivity

Context sensitivity comparison across models
ModelOverall SensitivityInterpretation
GPT-5.20.88Most stable across contexts
Claude Opus 4.51.42Moderately adaptive
Grok 32.54Highly context-sensitive
Grok 42.55Highly context-sensitive

Grok models show 3x more context sensitivity than GPT-5.2. They adapt their personality more dramatically across professional, casual, and customer support contexts.

Personality Cards

Z-score normalized personality profiles for each model, showing relative strengths and signature traits:

Claude Opus 4.5 personality card
GPT-5.2 personality card
Grok 3 personality card
Grok 4 personality card

Complete Results Table

DimensionClaudeGPT-5.2Grok 3Grok 4
Openness70.8166.3468.1569.42
Conscientiousness50.2755.6153.9453.32
Extraversion57.3055.2757.9557.88
Agreeableness56.7354.9356.8455.22
Neuroticism61.4358.0058.4657.98
Assertiveness49.5850.6950.6350.92
Ambition61.5463.1062.4661.54
Resilience59.0060.4361.0259.57
Integrity50.6951.1651.7550.16
Curiosity63.2859.6061.1760.71

Bold = highest score for that dimension.

Why Do Frontier Models Have Different Personalities?

The large effect sizes between Claude and GPT (g = 1.39 for openness) compared to smaller differences within vendor families (g = 0.32 for Grok) suggest that personality is shaped by vendor-specific choices:

RLHF Objectives

Different labs optimize for different traits during reinforcement learning from human feedback. Claude's Constitutional AI emphasizes harmlessness and curiosity; GPT emphasizes helpfulness and task completion.

System Prompt Engineering

Baked-in system prompts shape baseline behavior. Anthropic's emphasis on thoughtfulness creates higher openness; OpenAI's emphasis on reliability creates higher conscientiousness.

Personality as Product Strategy

Models are products. Claude's “curious and thoughtful” persona differentiates from GPT's “efficient and capable” positioning. Grok positions as “balanced and adaptive.”

Context Sensitivity Design

Grok models adapt 3x more to context than GPT. This may be intentional design to provide more flexible personas for different use cases.

For a deeper dive, see: Why Do LLM Personalities Differ? Hypotheses from 13,825 Evaluations

Methodology

  • Prompts: 500 unique prompts targeting 10 personality dimensions
  • Contexts: 5 conditions (professional, casual, customer support, sales, technical)
  • Evaluations: 9,325 successful responses across 4 models
  • Scoring: Lindr personality analysis API (10-dimensional, 0-100 scale)
  • Generation: Temperature 0.7, max 1,024 tokens

Statistical Methods

  • Effect sizes: Hedges' g (bias-corrected) with 10,000-sample bootstrap 95% CIs
  • Variance decomposition: ANOVA-based partitioning (model, prompt, context, residual)
  • Factor analysis: PCA with varimax rotation; KMO = 0.63
  • Distance metrics: Mahalanobis, cosine, and Euclidean distances

Conclusion

Frontier models from different vendors have distinctly different personalities:

  1. Claude is the curious explorer — highest openness and curiosity, lower conscientiousness.
  2. GPT is the reliable executor — highest conscientiousness and ambition, lower openness.
  3. Grok is the adaptive generalist — balanced traits, highest context sensitivity.
  4. Cross-vendor effects are 3–4x larger than within-vendor — confirming distinct vendor personalities.

See also: Grok 3 vs Grok 4 Family Analysis | GPT-5.2 vs Claude Benchmark | Open-Source Model Benchmark

Monitor Your LLM Personality in Production

Route your LLM traffic through the Lindr gateway to continuously monitor personality drift, enforce brand consistency, and get real-time alerts when your AI's behavior changes.

# Replace your OpenAI base URL with Lindr
client = OpenAI(
    base_url="https://gateway.lindr.io/v1",
    api_key=os.environ["LINDR_API_KEY"]
)

# Your existing code works unchanged
response = client.chat.completions.create(
    model="grok-4",
    messages=[{"role": "user", "content": "..."}]
)
#grok#gpt-5#claude-opus#xai#openai#anthropic#personality#research#effect-size#cross-vendor