Home / Academy / Module 7

Module 7Advanced45 min

Measurement: AISOV, Sentiment, Hallucination Risk, and Narrative Gap

If your GEO report does not have thresholds and trend direction, it is a diary, not a scorecard. Build a measurement system that triggers action and makes GEO defensible to leadership.

Core message of this lesson

GEO becomes strategy only when measured with intent-aware metrics that trigger action. Without thresholds, owners, and trend logic, your dashboard is an expensive diary that describes activity but drives nothing.

By the end of this lesson

  • Good GEO metrics are intent-aware and decision-relevant. Raw mention rate is a vanity metric. Intent-weighted AISOV is a business metric.
  • AISOV needs context, not just volume. Segment by intent bucket and model before drawing any conclusions.
  • Narrative gap and hallucination risk are essential companions to sentiment. Never report sentiment without them.

Why this matters now

Without measurement rigor, GEO stays opinion-driven and underfunded. I have seen teams spend 2 hours a week on GEO reporting that could not answer one business question. A strong scorecard makes prioritization objective, impact defensible, and budget conversations winnable.

Deep explanation

Why vanity metrics are actively misleading your GEO program

Here is the pattern I see constantly. A team starts tracking AI mentions. The number goes up. They report it as progress. Leadership nods. Then nothing changes in pipeline quality, nobody can explain why, and the GEO program gets quietly deprioritized six months later.

Raw mention growth can look positive while commercial outcomes worsen. Mention volume does not tell you whether the model recommends you for the right use cases, positions you favorably against competitors, or describes you accurately on the claims that matter for buyer trust. A brand that is mentioned in 60% of prompts but framed as 'a budget alternative with limited enterprise features' in all of them is not winning. They are losing with high visibility.

A decision-ready scorecard must include presence quality (not just presence), framing accuracy (are the descriptions correct and favorable?), factual reliability (how often does the model get your facts wrong?), and risk concentration by intent (where are the errors happening in the buying journey?). The goal is to measure influence on buying behavior, not content activity.

AISOV done correctly: intent-weighted, not raw

AI Share of Voice is the metric everyone talks about and almost nobody measures correctly. AISOV should be interpreted as structured visibility quality, not just mention share. You need to know where you appear, how you are positioned, and whether those appearances happen in high-intent contexts that drive buying behavior.

Here is the distinction that matters: being mentioned in 40% of prompts means nothing if 35% of those are low-intent informational queries like 'What is GEO?' and only 5% are decision-stage queries like 'Best GEO platform for B2B mid-market.' Raw AISOV treats both equally. Intent-weighted AISOV tells you your actual commercial visibility is 5%, not 40%. That is a completely different story, and it is the story your leadership needs to hear.

Segment AISOV by cluster and model. Global averages hide where you are losing the prompts that matter most. You might have 80% AISOV in educational prompts and 10% in comparison prompts, and the global average of 45% makes everything look fine while your pipeline bleeds.

Sentiment alone is incomplete: combine it with narrative gap and hallucination risk

Sentiment is the metric teams gravitate to because it feels intuitive: are AI answers positive or negative about us? But sentiment alone is dangerously incomplete. A neutral response can still damage strategy if it omits your key differentiator. A positive response can still be risky if it includes false claims that set wrong expectations.

Narrative gap tracks strategic alignment to your intended pillars. If your target positioning is 'workflow automation platform for operations teams' and the model describes you as 'a project management tool for small teams,' sentiment might be neutral or even positive, but the narrative gap is catastrophic. Hallucination risk tracks error severity and business exposure: how often does the model state something factually wrong about you, and in which prompt contexts?

Together, these three metrics (sentiment, narrative gap, hallucination risk) provide the context needed for reliable prioritization. Sentiment tells you the tone. Narrative gap tells you the strategic alignment. Hallucination risk tells you the factual accuracy. Any one metric alone gives you an incomplete and potentially misleading picture.

Scorecard governance: the 30-minute weekly review that actually drives action

A scorecard is useful only with owners, thresholds, and cadence. Without these three elements, metrics are interesting observations that produce no decisions. Here is the specific weekly review format I recommend. It takes 30 minutes and runs every Monday morning.

Monday review format (30 minutes): First 10 minutes: check trust score delta from last week. Has it improved, declined, or stayed flat? If declined, why? Second 10 minutes: scan top-3 diagnosis issues by severity. Are there new critical or high-severity items? Flag any new severity changes for immediate discussion. Final 10 minutes: review last sprint's corrections. Did they produce measurable improvement? If not, why? Assign next actions with names and dates.

Monthly reviews (one per month, 60 minutes) should evaluate trend durability over 4-week periods and question whether your scoring model itself needs updating. Are you measuring the right things? Are thresholds set correctly? This is how GEO measurement becomes an execution engine instead of a reporting artifact.

Mental model

Measure what changes buying decisions: inclusion, framing quality, factual trust, and risk movement by intent. Everything else is noise.

Framework
  1. 1. Define KPI hierarchy

    Separate strategic KPIs (trust score, decision-stage AISOV, hallucination rate) from diagnostic sub-metrics (per-model mention rates, per-cluster sentiment). Strategic KPIs go in your leadership report. Diagnostic metrics go in your sprint planning.

  2. 2. Segment by intent and model

    Track metrics where they matter commercially, not only in aggregate. Decision-stage AISOV, comparison-prompt positioning, and procurement-prompt accuracy are the segments that matter for pipeline. Everything else is context.

  3. 3. Set thresholds and ownership

    Assign clear owners and intervention triggers for each strategic KPI. Example: 'If decision-stage AISOV drops below 30%, the content lead owns a correction sprint starting next Monday.' Without thresholds, metrics are decoration.

  4. 4. Run fixed weekly review

    Use one 30-minute review ritual every Monday to align diagnosis, actions, and accountability. Same format every week. Same attendees. Same output template. Consistency is what makes this work.

  5. 5. Close the loop with post-action analysis

    After every correction sprint, measure the delta. Link specific interventions to measured changes. Update your playbook with what worked and retire what did not. This is how you get better at GEO every month.

Applied case

Case: rising mentions, flat revenue impact, and a team about to lose its budget

A B2B marketing team at a $30M ARR security software company celebrated strong mention growth over three months. Their AISOV went from 25% to 55%. The CMO presented it as a GEO win. But sales-qualified pipeline quality did not improve, and win rates actually declined slightly. The board started asking hard questions.

The problem was their scorecard. It did not segment by intent, so high-funnel educational mentions (which were growing rapidly) masked decision-stage weakness. When they finally ran intent-weighted analysis, the real picture emerged: educational AISOV was 75% (great for brand awareness, low pipeline impact). Decision-stage AISOV was 12% (the actual driver of buyer shortlisting). Narrative-gap analysis showed the brand was still framed as 'a legacy security tool with a complex deployment model' in 8 of 10 procurement prompts.

Scorecard redesign and the recovery

They redesigned their scorecard with three changes. First, they split AISOV into intent-weighted segments with separate thresholds for each. Second, they added hallucination risk and narrative gap as co-equal metrics alongside sentiment. Third, they implemented the Monday 30-minute review with named owners and escalation triggers.

Interventions shifted from broad content publishing to decision-stage evidence corrections and comparison page rewrites. Within two cycles, decision-stage AISOV improved from 12% to 31%. The procurement-prompt narrative shifted from 'legacy tool' to 'modern deployment with self-serve option.' More importantly, the team could now explain exactly what they were doing, why it mattered, and how it was working, which saved the GEO program's budget for the next fiscal year.

Captoo execution playbook

Mission in Captoo

Operationalize a weekly GEO scorecard that drives corrective action, verifies outcome movement, and makes GEO performance defensible to leadership.

Where to click

OverviewSOVSentimentNarrative gapUnified Report

Execution steps

Step 1Overview

Set top-line baseline

  • Record trust score and summary KPI status for the cycle. This is your weekly starting point.
  • Confirm target ranges for this sprint period. If you do not have thresholds, set them now: what score would trigger action?
Step 2SOV

Track share dynamics

  • Review share movement by cluster and competitor, segmented by intent. Do not look at global averages.
  • Flag strategic clusters with negative trend over 2+ weeks. These need investigation in this week's review.
Step 3Narrative gap

Assess narrative alignment

  • Score each priority pillar against model output. Is the gap closing, stable, or widening?
  • Escalate pillars with persistent negative deviation that has not responded to corrections. These may need external source work.
Step 4Sentiment

Monitor trust quality

  • Check whether sentiment shifts align with factual reliability improvements. Sentiment improving while hallucinations persist is a false positive.
  • Separate tone changes from structural narrative corrections. You want both, but factual accuracy comes first.
Step 5Unified Report

Publish and assign

  • Generate scorecard report with owner-specific actions and deadlines for each flagged item.
  • Use report outputs to lock next sprint backlog. No new priorities after Monday's review unless a critical incident is detected.

Decision rules (if/then)

  • If decision-stage AISOV declines for 2 consecutive weeks, prioritize comparative and proof interventions immediately. This is a pipeline emergency.
  • If sentiment rises but narrative gap worsens, prioritize positioning corrections. Positive tone with wrong framing is more dangerous than neutral tone with correct framing.
  • If hallucination risk exceeds your defined threshold on any model, trigger immediate correction protocol. Do not wait for the weekly review.
  • If metrics move but sales quality does not improve, reweight your cluster priorities. You may be measuring the wrong prompts.

Output artifact for your team

Weekly GEO Scorecard with KPI deltas, intent-weighted segments, risk alerts, action owners, and verification plan for each correction.

Success metrics to verify next cycle

  • Improved decision-stage AISOV over baseline within 3 sprint cycles.
  • Lower high-severity hallucination incidence, targeting below 10% of decision-stage prompts.
  • Stronger narrative alignment on priority pillars, measured by gap reduction.
  • Consistent owner-driven weekly action closure with documented outcomes every Monday.
Common mistakes
  • Tracking global averages that hide strategic cluster failures. Your overall AISOV can look great while decision-stage visibility collapses.
  • Treating sentiment as equivalent to positioning quality. A positive description in the wrong category is worse than a neutral description in the right category.
  • Running scorecards without thresholds or accountable owners. If nobody is responsible for a metric, nobody will fix it when it declines.
  • Reporting KPI movement without linking it to specific actions. 'Visibility improved 8%' is useless. 'Visibility improved 8% in decision-stage prompts after comparison page rewrite' is actionable intelligence.
Key takeaways
  • Good GEO metrics are intent-aware and decision-relevant. Raw mention rate is a vanity metric. Intent-weighted AISOV is a business metric.
  • AISOV needs context, not just volume. Segment by intent bucket and model before drawing any conclusions.
  • Narrative gap and hallucination risk are essential companions to sentiment. Never report sentiment without them.
  • Owners and thresholds make scorecards operational. Without them, your dashboard is an expensive diary.
  • The Monday 30-minute review is the minimum viable governance for a serious GEO program. Skip it and you lose your execution rhythm.

References and further reading

Move from lesson to execution

Apply this module on real prompts, real competitors, and real KPI movement inside your Captoo workspace.

Next module