No black-box score. Here is exactly how 42 factors across seven categories build the GEO Score - with weightings, data basis, and honest limitations.
Seven categories. 42 weighted core factors, plus ~18 sub-checks per factor (60 detail checks in total). Every factor corresponds to a real backend check in the scoring engine.
Not all factors carry equal weight. Structured data and technical fundamentals contribute most to the GEO Score.
Weightings are based on empirical observations across 9 AI engines (ChatGPT, Claude, Perplexity, Gemini, Copilot, DeepSeek, Grok, Z.AI, Kimi) over several months. Structured data (25%) and technical SEO (20%) dominate because AI systems primarily process machine-readable data - not natural-language text. The business data layer (5%) is deliberately low-weighted: missing feeds cost points, but perfect feeds barely lift the score.
Honesty is part of the methodology. Here are the actual limits of our system.
We have no privileged access to the internal ranking algorithms of OpenAI, Anthropic, or Google. Our measurement is based on observable output behavior of the models.
A high GEO Score statistically increases the probability of citation - it does not guarantee it. AI answers are non-deterministic.
We measure domain-level visibility. No user profiles, no end-customer tracking - GDPR-oriented by design.
AI engine queries run on a typical 24-hour polling cycle. Real-time visibility is not technically measurable - no platform offers an API for it.
From technical data quality to measurable business impact - four levels that together provide a complete picture.
Foundation: Is the data machine-readable, complete, and fresh?
Is the company mentioned, cited, or recommended in AI answers?
How accurate and positive are AI statements about the company?
Does AI visibility translate to measurable traffic, leads, and revenue?
The GEO Score is not based on gut feeling. These publications form the empirical basis.
Aggarwal et al.
First systematic study on optimization for generative search engines. Defines citations, impressions, and share of voice as primary GEO metrics - basis for our factor categorization.
Stanford Human-Centered AI Institute
Hallucination rates by model and domain. Basis for our hallucination score and the calibration of our 9-engine test matrix.
Patronus AI / IBM Research
Detection benchmarks for factual errors in LLM outputs. Methodology adopted for our 8-layer hallucination detection (Layer 9 = optional AI-semantic extension).
Google LLC
Official specification for structured data. Defines which schema types Google (and thus Gemini) uses for answer generation.
Beconova is not an academic research institution. The cited sources underpin the theoretical foundations - the practical weightings derive from our own measurement empiricism over 6+ months of production data.
If anything about our methodology is unclear, contact us directly. No buzzwords, no deflection.