This page discloses every methodological decision. Each layer is anchored in research, every finding traceable through verbatim evidence — especially for readers who are initially skeptical of our results.
The six dimensions and the 0–10 scale are Klyptra's operationalization of the Media Bias Taxonomy (Spinde et al. 2023) — not a direct part of BABE, which annotates binarily (biased/neutral). The scale is calibrated against BABE-style expert benchmarks. The five bands are not equally wide: news-agency reporting (dpa, AFP, Reuters) typically hits 8–9 — a value of 10 would be pure fact-listing with no narrative selection at all, practically unreachable. Downward, by contrast, there is more room: the propaganda band (0–2.9), three points wide, is the largest.
9 – 10
1 point
sehr_objektiv
Near-neutral reporting. No evaluative adjectives, balanced sources, speculation clearly marked as such. News-agency level (dpa, AFP, Reuters).
7 – 8.9
2 points
objektiv
Solid journalistic standards. Occasional evaluations detectable, but transparently marked as opinion. Multiple perspectives represented.
5 – 6.9
2 points
moderat_biased
Clearly recognizable editorial line. Word choice with a tendency, one-sided source selection, but no systematic distortion.
3 – 4.9
2 points
stark_biased
Consistently one-sided presentation. Loaded words without labeling, omission of exculpatory facts, emotional charging.
0 – 2.9
3 points
propaganda
Facts are distorted, the other side not quoted or only as a straw man, sensational framing dominates. Widest band — there is more room downward than upward.
The labels in this table are exactly the strings the analyzer outputs in the JSON and the permalink UI (in German) — no UI mapping in between.
The six dimensions — in depth
What each dimension measures and where it comes from.
Framing
Media Bias Taxonomy (Spinde et al. 2023) · Framing bias
Which perspective is declared the narrative norm? Who is subject, who is object?
Operationalization
Active/passive constructions with political asymmetry
Order in which actors are named
Implicit attribution of blame through verb choice
Word choice
Media Bias Taxonomy (Spinde et al. 2023) · Lexical bias
Which words carry judgments without marking them? Loaded language in the narrow sense.
Operationalization
Verbatim identification of evaluative terms
Comparison with neutral synonyms
Density per 1000 words
Source diversity
Media Bias Taxonomy (Spinde et al. 2023) · Selection/Coverage
How many voices are quoted directly, how politically broad is the spectrum?
Operationalization
Number of directly quoted people / institutions
Political positioning of those quoted
Ratio of primary to secondary sources
In the book (Spinde 2025, Ch. 2), source/selection bias is a reporting-level construct that strictly measures across articles. Klyptra approximates it on the single text — the full cross-outlet analysis sits at Tier 2 (not in the score).
Fact / opinion
Media Bias Taxonomy (Spinde et al. 2023) · Epistemological bias
Is evaluation linguistically separated from observation — or sold as fact?
Operationalization
Marking of commentary (“claims”, “according to X”)
Forecasts vs. facts
Subjunctive discipline
Its own dimension, because German news language interweaves evaluation and observation especially tightly at the syntactic level (nominalization, modal verbs, subjunctive I/II) — the English BABE annotation only covers this indirectly.
Completeness
Media Bias Taxonomy (Spinde et al. 2023) · Spin/Omission
What is left out? Which relevant background or counter-positions are missing?
Operationalization
Recognizably missing counter-positions to central claims
One-sided fact selection (cherry-picking)
Context gaps that distort the framing
Omission/spin bias is in part reporting-level in the book (what is missing across several articles). Klyptra assesses the gaps recognizable in the single text; the cross-article level is Tier 2.
Emotional balance
Media Bias Taxonomy (Spinde et al. 2023) · Phrasing/Sentiment
How strongly is it emotionally charged? Sensational or outrage language?
The six dimensions are the top-level axes. Beneath them, Klyptra maintains 27 specific bias patterns following the BiasScanner taxonomy (Menzner & Leidner 2024). Per article, 0–N patterns are identified — each with verbatim evidence, position in the text and a strength assessment.
Sub-categories are qualitative markers, not numeric sub-scores. If a top dimension such as “Completeness” is rated low, the sub-layer shows which concrete pattern carries the finding — e.g. Cherry-Picking or Whataboutism.
Word choice
word_choice
4 patterns
Lexical level — words that evaluate without marking the evaluation as such.
Word Choice Bias
ExampleA “migrant” is consistently called an “intruder”.
Emotional Sensationalism
Example“Nightmare scenario”, “shock diagnosis”, “mood of doom” as routine vocabulary.
Discrimination Bias
ExampleGeneralization about groups (“typical of …-migrants”), mentioning origin without relevance.
Smear / Praise Bias
Example“scandalous attempt” for one party vs. “bold initiative” for the other — for the same kind of action.
Framing
framing
6 patterns
Narrative constructs — how a matter is framed in storytelling, independent of individual words.
Straw Man
Example“The left wants every migrant to get a house immediately.” A caricature of the opposing position is attacked.
False Dichotomy
Example“Either we cut taxes — or the country collapses.” Two options suggested where many exist.
False Analogy
Example“Just like back in 1933 …” for a current political debate with a loose connection.
Insinuative Questioning
Example“Why is the chancellor silent on the accusations?” — without the accusations themselves being substantiated.
Moving Goalposts
Example“5% growth was expected, now it's 8% — so a failure.” The yardstick adjusted after the result.
In-Group / Out-Group Bias
ExampleConsistent “we Germans” vs. “them” — collective assignment of guilt or virtue.
Source diversity
source_diversity
3 patterns
Source quality — who is heard, how they are quoted, whether the voices are classifiable.
Source Selection Bias
ExampleOnly one party's press office is quoted; the other side not at all or only paraphrased.
External Validation Bias
ExampleA lobbyist is introduced as an “independent expert” without naming their interests.
Vague Attribution
Example“Circles report …”, “according to insiders …” carry the central argument of the piece.
Fact–opinion separation
fact_opinion_separation
5 patterns
Linguistic discipline — whether evaluation is marked as evaluation or sold as fact.
Opinionated Bias
Example“The government's catastrophic policy …” — an evaluative adjective in news mode.
Speculation Bias
Example“This will undoubtedly end in disaster.” Forecast without subjunctive, without source.
Unsubstantiated Claims
Example“Millions are affected” — a number without evidence, source or method.
Projection Bias
Example“They only care about power.” Attribution of motive as a statement of fact.
Circular Reasoning
Example“It is illegal because it breaks the law.” The justification repeats the claim.
Completeness
completeness
4 patterns
What is missing — relevant context, counter-arguments, evidence that goes unmentioned.
Cherry-Picking
ExampleOne study is cited; three methodologically comparable studies with the opposite result are not.
Anecdotal Evidence
Example“Ms. M. from Hamburg says …” carries a trend finding; statistical data are missing.
Whataboutism
ExampleConsistent deflection from the main accusation onto the behavior of other actors.
False Balance
ExampleClimate scientists and climate deniers are presented as equivalent voices — although the evidence base is asymmetric.
Emotional balance
emotional_balance
5 patterns
Affective charge — how strongly and in which direction the text colors emotionally.
Ad Hominem
Example“The incompetent minister …” — the person attacked instead of the argument refuted.
Causal Misunderstanding
Example“Since X has governed, Y has fallen — so X is to blame.” Correlation as causation, without a mechanism.
Generalization
Example“All politicians lie”, “the media” as a monolithic actor.
Commercial Bias
ExampleA product report without distance; editorial content not separated from advertising.
Political Bias
ExampleA consistent camp tendency across fact selection, word choice and sources.
Every sub-finding passes the same verbatim gate as the top-level analysis: findings without a quote that can be verified in the original text are discarded. In the JSON export of an analysis, the layer appears as sub_categories[] with parent_dimension, verbatim_quote, char_offset and bias_strength.
Actor analysis — PFA-light
Who is put in which light?
Person-Oriented Framing Analysis (Felix Hamborg 2023) extracts the named actors per article and describes how they are talked about. That is more concrete than any holistic score — and makes systematic asymmetries between actors visible.
Per actor
Each identified person (politician, scientist, citizen, …) gets four fields:
mentions_count
How often the actor appears — across all designations (see Coreference).
sentiment_score
Aggregated tone toward the actor on a scale from −1 (negative) to +1 (positive).
framing_devices
Up to five recurring stylistic devices per actor — e.g. “attribution of blame”, “hero narrative”, “victim staging”.
representative_quotes
Three to five verbatim quotes that carry the framing — the verbatim gate ensures each quote appears 1:1 in the text.
Cross-person analysis
From the individual actors, a distribution observation is computed — whether the article treats the people comparably in language or not.
sentiment_disparity
Difference between the most positive and most negative actor sentiment in the article — computed over all actors with mentions_count ≥ 2.
disparity = max(sentiments) − min(sentiments)
balance_assessment
A qualitative classification of the disparity: balanced / slightly_asymmetric / strongly_asymmetric. With fewer than 2 actors with sufficient mentions, the field returns not_applicable — instead of inventing a value.
Aggregation from models
cross_person_analysis is not taken from the language model but recomputed deterministically from the filtered actor data. This keeps the disparity metric always consistent with the reported sentiment values — even if the model would summarize differently internally.
Coreference documentation
The same person, three names.
Political texts reference the same entity in several ways — by name, by role, by pronoun. Klyptra documents these cross-references explicitly so that mention counts and actor sentiment are not distorted by mere synonymy.
What happens
Per article, a list coreference_documentation.entities[] is reported. Each entity has a canonical_name and a list of all all_mentions[] found in the text.
mention_count is then recomputed deterministically as the sum of non-overlapping substring matches of all mentions in the text — not taken from the language model.
Example
In a report on Ukraine policy:
canonical_name
Volodymyr Zelensky
all_mentions
“Zelensky”
“the Ukrainian president”
“the head of state in Kyiv”
mention_count
7
Without this resolution, the actor would land in three different buckets — and the sentiment aggregation in the PFA layer would be distorted.
Pipeline
From submitted text to substantiated result.
Every step is independently testable and logged. Anyone who questions a result can trace the chain back to the individual piece of evidence in the text.
01
Input
Submitted text
The article text to be checked is submitted directly (paste or file) — optionally with a title and source label. Klyptra analyzes exactly this text, not the outlet behind it.
Three language models independently rate the same text on all six dimensions and extract the detail layers in parallel: sub-category findings, actor mentions with sentiment, and coreference clusters. A chain-of-verification (5 control questions) reduces hallucinations.
gpt-5.4-minimistral-large-2512deepseek-v4-flash
03
Aggregation
Median + agreement
Numeric scores: median across the three models. Labels: majority vote. The spread of the individual judgments is reported per dimension as model agreement — it stays visible instead of vanishing into the average.
MedianMajority voteAgreement report
04
Verbatim gate & markup
Evidence checked literally
Every finding must carry a quote that appears exactly in the original text — otherwise the evidence is discarded (the assessment remains marked as model-based). Verified evidence is highlighted in the text with a character offset. The result is a permalink with a 30-day TTL.
exact string matchchar_offset markupPermalink 30 days
One analyzer, two uses
The same analyzer code runs in two contexts:
On-demand (via /analyse): the actual analysis of a submitted text — with a permalink, 30-day TTL.
Reference corpus (internal): a continuously co-analyzed corpus serves exclusively to calibrate the scale. It produces no public outlet profiles and does not feed into individual user analyses.
Ensemble
Three models, because none alone is trustworthy.
Language models have their own, model-specific biases. Klyptra picks three models from three different pre-training pipelines so that the blind spots of a single model become visible through the others. Aggregation is by median (numeric) and majority vote (labels) — all models carry equal weight, none is preferred.
GPT-5.4 mini
OpenAI (USA)
A lean, low-latency variant of the GPT-5.4 line — it carries the on-demand analysis without giving up the score consistency of the larger models.
Mistral Large 2512
Mistral AI (France / EU)
A European pre-training pipeline from an independent provider — it brings a different training basis than the US models.
DeepSeek V4 Flash
DeepSeek (China)
A third, independent pre-training corpus — if this model deviates from the consensus, the spread becomes visible as model agreement.
Aggregation
Numeric scores
Median across all 3 models. Robust against outliers of a single model.
Categorical labels
Majority vote. On a 1:1:1 split, the label falls back to the score nearest the median.
Model agreement
The spread of the three judgments is reported per dimension as high, medium or low agreement — separate from the model's confidence.
Scientific foundations
A peer-reviewed foundation, one methodology.
Klyptra is not a self-construction. Every methodological layer references a peer-reviewed source with a DOI. The differentiation from BiasScanner (our direct predecessor) lies in the three-model ensemble, the German-language specialization, the verbatim gate and the PFA layer.
Concept & 6 dimensions
Spinde et al. (2023) — Media Bias Taxonomy
The six dimensions operationalize the bias types of the Media Bias Taxonomy. BABE (Spinde et al. 2021, EMNLP-Findings) serves as an expert-annotated validation benchmark (binary biased/neutral) and IRR yardstick — the 0–10 scale itself is Klyptra's operationalization, not part of BABE.
Spinde et al. (2023) Media Bias Taxonomy, ACM Comput. Surv., arXiv:2312.16148 · BABE: EMNLP-Findings 2021, DOI: 10.18653/v1/2021.findings-emnlp.101 · consolidated in Spinde (2025), Springer, Open Access, DOI: 10.1007/978-3-658-47798-1
The 27 specific bias categories are placed as a sub-layer beneath the six Klyptra dimensions. BiasScanner is Klyptra's direct scientific predecessor. Not part of Spinde's work — an independent foundation.
Menzner & Leidner (2024) “Improved Models for Media Bias Detection and Subcategorization”, NLDB 2024, pp. 181–196. DOI: 10.1007/978-3-031-70239-6_13
Methodological honesty requires naming the weak points. This list is not exhaustive — contributions are welcome.
LLMs as annotators
Language models have documented biases (Horych et al. 2025). Ensemble + verbatim requirement reduce this but do not eliminate it. Klyptra is not an arbiter — it is a systematic, verifiable indicator.
A snapshot of one text
Klyptra rates exactly the submitted text — not the outlet, newsroom or author behind it. A single result is not a verdict on an outlet; no general “tendency” of a source can be derived from one analysis.
Language level, not factual accuracy
Klyptra measures language and framing — not whether claimed facts are correct. Fact-checking is a separate task (see Correctiv, dpa fact-check).
Evidence without literal coverage is discarded
The verbatim gate keeps only quotes that appear exactly in the text. Paraphrases are not output as evidence — some dimensions therefore deliberately appear without an evidence quote and are marked as a model-based assessment. Honesty over forced evidence.
Detail layers not yet ensemble-aggregated
The six top-level scores are aggregated across all three models (median) and reported with their spread. The detail layers (sub-categories, actor analysis, coreference) currently come from one of the three models — a true union/dedupe aggregation across all models is planned as the next stage. A deliberate MVP decision, not a bug.
German-language specialization
The three models are primarily pre-trained in English. Idiom, subjunctive discipline and irony detection can be thinner in German than in English. Few-shot examples compensate in part; a systematic German ground-truth evaluation is on the research roadmap.
Genre bias: commentary as news
The six dimensions are calibrated for news reporting. If a commentary is submitted, word choice and emotional balance swing strongly, as expected — Klyptra cannot yet reliably tell whether a text should be classified as a report or a commentary. A genre detection as a pre-stage is documented in the methodology roadmap.
Bias annotation is constitutively subjective
Even trained experts reach only an agreement of Krippendorff α ≈ 0.40 on bias labels (Spinde 2025, Ch. 4) — that is the documented maximum of the domain, not a weakness signal. There is no strong “ground truth”; Klyptra's consistency aim targets this expert level, not objective truth.
Technically measurable is not socially relevant
Automated methods can report patterns that are statistically tangible but substantively irrelevant (Spinde 2025, Ch. 8). A score is a systematic indicator, not a final verdict on the significance of a text.
Your own standpoint colors perception
Readers perceive texts that contradict their position as more biased than comparable texts on their own side (hostile-media effect). Studies show that even bias visualizations do not resolve this effect (Spinde 2025, Ch. 7) — a result is read filtered through one's own stance.
Political classification is not a bias statement
Spinde (2025, Ch. 7) shows that a political classification does not increase bias perception — it communicates stance, not distortion. Klyptra's political, economic and social descriptors are a tendency indication and do not feed into the objectivity score.
Version history
When what changed.
Every single analysis carries a methodology_version tag. Existing analyses keep their version — methodology updates create no score drift in historical data.
v1.0
April 2026
6 top-level dimensions + verbatim-quote gate + multi-model ensemble. Scientific basis: Media Bias Taxonomy (Spinde et al. 2023), validated against BABE (2021).
v1.1
May 2026
Added: 27 bias sub-categories (BiasScanner), Person-Oriented Framing Analysis (PFA-light after Felix Hamborg), coreference documentation. Existing v1.0 analyses remain valid — the new layers are additive.
Reproducibility
Methodology, data, prompts.
Klyptra discloses its methodology: model versions, bias dimensions and the underlying research are documented on this page. Every analysis also carries a signature (system-prompt hash) that records its exact methodological state.
Methodology
Bias dimensions, scale and aggregation logic are described in full on this page.
Data
Every analysis is exportable as JSON and PDF for its creator.
Prompts
Versioned prompt templates incl. few-shot examples. Diff log for every change.