Every score on Podlens is explainable, tied to transcript evidence, and applied symmetrically across the political spectrum. This page is for anyone who wants to understand, verify, or challenge our scoring — journalists, researchers, and users who got a result they disagree with.
The political bias score is a single number from −100 (far left) to +100 (far right), with 0 representing balanced or neutral content. It is the primary output of every analysis.
The score is derived from four weighted signals detected in the transcript:
Each signal is scored independently using the transcript, then combined into the final score. The model is instructed to evaluate language patterns, regulatory framing, value assumptions, and perspective omission — not the political positions themselves.
±0–15 = Center, ±16–35 = Lean left/right, ±36–60 = Left/Right, ±61–80 = Strong left/right, ±81–100 = Far left/right.
The score reflects how the episode is framed, not the topic itself or the guest's personal views. A conservative economist can be interviewed in a neutral, left-leaning, or right-leaning way depending on the questions asked and the framing of the host.
The single bias score is a useful summary, but it hides important distinctions. A podcast can be factually rigorous but politically one-sided. Or balanced politically but missing critical perspectives. The 6-dimension framework separates these signals.
All 6 scores are available on all paid tiers. Free users see the scores. Starter Lens and above unlock the transcript evidence quotes behind each score.
The host credibility score is a 0–100 rating of how rigorously a host holds their guests and themselves to evidential standards. It is episode-level — the same host can score differently across episodes depending on the guest and topic.
Pushback rate — what fraction of contestable claims received a follow-up question, challenge, or request for evidence. A host who accepts all guest claims without question scores 0 on this factor.
Citation quality — when the host introduces external information, how well-sourced is it? Links to primary sources, named institutions, or publication titles score higher than vague references.
Correction rate — does the host correct factual errors from previous episodes? Does the host acknowledge when a guest's claim is disputed by other evidence?
Consistency — are the same standards applied to guests across the political spectrum? A host who challenges left-leaning guests but not right-leaning ones (or vice versa) scores lower on consistency.
A low score does not mean the host is dishonest or bad — it means the episode functions more as a platform than an interrogation. Some of the world's most insightful long-form podcasts score low on credibility while scoring high on content depth. The score measures rigor, not quality.
Factual density measures how much of the episode's content is grounded in verifiable, sourced claims versus opinion, assertion, or speculation. It does not measure whether claims are correct — only whether they are supported.
The model distinguishes between:
Sourced claims — the speaker cites a study, institution, publication, or named expert. E.g. "According to the IMF's 2024 report..."
Verifiable claims — the claim references a publicly known fact that could be checked, even if not explicitly sourced. E.g. "NVIDIA's market cap passed $1 trillion in 2023."
Asserted claims — stated as fact without source or verification path. E.g. "AI will replace 40% of jobs within a decade." These reduce the score.
Opinion/speculation — clearly framed as personal view or prediction. These are excluded from the calculation entirely — opinion is not a flaw, only unsourced factual claims are.
Source diversity measures how many distinct, independent perspectives are present. A single guest in a two-hour conversation with no external sources represents minimal diversity — even if the host is excellent and the content is high-quality.
Source categories tracked: academic researchers, policy experts, industry insiders, civil society, independent journalists, government/regulatory bodies, and first-person accounts. Episodes that draw from multiple categories across ideological lines score highest.
This dimension measures how much the language of the episode goes beyond neutral description into advocacy, emotional framing, or loaded terminology. It is deliberately not directional — the same language patterns that push left-leaning content toward a high score push right-leaning content equally.
The model looks for: emotionally charged vocabulary, calls to action, framing that assumes the listener shares a political prior, in-group/out-group language, and rhetorical devices that function more to persuade than inform.
Note: advocacy podcasts, political commentary, and opinion shows are expected to score higher on this dimension. The score is not a disqualifier — it is a descriptor. A listener who prefers explicitly opinionated content should weigh this score differently.
Omission risk is the hardest dimension to score — it requires knowing what was not said. The model assesses this by comparing the episode's coverage against a knowledge base of what topics, facts, and perspectives are typically associated with the subject matter.
Material omissions are things that would meaningfully change how a listener evaluates the episode's central claims. For example: an episode about NVIDIA's market dominance that doesn't mention the company's ongoing antitrust scrutiny has a high-materiality omission.
Non-material omissions are things that could be discussed but weren't without distorting the listener's understanding. Every episode omits something — only omissions that affect the truthfulness of the implied picture are scored.
The model flags up to 3 high-materiality omissions per episode. Each is explained with what was missing and why it matters.
Every finding on Podlens is grounded in the actual transcript. We do not score vibes — we score words. Each dimension score comes with up to two direct quotes from the episode that illustrate why the score landed where it did.
Transcript quotes are presented with timestamps so you can listen to the moment yourself. If a timestamp is unavailable (e.g. the source was an RSS audio file without chapter data), the quote is still provided without a timestamp.
To report a finding you believe is inaccurate, email hello@podlens.app with the episode URL and the specific finding. We review all reports and update the community cache when findings are corrected.
The most important constraint on the Podlens analysis model is symmetry: the exact same criteria that detect left-leaning framing must detect right-leaning framing, and vice versa. This is not a design goal — it is a hard requirement baked into the prompt.
The model is explicitly instructed to evaluate language patterns and framing choices, not political positions. "Government should regulate AI" and "government should not regulate AI" are both treated as positions — neither is treated as the neutral baseline. What is scored is how each position is framed, sourced, and challenged.
We test our model regularly against episodes from across the political spectrum to verify that equivalent framing patterns produce equivalent scores regardless of direction. If you believe a result shows directional bias in our scoring, contact us — we take these reports seriously.
Language only. Podlens analyzes transcripts. Audio tone, visual content, guest body language, and off-the-record context are not captured. The same words spoken sarcastically and sincerely produce the same score.
Transcript quality. For episodes transcribed via audio (not YouTube captions), occasional transcription errors can affect quote accuracy. We use AssemblyAI with high accuracy settings, but complex audio with multiple speakers, heavy accents, or poor recording quality degrades transcript quality.
Model knowledge cutoff. The model cannot know about events after its training data cutoff. Omission risk for very recent topics may be underestimated because the model doesn't know what facts became available after recording.
Episode length sampling. Very long episodes (3+ hours) are analyzed using a smart sampling method: the first 3,000 words, a 3,000-word middle sample, and the last 3,000 words. This may miss significant moments in the middle of an unusually long episode.
Context-dependent framing. Some framing patterns are heavily context-dependent. Technical jargon that sounds loaded in a news context may be standard in an academic one. The model is trained to account for context but is not perfect.
Version history. The scoring model is updated over time as we improve accuracy and add dimensions. Scores are versioned — a score generated with an older model version may differ from one generated today. We flag version differences when comparing episodes.