Open Methodology

How Podlens scores podcasts

Every score on Podlens is explainable, tied to transcript evidence, and applied symmetrically across the political spectrum. This page is for anyone who wants to understand, verify, or challenge our scoring — journalists, researchers, and users who got a result they disagree with.

Contents

Political Bias Score (−100 to +100)
The 6-Dimension Framework
Host Credibility Score
Factual Density
Source Diversity
Framing Patterns & Loaded Language
Omission Risk
Transcript Evidence & Citations
Political Symmetry Guarantee
Known Limitations

1. Political Bias Score

The political bias score is a single number from −100 (far left) to +100 (far right), with 0 representing balanced or neutral content. It is the primary output of every analysis.

Far left −100

Far right +100

The score is derived from four weighted signals detected in the transcript:

The Bias Score is derived from multiple weighted signals including framing patterns, language choices, source selection, and omitted perspectives — each scored independently from the transcript.

Each signal is scored independently using the transcript, then combined into the final score. The model is instructed to evaluate language patterns, regulatory framing, value assumptions, and perspective omission — not the political positions themselves.

Plain English labels: Scores are converted to labels for readability. ±0–15 = Center, ±16–35 = Lean left/right, ±36–60 = Left/Right, ±61–80 = Strong left/right, ±81–100 = Far left/right.

The score reflects how the episode is framed, not the topic itself or the guest's personal views. A conservative economist can be interviewed in a neutral, left-leaning, or right-leaning way depending on the questions asked and the framing of the host.

2. The 6-Dimension Framework

The single bias score is a useful summary, but it hides important distinctions. A podcast can be factually rigorous but politically one-sided. Or balanced politically but missing critical perspectives. The 6-dimension framework separates these signals.

All 6 scores are available on all paid tiers. Free users see the scores. Starter Lens and above unlock the transcript evidence quotes behind each score.

⚖️

Perspective Balance

Were opposing viewpoints given meaningful airtime? A score of 100 means all major perspectives were fairly represented. A score near 0 means one viewpoint dominated with no counterweight.

0 = single perspective only · 100 = all major views represented

🔬

Factual Density

The ratio of sourced, verifiable claims to total claims made. A score of 100 means every claim is sourced or directly verifiable. A score of 0 means all claims are asserted without evidence.

0 = no sourcing · 100 = fully sourced

🌐

Source Diversity

How many distinct, independent perspectives are represented? A single-guest interview with no external sources scores low. A panel drawing from academic, policy, and industry sources scores high.

0 = single voice · 100 = multiple independent sources

🗣️

Framing Patterns

How much loaded, emotionally charged, or advocacy language appears? Neutral reporting scores low. Speeches, op-eds, and advocacy interviews score high. This dimension is not directional — it applies symmetrically.

0 = purely neutral · 100 = highly loaded

🎙️

Host Credibility

Measures how rigorously the host challenges claims, corrects errors, cites sources, and maintains consistent standards across guests. High = active challenger. Low = uncritical platform.

0 = uncritical · 100 = highly rigorous

🕳️

Omission Risk

Estimates how much important, relevant context is absent from the episode. Based on what a knowledgeable person would expect to be covered given the topic and the claims made.

0 = comprehensive · 100 = major omissions detected

3. Host Credibility Score

The host credibility score is a 0–100 rating of how rigorously a host holds their guests and themselves to evidential standards. It is episode-level — the same host can score differently across episodes depending on the guest and topic.

Host Credibility is scored across multiple factors including pushback rate, citation quality, correction behaviour, and consistency of standards across guests.

Pushback rate — what fraction of contestable claims received a follow-up question, challenge, or request for evidence. A host who accepts all guest claims without question scores 0 on this factor.

Citation quality — when the host introduces external information, how well-sourced is it? Links to primary sources, named institutions, or publication titles score higher than vague references.

Correction rate — does the host correct factual errors from previous episodes? Does the host acknowledge when a guest's claim is disputed by other evidence?

Consistency — are the same standards applied to guests across the political spectrum? A host who challenges left-leaning guests but not right-leaning ones (or vice versa) scores lower on consistency.

Score interpretation

80–100

High

50–79

Moderate

0–49

Low

A low score does not mean the host is dishonest or bad — it means the episode functions more as a platform than an interrogation. Some of the world's most insightful long-form podcasts score low on credibility while scoring high on content depth. The score measures rigor, not quality.

4. Factual Density

Factual density measures how much of the episode's content is grounded in verifiable, sourced claims versus opinion, assertion, or speculation. It does not measure whether claims are correct — only whether they are supported.

Factual Density measures the proportion of claims in an episode that are sourced or independently verifiable.
A "substantive claim" is any assertion about the world that could in principle be verified or disputed.

The model distinguishes between:

Sourced claims — the speaker cites a study, institution, publication, or named expert. E.g. "According to the IMF's 2024 report..."

Verifiable claims — the claim references a publicly known fact that could be checked, even if not explicitly sourced. E.g. "NVIDIA's market cap passed $1 trillion in 2023."

Asserted claims — stated as fact without source or verification path. E.g. "AI will replace 40% of jobs within a decade." These reduce the score.

Opinion/speculation — clearly framed as personal view or prediction. These are excluded from the calculation entirely — opinion is not a flaw, only unsourced factual claims are.

5. Source Diversity

Source diversity measures how many distinct, independent perspectives are present. A single guest in a two-hour conversation with no external sources represents minimal diversity — even if the host is excellent and the content is high-quality.

Source Diversity = f(distinct source types, ideological variety, expert categories)
Weighted by independence — a CEO and their company's press release count as one source.

Source categories tracked: academic researchers, policy experts, industry insiders, civil society, independent journalists, government/regulatory bodies, and first-person accounts. Episodes that draw from multiple categories across ideological lines score highest.

Important: Low source diversity is not always a flaw. A founder telling their own story, or an expert explaining their own research, is expected to have a single perspective. The score is most meaningful for episodes that make broad claims about contested topics.

6. Framing Patterns & Loaded Language

This dimension measures how much the language of the episode goes beyond neutral description into advocacy, emotional framing, or loaded terminology. It is deliberately not directional — the same language patterns that push left-leaning content toward a high score push right-leaning content equally.

The model looks for: emotionally charged vocabulary, calls to action, framing that assumes the listener shares a political prior, in-group/out-group language, and rhetorical devices that function more to persuade than inform.

Examples of loaded framing (neutral equivalent in brackets):

"The radical left wants to destroy small business" → [Some progressive policies would affect small businesses]
"Billionaires are hoarding wealth" → [Wealth concentration has increased]
"This existential threat to democracy" → [This poses risks to democratic norms]
"The job-killing regulation" → [The regulation would reduce employment in this sector]

Note: advocacy podcasts, political commentary, and opinion shows are expected to score higher on this dimension. The score is not a disqualifier — it is a descriptor. A listener who prefers explicitly opinionated content should weigh this score differently.

7. Omission Risk

Omission risk is the hardest dimension to score — it requires knowing what was not said. The model assesses this by comparing the episode's coverage against a knowledge base of what topics, facts, and perspectives are typically associated with the subject matter.

Omission Risk = f(missing context, missing opposing views, missing relevant facts)
Weighted by how material the omission is to the episode's central claims.

Material omissions are things that would meaningfully change how a listener evaluates the episode's central claims. For example: an episode about NVIDIA's market dominance that doesn't mention the company's ongoing antitrust scrutiny has a high-materiality omission.

Non-material omissions are things that could be discussed but weren't without distorting the listener's understanding. Every episode omits something — only omissions that affect the truthfulness of the implied picture are scored.

The model flags up to 3 high-materiality omissions per episode. Each is explained with what was missing and why it matters.

8. Transcript Evidence & Citations

Every finding on Podlens is grounded in the actual transcript. We do not score vibes — we score words. Each dimension score comes with up to two direct quotes from the episode that illustrate why the score landed where it did.

Transcript quotes are presented with timestamps so you can listen to the moment yourself. If a timestamp is unavailable (e.g. the source was an RSS audio file without chapter data), the quote is still provided without a timestamp.

Why this matters: A bias score without evidence is just an opinion. The transcript evidence is what makes a Podlens score different from a political label — it is falsifiable. If the quote doesn't support the finding, the finding is wrong, and we want to know about it.

To report a finding you believe is inaccurate, email hello@podlens.app with the episode URL and the specific finding. We review all reports and update the community cache when findings are corrected.

9. Political Symmetry Guarantee

The most important constraint on the Podlens analysis model is symmetry: the exact same criteria that detect left-leaning framing must detect right-leaning framing, and vice versa. This is not a design goal — it is a hard requirement baked into the prompt.

The model is explicitly instructed to evaluate language patterns and framing choices, not political positions. "Government should regulate AI" and "government should not regulate AI" are both treated as positions — neither is treated as the neutral baseline. What is scored is how each position is framed, sourced, and challenged.

We test our model regularly against episodes from across the political spectrum to verify that equivalent framing patterns produce equivalent scores regardless of direction. If you believe a result shows directional bias in our scoring, contact us — we take these reports seriously.

10. Known Limitations

Language only. Podlens analyzes transcripts. Audio tone, visual content, guest body language, and off-the-record context are not captured. The same words spoken sarcastically and sincerely produce the same score.

Transcript quality. For episodes transcribed via audio (not YouTube captions), occasional transcription errors can affect quote accuracy. We use AssemblyAI with high accuracy settings, but complex audio with multiple speakers, heavy accents, or poor recording quality degrades transcript quality.

Model knowledge cutoff. The model cannot know about events after its training data cutoff. Omission risk for very recent topics may be underestimated because the model doesn't know what facts became available after recording.

Episode length sampling. Very long episodes (3+ hours) are analyzed using a smart sampling method: the first 3,000 words, a 3,000-word middle sample, and the last 3,000 words. This may miss significant moments in the middle of an unusually long episode.

Context-dependent framing. Some framing patterns are heavily context-dependent. Technical jargon that sounds loaded in a news context may be standard in an academic one. The model is trained to account for context but is not perfect.

Version history. The scoring model is updated over time as we improve accuracy and add dimensions. Scores are versioned — a score generated with an older model version may differ from one generated today. We flag version differences when comparing episodes.

Feedback and corrections: Podlens scores are outputs of an AI model and should be treated as informed starting points, not authoritative judgments. We believe they are useful and accurate on average, but we are not infallible. If you find a score that seems wrong, let us know.

Our mission — a force of clarity How Podlens works — from URL to intelligence Frequently asked questions