Data Sources
Every data point in TheWatchdogs comes from public, verifiable sources — official government records, representatives' own public statements, and their public social media accounts. We track official, campaign, and personal accounts where ownership has been verified. We don't rely on anonymous tips, leaked documents, or insider information.
All content is archived in full at the time of collection, including the complete text, timestamp, and source URL. Social media posts, video transcripts, and press releases are captured as published — if original content is later removed by the representative, our archived record remains. Government records such as floor votes, bill text, and Congressional Record transcripts are permanent public records and can always be independently verified.
Our data categories include floor votes, bill sponsorships, social media activity across multiple platforms, video transcripts from official channels, floor speeches from the Congressional Record, press releases, and contextual background from published news sources. All data is collected daily through automated systems and verified before inclusion in any analysis.
The Daily Process
Every day, our automated system runs a full data collection and analysis cycle for each tracked representative. The process collects new votes and legislative activity, gathers social media posts across platforms, processes new video transcripts and floor speeches, categorizes content by topic, and applies strict editorial guardrails throughout.
Cross-platform duplicate content is identified and removed before analysis, ensuring that each representative's activity is accurately represented without artificial inflation.
What We Grade — And What We Don't
We grade consistency and conduct, not ideology.
A loyal conservative who promises to vote party line and does so consistently should score well. A bipartisan moderate who promises to work across the aisle but never does should score poorly. The grade measures honesty and consistency with constituents — not whether we agree with their positions.
We do not grade:
- Policy positions themselves — supporting or opposing any legislation is not a positive or negative signal
- Party loyalty — voting with your party is not inherently good or bad
- Ideology — conservative, moderate, and progressive members are evaluated by the same standard
- Media coverage — how often a representative appears in the news has no bearing on their grade
- Fundraising totals — how much money a representative raises is not part of the score
A conservative and a progressive can both receive an A. A member of any party can receive an F. The only question is: did you do what you said you'd do, and did you conduct yourself with basic dignity while doing it?
Accountability Score & Letter Grades
Each representative receives a composite accountability score (0–100) computed from five independently scored components. The score measures the consistency gap between what a representative says and what they do.
The Five Components
| Component | Weight | What It Measures |
|---|---|---|
| Vote Consistency | 35% | Do floor votes match the representative's stated positions across 24 policy topics? |
| Transparency | 25% | Does the representative state clear positions on major issues, and do they engage with media beyond friendly outlets? |
| Rhetorical Conduct | 15% | Does the representative's public language meet a nonpartisan standard of democratic discourse? |
| Constituent Alignment | 15% | Is donor geography in-district/in-state, does social media focus on local issues, and do they hold town halls? |
| Rhetoric Consistency | 10% | Do public statements align with each other — no significant contradictions between words and other words? |
How the Letter Grade Works — Hybrid Curve
The numeric score (0–100) is always the honest, unmanipulated number. It is never curved. What the letter grade does is place that score in context: how does this representative compare to their peers in the same chamber?
House members are graded on a House curve. Senators are graded on a Senate curve. This means the same numeric score might earn different letter grades in each chamber, depending on the distribution. The numeric score tells you the absolute performance; the letter grade tells you the relative standing.
Absolute excellence thresholds: Regardless of the curve, exceptional scores earn top grades automatically. Very high scores always receive top-tier grades, ensuring that genuine excellence is recognized even if the entire chamber performs well.
Hard floors: Very low scores cannot receive passing grades regardless of where they fall on the curve. This ensures that poor absolute performance is never masked by a weak peer group.
Why a curve? Because an absolute grading scale would be misleading. If every member of Congress scored between 65 and 85, an absolute A–F scale would cluster everyone into two or three letter grades with no differentiation. The curve ensures the letter grade always communicates meaningful relative standing. The numeric score remains the honest, uncurved number.
Confidence indicator: Each grade includes a confidence level based on data volume — Low (under 3 months of data), Medium (3–9 months), or High (9+ months). New members receive grades with a Low confidence flag rather than being excluded. Missing data is scored as neutral, not penalized.
Component Details
Vote Consistency (35%)
For each of the representative's stated or campaign positions across 24 policy topics, we identify votes on related bills. The score reflects the percentage of votes that align with stated positions.
Key principles:
- Inferred positions are excluded. We only measure against positions the representative has publicly stated or campaigned on. Contradicting a position we inferred from the voting record would be circular logic.
- Position specificity required. A broad claim like "I support veterans" is not specific enough to flag any veterans-related vote as a contradiction. The position must be specific enough to apply to the exact provision voted on.
- Omnibus bills treated carefully. Voting against a large spending package that funds something a member supports can reflect principled objection to other provisions — not hypocrisy.
- Procedural votes excluded. Motions to table, recommit, cloture, and similar procedural actions are never counted.
- Party-line skepticism. When nearly an entire party votes the same way, contradictions are flagged but weighted with additional context.
- Explained votes honored. If a representative publicly explains why they voted against their stated position, that vote is excluded from the penalty.
- Position changes respected. If a representative says "I changed my mind on this issue," we update the record. People are allowed to evolve — they just have to own it.
Transparency (25%)
Transparency measures two things: does the representative take clear positions on major issues, and do they engage with media outside their ideological comfort zone?
Positions coverage: We track whether the representative has stated positions on major policy topics. Representatives with clear, public positions score higher. Empty or vague positions reduce the score.
Media transparency: We analyze the representative's media appearances across outlets classified by ideological lean. A Republican who appears only on conservative media, or a Democrat who appears only on progressive media, scores lower than one who engages across the spectrum. Appearing on outlets that challenge your positions is a positive signal — it demonstrates willingness to face scrutiny. Representatives who avoid challenging media environments entirely receive a reduced score on this sub-component.
Rhetorical Conduct (15%)
This component exists because political science research is unambiguous: dehumanizing and inflammatory rhetoric from elected officials corrodes democratic norms, increases political violence, and harms democratic institutions — regardless of party. We track this not as a policy judgment but as a conduct standard.
A standardized keyword taxonomy applies to every representative regardless of party. Language is categorized by severity, with dehumanizing terms and incitement language weighted more heavily than common partisan rhetoric. A weighted rate is computed across all public posts, transcripts, and floor speeches.
Nonpartisan by design: The same words trigger the same penalty regardless of who says them. Dehumanizing language, incitement rhetoric, and personal attacks are flagged identically whether spoken by a Democrat or a Republican. The standard is conduct — not ideology.
Academic basis: This component is grounded in peer-reviewed research on the effects of elite rhetoric on democratic institutions, including work by the Carnegie Endowment for International Peace (2023), research on inflammatory rhetoric and political violence in Security Studies (Piazza, 2023), and the Political Hostility Scale framework published in Tandfonline (2025).
Constituent Alignment (15%)
Are they focused on their district, or performing for national cameras?
Three sub-components:
- Donor geography: What percentage of campaign donors are from the representative's state or district? Higher in-district and in-state donation rates score better.
- Local social media focus: Does the representative's social media content address local and district-level issues, or is it predominantly national talking points?
- Town hall engagement: Does the representative hold public town halls where constituents can ask questions? More frequent, in-person town halls score higher. (See the Town Hall Accountability Grade section below for the separate, more detailed grading scale.)
Rhetoric Consistency (10%)
Do their words match their other words? This component catches hypocrisy — specifically, representatives who publicly call for civility and then engage in the exact conduct they condemned.
Civility claims include explicit directives ("tone down the rhetoric"), self-imposed standards ("I believe in civil discourse"), and the implicit baseline accepted by taking office. When a representative makes such a claim and then violates it, that contradiction is flagged with a severity level based on how close in time the contradiction occurred to the original claim.
Note on the difference between Rhetorical Conduct and Rhetoric Consistency: Rhetorical Conduct measures behavior — how the representative actually speaks, regardless of what they've claimed. Rhetoric Consistency measures hypocrisy — the gap between what they say about civility and how they actually behave. A representative who is consistently inflammatory scores low on Conduct. A representative who claims to be civil while being inflammatory scores low on both.
Contradiction Detection
We track contradictions between representatives' stated positions and their actions. When a vote conflicts with a stated position, the system flags it with a severity level: major (direct reversal), moderate (significant inconsistency), or minor (subtle tension).
Only clear contradictions are flagged — gray areas and judgment calls are excluded.
If a representative publicly acknowledges changing their position on an issue, we update our records and do not flag it as a contradiction. People are allowed to evolve — they just have to own it.
Flags, Confirmation, and Score Impact
Automated detection runs daily across all 538 tracked members. Every flag enters a review queue before it affects any score or public profile. A confirmed flag is included in the component score at full weight — the contradiction or conduct hit is real and counts. A dismissed flag is permanently excluded: it is never counted again, and the score adjusts on the next grading run. Items in the queue pending review do not affect scoring.
No automated detection directly changes a public-facing grade without review.
Human Review & Continuous Improvement
Automated Detection
Daily pipeline runs contradiction detection (vote vs. stated position), rhetoric contradiction scanning (civility claims vs. inflammatory language), and rhetorical conduct analysis across all posts, transcripts, floor speeches, and press releases. Each flag is written to the review queue with a severity tier and the exact excerpt that triggered it.
Editorial Review
Every flag is reviewed against our editorial standards. Each item is confirmed, dismissed, or marked for further evidence. Dismissed items — false positives — are permanently excluded from scoring. Confirmed items are reflected in the public profile on the next grading run.
Progressive Automation
Our review pipeline combines automated detection with human editorial judgment. As accuracy improves through accumulated review decisions, routine classifications are progressively automated — but the editorial standards and methodology are always human-defined. Human review focuses increasingly on edge cases, novel patterns, and calibration rather than obvious calls.
Accuracy Improves Over Time
Every review decision is stored as structured training data. False positive rates inform threshold calibration. The more items reviewed, the more precisely the system can be tuned — reducing noise while increasing coverage of genuine patterns.
Open to Correction
Accountability tools should be accountable themselves. We publish this process because the integrity of the scores depends on the integrity of the review pipeline. If you believe a specific flag is a false positive or we've made an error, reach out at [email protected] — we investigate every report and correct the record when warranted.
AI Use & Editorial Guardrails
Our analysis is AI-assisted but operates within strict editorial guardrails designed to prevent the failure modes that make AI-generated content unreliable. The AI's job is to synthesize and summarize verified public data — not to editorialize or speculate.
Anti-Repetition
Each analysis is checked against recent coverage for the same representative. No recycled filler.
Anti-Hallucination
The AI verifies dates, vote counts, and specific claims against the source data. If it can't confirm a fact from the provided data, it doesn't include it.
Fairness Standard
Representatives who are actively legislating, holding town halls, and engaging constituents aren't nitpicked for minor inconsistencies. Accountability is proportional — we focus scrutiny where the gaps between rhetoric and action are significant.
Rhetoric Preservation
When a representative uses inflammatory or notable language, we quote it directly rather than paraphrasing. You should see their actual words, not our interpretation of them.
Source Verification
Every claim traces back to a vote record, a direct quote, a public document, or an official statement. Unverifiable claims are excluded.
These guardrails are continuously refined based on output review and feedback.
What AI Doesn't Do
AI does not choose which representatives to cover, does not decide what's "important" outside of the data it's given, does not generate opinions or editorial positions, and does not have access to any non-public information. It is a synthesis tool operating within defined guardrails on verified public data.
Town Hall Accountability Grade
This is a standalone grade, separate from the composite accountability score above.
Every member of Congress is graded on in-person constituent engagement over a rolling 24-month window. This grade reflects how accessible each representative makes themselves to the people they represent — not their legislative activity or policy positions.
Grading Scale (in-person town halls in the past 24 months)
- A — 4 or more in-person town halls
- B — 3 in-person town halls
- C — 2 in-person town halls
- D — 1 in-person town hall
- D− — No in-person events, but held telephone or virtual town halls
- F — No town hall of any kind
Empty Chair Penalty
A grade drops one letter if constituents organized "empty chair" events and the representative's most recent empty chair event is more recent than their last in-person town hall — meaning they still haven't responded to constituent demand. If the representative held an in-person town hall after the empty chair event, no penalty applies.
What Counts
In-person town halls open to the general public, telephone town halls, virtual town halls, and public office hours open to any constituent (not by invitation only).
What Doesn't Count
Invite-only roundtables, fundraisers, press conferences, committee hearings, or events where attendance was curated or restricted.
Data Sources
Official .house.gov and .senate.gov event pages, local news archives, LegiStorm, and verified constituent reports. Town hall data is research-based rather than scraped from a live API.
Corrections
If you know of a town hall we missed or have a correction, email [email protected]. We review all submissions and update the record when verified.
A note on transparency: We publish this methodology because we believe accountability tools should be accountable themselves. If you have questions about our process or spot an error in our coverage, we want to hear about it. Reach out at [email protected].