How to Interpret AI Reasoning and Scores Responsibly
A guide to understanding FounderScan's AI-generated scores and reasoning, including best practices for incorporating AI insights into your decision-making process.
FounderScan's AI provides scores and detailed reasoning for every evaluation. But what do these numbers actually mean? And how should you incorporate AI insights into your decision-making process?
This guide helps you interpret AI outputs responsibly and make better investment decisions.
Understanding the Scoring Scale
FounderScan uses a 1–10 scale for each criterion:
| Score | Meaning | |-------|---------| | 9–10 | Exceptional. Clear strength with strong evidence. | | 7–8 | Strong. Meets or exceeds expectations with good evidence. | | 5–6 | Adequate. Meets basic expectations but not a standout. | | 3–4 | Weak. Below expectations or insufficient evidence. | | 1–2 | Poor. Significant concerns or missing entirely. |
The overall score is a weighted average of criterion scores, with required criteria counting 2x.
What the AI Considers
For each criterion, the AI examines:
- Application responses: What founders wrote about themselves and their company
- Enriched profiles: LinkedIn data, professional history, education
- Recent news: Press coverage, announcements, industry context
- Implicit signals: Consistency, specificity, depth of responses
The AI then synthesizes this information to assess how well the startup meets the criterion.
Reading the Reasoning
Every score comes with written reasoning. Here's how to interpret it:
Look for Cited Evidence
Good AI reasoning points to specific facts:
"The CTO has 8 years of ML experience including 3 years leading the computer vision team at Tesla. This directly relevant technical background supports a score of 9/10 for Technical Team."
Generic statements without evidence should be treated with more skepticism.
Note Confidence Indicators
The AI expresses uncertainty when data is limited:
"Based on the application, the team appears to have relevant experience, though limited LinkedIn data was available for the second co-founder. Score: 6/10 with medium confidence."
Lower confidence scores deserve more human scrutiny.
Watch for Hedging Language
Phrases like "may have," "could indicate," or "potentially" signal the AI is inferring rather than observing. These are reasonable inferences but less reliable than direct evidence.
When to Trust the AI
AI-generated scores are most reliable when:
- Data is abundant: Full application responses, complete LinkedIn profiles, recent news coverage
- Criteria are objective: "Has technical co-founder" vs. "Founders are visionary"
- Evidence is clear: Specific facts support the conclusion
- Pattern matches training: Standard startup profiles in common industries
When to Apply Extra Scrutiny
Human review is especially important when:
Edge Cases and Outliers
Startups that don't fit typical patterns may be misjudged. A first-time founder with a breakthrough insight might score lower on "experience" but be exactly who you want to back.
Low-Data Situations
If LinkedIn enrichment failed or founders have minimal online presence, scores may be based on application text alone. This is less reliable.
Subjective Criteria
"Culture fit," "coachability," and "passion" are hard to assess from documents. Take these scores as directional, not definitive.
Exceptional Claims
If a startup claims exceptional traction or unique technology, verify independently. The AI can only assess what's written, not what's true.
Incorporating AI Into Your Process
We recommend a tiered review approach:
Tier 1: AI-Led Filtering (Top 20%, Bottom 40%)
Use AI scores to quickly identify:
- Top candidates (8.0+ overall): Fast-track to partner review
- Clear passes (< 5.0 overall): Likely not a fit; human spot-check recommended
Tier 2: Human Review (Middle 40%)
The middle band requires human judgment. These startups meet basic criteria but don't obviously stand out. Look for:
- Compelling narratives the AI might miss
- Red flags in the details
- Culture or stage fit nuances
Tier 3: Deep Dives (Final Candidates)
For finalists, go beyond FounderScan:
- Reference calls with founders' former colleagues
- Customer interviews (if applicable)
- Technical deep dives with your experts
- In-person or video meetings
AI helps you efficiently reach this stage; humans make the final call.
Common Interpretation Mistakes
Treating Scores as Absolute Truth
A startup with 7.5 is not necessarily better than one with 7.3. Within a half-point, treat scores as equivalent and focus on qualitative differences.
Ignoring the Reasoning
The score is a summary; the reasoning is the substance. Two 7s might have very different stories: one solid across the board, another with a 9 in market and a 5 in team.
Over-trusting on Weak Data
If you see "limited data available" or "could not verify," the score is an educated guess. Weight it accordingly.
Forgetting Base Rates
In a highly competitive cohort, a 7.0 might be below average. Context matters—compare scores within the batch, not to an abstract ideal.
Calibrating Over Time
Track how AI scores correlate with outcomes:
- Do high-scored companies perform better in your program?
- Are there systematic biases (e.g., favoring certain industries)?
- Do certain criteria predict success better than others?
Use this data to refine criteria and adjust your trust in different score ranges.
The Human-AI Partnership
FounderScan is designed to augment human judgment, not replace it. The AI handles:
- Volume: Processing hundreds of applications consistently
- Research: Gathering and synthesizing information
- Structure: Applying criteria systematically
You bring:
- Intuition: Pattern recognition from experience
- Relationships: Assessing fit and potential chemistry
- Context: Knowledge of your program's unique needs
- Accountability: Owning the final decision
Together, you make better decisions faster.
For more on setting up effective criteria, see our guide on designing evaluation criteria.