How to Interpret AI Reasoning and Scores Responsibly

FounderScan's AI provides scores and detailed reasoning for every evaluation. But what do these numbers actually mean? And how should you incorporate AI insights into your decision-making process?

This guide helps you interpret AI outputs responsibly and make better investment decisions.

Understanding the Scoring Scale

FounderScan uses a 1–10 scale for each criterion:

| Score | Meaning | |-------|---------| | 9–10 | Exceptional. Clear strength with strong evidence. | | 7–8 | Strong. Meets or exceeds expectations with good evidence. | | 5–6 | Adequate. Meets basic expectations but not a standout. | | 3–4 | Weak. Below expectations or insufficient evidence. | | 1–2 | Poor. Significant concerns or missing entirely. |

The overall score is a weighted average of criterion scores, with required criteria counting 2x.

What the AI Considers

For each criterion, the AI examines:

Application responses: What founders wrote about themselves and their company
Enriched profiles: LinkedIn data, professional history, education
Recent news: Press coverage, announcements, industry context
Implicit signals: Consistency, specificity, depth of responses

The AI then synthesizes this information to assess how well the startup meets the criterion.

Reading the Reasoning

Every score comes with written reasoning. Here's how to interpret it:

Look for Cited Evidence

Good AI reasoning points to specific facts:

"The CTO has 8 years of ML experience including 3 years leading the computer vision team at Tesla. This directly relevant technical background supports a score of 9/10 for Technical Team."

Generic statements without evidence should be treated with more skepticism.

Note Confidence Indicators

The AI expresses uncertainty when data is limited:

"Based on the application, the team appears to have relevant experience, though limited LinkedIn data was available for the second co-founder. Score: 6/10 with medium confidence."

Lower confidence scores deserve more human scrutiny.

Watch for Hedging Language

Phrases like "may have," "could indicate," or "potentially" signal the AI is inferring rather than observing. These are reasonable inferences but less reliable than direct evidence.

When to Trust the AI

AI-generated scores are most reliable when:

Data is abundant: Full application responses, complete LinkedIn profiles, recent news coverage
Criteria are objective: "Has technical co-founder" vs. "Founders are visionary"
Evidence is clear: Specific facts support the conclusion
Pattern matches training: Standard startup profiles in common industries

When to Apply Extra Scrutiny

Human review is especially important when:

Edge Cases and Outliers

Startups that don't fit typical patterns may be misjudged. A first-time founder with a breakthrough insight might score lower on "experience" but be exactly who you want to back.

Low-Data Situations

If LinkedIn enrichment failed or founders have minimal online presence, scores may be based on application text alone. This is less reliable.

Subjective Criteria

"Culture fit," "coachability," and "passion" are hard to assess from documents. Take these scores as directional, not definitive.

Exceptional Claims

If a startup claims exceptional traction or unique technology, verify independently. The AI can only assess what's written, not what's true.

Incorporating AI Into Your Process

We recommend a tiered review approach:

Tier 1: AI-Led Filtering (Top 20%, Bottom 40%)

Use AI scores to quickly identify:

Top candidates (8.0+ overall): Fast-track to partner review
Clear passes (< 5.0 overall): Likely not a fit; human spot-check recommended

Tier 2: Human Review (Middle 40%)

The middle band requires human judgment. These startups meet basic criteria but don't obviously stand out. Look for:

Compelling narratives the AI might miss
Red flags in the details
Culture or stage fit nuances

Tier 3: Deep Dives (Final Candidates)

For finalists, go beyond FounderScan:

Reference calls with founders' former colleagues
Customer interviews (if applicable)
Technical deep dives with your experts
In-person or video meetings

AI helps you efficiently reach this stage; humans make the final call.

Common Interpretation Mistakes

Treating Scores as Absolute Truth

A startup with 7.5 is not necessarily better than one with 7.3. Within a half-point, treat scores as equivalent and focus on qualitative differences.

Ignoring the Reasoning

The score is a summary; the reasoning is the substance. Two 7s might have very different stories: one solid across the board, another with a 9 in market and a 5 in team.

Over-trusting on Weak Data

If you see "limited data available" or "could not verify," the score is an educated guess. Weight it accordingly.

Forgetting Base Rates

In a highly competitive cohort, a 7.0 might be below average. Context matters—compare scores within the batch, not to an abstract ideal.

Calibrating Over Time

Track how AI scores correlate with outcomes:

Do high-scored companies perform better in your program?
Are there systematic biases (e.g., favoring certain industries)?
Do certain criteria predict success better than others?

Use this data to refine criteria and adjust your trust in different score ranges.

The Human-AI Partnership

FounderScan is designed to augment human judgment, not replace it. The AI handles:

Volume: Processing hundreds of applications consistently
Research: Gathering and synthesizing information
Structure: Applying criteria systematically

You bring:

Intuition: Pattern recognition from experience
Relationships: Assessing fit and potential chemistry
Context: Knowledge of your program's unique needs
Accountability: Owning the final decision

Together, you make better decisions faster.

For more on setting up effective criteria, see our guide on designing evaluation criteria.