"Which Number Is Right?"

An interactive simulation of the #1 pain point that 1,385 agent skills fail to solve

Based on Real Data 1,995 Skills Scanned 366 Reddit Pain Points

You are a Senior Data Engineer. It's Monday morning.
Your pipeline ran overnight. All green. All tests passed.
Then the CFO's assistant sends a message...

09:15 AM — Slack Notification

Sarah Chen (CFO Office) 9:15 AM

Hey, the weekly revenue report in Looker shows $2.3M for last week but Finance pulled $2.7M directly from the Stripe Dashboard. That's a $400K gap. The CFO is asking which number is right. Can you look into this ASAP? She has a board meeting at 2pm.

Your first instinct — check the pipeline:

$ airflow dags list-runs --dag-id revenue_daily --state success

dag_run_id | execution_date | state

manual__2025-06-15 | 2025-06-15T04:00 | success

manual__2025-06-14 | 2025-06-14T04:00 | success

manual__2025-06-13 | 2025-06-13T04:00 | success

$ dbt test --select fct_daily_revenue

Running 7 tests...

Pass: not_null_fct_daily_revenue_date PASS

Pass: unique_fct_daily_revenue_date PASS

Pass: accepted_values_currency PASS

Pass: positive_total_revenue PASS

All 7 tests passed ✓

Everything is green. Every test passed. But the numbers are $400K off.

09:30 AM — The Investigation Begins

Write Manual SQL

Compare warehouse aggregates against Stripe numbers dimension by dimension

Check Observability Tools

Look at Monte Carlo / Elementary for anomalies

The Reality: 5 Hours, Every Time

09:15

CFO asks "which number is right?" — you don't know

09:30

Check pipeline: all green, all tests passed

10:00

Write manual SQL to compare aggregates

10:30

Numbers don't match by $400K — but why?

11:00

Drill down by date, currency, payment method...

12:00

Found it: EUR/GBP are the discrepancy

13:00

Root cause: Stripe API v2025-04-16 changed `amount` from presentment to settlement currency. Schema unchanged. Tests passed. Meaning changed.

13:30

Fix stg_payments, re-run pipeline, verify

14:00

Reply to CFO — 5 hours after the question

The Worst Part

This will happen again. Next quarter, another API will change a field's meaning without changing its name. You won't know until someone asks why the numbers don't match.

We Scanned 1,995 Agent Skills

From SkillsMP, Claude Marketplace, GitHub, and 4 other sources. Then classified them by real user pain points.

1,995

Total skills scanned

1,385

Passed quality filter

74%

Marketing packaging

26%

Solve real pain

Pain Points That Have ZERO Skills Solving Them

What Reddit Says

r/dataengineering

"Month-end closing — why is it so painful? I build pipelines, transformations, and reporting layers... but the numbers never match."

r/devops

"Our observability costs are now higher than our AWS bill. We're spending $7.5M/yr on monitoring tools... and we still can't tell which number is right."

r/LocalLLaMA

"RAG pipeline — poor retrieval quality, wrong page numbers. How do you trust the output?"

Supply vs Demand: The Real Picture

Old taxonomy classifies skills by what tools do. Pain taxonomy classifies by what users need.

Old Taxonomy (supply-driven)

"What category is the tool in?"

devops-cicd206 skills

testing-qa165 skills

development-tools149 skills

ai-ml113 skills

security110 skills

Looks healthy. Lots of skills everywhere.

Pain Taxonomy (demand-driven)

"What pain does the user have?"

manual-toil15 skills

tools-dont-connect7 skills

cant-find-root-cause2 skills

passed-but-wrong2 skills

nobody-trusts-the-data0 skills

costs-too-much0 skills

too-much-noise0 skills

breaks-when-things-change0 skills

4 major pain points have zero coverage.

The Honest Breakdown by Old Category

What % of skills in each category actually solve a real pain?

What 4:30 AM Should Look Like

Instead of 5 hours of panic, you get a pre-dawn alert before anyone notices.

Data Trust Bot 4:30 AM

RECONCILIATION ALERT

Source: Stripe API (balance transactions)
Warehouse: fct_daily_revenue
Period: 2025-06-09 to 2025-06-15

Source total: $2,700,000
Warehouse total: $2,300,000
Delta: -$400,000 (-14.8%)

Auto drill-down:
  USD: ✅ match ($1,900,000)
  EUR: ❌ -$180,000 (warehouse lower)
  GBP: ❌ -$120,000 (warehouse lower)

Root cause detected:
Stripe API version changed from 2024-12-18 → 2025-04-16.
Field payment_intent.amount now returns settlement currency (USD) instead of presentment currency.
Your stg_payments model applies FX conversion at line 23 — this double-converts.

Suggested fix: Remove CASE WHEN currency != 'usd' block in stg_payments.

Without (Today)

DetectionCFO finds it

Time to root cause5 hours

Business impactBoard meeting delayed

Trust impactData team credibility ↓

With Data Trust Chain

Detection4:30 AM auto-alert

Time to root cause5 minutes

Business impactFixed before anyone asks

Trust impactData team is the hero

Key Findings

The Skill Marketplace Truth

74%

of skills are documentation wrappers, SaaS marketing, or glorified templates

0

skills address "nobody trusts the data"

0

skills address "costs too much"

0

skills address "too much noise"

The Pattern

Most skills solve problems you have before things go wrong: how to set up a pipeline, how to configure monitoring, how to deploy to Kubernetes.

Almost none solve problems you have after things go wrong: why the numbers don't match, where the $400K went, which alert matters.

The skill marketplace is optimized for setup, not for survival.

Methodology

1,995 skills collected from 6 sources (SkillsMP, Claude Marketplace, GitHub, Apify, curated repos, ecosystem)
1,385 passed quality evaluation via Gemma 4 26B local model
152 sample skills reclassified by pain-point taxonomy
366 Reddit complaints scraped across 8 categories
Cross-validated against SO 2025 Developer Survey + dbt State of Analytics Engineering 2026
18 multi-agent simulation conditions across 7 scenarios