"Which Number Is Right?"

An interactive simulation of the #1 pain point that 1,385 agent skills fail to solve

Based on Real Data 1,995 Skills Scanned 366 Reddit Pain Points

You are a Senior Data Engineer. It's Monday morning.
Your pipeline ran overnight. All green. All tests passed.
Then the CFO's assistant sends a message...

09:15 AM — Slack Notification

Sarah Chen (CFO Office) 9:15 AM
Hey, the weekly revenue report in Looker shows $2.3M for last week but Finance pulled $2.7M directly from the Stripe Dashboard. That's a $400K gap. The CFO is asking which number is right. Can you look into this ASAP? She has a board meeting at 2pm.

Your first instinct — check the pipeline:

$ airflow dags list-runs --dag-id revenue_daily --state success
dag_run_id | execution_date | state
manual__2025-06-15 | 2025-06-15T04:00 | success
manual__2025-06-14 | 2025-06-14T04:00 | success
manual__2025-06-13 | 2025-06-13T04:00 | success
$ dbt test --select fct_daily_revenue
Running 7 tests...
Pass: not_null_fct_daily_revenue_date PASS
Pass: unique_fct_daily_revenue_date PASS
Pass: accepted_values_currency PASS
Pass: positive_total_revenue PASS
All 7 tests passed ✓

Everything is green. Every test passed. But the numbers are $400K off.

09:30 AM — The Investigation Begins

Write Manual SQL

Compare warehouse aggregates against Stripe numbers dimension by dimension

Check Observability Tools

Look at Monte Carlo / Elementary for anomalies

The Reality: 5 Hours, Every Time

09:15
CFO asks "which number is right?" — you don't know
09:30
Check pipeline: all green, all tests passed
10:00
Write manual SQL to compare aggregates
10:30
Numbers don't match by $400K — but why?
11:00
Drill down by date, currency, payment method...
12:00
Found it: EUR/GBP are the discrepancy
13:00
Root cause: Stripe API v2025-04-16 changed `amount` from presentment to settlement currency. Schema unchanged. Tests passed. Meaning changed.
13:30
Fix stg_payments, re-run pipeline, verify
14:00
Reply to CFO — 5 hours after the question

The Worst Part

This will happen again. Next quarter, another API will change a field's meaning without changing its name. You won't know until someone asks why the numbers don't match.

We Scanned 1,995 Agent Skills

From SkillsMP, Claude Marketplace, GitHub, and 4 other sources. Then classified them by real user pain points.

1,995
Total skills scanned
1,385
Passed quality filter
74%
Marketing packaging
26%
Solve real pain

Pain Points That Have ZERO Skills Solving Them

What Reddit Says

r/dataengineering
"Month-end closing — why is it so painful? I build pipelines, transformations, and reporting layers... but the numbers never match."
r/devops
"Our observability costs are now higher than our AWS bill. We're spending $7.5M/yr on monitoring tools... and we still can't tell which number is right."
r/LocalLLaMA
"RAG pipeline — poor retrieval quality, wrong page numbers. How do you trust the output?"

Supply vs Demand: The Real Picture

Old taxonomy classifies skills by what tools do. Pain taxonomy classifies by what users need.

Old Taxonomy (supply-driven)

"What category is the tool in?"

devops-cicd206 skills
testing-qa165 skills
development-tools149 skills
ai-ml113 skills
security110 skills

Looks healthy. Lots of skills everywhere.

Pain Taxonomy (demand-driven)

"What pain does the user have?"

manual-toil15 skills
tools-dont-connect7 skills
cant-find-root-cause2 skills
passed-but-wrong2 skills
nobody-trusts-the-data0 skills
costs-too-much0 skills
too-much-noise0 skills
breaks-when-things-change0 skills

4 major pain points have zero coverage.

The Honest Breakdown by Old Category

What % of skills in each category actually solve a real pain?

What 4:30 AM Should Look Like

Instead of 5 hours of panic, you get a pre-dawn alert before anyone notices.

Data Trust Bot 4:30 AM
RECONCILIATION ALERT

Source: Stripe API (balance transactions)
Warehouse: fct_daily_revenue
Period: 2025-06-09 to 2025-06-15

Source total: $2,700,000
Warehouse total: $2,300,000
Delta: -$400,000 (-14.8%)

Auto drill-down:
  USD: ✅ match ($1,900,000)
  EUR: ❌ -$180,000 (warehouse lower)
  GBP: ❌ -$120,000 (warehouse lower)

Root cause detected:
Stripe API version changed from 2024-12-18 → 2025-04-16.
Field payment_intent.amount now returns settlement currency (USD) instead of presentment currency.
Your stg_payments model applies FX conversion at line 23 — this double-converts.

Suggested fix: Remove CASE WHEN currency != 'usd' block in stg_payments.

Without (Today)

DetectionCFO finds it
Time to root cause5 hours
Business impactBoard meeting delayed
Trust impactData team credibility ↓

With Data Trust Chain

Detection4:30 AM auto-alert
Time to root cause5 minutes
Business impactFixed before anyone asks
Trust impactData team is the hero

Key Findings

The Skill Marketplace Truth

74%
of skills are documentation wrappers, SaaS marketing, or glorified templates
0
skills address "nobody trusts the data"
0
skills address "costs too much"
0
skills address "too much noise"

The Pattern

Most skills solve problems you have before things go wrong: how to set up a pipeline, how to configure monitoring, how to deploy to Kubernetes.

Almost none solve problems you have after things go wrong: why the numbers don't match, where the $400K went, which alert matters.

The skill marketplace is optimized for setup, not for survival.

Methodology