# Evidence Chain: Tracing the 10 Most Important Claims

**Date:** 2026-02-21
**Purpose:** Trace the evidence chain for the 10 most important claims/numbers in the Master Feasibility Report to assess confidence and identify weakest links.

---

## Confidence Tier Definitions

| Tier | Definition |
|------|-----------|
| **C1 Verified** | Claim confirmed by primary/authoritative source (official docs, API reference, legal statute) |
| **C2 Well-sourced** | Claim supported by multiple credible secondary sources |
| **C3 Inferred** | Claim derived from related data; not directly verified |
| **C4 Anecdotal** | Claim based on single source, blog post, or community report |

---

## The 10 Critical Evidence Chains

### 1. "65% of party-throwing is automatable today"

**Chain:** Master Feasibility Report (Section 8) --> Physical Tasks `_summary.md` (Section 6, pipeline table) + all 9 research summaries (task-by-task API assessment) --> individual API documentation checks

**Confidence:** C3 Inferred

**Derivation:** This number is synthesized from the task-by-task scorecard in the report. 63 tasks were identified across digital and physical categories. Each was classified based on whether a production API exists (verified per-API in the 9 research tracks). The 65% is a weighted estimate combining:
- 13 fully automatable tasks (21%)
- 10 semi-automatable tasks (16%)
- 29 outsourceable-via-gig-workers tasks (46%)
- 11 human-only tasks (17%)

The weighting accounts for the fact that digital tasks require less time than physical tasks, but physical tasks are mostly outsourceable. The raw percentage of non-human-only tasks is 83%, but the effort-weighted estimate is 65%.

**Weakest link:** The task list itself. There is no authoritative, canonical list of "all tasks required to throw a party." The 63-task list was constructed by the AI researcher from multiple party planning checklists and first-principles reasoning. Different parties would have different task compositions. The classification of each task depends on whether the relevant API actually works as documented -- and most were verified from documentation, not from live integration testing.

**What would change the number:** If any of the Tier 2 APIs (Instacart, TaskRabbit, Uber Guest Rides) are harder to access than documented, the semi-automatable percentage drops. If consumer venue/catering APIs emerge, the number rises significantly.

---

### 2. "Total estimated cost: $2,200-$5,500 for a 30-person party"

**Chain:** Master Feasibility Report (Section 4) --> Catering `_summary.md` (cost tables: $12-50/pp) + Venue `_summary.md` (city pricing tables: $100-500/hr) + Logistics `_summary.md` (supply costs: $150-600) + Physical Tasks `_summary.md` (labor costs: $400-1,200) + Music `_summary.md` (DJ: $600-1,700; Spotify: $11/mo) --> primary sources (Thumbtack, Peerspace, GreatEvent, Premier Staff, Bites By Braxtons)

**Confidence:** C2 Well-sourced (for individual line items); C3 Inferred (for the total)

**Component verification:**
- Venue ($400-$1,200): Peerspace city-specific pricing verified at C1 from peerspace.com listing pages and GreatEvent cost guide. NYC avg $185/hr, Chicago avg $120/hr confirmed.
- Catering ($360-$1,500): Drop-off at $12-20/pp verified at C2 from Thumbtack, Food Truck Club, Best Food Trucks. Buffet at $25-50/pp from Bites By Braxtons and Evolved Events.
- Supplies ($100-$400): Calculated at C3 from per-item pricing across multiple sources (My Mind's Eye, Pick Me Up Game, Bliss Curated Events).
- Labor ($0-$1,000): TaskRabbit rates at C1 from taskrabbit.com ($20-50/hr). Thumbtack wait staff at C1 ($25-35/hr). Agency fees at C2 (15-25%).
- Insurance ($150-$300): Verified at C1 from Peerspace support docs and VTM Miami.

**Weakest link:** The total is an aggregation of ranges, which compounds uncertainty. Each line item has a 2-3x range, and the total inherits all that variance. The $2,200 low end assumes a home venue ($0), drop-off catering, minimal staffing, and DIY setup. The $5,500 high end assumes rented venue, buffet catering, full staffing, and DJ entertainment. Actual costs in NYC or SF could exceed the high end by 30-50%.

---

### 3. "No major consumer venue marketplace offers a public API for booking"

**Chain:** Venue `_summary.md` (Q2) --> API Tracker (apitracker.io/a/peerspace showing no API reference) + Peerspace support center (no developer docs) + Giggster help center (no API references) + Splacer search (no results) + DesignMyNight developer docs (UK-focused, venue-operator API only) + iVvy developer docs (venue-operator API only)

**Confidence:** C2 Well-sourced

**Verification method:** Searched for "[platform name] API" and "[platform name] developer" for all six major platforms. Checked API Tracker, developer portals, and help centers. Confirmed presence of enterprise/B2B APIs (iVvy, DesignMyNight, Cvent) while confirming absence of consumer-facing APIs.

**Weakest link:** Absence of evidence is not evidence of absence. A private/partner API could exist at Peerspace or Giggster that is not publicly documented. The venue `_summary.md` explicitly flags this: "Would need to contact Peerspace directly for partnership API access." Additionally, this landscape could change at any time -- venue platforms may launch APIs in response to the agentic commerce trend.

---

### 4. "Social host liability laws exist in 43 US states"

**Chain:** Legal `_summary.md` (Q4) --> Insurance Information Institute (iii.org article on social host liability) + NCSL 50-state survey (social host liability for underage drinking statutes) + FindLaw (social host liability overview) --> individual state statutes

**Confidence:** C1 Verified

**The claim specifies:** 33 states have statutes assigning civil liability for injuries caused by minors provided alcohol. 31 states have criminal penalties for hosting underage drinking. The "43 states" figure comes from the Insurance Information Institute's broader count of states with any form of social host liability (statutory or common law).

**Weakest link:** The "43 states" figure is sourced from III.org, which is an insurance industry group -- authoritative but potentially expansive in their interpretation (they have a business interest in highlighting liability). The NCSL survey provides a more granular 50-state breakdown for the specific case of underage alcohol service. The 43-state number may include states with only common-law (court-established) rather than statutory liability, which is a meaningful legal distinction.

**States with NO social host liability:** Delaware, Kentucky, North Carolina, West Virginia, and DC (per the legal research).

---

### 5. "Instacart Developer Platform enables same-day delivery from 85,000+ stores"

**Chain:** Catering `_summary.md` (Section 6) + Logistics `_summary.md` (Q2-Q3) --> Instacart Developer Platform documentation (docs.instacart.com/developer_platform_api/) + Instacart press release (PRNewswire, 2024) + Instacart company page (instacart.com/company/business/developers)

**Confidence:** C1 Verified

**Key details verified:**
- 85,000+ stores across 1,500+ retail banners: from Instacart's own developer platform page
- 1B+ unique products: from Instacart press release
- Same-day delivery, as fast as 30 minutes: from IDP documentation
- Requires business registration in US or Canada: from IDP documentation

**Weakest link:** While the API documentation is real and verified, the actual developer experience is not tested. The research notes that IDP "requires business registration" -- the application process, approval timeline, and potential rejection criteria are unknown. The "30-minute delivery" claim is Instacart's best case; actual delivery times depend on location, store proximity, and order complexity. Additionally, the 85,000 store figure is self-reported by Instacart.

---

### 6. "TaskRabbit has a documented Partner API for programmatic booking of setup/breakdown helpers"

**Chain:** Logistics `_summary.md` (Q5) + Physical Tasks `_summary.md` (Section 3) --> TaskRabbit Developer Hub (developer.taskrabbit.com/docs/overview-taskrabbit-home-services-api) + TaskRabbit API Reference (developer.taskrabbit.com/reference/projectavailability, /projectbook)

**Confidence:** C1 Verified (for the API's existence); CRITICAL CAVEAT (for current availability)

**What is verified:**
- Developer Hub exists at developer.taskrabbit.com
- API documentation version 2025-06 is published
- Three key endpoints documented: Check Service Availability, Bid Project Agreement, Book Project
- OAuth2 authentication required
- "Parties & Events" explicitly listed as a service category

**The critical caveat:** The Physical Tasks research discovered that the **Delivery by Dolly API is live**, but the **Home Services API (which covers party setup) is listed as "coming soon."** This is a significant gap between the API being documented and being available. The research accurately notes: "The Home Services API is not yet publicly available. This is a significant limitation for AI-directed automation."

**Weakest link:** The API documentation exists but the Home Services endpoints may not be callable yet. Partner approval is required regardless. This claim needs live verification before relying on it for a real party.

---

### 7. "Stripe's Agentic Commerce Protocol enables programmatic purchases"

**Chain:** Payments `_summary.md` (Section 1, 6) --> Stripe ACP documentation (docs.stripe.com/agentic-commerce/protocol) + Stripe blog post (stripe.com/blog/developing-an-open-standard-for-agentic-commerce) + OpenAI ACP developer docs (developers.openai.com/commerce/guides/get-started/) + GitHub (github.com/stripe/agent-toolkit)

**Confidence:** C1 Verified

**What is verified:**
- ACP launched September 2025, co-developed with OpenAI
- Open-source specification with four REST endpoints (Create, Update, Complete, Cancel Checkout)
- Powers ChatGPT's Instant Checkout feature
- Shared Payment Tokens (SPTs) scoped to specific merchants
- Agent Toolkit available in Python and TypeScript
- MCP server at mcp.stripe.com
- x402 protocol integration for USDC micropayments (50M+ transactions)

**Weakest link:** ACP is production-ready for merchants who have implemented it, but the relevant question is: how many party-related merchants accept ACP? As of February 2026, ACP is primarily used by e-commerce merchants on Shopify and similar platforms. Whether a local caterer, venue, or party supply store accepts ACP payments is unlikely. The technology exists, but merchant adoption in party-relevant categories is unverified. For most party purchases, the AI would use virtual cards (Privacy.com) at traditional merchants rather than ACP.

---

### 8. "RSVP-to-attendance conversion is approximately 60-80% for casual parties"

**Chain:** Invitations `_summary.md` (Q7, conversion rate table) --> Glueup blog (glueup.com/blog/fix-high-event-rsvp-no-show-rate) + Splashthat blog (splashthat.com/blog/boost-event-attendance-rates) + After Work Wonders blog (afterworkwonders.com) + Nolimitsowasso (nolimitsowasso.com/show-rate/)

**Confidence:** C2 Well-sourced (for paid events); C3-C4 Inferred/Anecdotal (for casual parties)

**What the sources say:**
- Industry average no-show rate: 20-30% for paid events, 30-40% for free events (C2, from Glueup and Splashthat)
- "About 60% of invited guests attend a party" (C3, from After Work Wonders)
- "Rule of thirds" -- plan for 2/3 attendance (C4, party planning rule of thumb)
- Events charging $18+ saw ~95% attendance (C2, from Splashthat)
- Reconfirmation emails reduce no-shows by ~30% (C2, from Splashthat)

**Weakest link:** The paid-event no-show statistics come from event industry blogs that primarily track corporate and professional events, not private birthday parties. The casual-party-specific data is thin -- the "60%" and "rule of thirds" figures come from party advice blogs, not rigorous studies. The true range for a casual 30-person party could be anywhere from 50% to 90% depending on social closeness, geography, weather, and competing events. This number matters because it affects how many people to invite (catering and supply quantities depend on it).

---

### 9. "Privacy.com API enables creation of single-use, merchant-locked virtual cards with spending limits"

**Chain:** Payments `_summary.md` (Section 2) --> Privacy.com API documentation (privacy-com.readme.io/docs/cards) + Privacy.com developer docs (privacy.com/developer/docs)

**Confidence:** C1 Verified

**What is verified from API docs:**
- POST https://api.privacy.com/v1/cards creates a card
- Five card types: SINGLE_USE, MERCHANT_LOCKED, UNLOCKED, DIGITAL_WALLET, PHYSICAL
- Spending limits configurable: per-transaction, monthly, annually, or lifetime (specified in cents)
- Cards can be paused or closed via API
- Enterprise tier required for PAN/CVV access (needed for the AI to actually use the card)

**Weakest link:** The Enterprise tier requirement for PAN/CVV access is critical. Without seeing the actual card number, Claude Code cannot use the card to make purchases on websites. The standard (non-enterprise) Privacy.com API creates cards that the human uses in their browser. For fully programmatic purchases, enterprise access is needed -- and the cost/requirements of enterprise access are not documented in the research. This gap could make the difference between "AI can spend money" and "AI can create a card that a human must use."

---

### 10. "The 'One Person + Claude Code' model is viable with approximately $1,892 total cost"

**Chain:** Master Feasibility Report (Section 4, "One Person + Claude Code" estimate) --> derived from: Catering `_summary.md` (drop-off $12-20/pp -> $600 for 30), Logistics `_summary.md` (supplies $150-600 -> $200 estimate), Physical Tasks `_summary.md` (TaskRabbit $35/hr x 2 workers x 3 hrs -> $350 with fees), Payments `_summary.md` (API costs minimal), Legal `_summary.md` (insurance $150-300 -> $200 estimate), Music `_summary.md` (Spotify Premium $11/mo), Invitations `_summary.md` (Twilio + Resend ~$5)

**Confidence:** C3 Inferred

**Assumptions baked in:**
1. Home/backyard venue ($0) -- eliminates the single largest variable cost
2. Drop-off catering at $20/pp -- mid-range of the $12-20 verified range
3. BYOB + some hosted drinks ($200) -- highly variable assumption
4. Instacart supplies at $200 -- within verified $150-600 range
5. 2 TaskRabbit workers at $35/hr x 3 hrs + 15% fees ($350) -- verified hourly rates from TaskRabbit
6. 10 Uber vouchers at $15/ride ($150) -- per-ride estimate from research
7. Event insurance at $200 -- within verified $150-300 range

**Weakest link:** This estimate assumes a home venue, which eliminates $400-$1,200 in venue costs but introduces other costs not captured (utilities, wear and tear, homeowner's insurance implications). The catering estimate of $600 is realistic only for simple drop-off (pizza, BBQ platters, taco bars); any upgrade to buffet service doubles it. The labor estimate assumes gig workers are available on the desired date and at the quoted rates -- surge pricing, minimum booking hours, and platform fees could inflate this. The "BYOB + some hosted" drink estimate of $200 is essentially a guess.

The $1,892 number should be understood as a plausible low-to-mid estimate for the simplest viable version of this model, not as a reliable budget. A realistic planning budget would be **$2,500-$3,500** to account for cost overruns, surge pricing, substitutions, and the inevitable unplanned expenses.

---

## Model-Killing Uncertainties

These are assumptions that, if wrong, could fundamentally undermine the feasibility of the "Claude Code throws a party" concept:

| # | Uncertainty | Current Assumption | If Wrong... | Likelihood of Being Wrong |
|---|-----------|-------------------|-------------|--------------------------|
| 1 | **Instacart IDP access** | Business registration grants API access | Cannot programmatically order supplies; must use human for all shopping | Medium -- approval is not guaranteed |
| 2 | **TaskRabbit Home Services API launch** | API launches and is accessible to partners | Cannot programmatically book gig workers; must use Fancy Hands as intermediary | High -- currently "coming soon" |
| 3 | **Privacy.com Enterprise access** | Can obtain PAN/CVV for programmatic card use | AI cannot actually spend money autonomously; human must execute purchases | Medium -- enterprise tier pricing/requirements unknown |
| 4 | **Platform ToS enforcement** | Platforms won't detect/enforce against human accounts operated by AI | Account termination, voided bookings, potential legal claims | Medium -- AI-generated request patterns may be detectable |
| 5 | **Gig worker availability** | Workers available on party date at quoted rates | Must hire through agencies (more expensive) or recruit friends | Low-Medium -- depends on date/location |
| 6 | **Consumer catering API emergence** | No consumer catering API will exist soon | Catering ordering stays fully manual indefinitely | High -- no signals of this changing |
| 7 | **Regulatory crackdown on AI purchasing** | Current regulatory ambiguity continues | New regulations restrict AI-initiated transactions; require human-per-click authorization | Low-Medium -- rapid legislative interest |
| 8 | **Claude Code reliability for multi-hour orchestration** | Session stability for party-day coordination (6-8 hours) | Crashes, context window exhaustion, hallucination during live event | Medium -- Agent Teams is experimental |
| 9 | **Smart home device presence at venue** | Venue has or can accommodate Hue/Sonos/WiFi | No programmatic lighting or music control; must use manual systems | High for rented venues |
| 10 | **Social graph cold start** | Human provides adequate guest list | AI cannot reach the right people; invitations go to wrong contacts or miss key friends | Low -- this is explicitly a human task |

---

## Summary Assessment

The evidence chains reveal a research base that is:

- **Strong on API documentation** -- most API claims are C1 Verified from official developer docs
- **Strong on cost ranges** -- catering, venue, and labor costs are well-sourced from multiple platforms
- **Moderate on integration feasibility** -- APIs exist but live integration testing was not performed
- **Weak on real-world execution** -- no one has actually attempted this model end-to-end
- **Weak on edge cases** -- weather, vendor cancellations, gig worker no-shows, API outages are acknowledged but not stress-tested

The single most important untested assumption is whether the necessary API partner access (Instacart, TaskRabbit, Uber for Business, Privacy.com Enterprise) can actually be obtained by a single individual or small entity attempting this for the first time. The documentation says "yes, with business registration," but the practical experience of applying, waiting for approval, and configuring these integrations is undocumented and could take weeks or months.

**Bottom line:** The 65% automation figure is defensible based on available evidence, but it describes a theoretical ceiling based on API documentation. The achievable automation percentage for someone attempting this today, without pre-existing API partnerships, is likely closer to **40-50%** until the access pipeline is established.
