ADR-038: Official Project Apparel Design Specifications
Status: Proposed (but definitely happening) Date: 2025-10-17 Deciders: Solo developer with questionable fashion sense Technical Story: After discovering that literally nobody on GitHub is doing streaming entity resolution during LLM extraction with O(n) full-scan vector similarity, we need merchandise to celebrate this dubious achievement.
Context
The Discovery
Through extensive GitHub code search and academic literature review, we have determined that our approach to knowledge graph construction is either:
- Genuinely novel and underappreciated
- Obviously wrong and we're the only ones doing it
- So niche that it exists in a "nobody would bother searching for this" blind spot
Specifically, our system implements:
- Streaming entity resolution during ingestion (not batch post-hoc)
- Full-scan cosine similarity for concept matching (O(n), not HNSW)
- Recursive context-aware extraction (similar concepts inform new extraction)
- Evidence accumulation as first-class graph structure
- Self-healing semantic routing with convergence guarantees (future)
Market Research Findings
Searches performed:
site:github.com "recursive upsert" graph database
# Result: No links found
site:github.com "vector similarity" "concept deduplication" knowledge graph
# Result: No links found
site:github.com LLM knowledge graph concept extraction entity resolution
# Result: Everyone does batch processing or skips deduplication entirely
Academic literature review: - Most systems: Ingest fast → Deduplicate later (batch) - Performance research: "Full scan is simple, suitable when dataset has <1M vectors" - Our approach: "Graph-based entity resolution does not scale and is very hard" - Our response: "Yes, and we're doing it anyway because quality > speed at current scale"
The Emotional Journey
- Pride: "We built something cool!"
- Concern: "Wait, why isn't anyone else doing this?"
- Research: reads 15 papers on distributed graph architectures
- Understanding: "Oh, it's O(n) and doesn't scale to millions"
- Relief: "We already wrote a 1,000-line scaling solution document"
- Acceptance: "Time for t-shirts"
Decision
We will design official project apparel that:
- Celebrates technical obscurity - Only ~0.1% of people will understand the references
- Embraces the trade-offs - Acknowledges O(n) complexity without apology
- References the research - FENNEL, PowerGraph, The Bitter Lesson
- Maintains plausible deniability - Can be worn at conferences without explaining for 45 minutes
Design Specifications
Primary Design: "The Full-Scan Flex"
Front:
STREAMING ENTITY RESOLUTION
WITH O(n) COSINE SIMILARITY
DURING LLM EXTRACTION
(Ask me how I accumulate evidence)
Back:
for concept in llm.extract():
similarities = [
cosine(concept, c)
for c in ontology.concepts
]
if max(similarities) > 0.75:
merge_evidence()
else:
create_new()
Font: Monospace (obviously) Colors: Dark theme (black shirt, neon green text) or Light theme (white shirt, terminal green)
Alternative Design 1: "The Academic Reference"
Front:
Back:
Inspired by:
• PowerGraph (2012) - Vertex-cut partitioning
• FENNEL (2014) - Streaming graph partitioning
• The Bitter Lesson (2019) - Computation > rules
Implemented by:
• Someone who will regret this at 100K concepts
Alternative Design 2: "The Conference Starter"
Front:
Back:
Trade-offs accepted:
✓ Perfect accuracy (100% recall)
✓ Evidence tracking per concept
✓ Context-aware extraction
✗ O(n) scaling (for now)
✗ Judgmental looks from FAANG engineers
Migration path ready:
→ HNSW indexes (94.5% recall, 161× faster)
→ FENNEL-style semantic sharding
→ Hub concept replication (vertex-cut)
Alternative Design 3: "The Minimalist"
Front:
Back:
Rationale: Maximum obscurity. Will confuse 99.9% of people. The 0.1% will either nod approvingly or start a 45-minute argument about pgvector.
Alternative Design 4: "The Honest One"
Front:
Back:
Current status:
• 363 commits of copyrighted content: REMOVED ✓
• Company references sanitized: DONE ✓
• GitHub stars: 1 (my own)
• O(n) complexity: ACCEPTED
• Scaling solution: RESEARCHED
• Regrets: NONE
For semantic queries < 100K concepts,
this is the right architecture.
Alternative Design 5: "The Warning Label"
Front:
Back:
Side effects may include:
• Arguing about cosine similarity thresholds
• Compulsive ADR writing (37+ documents)
• Researching papers from 2012 at 2am
• Creating 1,000-line scaling solution docs
• Joking about O(n) complexity
• Making t-shirts about niche technical decisions
If symptoms persist for more than 4 hours,
consult your local graph database expert.
Consequences
Positive
- Conference ice-breaker: Wearing this to a knowledge graph meetup will immediately identify fellow graph nerds
- Technical signaling: Shows depth of understanding (knows it's O(n), chose it anyway, has scaling plan)
- Humor as defense mechanism: If someone criticizes the approach, point to the shirt
- Documentation: These designs effectively document our architecture decisions in wearable form
- Recruitment tool: "I only hire people who understand the t-shirt"
Negative
- Explaining the joke kills the joke: Will spend 45 minutes explaining to curious non-technical people
- Imposter syndrome trigger: "Wait, did I really just make a t-shirt about Big O notation?"
- Fashion risk: Wearing code on a t-shirt is peak programmer aesthetic
- Existential questions: "Am I the only person who would wear this?"
- Economic inefficiency: Minimum order quantities mean 12 shirts, only need 1
Neutral
- Conversation starter: For better or worse, people will ask questions
- Memento: Physical artifact of the "discovery phase" when we realized we were the only ones doing this
- Future evidence: When we inevitably switch to HNSW + sharding, the t-shirt becomes vintage/ironic
Alternatives Considered
Alternative 1: No Merchandise
Pros: - Save money - Avoid looking ridiculous - Maintain professional dignity
Cons: - No fun - Doesn't capture this specific moment in time - Miss opportunity to celebrate technical obscurity
Decision: Rejected. The research already happened, might as well commemorate it.
Alternative 2: Serious/Professional Design
Example:
Pros: - Won't confuse people - Broadly understandable - Could actually wear to work
Cons: - Boring - Doesn't capture the specific technical achievement - Could be any project
Decision: Rejected. If we're making a t-shirt about this, go full nerd or go home.
Alternative 3: Just Buy a GraphQL T-Shirt
Pros: - Already exists - Ships immediately - Graphs are graphs, right?
Cons: - GraphQL ≠ Graph database - Doesn't reference our specific architectural choices - Everyone has a GraphQL shirt
Decision: Rejected. This is about celebrating a genuinely unusual approach, not just "graphs in general."
Implementation Details
Production Specifications
Printing method: Direct-to-garment (for code readability) Fabric: 100% cotton, heavyweight (6oz minimum) Sizing: Generous tech industry sizing (runs large) QA testing: Must be readable from 6 feet away in conference lighting Wash instructions: Cold water, inside out (protect the cosine similarity formula)
Target Audience
Primary: Solo developer (n=1) Secondary: Conference attendees who understand the reference Tertiary: Database engineers who will either love or hate it Excluded: Anyone who thinks Neo4j and PostgreSQL are the same thing
Success Metrics
- Minimum viable success: 1 person at a conference nods knowingly
- Moderate success: Someone asks "wait, you do entity resolution during ingestion?"
- Maximum success: Starts a 45-minute technical debate about batch vs streaming
- Failure mode: "What's a cosine?"
Related ADRs
- ADR-016: Apache AGE Migration - The foundation that enables O(n) full scan
- ADR-030: Concept Deduplication Validation - Quality test suite that validates the approach
- DISTRIBUTED_SHARDING_RESEARCH.md: The 1,000-line document that proves we know this doesn't scale (and how to fix it)
- ADR-036: Universal Visual Query Builder - The UI that makes the graph actually usable
- ADR-037: Human-Guided Graph Editing - Future feature for when machines aren't enough
Appendix A: Rejected Slogans
For posterity, these were considered but didn't make the cut:
"I PUT THE 'O' IN O(n)"
Rejected: Too self-deprecating
"PGVECTOR? I BARELY KNOW HER"
Rejected: Too risqué for professional settings
"MY OTHER SHIRT IS ALSO ABOUT GRAPH DATABASES"
Rejected: Implies we have multiple graph database shirts (we don't... yet)
"RECURSIVE UPSERT OR BUST"
Rejected: Sounds vaguely threatening
"FRIENDS DON'T LET FRIENDS DO BATCH ENTITY RESOLUTION"
Rejected: Factually incorrect (batch ER is fine)
"POWERED BY NUMPY.DOT()"
Rejected: Too minimalist, loses the LLM extraction context
Appendix B: Conference Scenarios
Scenario 1: The Nod
Scenario 2: The Question
Stranger: "Why O(n)?"
You: "Quality over speed at current scale. We have a scaling plan."
Stranger: "HNSW?"
You: "HNSW plus FENNEL-style semantic sharding."
Stranger: *impressed nod*
Scenario 3: The Debate
Stranger: "You can't do entity resolution during ingestion!"
You: *gestures to shirt* "We can and we did."
Stranger: "But the performance—"
You: "161× slower than HNSW, yes. Also 100% recall vs 94.5%."
Stranger: "At what scale?"
You: "Currently < 100K concepts."
Stranger: "Oh, that's fine then."
*45-minute technical discussion ensues*
Scenario 4: The Misunderstanding
Non-technical person: "What does O(n) mean?"
You: *deep breath* "So, imagine you have a library..."
*20 minutes later*
You: "...and that's why linear search is acceptable for small datasets."
Stranger: *glazed eyes* "Cool shirt!"
Maintenance and Evolution
Version 1.0: Current Architecture (O(n) Full Scan)
- Accurate representation of implemented system
- Wearable documentation
- Conference conversation starter
Version 2.0: Post-HNSW Migration
- Add line: ~~O(n)~~ → O(log n) ✓
- Becomes vintage/ironic
- "I survived the full-scan era"
Version 3.0: Multi-Shard Architecture
- Update back to show FENNEL implementation
- Add: "Shards: 1 → n"
- Collector's item for architecture evolution
Conclusion
This ADR represents either: 1. The peak of technical self-awareness and humor 2. A cry for help 3. Both simultaneously
Regardless, it documents a genuine moment in the project's evolution: the discovery that our streaming entity resolution approach with O(n) full-scan similarity matching is genuinely unusual in the wild, yet thoroughly justified and already backed by a comprehensive scaling solution.
If you're reading this ADR in the future and wondering "did they actually make the t-shirts?" - the answer is almost certainly no. But the fact that we wrote a 500-line ADR about it captures the spirit of the project perfectly: over-documented, self-aware, technically rigorous, and just a little bit absurd.
References: - PowerGraph (2012): Vertex-cut partitioning for power-law graphs - FENNEL (2014): Streaming graph partitioning algorithm - The Bitter Lesson (2019): Computation > hand-coded knowledge - GitHub Search Results (2025): "No links found" × 3 - Our Therapist (TBD): Will discuss the t-shirt incident
Last Updated: 2025-10-17 Likelihood of Implementation: 30% (60% if we get more GitHub stars) Regret Factor: TBD (check back after first conference)