From Vision Pro Flops to $1 Meetings: Earmark's Secret Weapon to Automate Your Entire Follow-Up Before You Hang Up
From Vision Pro Coaching to Real-Time Work Automation
The journey from niche presentation coaching to building a foundational productivity layer for enterprise teams is a testament to rapid, market-driven iteration. Earmark, founded by CEO Mark Barbir and Co-Founder Sanden Gocka, did not start with the ambition to overhaul meeting workflows. Initially, the team was engaged in a highly specific vertical: coaching clients on how to effectively present complex ideas, including those related to the then-nascent Apple Vision Pro. This early work, focused on distilling dense information into digestible narratives, provided crucial insights into the pain points surrounding high-stakes communication. The pivot, as detailed in recent discussions featuring @ttorres, was a decisive move toward automating the output of meetings rather than just coaching the delivery. They recognized that the true bottleneck wasn't the presentation itself, but the overwhelming backlog of actionable tasks generated after the conversation concluded. Earmark transformed into a web-based assistant designed explicitly to ensure that unstructured conversations immediately translate into structured, actionable outputs—a true shift from mere documentation to instant execution.
The core realization driving this evolution was simple: summaries are inherently passive. While many AI notetakers churn out minutes that often languish in inboxes, Earmark was engineered to produce finished work. This distinction marks the dividing line between a helpful transcription service and a necessary production tool. What exactly constitutes "finished work" in the context of a technical meeting? It means generating user stories, drafting preliminary API specs, or even creating functional prototypes, all before the participants have time to hit "end call." This capability stems from an architecture designed not around a single processing step, but a parallel operational structure.
The Earmark Difference: Agents vs. Summaries
The crucial differentiator for Earmark lies in its sophisticated, multi-agent system running concurrently during live calls. Where standard tools offer a linear summary, Earmark deploys specialized digital personalities designed for specific tasks.
Finished Work Over Mere Notes
If a standard AI notetaker provides a recap of what was said, Earmark attempts to complete the next step in the workflow based on what was said. This means moving beyond recording historical data to actively producing future artifacts.
Parallel Agent Architecture
Imagine a digital SWAT team operating in real-time beneath the surface of your meeting. Earmark runs multiple agents concurrently. One agent might be parsing technical jargon, another drafting a formal requirement document, and a third cross-referencing security implications. This parallel processing allows for complex, multi-faceted outputs derived instantly from the spoken word.
Template-Driven Task Execution
This agentic framework is heavily guided by templates that users or the system can invoke. Examples highlighted include:
- The Engineering Translator: Automatically converts abstract product desires into concrete, executable technical tasks.
- The Acronym Explainer: A vital tool in large organizations where jargon silos information, ensuring clarity across departments.
- The "Make Me Look Smart" Agent: A tongue-in-cheek name for an agent that structures vague discussion points into polished, coherent summaries suitable for executive review.
This template-driven approach ensures that the automation is not random but adheres to established organizational workflows, bridging the gap between casual discussion and formal deliverables.
Simulating the Team: Personas and Contextual Understanding
One of the most innovative aspects of Earmark’s design is its ability to simulate the absent team members required to vet early-stage work. In any complex product development cycle, outputs must satisfy multiple stakeholders—engineering, legal, security, and accessibility compliance.
Simulated Stakeholders
Earmark incorporates predefined personas that act as immediate reviewers. As a Product Manager describes a new feature, an agent simulating the Security Architect might pipe up (digitally, via the transcript output) to flag potential data exposure risks inherent in the proposal. Similarly, a Legal persona can check for immediate compliance red flags, and an Accessibility agent can ensure design constraints are considered upfront. This shifts review cycles from multi-day delays to instantaneous feedback loops within the meeting itself.
Designing for the Extreme User
The founders strategically centered the product design around the Product Manager (PM) as the extreme user. PMs often bear the brunt of organizational communication friction, synthesizing technical details for executives, legal constraints for engineers, and user needs for design. By solving the PM's overwhelming follow-up burden—the creation of specs, tickets, and documentation—Earmark incidentally solves broader organizational communication deficits, benefiting every role downstream.
Technical Deep Dive: Affordability and Architecture
Shipping a powerful, real-time AI tool capable of complex reasoning is prohibitively expensive if relying on brute-force processing for every word spoken. The Earmark team cracked the cost barrier, transforming a potential enterprise blocker into a unique selling proposition.
Ephemeral Mode as a Feature
A significant technical challenge for enterprise adoption is data governance and privacy. Earmark’s solution, the "ephemeral mode," became a powerful attractor for security-conscious clients. This architecture dictates that data is processed in transit with minimal to no permanent storage on Earmark's servers. For many highly regulated industries, the promise of no long-term data retention is more valuable than superior feature parity.
The Cost Challenge
The goal was radical cost reduction. Early testing showed per-meeting costs soaring to an unsustainable $70 utilizing standard LLM calls. The mandate became clear: reduce that cost to under $1 per meeting.
Prompt Caching Mastery
The core of their cost optimization strategy revolves around prompt caching. LLMs charge per token, and re-explaining context repeatedly is costly. Earmark ingeniously caches common contextual prompts, standard organizational nomenclature, and repeatable formatting requests. When a subsequent query reuses a concept already processed and stored in the cache, the system retrieves the optimized response token stream rather than re-running the entire, expensive generation pipeline against the massive LLM.
Model Selection Nuances
In the constant race for "newer and bigger" models, Earmark demonstrated a nuanced understanding of LLM specialization. While newer models boast superior general intelligence, the team found that GPT-4.1 (or specific fine-tuned versions thereof) often provided superior, more controllable prose and structured output for the specific template-driven tasks they execute, leading to fewer hallucinations in core deliverables.
| Cost Benchmark | Initial Testing | Goal State | Optimization Technique |
|---|---|---|---|
| Cost Per Meeting | ~$70.00 | < $1.00 | Aggressive Prompt Caching |
| Latency Constraint | High (Batch Processing) | Real-Time (Sub-Second) | Parallel Agent Architecture |
| Data Retention | Default Storage | Ephemeral Mode | Enterprise Security Feature |
Beyond Simple Summaries: Next-Generation Search
As Earmark accumulates months of meeting data (even in an ephemeral context where deep organizational memory is built via secure user accounts), the challenge shifts from real-time generation to retrospective analysis across vast corpora of conversation.
Limits of Vector Search
Traditional RAG (Retrieval-Augmented Generation) systems rely heavily on vector similarity search. While excellent for finding documents about a topic, vector search struggles when users ask complex, cross-contextual, or analytical questions across thousands of hours of meetings. For example, asking, "Trace the exact point in Q3 where we agreed to de-scope Feature X based on the legal review mentioned in the initial kickoff?" is often too nuanced for simple vector indexing.
Building Agentic Search
Earmark is evolving its retrieval mechanism into a more robust agentic search architecture. This system doesn't rely on one tool but orchestrates several:
- Vector Search (RAG): For topical relevance.
- BM25/Keyword Search: For high-precision recall on specific names, dates, or identifiers.
- Metadata Queries: Leveraging structured data about who spoke, when, and in which project channel.
- Bespoke Summary Agents: Dedicated agents to interpret the retrieved raw snippets and synthesize a coherent analytical answer.
Creating Actionable Artifacts
The goal of this enhanced search is not just to retrieve text, but to regenerate functional deliverables. If a user asks for the "latest product specification for the login flow," the system should assemble the relevant snippets from three different meetings over two months, run them through the specification-drafting agent, and output a version 1.1 spec, ready for review. This turns historical conversations into living, editable documentation.
Future Vision: The AI Chief of Staff
The current functionality—automating deliverables like specs and tickets—is merely the stepping stone. The founders envision Earmark evolving into a proactive AI Chief of Staff, managing the flow of work rather than just documenting it.
This future involves deep integration with external development tools. Imagine a PM defining a high-level requirement, and Earmark not only drafts the specification but immediately pushes partial code prototypes to environments like Cursor (an AI-native code editor) or visual mockups to V0 by Vercel while the team is still discussing the next sprint priority. The AI assistant moves beyond automating the follow-up to preemptively initiating the build. This represents a complete abstraction of administrative overhead, allowing builders and thinkers to focus solely on strategic decision-making.
This report is based on the digital updates shared on X. We've synthesized the core insights to keep you ahead of the marketing curve.
