HeyLemon.ai Architecture Deep Dive

February 17, 2026 · Research Report

Executive Summary

Lemon (heylemon.ai) is a voice-to-action AI productivity agent built by Futureproof Lab, Inc., a Delaware-incorporated startup founded in 2024 and headquartered in New York. The product is a native macOS desktop app (Apple Silicon only) that lets users press the fn key to issue voice commands which are then executed as real tasks — replying to emails, drafting documents, searching the web, and more — all without leaving the current tab.

The company has raised $1.27M in pre-seed funding from Flybridge and has approximately 3 employees. The product is closely associated with Hassan W. Bhatti, a serial AI entrepreneur (Forbes 30 Under 30, co-founded CryptoNumerics which was acquired by Snowflake, co-hosts the Superhuman AI newsletter/podcast).

The marketing website is built on Framer, hosted on AWS (via Framer's infrastructure in ca-central-1). The desktop app itself uses real-time voice streaming, screen capture for contextual awareness, OAuth integrations with Google services, and third-party AI models for inference. Below is a detailed technical breakdown based on publicly available information.

1. What is HeyLemon.ai?

Lemon positions itself as "the first AI agent that turns your voice instructions into finished tasks." The core value proposition targets knowledge workers who spend hours typing and switching between tabs.

Core Features

  • Voice-to-Action: Press fn, speak a command, watch it execute
  • Email & Message Replies: Compose and send replies via voice (claims 12x faster)
  • Document Research & Creation: Draft documents from voice instructions
  • Instant Search: Voice-powered web and knowledge search
  • Feedback & Ideation: AI brainstorming assistant
  • Tone & Text Modification: Edit existing text by voice
  • Dictation: Pure speech-to-text transcription
  • Screen Context Awareness: Captures screenshots to understand what the user is looking at

Target audience: Knowledge workers, deep workers, and professionals who juggle multiple apps, docs, and communication tools throughout the day. Currently macOS only (Apple Silicon).

2. Frontend Architecture (Marketing Site)

Framework Framer (confirmed via meta name="generator" content="Framer deac62f")
Server Framer/7e53415 (from HTTP headers)
Rendering SSG (Static Site Generation) — ssg-status;desc="optimized"
JS Bundles Minimal — single async module (script_main.DuWXvIkF.mjs) + Framer analytics
Fonts Inter (primary), DM Serif Display/Text, Solway, Helvetica — via Google Fonts + Framer CDN
Video Cloudflare Stream for demo video (customer subdomain: customer-vfd9s68qudygmavo.cloudflarestream.com)
Analytics Framer Events (events.framer.com/script?v=2) — no GA, Segment, or other trackers detected
Assets CDN framerusercontent.com

The marketing site is a static site with no traditional SPA framework (no React/Next.js/Vue indicators). It's a design-to-code Framer project with responsive breakpoints at 390px, 500px, 810px, 1200px, and 1440px. Page weight is ~495KB HTML. The site has 5 pages: Home, About Us, Download, Privacy Policy, and Terms & Conditions.

3. Desktop App Architecture

The actual product is a native macOS desktop application (Apple Silicon only), distributed as a standard .dmg file via drag-and-drop installation. Based on the product's behavior and privacy policy, here is what can be inferred:

Confirmed Technical Details

  • Platform: macOS, Apple Silicon only (no Intel, no Windows, no Linux, no browser extension)
  • Activation: System-wide hotkey (fn key) — suggests deep OS integration, likely using macOS Accessibility APIs
  • Voice Processing: Audio is "streamed in real time" for transcription — voice recordings are not retained
  • Screen Capture: Captures screenshots "when explicitly activated" for contextual awareness — processed in memory only, never stored
  • Authentication: Uses a "third-party authentication provider" (not named)
  • OAuth Integrations: Supports Google OAuth (Gmail, Calendar, Drive, Contacts) and potentially other services
  • AI Inference: Uses "third-party AI systems" for response generation (per Terms of Service)
  • Web Search: Can process web search queries on behalf of the user
  • System Actions: Can open applications and URLs via OS mechanisms

Inferred Architecture (Educated Speculation)

Based on the macOS-only, Apple Silicon requirement, and the product's behavior:

  • Framework: Likely Swift/SwiftUI or Electron/Tauri — the Apple Silicon requirement and system-level integration (fn key, screen capture, accessibility) strongly suggest a native Swift app or at minimum a hybrid with native components
  • Voice Streaming: Likely uses Apple's Speech framework for on-device STT, or streams audio to a cloud STT service (Whisper API, Deepgram, or AssemblyAI)
  • Screen Understanding: Screenshot + multimodal AI model (e.g., GPT-4V/GPT-4o or Claude Vision) for contextual awareness
  • Local + Cloud Hybrid: Some processing on-device (audio capture, screen capture, UI overlay), with AI inference happening server-side

4. AI/ML Stack

The Terms of Service explicitly state: "Lemon uses third-party AI systems to generate responses and perform tasks." The specific AI provider(s) are not disclosed, but the capabilities suggest:

AI Pipeline (Inferred)

Step 1
Voice Input → Transcription
Real-time audio streaming to an STT service. Voice is not retained. Likely candidates: OpenAI Whisper API, Deepgram, or Apple Speech framework.
Step 2
Context Gathering
Screenshots captured in memory (if activated), plus any active OAuth data from connected services (Gmail, Calendar, etc.).
Step 3
Intent + Action Planning
The transcribed text + screen context is sent to an LLM for intent classification and action planning. Given the multimodal requirements (text + image), likely candidates are OpenAI GPT-4o or Anthropic Claude.
Step 4
Action Execution
The app performs the planned action: composing text, executing a search, interacting with OAuth-connected services, or triggering OS-level actions.
Step 5
Output Storage
Generated outputs (assistant responses, drafts, summaries) are stored persistently. Raw inputs (voice, screenshots, third-party data) are discarded.

Note: The privacy policy explicitly states that screenshots, voice recordings, and third-party integration data are never used to train AI models. This is consistent with using commercial API-based AI services (OpenAI, Anthropic) that offer data-processing agreements.

5. Infrastructure

Marketing Site Hosting Framer (sites.framer.app), hosted on AWS (AS16509 Amazon.com, Inc.)
IP Addresses 31.43.160.6, 31.43.161.6 (anycast, Amsterdam PoP)
Server Region ca-central-1 (Canada — from server-timing header)
Domain Registrar Namecheap (registrar-servers.com nameservers)
Email Google Workspace (MX: aspmx.l.google.com, SPF includes _spf.google.com)
Video CDN Cloudflare Stream
Asset CDN Framer CDN (framerusercontent.com)
TLS HTTP/2, HSTS enabled (max-age 31536000)

Parent company domain (futureprooflabs.ai) resolves to the same Framer IPs, confirming both sites are Framer-hosted. The backend API for the desktop app is not publicly exposed or discoverable — it likely runs on a separate cloud infrastructure (AWS, GCP, or similar) not linked to the marketing domain.

6. Key Integrations

Google Workspace (Confirmed)

Gmail, Google Calendar, Google Drive, Google Contacts via OAuth. Explicitly detailed in privacy policy with Google API Limited Use compliance.

Third-Party AI (Confirmed)

Uses external AI systems for text generation. Specific provider not disclosed (likely OpenAI or Anthropic).

Third-Party Auth (Confirmed)

Uses an external authentication provider for account management and login sessions.

Web Search (Confirmed)

Can perform web searches on behalf of the user. Search provider not specified.

Cloudflare Stream (Confirmed)

Used for marketing site video hosting/delivery.

Other OAuth Services (Likely)

Privacy policy mentions generic "third-party services via OAuth" and access to "messages, emails, social or workspace content" — suggesting possible Slack, Microsoft, or other integrations.

7. Data Flow

┌─────────────────────────────────────────────────────────────┐
│                     USER'S MAC (Apple Silicon)               │
│                                                              │
│  ┌──────────┐   ┌──────────────┐   ┌──────────────────────┐ │
│  │ fn key   │──▶│  Lemon App   │──▶│ Screen Capture       │ │
│  │ pressed  │   │  (native)    │   │ (in-memory only)     │ │
│  └──────────┘   └──────┬───────┘   └──────────┬───────────┘ │
│                         │                       │             │
│  ┌──────────────────────▼───────────────────────▼──────────┐ │
│  │              Audio + Context Packaging                   │ │
│  └──────────────────────┬──────────────────────────────────┘ │
└─────────────────────────┼────────────────────────────────────┘
                          │ (HTTPS)
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                     LEMON BACKEND (Cloud)                    │
│                                                              │
│  ┌──────────┐   ┌──────────────┐   ┌──────────────────────┐ │
│  │ STT      │──▶│ LLM / AI     │──▶│ Action Router        │ │
│  │ Service  │   │ (GPT-4o /    │   │ (execute user intent)│ │
│  │          │   │  Claude?)    │   │                      │ │
│  └──────────┘   └──────────────┘   └──────────┬───────────┘ │
│                                                │             │
│  ┌─────────────────────────────────────────────▼──────────┐ │
│  │  OAuth Broker (Google, others)  │  Web Search Engine   │ │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                     RESPONSE TO CLIENT                       │
│  Generated text, action results, drafts → stored in app     │
│  Voice audio, screenshots, raw 3rd-party data → discarded   │
└─────────────────────────────────────────────────────────────┘
        

The data flow follows a strict privacy-first pattern: raw inputs (voice, screenshots, third-party data) are transient and never persisted. Only user-visible outputs (assistant responses, drafts, task history) are stored. Users can delete all stored data at any time, and account deletion permanently removes everything.

8. Company & Team

Legal Entity Futureproof Lab, Inc. (Delaware corporation)
Registered Address 1111B South Governors Avenue, Dover, DE 19904
HQ New York, NY
Parent Brand Futureproof Labs (futureprooflabs.ai)
Founded 2024
Team Size ~3 employees (per PitchBook)
Funding $1.27M pre-seed from Flybridge
Key Person Hassan W. Bhatti — serial AI entrepreneur (Forbes 30 Under 30, sold CryptoNumerics to Snowflake, co-hosts Superhuman AI newsletter)
Known Team (LinkedIn) Touseef Ullah, Ghisuh Na, Sherjeel Syed

9. Business Model

The business model is not fully transparent from publicly available information. Key observations:

10. What Remains Unknown

  • Exact AI model provider(s) — OpenAI, Anthropic, or others
  • Specific STT service used for voice transcription
  • Backend language/framework (Go, Python, Node.js, etc.)
  • Backend hosting provider (AWS, GCP, or other)
  • Database technology used for storing assistant content and task history
  • Whether the desktop app is fully native (Swift) or hybrid (Electron/Tauri)
  • The identity of the "third-party authentication provider"
  • Web search provider used for the search feature
  • Specific pricing tiers and timeline for paid launch
  • Hassan Bhatti's exact role (founder? advisor? promoter?)

Sources