Wearable, on-device, bilingual

HearSense

Sound, made visible.

Live captions, severity-tiered alerts, and wrist haptics. All inference runs locally on the device.

  • UN SDG 10
  • WCAG 2.2 AA
  • On-device by default

The gap

Sound is information. Most of it never reaches deaf people.

1.5B

people live with hearing loss worldwide (WHO, 2024).

Hearing aids amplify. They don't translate.

Pipeline

Three steps, one continuous stream.

1

Listen

Mic array. 16 kHz stream. VAD-gated.

2

Understand

ASR, sound classes, prosody, reply intents.

3

Surface

Captions, alerts, haptic patterns.

What it does

Six things, end-to-end.

Live captions

Real-time speech in EN and PL.

Sound alerts

527 classes across three severity tiers.

Critical Notify Ambient

Smart replies

Three tap-to-speak suggestions per turn.

Name detection

Hears your name in any inflection.

Laughter sense

Knows when a room is laughing.

Bilingual ASR

One model, two languages, auto-detect.

The stack

Open models, measured end-to-end.

Every model in the pipeline is open-source or replicable from a published paper.

ASR NVIDIA Parakeet TDT 0.6B
Sound classes EfficientAT mn10_as · 527 AudioSet classes
VAD Silero
Prosody pyin / CREPE
Smart reply Model2Vec (retrieval only)
Runtime ONNX Runtime · CPU · aarch64

Hardware

Three assemblies, wirelessly linked.

All ML runs locally. No cloud, ever.

Compute

Raspberry Pi CM5 (8 GB), quad-core Cortex-A76. Active cooling for continuous inference.

Neckband

MEMS mic array over I2S, RP2040 USB Audio Class bridge. 32 kHz capture, decimated to 16 kHz in software.

Haptic wristband

ESP32-S3 over BLE. Four LRA via DRV2605L. Five severity patterns, under 150 ms target.

AR glasses (optional)

INMO Air3 over 5 GHz WiFi. Captions in the field of view. The system works without them too.

Power & weight

18650 lithium-ion cell. ≈211 g without glasses. About 1 to 1.3 h in demo mode.

Local, private

No cloud connection for the core features. Audio never leaves the device.

Numbers

Measured where it matters.

Bench numbers from a fixed reference-audio suite. Full user-testing is the next stage.

≈ 300 ms
perceived caption latency
527
sound classes
10
smart-reply intents
sub-ms
reply retrieval

Our commitment

Designed honestly, from day one.

These are commitments and intentions. Where work hasn't happened yet, we say so plainly.

01

Translation, not a cure

We don't claim to fix deafness. HearSense translates sound into sight and touch.

02

Co-design intent

User-testing with deaf and hard-of-hearing participants comes before we finalize the tech, not after.

03

Audio stays local

All inference on-device. No cloud, no audio leaves the body.

04

Accessibility by default

WCAG 2.2 AA across the device and this site, from the start.

Impact

One channel of the world, made accessible.

SDG 10

Reduced inequalities

Closing the information gap between hearing and deaf participants in the same room.

  • Works without a phone
  • Bilingual from day one
  • Open hardware, open models
  • WCAG 2.2 AA baseline