Building a Jarvis Voice System for Your Smart Home

Making Your House Talk: Piper TTS with a Custom Jarvis Voice

I wanted my house to sound like Jarvis. Not Alexa, not Google, not Siri — Jarvis. The AI butler voice from Iron Man. Calm, competent, slightly dry. The kind of voice that says "the garage has been open for ten minutes" and you actually want to hear it.

I got it running in one session. Fully local, zero subscription, and it sounds better than the cloud service I was paying $5/month for.

Why Not ElevenLabs?

I tried ElevenLabs first. It has a Home Assistant integration, the voice quality is excellent, and for $5/month you get enough characters for a smart home. I had it working. Here's why I stopped:

Latency: Every TTS request goes to ElevenLabs' API, gets synthesized, and comes back as audio. Round trip: 1-3 seconds depending on their server load. For a voice that's supposed to announce "alarm armed" right after you tap the button, that delay is noticeable and annoying.
Cloud dependency: Internet goes down, your house goes mute. The one time you really want a voice announcement — alarm triggered at 3 AM — is exactly when you don't want to depend on an API being reachable.
Cost at scale: $5/month is cheap. But it's $60/year for something a local model does for free. And once you start adding more announcements (garage open too long, person detected, welcome home, goodnight), you burn through the character allowance faster than you'd expect.

Piper TTS runs entirely on the Pi. No internet. No API. No subscription. Synthesis time: under 200ms for a typical announcement. And with the right voice model, it sounds great.

Installing Piper TTS

In Home Assistant, go to Settings → Add-ons → Add-on Store. Search for "Piper." Install the official Piper add-on. Start it. That's the base TTS engine.

It comes with default voices that sound like a GPS navigator from 2008. Don't judge Piper by these. The quality depends entirely on the model.

The Jarvis Voice Model

The model that makes this work: jgkawell/jarvis on HuggingFace. It's a community-trained Piper voice model that sounds remarkably close to the MCU Jarvis voice. Not perfect, but good enough that visitors to my house have commented on it unprompted.

Download jarvis-high.onnx and its corresponding .json config file from HuggingFace. Place both in /share/piper/ on your HA instance (accessible via Samba or SSH).

In the Piper add-on configuration, you don't need to change anything — the model is selected per-service-call. In your automations, call tts.piper with the option voice: jarvis-high. That's it.

The Centralized Announcement Script

Don't scatter tts.speak calls across every automation. Create a centralized script that handles all announcement logic:

script.announce takes three inputs:

message: What to say
priority: normal, important, or critical
target: (optional) Override the default speaker routing

The script handles routing logic internally:

Normal priority: Living room speaker during the day. Bedroom HomePod Mini at night. Nothing between 11 PM and 7 AM.
Important priority: All common-area speakers. Still silent in bedrooms at night unless occupancy detected.
Critical priority: Every speaker in the house, maximum volume, overrides quiet hours. Reserved for alarm events only.

The 9 Launch Automations

I deployed 9 voice automations on day one, all in a single package file (jarvis_tts.yaml):

Alarm armed: "Home security armed, [mode]."
Alarm disarmed: "Security disarmed. Welcome home."
Welcome home: "Welcome home, [name]." Only fires if the house was empty.
Alarm triggered: Critical priority. "SECURITY ALERT. [zone name] has been triggered." Every speaker, full volume.
Person detected (camera): "Person detected, [camera location]." Watches camera AI booleans.
Garage open too long: "The garage has been open for [duration]." Fires at 10 and 20 minutes.
Goodnight: "Goodnight. All doors locked, garage closed, alarm armed."
Door left open: "The [door name] has been open for 5 minutes."
System announcement: Generic template for HA restart notifications, backup completion, integration errors.

Every one of these calls script.announce with a message and priority. None of them manage speaker routing or volume.

Why This Pattern Scales

Notice how the voice automations follow the same sensor-boolean-timer pattern from last issue. The camera person detection doesn't know it triggers a voice announcement — it just flips a boolean. The garage timer doesn't know it triggers speech — it just fires a state change. The voice automations watch those same events independently.

I added 9 voice behaviors to my house without modifying a single existing automation. That's the payoff of the architecture.

If you want the full Jarvis voice setup as a ready-to-install package — the voice model files, the centralized announcement script with priority routing and night mode, and all 9 automations pre-configured — the Jarvis Voice Pack gets you from zero to talking house in about 15 minutes.

Quick Tips

Use the high quality model: Piper models come in low, medium, and high quality. The high model (jarvis-high.onnx) sounds natural. The file is larger (~60MB) but synthesis is still under 200ms.
Test with Developer Tools: Call tts.speak directly from HA's Developer Tools with your message and voice option. Iterate on the message text until it sounds right.
Cancel ElevenLabs if you have it: If you installed ElevenLabs to try it, don't forget to cancel the subscription. Remove the integration entirely when you switch to Piper.

Next Issue

UPB lighting — the forgotten protocol that runs 54 lights in my house without WiFi, without Zigbee, and without a hub that needs a cloud connection. How powerline automation from 2005 became the most reliable part of my smart home.

— The Automated Home