Camera AI Pipeline: From Dumb Cameras to Smart Alerts

Camera AI Without the Cloud: Vivotek VCA to Push Notification in 2 Seconds

Every camera company wants you on their cloud. Monthly subscription, your footage on their servers, latency measured in "whenever they feel like it." Arlo, Ring, Nest — same business model with different logos.

I run 4 cameras with on-device person detection, push notifications with snapshots to my phone, and zero cloud dependency. Total subscription cost: $0/month. Latency from person detected to notification on my phone: under 2 seconds.

Here's exactly how it works.

The Architecture

The pipeline has five stages. Each one is dead simple on its own:

Vivotek VCA detects a person on-camera (edge AI, no server needed)
Camera sends an HTTP webhook to Home Assistant
HA automation flips an input_boolean to ON
A timer starts for cooldown (prevents notification spam)
A separate automation watches the boolean, grabs a snapshot, sends a push notification

That's it. No Frigate. No cloud API. No image processing service running on your server. The AI runs on the camera's own chip.

Why Not HA's Built-in Image Processing?

Home Assistant has an image processing integration. You can point it at a camera feed and run TensorFlow or OpenCV on each frame. I tried it. Here's why I stopped:

CPU load: Processing 4 camera feeds at any useful frame rate pinned my Pi at 100%. HA became sluggish. Automations delayed.
Latency: By the time HA pulled a frame, processed it, and decided "that's a person," the person had walked past. 5-8 second delays were common.
False positives: Without a well-tuned model, every shadow, cat, and tree branch was a person. Notification fatigue killed it within a day.

The Vivotek FD9389 cameras have video content analysis (VCA) built into their firmware. The camera does the AI work. It's been trained on a real dataset, it runs at full frame rate, and it outputs a simple event: "person detected in zone." All HA has to do is receive a webhook and react — trivial work for even a Pi.

Setting Up the Camera Side

In the Vivotek web UI (http://camera-ip), go to Configuration → Applications → Motion Detection. But don't use motion detection — go to the VCA tab. Enable "Intrusion Detection" or "Loitering Detection" with the object type set to "Human."

Draw your detection zone on the camera's preview image. Be specific — don't cover the entire frame. I exclude areas with trees, flags, anything that moves in wind. Tight zones = fewer false detections.

Under Event → Event Server, configure an HTTP notification target. Point it at your HA webhook URL:

http://your-ha-ip:8123/api/webhook/camera_person_CAMERA_NAME

One critical gotcha: Vivotek's testserver.cgi does NOT send real HTTP POSTs. I spent an entire session trying to figure out why my test webhooks weren't arriving in HA. The test button in the camera UI uses an internal test mechanism, not the actual HTTP POST that real VCA events use. You have to actually trigger the camera — walk in front of it — to test the webhook pipeline. Don't debug a working system because the test button lies to you.

The HA Side

I use the sensor-boolean-timer pattern (more on this in the next issue — it's the most important architecture pattern in my entire system).

For each camera, I have:

input_boolean.camera_ai_CAMERA_NAME — represents "person detected at this camera"
timer.camera_ai_CAMERA_NAME_cooldown — 60-second cooldown to prevent spam

Automation 1 (webhook receiver): When the webhook fires, check if the cooldown timer is idle. If yes, turn on the boolean and start the timer. If the timer is already running, ignore — we already notified.

Automation 2 (notification sender): When any camera boolean turns ON, grab a snapshot from that camera's still image URL, send a push notification with the image attached, then turn the boolean OFF.

The still image URL for Vivotek cameras is: http://camera-ip/cgi-bin/viewer/video.jpg with Digest authentication. Set this as your generic camera's still_image_url in HA. The snapshot arrives in the notification instantly — no stream decoding, no RTSP overhead.

Why the Boolean Matters

You might ask: why not just send the notification directly from the webhook automation? Why the boolean intermediary?

Because the boolean is a semantic event that other automations can watch. Today it triggers a push notification. Tomorrow it might also flash a light, announce on speakers, or record a clip. I add those responses without touching the webhook automation. The detection side and the response side are completely decoupled.

I have 4 cameras running this exact pattern. Adding a 5th camera takes about 3 minutes: create the boolean, create the timer, add one trigger to each automation, done.

Performance

From person entering the detection zone to notification appearing on my iPhone: consistently under 2 seconds. The bottleneck is Apple's push notification delivery, not anything in my pipeline. The webhook arrives in HA within 200ms of the VCA event. HA processes it in under 50ms. The snapshot fetch takes maybe 300ms. The rest is APNS latency.

CPU impact on my Pi: essentially zero. HA isn't doing any image processing. It receives a webhook, flips a boolean, fetches one JPEG, and sends a notification. That's less work than a light automation.

Quick Tips

Enable Smart IR and WDR Pro: On Vivotek cameras, these dramatically improve night detection. Smart IR adjusts intensity to avoid blowing out close objects. WDR Pro handles mixed lighting (porch light + dark yard). Better image = better VCA accuracy.
Set DNR to 60%: Digital Noise Reduction at 60% is the sweet spot — cleans up the feed enough for good VCA without smearing detail. At 100% it over-smooths and the VCA starts missing detections.
Use sub-streams for HA entities: Point your HA generic camera at the sub-stream (e.g., live1s2.sdp) for the live view, but use the full-resolution still image URL for snapshots. Live view doesn't need 5MP. Your notification snapshot does.

Next Issue

The architecture pattern I keep referencing — Sensor, Boolean, Timer, Action. It's the single design decision that makes everything else in my system work. I'll break down exactly why decoupling sensors from actions matters, and how package files organized by domain (not room) keep 70+ automations manageable.

— The Automated Home