Realtime Motion Capture in the Browser with MediaPipe

As a web developer, do you feel like you’re completely out of the loop on how to implement AI into your applications? Does the idea of training a computer-vision model sound like magic? Fear not, I’ll show you how to add a sprinkle of AI magic to your applications without having to get a PhD in machine learning. We'll dive into Google's MediaPipe, a toolkit that makes adding AI features relatively straightforward.

We’ll get our hands dirty with a React demo, where I will walk you through real-time face and pose detection right in your browser. Hopefully you’ll feel motivated to try it out yourself!

Untitled

So, What's MediaPipe?

Google MediaPipe is an open-source toolbox that lets you build and deploy machine learning models to chew through all sorts of tasks. In this article we’ll focus on computer-vision related tasks (pose and face landmark detection), but you can actually use it for all sorts of other things as well (classification, gesture recognition, image generation, etc.). While MediaPipe is a vast playground for creating custom AI stuff, we're zoning in on MediaPipe Solutions for this series.

MediaPipe Solutions are like Lego sets that are pre-assembled. They're customizable models for common tasks—think detecting faces, tracking hands, estimating poses, and spotting objects. They package up the brain-hurting parts of machine learning, giving you a head start on integrating AI into your projects.

Untitled

Why Should Web Developers Care?

Ease of Use: High-level APIs mean you don't need to be an AI guru to get started.
Performance: Designed for real-time use on the web and mobile, keeping things snappy. MediaPipe executes its web runtime via WebAssembly and WebGL, which translates to running some pretty heavy-duty machine learning models right in the browser, on the device.
Versatility: A toolbox full of models for various tasks, from analyzing facial features to understanding body movements.
Customizability: Pre-built doesn't mean set in stone. There's room to tweak models to fit your project's needs.

Getting Started

Prerequisites for the Reader

Clone the jwc-mediapipe repo and follow along. This repo comes with the full implementation, as well as a Storybook environment for easy testing/experimenting. If you would like to replicate it yourself, I tried to document the details of every config, and provide scripts for how the repo was initially set up.

Integrating `@mediapipe/tasks-vision`

First we need to integrate the MediaPipe npm package for pose and face landmark detection. As a bonus, it comes with pre-typed for TypeScript: