As a web developer, do you feel like you’re completely out of the loop on how to implement AI into your applications? Does the idea of training a computer-vision model sound like magic? Fear not, I’ll show you how to add a sprinkle of AI magic to your applications without having to get a PhD in machine learning. We'll dive into Google's MediaPipe, a toolkit that makes adding AI features relatively straightforward.
We’ll get our hands dirty with a React demo, where I will walk you through real-time face and pose detection right in your browser. Hopefully you’ll feel motivated to try it out yourself!
Google MediaPipe is an open-source toolbox that lets you build and deploy machine learning models to chew through all sorts of tasks. In this article we’ll focus on computer-vision related tasks (pose and face landmark detection), but you can actually use it for all sorts of other things as well (classification, gesture recognition, image generation, etc.). While MediaPipe is a vast playground for creating custom AI stuff, we're zoning in on MediaPipe Solutions for this series.
MediaPipe Solutions are like Lego sets that are pre-assembled. They're customizable models for common tasks—think detecting faces, tracking hands, estimating poses, and spotting objects. They package up the brain-hurting parts of machine learning, giving you a head start on integrating AI into your projects.
@mediapipe/tasks-vision
First we need to integrate the MediaPipe npm package for pose and face landmark detection. As a bonus, it comes with pre-typed for TypeScript:
npm i @mediapipe/tasks-vision