Motivation for launching the project by the client: in the process of eating, a person makes certain movements with the hands, which can be recognized using data from sensors on smartwatches or fitness trackers. By the number and frequency of these movements, you can determine the amount of eaten food and the eating speed, which can be useful for people who watch what they eat. The task was to build a model to obtain this information from the sensors data.
What we had initially:
Project goals:
MIL Team's solution: first, we annotated an additional 6% of the videos by using an outsourced team. Before sending the video annotation, we used an open model of face detection with subsequent blooming to anonymize the data. Next, we used the output of the pose estimation model on the video with a meal as input to the gesture recognition model. We ran the trained video annotation model on the remaining 91% of the videos for automatic annotation. This annotation was used to train a gesture recognition model based on IMU data. According to the correlation between the responses of these models to the video and IMU series, the time between the video and the sensors was synchronized. The final model was trained on the already synchronized automatic annotation and sensor data. We also solved the problem of classifying what a person is standing while eating according to data from sensors.
Tools for building the model:
The model results:
Client: under NDA
Technological stack: Python, PyTorch