Our client's vision was to create an app which would help end users track the food in terms of macronutrients they were eating. Today, most people do not track or know the amount of food they are consuming. This is due mostly to the barriers that one must cross in order to measure and track each plate of food; measuring the volume of each portion, calculating the macronutrients based on the volume, tracking the total amount of macronutrients consumed across time, etc. 

The way he envisioned this was through a mobile application which would capture images of a plate of food through various images, then immediately process these images to identify what the type and amount of food they were about to eat. Finally, show the user the amount of macronutrients he/she was about to consume. 

The client had a very focused vision of the product with lots of domain knowledge but lacked the skills in AI and cloud development, key components in the development of the application. Also, his vision was backed by scientific studies but had yet to be challenged in a practical manner.


After analyzing the requirements thoroughly, 3XM assembled a team with expertise in AI & ML, supported by specialists in cloud development. The key in the solution were the machine learning components backed up by a scalable architecture managed and deployed through IaC (Infrastructure as Code).  This infrastructure contemplated various components like step functions and lambdas that scaled automatically. 

Machine Learning Models

The machine learning models were in charge of segmentation and classification of food images, followed by algorithms that used all of the information to calculate the final macronutrients based on the FDA public database. Finally, all of the data was rendered into a FDA food label and sent to the user.

Two main models were used:

  • Pre-trained Semantic Segmentation model: Provides a fine-grained, pixel-level approach to developing computer vision applications. The segmentation output is represented as a grayscale image, called a segmentation mask, in our case the output was a mask for each portion of food in the plate. This model has 2 main components: 
→ The backbone (or encoder): A network that produces reliable activation maps of features. We used a ResNet-101 pre-trained with ImageNet, a dataset containing more than 14 million images.
→ The decoder: A network that constructs the segmentation mask from the encoded activation maps. The decoder had to be retrained with the images from our datasets. 
  • Food class classification model: NN (Neural Nets) trained with infrared data preprocessed from the image datasets.

Though the segmentation model outputs a class for every pixel, we needed to leverage preprocessed infrared data for a better classification. Considering the dataset sizes it was decided that the best approach, until the datasets increased in volume, would be to generate two simple models instead of one.  

Both models were trained and deployed using Amazon Sagemaker.



In order to create machine learning models, datasets were needed but very few were found on the internet that met the requirements of the application. Throughout the whole project the team worked side-by-side with the client in order to create the datasets needed to train these models. In order to generalize, the generation of the datasets were done in an incremental and iterative manner, a report of insights of the new data was sent to the client every time part of each dataset was delivered. Also, through the Amazon A2I (Augmented AI) a private labeling workforce was contacted to work with the team in order to create ground truth annotations for each dataset. 

Throughout the whole project the team was faced with various challenges in the implementation of the architecture, especially in the creation of the machine learning models. This was due to the fact that though the client’s vision had solid scientific background it was yet to be challenged to real-world scenarios. Lots of research and various experiments were needed to reach the best trade-offs. 


The team managed to create an MVP that complied with various application requirements. 

  • Real time macronutrients estimation with AI displayed with FDA labels.
  • Trained and deployed segmentation & classification machine learning models 
  • Maintainable and reproducible deployment of the AWS architecture. 
  • Monthly and annual cloud costs estimate reports for the business model.



  • Lambdas
  • Step Functions
  • EC2
  • DynamoDB
  • Sagemaker
  • CloudWatch
  • S3
  • EventBridge
  • A2I (Augmented AI)
  • Amazon API Gateway
  • AWS Code Build
  • AWS Elastic Beanstalk



→Data Analytics & Machine Learning

  • MXNet
  • Tensorflow
  • Pandas
  • Numpy

→Image Visualization & Manipulation

  • Pillow
  • OpenCV