A recent episode on the popular HBO comedy “Silicon Valley” involved the creation of an app called “See Food,” which was designed to be able to identify items of food from snapped images. To the chagrin of the app’s backers, however, the prototype was only able to identify one food item – a hot dog – and everything else was identified as “not hot dog.”
Unlike the fictional “See Food” app, the AI system developed by the MIT researchers – called Pic2Recipe – is able to look at a still image of a dish of food and then predict the ingredients and suggest similar recipes. Being able to do such analysis on photos, say the researchers, could help people learn recipes and better understand their eating habits.
“In computer vision, food is mostly neglected because we don’t have the large-scale datasets needed to make predictions,” says Yusuf Aytar, an MIT postdoc who co-wrote a paper about the system. “But seemingly useless photos on social media can actually provide valuable insight into health habits and dietary preferences.”
The countless number of available online recipes with user-submitted photos, say the researchers, represents an opportunity to train machines to automatically understand food preparation by jointly analyzing ingredient lists, cooking instructions, and food images. In assembling the dataset for their system, the researchers scraped recipes from over two dozen popular cooking websites by extracting the relevant text content and downloading associated linked images.
The result was “Recipe1M” – a database of over one million recipes annotated with information about the ingredients in a wide range of dishes. This database was then used to train a neural network to find patterns and make connections between food images and corresponding ingredients and recipes.
The resulting Pic2Recipe system, when presented with an image of a food item, is able to both identify the food’s ingredients – like flour, eggs, and butter – and then suggest recipes based on similar images. The system performed best on dessert items, which were especially popular items in the database, and least well on more “ambiguous” foods, such as sushi and smoothies.
The researchers are looking to improve the system and even add capabilities such as being able to infer how a food was prepared, or to identify variations of food types. Potentially, say the researchers, they envision developing it into a “dinner aide” app that could help people decide what meal to cook based on a provided dietary preference and list of items in the fridge.
“This could potentially help people figure out what’s in their food when they don’t have explicit nutritional information,” says MIT’s Computer Science and Artificial Intelligence Laboratory graduate student Nick Hynes and lead co-author of the paper. “For example, if you know what ingredients went into a dish but not the amount, you can take a photo, enter the ingredients, and run the model to find a similar recipe with known quantities, and then use that information to approximate your own meal.”
A Pic2Recipe online demo is available, where users can upload their own food images to test it out. For more, see “Learning Cross-modal Embeddings for Cooking Recipes and Food Images.” (PDF)
Analog Devices, Consumer Physics team to enable material analysis anywhere
IBM: Five innovations that will change our lives within five years
IBM Watson IoT collaboration with Indiegogo, Arrow Electronics