A team at Google has proposed using artificial intelligence technology to create a “bird’s eye” view of users’ lives using mobile phone data such as photos and search queries.
The idea, dubbed “Project Ellmann” after biographer and literary critic Richard David Ellmann, is to use LLMs like Gemini to ingest search results, recognize patterns in a user’s photos, create a chatbot, and “previously impossible to answer questions,” according to a copy of a presentation viewed by CNBC. Ellmann’s goal is to be “your life story teller.”
It’s unclear whether the company plans to bring these features to Google Photos or another product. According to a company blog post, Google Photos has more than 1 billion users and 4 trillion photos and videos.
Project Ellman is just one of many ways Google is proposing to develop or improve its products with AI technology. On Wednesday, Google unveiled its latest “most powerful” and advanced AI model, Gemini, which in some cases outperformed OpenAI’s GPT-4. The company plans to license Gemini to a variety of customers through Google Cloud so they can use it in their own applications. One of Gemini’s standout features is its multimodality, meaning it can process and understand information beyond text, including images, videos and audio.
According to documents seen by CNBC, a Google Photos product manager recently introduced Project Ellman to the Gemini teams at an internal summit. They wrote that over the past few months, teams have discovered that large language models are the ideal technology to make this bird’s-eye view of one’s life story approach a reality.
Ellmann could include context using bios, previous moments and subsequent photos to describe a user’s photos in more detail than “just pixels with labels and metadata,” the presentation says. It suggests being able to identify a range of moments such as college years, years in the Bay Area, and years as parents.
“We can’t answer tough questions or tell good stories without a bird’s-eye view of your life,” reads a description alongside a photo of a little boy playing in the dirt with a dog.
“We search through your photos and look at their tags and locations to identify a meaningful moment,” reads a presentation slide. “When we step back and understand your life in its entirety, your overarching story becomes clear.”
The presentation said large language models could infer moments such as the birth of a user’s child. “This LLM can use insights from further up the family tree to conclude that this is Jack’s birth and that he is James and Gemma’s first and only child.”
“One of the reasons an LLM is so powerful for this bird’s-eye view approach is that it is able to capture unstructured context from all different heights of that tree and use it to inform the understanding of other regions of the tree to improve,” he says The slide reads alongside a representation of the various “moments” and “chapters” of a user’s life.
The presenters gave another example of noting that a user had recently attended a class reunion. “It has been exactly 10 years since he graduated and there are many faces that have not been seen in 10 years, so this is probably a reunion,” the team concluded in their presentation.
The team also demonstrated “Ellmann Chat” with the description: “Imagine opening ChatGPT, but it already knows everything about your life. What would you ask?”
An example chat was displayed in which a user asks, “Do I have a pet?” to which he replies, “Yes, the user has a dog that was wearing a red raincoat,” and then says the dog’s name and the names of the two Family members he is most often seen with.
Another example of the chat involved a user asking when their sibling last visited. Another asked to list similar cities they live in because they are thinking about moving. Ellmann gave answers to both.
Ellmann also presented a summary of the user’s eating habits, as other slides showed. “You seem to like Italian food. There are several photos of pasta dishes as well as a photo of a pizza.” It also said that the user seemed to enjoy new food, as one of his photos showed a menu with a dish he was unfamiliar with.
The technology also determined from the user’s screenshots what products the user wanted to purchase, what their interests were, and what their work and travel plans were, the presentation said. It also suggested that it would be able to know their favorite websites and applications, such as Google Docs, Reddit and Instagram.
A Google spokesperson told CNBC: “Google Photos has always used AI to help people find their photos and videos, and we’re excited about the potential of LLMs to enable even more helpful experiences.” This was an early one internal exploration and, as always, should we decide to introduce new features, we would take the time necessary to ensure that they are helpful to people and that protecting user privacy and security is a top priority.”
Big Tech’s Race to Create AI-Driven “Memories”
The proposed Project Ellmann could help Google in the tech giants’ arms race to create more personalized memoirs.
Google Photos and Apple Photos have been used for years to provide “memories” and create albums based on photo trends.
In November, Google announced that Google Photos can now use AI to group similar photos and organize screenshots into easy-to-find albums.
Apple announced in June that its latest software update would include the ability of its Photos app to recognize people, dogs and cats in their photos. It already sorts faces and allows users to search for them by name.
Apple also announced an upcoming Journal app that will use on-device AI to create personalized suggestions to prompt users to write passages describing their memories and experiences based on recent photos, locations, music and workouts.
But Apple, Google and other tech giants are still grappling with the complexities of appropriately displaying and identifying images.
For example, Apple and Google still avoid labeling gorillas after reports in 2015 that the company incorrectly referred to black people as gorillas. A New York Times investigation this year found that Apple and Google’s Android software, which underlies most of the world’s smartphones, has disabled the ability to visually search for primates for fear of labeling a person as an animal.
Companies like Google, Facebook and Apple have added controls over time to minimize unwanted reminders, but users have reported that they still sometimes pop up and require users to cycle through different settings to minimize them.
Don’t miss these stories from CNBC PRO:
Source : www.cnbc.com