AI Enhances Machine Understanding of Visual Content

In today’s data-driven world, every decision a business makes should be grounded in analytics. However, a significant gap exists in many organizations: the inability to harness insights from visual data.

That’s where Coactive comes in. Founded by Cody Coleman ’13, MEng ’15, and William Gaviria Rojas ’13, the company has developed an AI-powered platform designed to interpret complex data forms, including images, audio, and video. This innovative approach allows businesses to extract valuable insights and make decisions more rapidly and effectively.

“The first big data revolution improved business utilization of structured datasets,” Coleman explains, referring specifically to data drawn from spreadsheets and tables. “Today, however, around 80 to 90 percent of global data is unstructured. In this new era of big data, organizations will need to process images, videos, and audio at scale, with AI playing a crucial role in this transformation.”

Coactive is already partnering with notable media and retail firms to streamline the understanding of visual content, eliminating the need for tedious manual sorting and tagging. This efficiency is enabling companies to deliver the right content to users swiftly, expunge inappropriate material, and analyze how particular content influences consumer behavior.

Beyond its practical applications, Coactive represents a vision where AI empowers humans to enhance efficiency and tackle new challenges. “The term coactive signifies working together harmoniously,” says Coleman. “Our ultimate goal is to foster collaboration between humans and machines. In an age where AI can either divide or unite us, we aspire for Coactive to be an unifying force, bestowing humans with new capabilities.”

Empowering machines with vision

Coleman and Gaviria Rojas first crossed paths during their freshman year through the MIT Interphase Edge program. Their academic journey led them to major in electrical engineering and computer science, collaborating on projects like bringing MIT OpenCourseWare to universities across Mexico.

“That experience was a fine illustration of entrepreneurship,” Coleman reflects, recalling the OpenCourseWare initiative. “It was incredibly fulfilling to oversee both the business and software development aspects, which ultimately inspired me to launch my own web ventures and enroll in the MIT course ‘Founder’s Journey.’”

Coleman’s first taste of AI came during his graduate research at MIT’s Office of Digital Learning (now MIT Open Learning), where he used machine learning to explore how individuals learn within MITx—home to the university’s massive open online courses.

“It was extraordinary to realize that I could share my transformative MIT experience through digital learning,” Coleman comments. “By leveraging AI and machine learning, we can create adaptive educational systems that not only facilitate deep learning but also offer personalized experiences globally.”

After completing his studies at MIT, Coleman pursued a PhD at Stanford University, focusing on making AI more accessible. His research took him to firms like Pinterest and Meta, exploring applications of AI and machine learning.

“I gained insights into how top companies were leveraging AI to create business value, and that sparked the initial idea for Coactive. I thought, ‘What if we design an enterprise-grade operating system for content that harnesses multimodal AI?’”

Meanwhile, Gaviria Rojas transitioned to the Bay Area in 2020, securing a position as a data scientist at eBay. During the move, he sought Coleman’s help to transport his couch, which led to a pivotal conversation.

“On that drive, we both recognized an impending explosion in data and AI,” Gaviria Rojas shares. “At MIT, we witnessed the big data revolution firsthand, observing technologies designed to extract value from data at scale. We realized a similar explosion was on the horizon, driven by enterprises gathering vast amounts of multimodal data—images, videos, audio, and text—needing a technological solution to harness it effectively: AI.”

The platform they developed—a versatile “AI operating system”—is model-agnostic, allowing for easy integration of improved AI models over time. Coactive includes pre-built applications enabling businesses to search their content, generate metadata, and conduct analytics to glean insights.

“Historically, computers interpreted the world through bytes, while humans did so visually. Now, with AI, machines can finally perceive the world like we do, blurring the lines between the digital and physical realms,” Coleman states.

Enhancing human-computer interaction

For instance, Reuters’ image database, a critical resource for global journalists, previously relied on labor-intensive manual tagging. This slow and costly process often left reporters struggling to find relevant images during their searches.

“The inefficiency resulted in limited search results, despite many relevant images in the database,” Coleman explains. “Now, with Coactive’s AI Search feature, journalists can easily access pertinent content based on the AI’s comprehension of each image’s details.”

Reuters isn’t alone in facing content management challenges. Effective digital asset management is crucial for many media and retail organizations that typically depend upon manually entered metadata for sorting through their assets.

Another prominent customer, Fandom—one of the largest platforms hosting information about TV shows, video games, and movies—utilizes Coactive to analyze visual content within its communities, assisting in the removal of excessive or inappropriate material.

“Previously, Fandom needed 24 to 48 hours to review each new piece of content,” Coleman notes. “With Coactive, they’ve optimized their community guidelines and can analyze new content in an average of 500 milliseconds.”

The founders envision Coactive as a catalyst for a revolutionary shift in human-machine collaboration.

“Throughout the history of human-computer interaction, we’ve had to manipulate keyboards and mice to communicate with machines,” Coleman asserts. “Now, for the first time, we can interact naturally—sharing images and videos—and AI truly understands that content. This represents a fundamental shift in how we conceive of human-computer interactions. Coactive’s guiding vision reflects the need for a new operating system and a refreshed approach to working with content and AI.”

Photo credit & article inspired by: Massachusetts Institute of Technology

Leave a Reply

Your email address will not be published. Required fields are marked *