Generative AI Startup Twelve Labs Works with AWS to Make Videos as Searchable as Text

Leading startup makes ‘needle-in-a-haystack’ video searches possible using natural language, turning the world’s largest unsearchable data source—video—into a trove of accessible information

Developers can now find specific movie scenes from decades of video archives, or assess video footage of athletes’ performances, with conversational queries

Twelve Labs uses AWS to train its multimodal foundation models up to 10% faster, while reducing training costs by more than 15%

LAS VEGAS--(BUSINESS WIRE)--At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), today announced that Twelve Labs, a startup that uses multimodal artificial intelligence (AI) to bring human-like understanding to video content, is building and scaling its proprietary foundation models on AWS. Twelve Labs will use AWS technologies to accelerate the development of its foundation models that map natural language to what’s happening inside a video. This includes actions, objects, and background sounds, allowing developers to create applications that can search through videos, classify scenes, summarize, and split video clips into chapters.

Creating applications that can pinpoint any video moment or frame

Available on AWS Marketplace, these foundation models enable developers to create applications for semantic video search and text generation, serving media, entertainment, gaming, sports, and additional industries reliant on large volumes of video. For example, sports leagues can use the technology to streamline the process of cataloging vast libraries of game footage, making it easier to retrieve specific frames for live broadcasts. Additionally, coaches can use these foundation models to analyze a swimmer’s stroke technique or a sprinter’s starting block position, making adjustments that lead to better performance. Finally, media and entertainment companies can use Twelve Labs technology to create highlight reels from TV programs tailored to each viewer’s interests, such as compiling all action sequences in a thriller series featuring a favorite actor.

“Twelve Labs was founded on a vision to help developers build multimodal intelligence into their applications,” said Jae Lee, co-founder and CEO of Twelve Labs. “Nearly 80% of the world’s data is in video, yet most of it is unsearchable. We are now able to address this challenge, surfacing highly contextual videos to bring experiences to life, similar to how humans see, hear, and understand the world around us.”

“AWS has given us the compute power and support to solve the challenges of multimodal AI and make video more accessible, and we look forward to a fruitful collaboration over the coming years as we continue our innovation and expand globally,” added Lee. “We can accelerate our model training, deliver our solution safely to thousands of developers globally, and control compute costs—all while pushing the boundaries of video understanding and creation using generative AI.”

Generating accurate and insightful video summaries and highlights

Twelve Labs’ Marengo and Pegasus foundation models deliver groundbreaking video analysis that not only provides text summaries and audio translations in more than 100 languages, but also analyzes how words, images, and sounds all relate to one other, such as matching what’s said in speech to what’s shown in video. Content creators can also access exact moments, angles, or events within a show or game using natural language searches. For example, major sports leagues use Twelve Labs technology on AWS to automatically and rapidly create highlight reels from their extensive media libraries to improve the viewing experience and drive fan engagement.

“Twelve Labs is using cloud technology to turn vast volumes of multimedia data into accessible and useful content, driving improvements in a wide range of industries,” said Jon Jones, vice president and global head of Startups at AWS. “Video is a treasure trove of valuable information that has, until now, remained unavailable to most viewers. AWS has helped Twelve Labs build the tools needed to better understand and rapidly produce more relevant content.”

Accelerating and lowering the cost of model training

Twelve Labs uses Amazon SageMaker HyperPod to train its foundation models, which are capable of comprehending different data formats like videos, images, speech, and text all at once. This allows its models to unlock deeper insights compared to other AI models focused on just one data type. The training workload is split across multiple AWS compute instances working in parallel, which means Twelve Labs can train their foundation models for weeks or even months without interruption. Amazon SageMaker HyperPod provides everything needed to get AI models up to speed quickly, fine-tune their performance, and scale up operations seamlessly.

Leveraging the scale of AWS to expand globally

As part of a three-year Strategic Collaboration Agreement (SCA), Twelve Labs will work with AWS to deploy its advanced video understanding foundation models across new industries and enhance its model training capabilities using Amazon SageMaker Hyperpod. AWS Activate, a program that helps startups grow their business, has empowered Twelve Labs to scale its generative AI technology globally and unlock deeper insights from hundreds of petabytes of videos—down to split-second accuracy. This support includes hands-on expertise for optimizing machine learning performance and implementing go-to-market strategies. Additionally, AWS Marketplace enables Twelve Labs to seamlessly deliver its innovative video intelligence services to a global customer base.

About Amazon Web Services

Since 2006, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud. AWS has been continually expanding its services to support virtually any workload, and it now has more than 240 fully featured services for compute, storage, databases, networking, analytics, machine learning and artificial intelligence (AI), Internet of Things (IoT), mobile, security, hybrid, media, and application development, deployment, and management from 108 Availability Zones within 34 geographic regions, with announced plans for 18 more Availability Zones and six more AWS Regions in Mexico, New Zealand, the Kingdom of Saudi Arabia, Taiwan, Thailand, and the AWS European Sovereign Cloud. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—trust AWS to power their infrastructure, become more agile, and lower costs. To learn more about AWS, visit aws.amazon.com.

About Amazon

Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. Amazon strives to be Earth’s Most Customer-Centric Company, Earth’s Best Employer, and Earth’s Safest Place to Work. Customer reviews, 1-Click shopping, personalized recommendations, Prime, Fulfillment by Amazon, AWS, Kindle Direct Publishing, Kindle, Career Choice, Fire tablets, Fire TV, Amazon Echo, Alexa, Just Walk Out technology, Amazon Studios, and The Climate Pledge are some of the things pioneered by Amazon. For more information, visit amazon.com/about and follow @AmazonNews.

Contacts

Amazon.com, Inc.
Media Hotline
Amazon-pr@amazon.com
www.amazon.com/pr

Industry:

More News From Amazon.com, Inc.

Amazon.com to Webcast First Quarter 2025 Financial Results Conference Call

SEATTLE--(BUSINESS WIRE)--Amazon.com, Inc. (NASDAQ: AMZN) announced today that it will hold a conference call to discuss its first quarter 2025 financial results on Thursday, May 1, 2025, at 2:00 p.m. PT/5:00 p.m. ET. The event will be webcast live, and the audio and associated slides will be available for at least three months thereafter at www.amazon.com/ir....

Introducing Amazon Nova Sonic: A New Gen AI Model for Building Voice Applications and Agents

SEATTLE--(BUSINESS WIRE)--Today, Amazon.com Inc (NASDAQ: AMZN) introduced Amazon Nova Sonic, a new foundation model that unifies speech understanding and speech generation into a single model, to enable more human-like voice conversations in artificial intelligence (AI) applications. Available in Amazon Bedrock via a new bi-directional streaming API, the model simplifies the development of voice applications, such as customer service call automation and AI agents across a broad range of industr...

New Capability of Amazon Q in QuickSight Makes Every Employee Their Own Data Analyst

SEATTLE--(BUSINESS WIRE)--Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), today announced that Amazon Q in QuickSight unlocks the ability for any employee to perform expert-level data analysis using natural language. The new scenarios capability of Amazon Q in QuickSight, now generally available, uses an advanced artificial intelligence (AI) agent to empower all employees to engage via natural language to perform data analysis without any specialized skills or exper...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

Generative AI Startup Twelve Labs Works with AWS to Make Videos as Searchable as Text

Contacts

Amazon.com, Inc.

Contacts

Amazon.com to Webcast First Quarter 2025 Financial Results Conference Call

Introducing Amazon Nova Sonic: A New Gen AI Model for Building Voice Applications and Agents

New Capability of Amazon Q in QuickSight Makes Every Employee Their Own Data Analyst

Amazon.com, Inc.

Contacts