Alphabet's (GOOG, Financial) Gemini just took a big swing at the AI video space. Starting this week, paying users on the Ultra and Pro plans can ask Gemini to turn a photo and a simple text description into an 8-second video—with sound. The feature is powered by Veo 3, Google's latest video generation model, and will roll out on both web and mobile. Until now, the tech had been locked behind a standalone tool called Flow. By baking it into Gemini's core interface, Google is clearly moving to compete more directly with OpenAI, Runway, and fast-moving Chinese players like Alibaba and Kuaishou.
But early testing shows there's still a gap between ambition and output. In Bloomberg's trials, the AI successfully animated plants swaying in the wind and even created a talking cat. But when asked to make a person breakdance, Gemini defaulted to a wave and altered the subject's facial features—sometimes even their race. Google responded by saying there's no instruction to change appearances, but acknowledged the model may extrapolate from limited image data in unintended ways. Translation: it's still not ready for prime time with anything involving real human faces.
Still, there's a reason this matters. Video is the next AI battleground, and Google is betting Gemini can evolve beyond a chatbot into a full creative engine. For now, the tool seems better suited for animating objects, landscapes, or drawings than people. But with Veo 3 under the hood and Google promising steady improvements—especially in face animation—this rollout could be an early step toward something bigger. Investors tracking the generative AI space may want to watch closely as this arms race heats up.