Adobe Firefly - Game, Set, Match for Premiere?

ahalpert

Major Contributor

If it does what it says on the tin, Adobe Firefly will be a game-changer. It's easy to imagine them also incorporating things like a voice cloner, which is one of the only generative AI tools I've found a regular use for (aside from inpainting stills). And would Resolve or FCP be able to match the feature set? Theoretically, AI models are actually not that expensive to train. So maybe Resolve would try to achieve parity. But I don't think I've seen signs of FCP heading in this direction even though this sort of thing should be right up Apple's alley.

It would make me sad to leave FCP behind but this could finally entice me back to Premiere.
 
I guess it all comes down to how good it is right? When it becomes very good in all situations it will be useful but that will also usher in the rise of us not being needed as often. So you will be able to do your lessened amount of work much easier. Sad, but true.

I have used Photoshop since the 90's and Adobe is very good at marketing. I still prefer to extract a subject out of a background with the bezier curves and attention to detail myself. Otherwise, there is always something that the auto 'thingy of the day' misses and it shows upon close inspection - meaning for professional work. But they have been touting how great all of the auto tools are since then anyway... I know they are better now but video is even more difficult to work with.
 
I guess it all comes down to how good it is right? When it becomes very good in all situations it will be useful but that will also usher in the rise of us not being needed as often. So you will be able to do your lessened amount of work much easier. Sad, but true.

I have used Photoshop since the 90's and Adobe is very good at marketing. I still prefer to extract a subject out of a background with the bezier curves and attention to detail myself. Otherwise, there is always something that the auto 'thingy of the day' misses and it shows upon close inspection - meaning for professional work. But they have been touting how great all of the auto tools are since then anyway... I know they are better now but video is even more difficult to work with.
Well, I'm currently round-tripping for a couple things. If I need to expand a still image, I can usually (but not always) get better results from an AI than I can manually. It would be convenient to have that in my NLE.

And I've used a voice cloner for a couple interviews to fix/create sound bites. Honestly, that's a game-changer. There's no going back on that now. And while they haven't mentioned a voice cloner in Firefly, I wouldn't be surprised if it gets one eventually.

Also, a little bit of upscaling. I've used Topaz a few times lately, with mixed results.

As for the other promised capabilities -- I agree, it comes down to how good it is. But I think there's potential.

A lot of my interview-based edits start with a word document from the producer and I have to piece together the sound bites. If I were still using Premiere, I believe I could already do that there.

Will we not be needed as often? My guess is that, at least in the short to medium term, the editor is still needed but can accomplish more in the same time, and/or with a higher-quality product. Certainly, that could equate to fewer days worked... Or having fewer editors on hire. In any case, one must keep up with the times. But my current expectation is they will still want a professional editor for most of the same work as now.
 
Last edited:
How about voice isolation from noisy background? Audio transcription? Object removal? Face recognition? Search within clips for spoken audio phrases? I could be totally wrong but my bet is that Adobe would not be able to keep pace with the evolving features in DaVinci Resolve Studio.

 
Tom, most of the features you list are already in Premiere Pro, or at least are in beta. And fairly well implemented, imo. Search within non-transcribed audio for phrases I'd think could be done since Adobe already has machine speech to text transcription. Face recognition isn't, afaik. Adobe's a pretty big company and making a pretty big bet on AI (nothing at all unique there). OTOH, BMD's big enough to move ahead too, and they're a video-only company so they're not distracted by cramming AI into Acrobat, Photoshop, Express, etc. And both companies can license new technology from all the AI startups.

But why do you think Resolve will be able to outpace Premiere Pro?
 
How about voice isolation from noisy background? Audio transcription? Object removal? Face recognition? Search within clips for spoken audio phrases? I could be totally wrong but my bet is that Adobe would not be able to keep pace with the evolving features in DaVinci Resolve Studio.

voice isolation - forgot about that one. although FCP has pretty good non-"AI" voice isolation in the program, I've been going to audostudio.com for the real basket cases. on a pretty regular basis. and I wouldn't be surprised if tools like that have already cost sound mixers some pay. I don't think it'll ever sound as good as a good sound person doing a good job with the recording to begin with. but you can get to "good enough" (depending on your standards) for a lot less money and in more adverse conditions than before. It would be great to have tools of that caliber in-program.

By the way, a computer scientist with expertise in the field confirmed to me a year or two ago that nothing distinguishes "AI" from other computer programs. When we talk about "generative" AI, I suppose that's meaningful in terms of talking about programs that create audio/visual material from whole cloth. But they operate fundamentally the same way all other computer programs do.

Relatedly, I heard that Meta/Instagram has rebranded its content-selection algorithm "AI." So now they can say that the posts you see on your feed have been curated by AI. But it's the same algorithm as before.
 
Tom, most of the features you list are already in Premiere Pro, or at least are in beta. And fairly well implemented, imo. Search within non-transcribed audio for phrases I'd think could be done since Adobe already has machine speech to text transcription. Face recognition isn't, afaik. Adobe's a pretty big company and making a pretty big bet on AI (nothing at all unique there). OTOH, BMD's big enough to move ahead too, and they're a video-only company so they're not distracted by cramming AI into Acrobat, Photoshop, Express, etc. And both companies can license new technology from all the AI startups.

But why do you think Resolve will be able to outpace Premiere Pro?
There's also a possibility that the major NLEs will converge in terms of AI capabilities. My understanding is that these models are actually not that expensive to train. So, maybe there will be some proprietary magic sauce that one company has for one or another specific tool. But I'm not sure there will be a huge IP disparity between them at the end of the day.

The only reason I'm bearish on the prospects of FCP is that they don't seem to be moving in that direction. Adobe came out with automatic transcription and edit-by-text a while back and I don't see signs of FCP following suit even with that.
 
And I've used a voice cloner for a couple interviews to fix/create sound bites. Honestly, that's a game-changer. There's no going back on that now. And while they haven't mentioned a voice cloner in Firefly, I wouldn't be surprised if it gets one eventually.

What are some examples of using the voice cloner?
 
But why do you think Resolve will be able to outpace Premiere Pro?

Jim, it doesn't feel to me as much about the size or resources of a company as much as the vision and the direction of leadership. Adobe is owned by institutional investors and funds, publicly traded and based on the subscription model.

Grant Petty is the founder and owner of Blackmagic Design. He wrote code for all their products. Resolve was just a color grading app that added edit, VFX, sound and collaborative, cloud based with free upgrades arriving fast and often. Resolve is wildly popular and growing. I don't sense the same passion for Premier Pro or FCP.
 
There's also a possibility that the major NLEs will converge in terms of AI capabilities. My understanding is that these models are actually not that expensive to train. So, maybe there will be some proprietary magic sauce that one company has for one or another specific tool. But I'm not sure there will be a huge IP disparity between them at the end of the day.

My understanding is that training wide-ranging generative AI generators is expensive, and is likely to get more so for images as copyright issues (or at least lawsuits) come into play.... I could be wrong. But for "simple" problems, it looks like generating useful models isn't terrible.... Check out Ian Sampson's really useful Hush noise and reverb reduction tool; I think he built his model on Apple's Neural Engine. Hush is cool: https://hushaudioapp.com/

But I think I agree that at some point, the differences in the image generating tools might not matter much to end users like us. I mean, it's going to matter, but if we want we'll be able to access a lot of tools. Like how Adobe says in Premiere, you'll be able to use their Firefly generative AI, but also Sora from OpenAI, RunwayML and Pika. Their pitch is "choose the tool that works best for your particular need." I'd guess the other NLEs/etc will allow something similar....

The only reason I'm bearish on the prospects of FCP is that they don't seem to be moving in that direction. Adobe came out with automatic transcription and edit-by-text a while back and I don't see signs of FCP following suit even with that.

Ya, I don't know what's up with FCP. The new Enhance Light and Color seems pretty cool. OTOH, did you see the new stuff in Apple's Logic Pro? StemSplitter and ChromaGlow seem neat, but this:

------
Session Players offer groundbreaking experiences for creators by providing a personal, AI-driven backing band that responds directly to feedback.1 Drummer took the music creation industry by storm when it debuted as one of the world’s first generative musicians more than a decade ago. Today, it gets even better with key improvements and the addition of a new virtual Bass Player and Keyboard Player. Session Players augment the live-playing experience while ensuring artists maintain full agency during any phase of their music-making process.
------

So different engineering teams (I think), but still some crosstalk. Hopefully something's already cooking, or at least the Logic team's passion will get the FCP team rolling ahead. And Apple's hiring A LOT of people for their Machine Learning /Artificial Intelligence division... though I'd guess most of those are working on iPhone and perhaps Vision Pro stuff.

Going to be interesting to see how all this unfolds.

Jim, it doesn't feel to me as much about the size or resources of a company as much as the vision and the direction of leadership. Adobe is owned by institutional investors and funds, publicly traded and based on the subscription model.

Grant Petty is the founder and owner of Blackmagic Design. He wrote code for all their products. Resolve was just a color grading app that added edit, VFX, sound and collaborative, cloud based with free upgrades arriving fast and often. Resolve is wildly popular and growing. I don't sense the same passion for Premier Pro or FCP.

I was told by a couple people that a year or so ago, Adobe told all of their top programmers & researchers something like "finish your current explorations right away (or drop them if they don't seem likely to produce profitable apps/features), and focus on AI stuff." There are a lot of smart engineers and researchers at Adobe. And Premiere Pro, After Effects, Photoshop, etc are useful tools.

Same with Blackmagic, too. I don't know how much code Grant has written for the various parts of Resolve, but I do know that when BMD acquired DaVinci, Fusion, and Fairlight a lot of the developers joined BMD and are still there and doing cool stuff.

But thinking of all the ebbs and flows in editing systems and cameras in the last few decades and years, I have no idea what the future holds.

We'll probably all be filming with an iPhone 17 and editing (or having our phone edit) in CapCut or Canva... ;-)
 
But I think I agree that at some point, the differences in the image generating tools might not matter much to end users like us. I mean, it's going to matter, but if we want we'll be able to access a lot of tools. Like how Adobe says in Premiere, you'll be able to use their Firefly generative AI, but also Sora from OpenAI, RunwayML and Pika. Their pitch is "choose the tool that works best for your particular need." I'd guess the other NLEs/etc will allow something similar....

It's interesting, those software tools. AFAIK, BMD haven't said something similar, the integration seems not 3rd party plugin but NVIDIA TensorRT deep learning engine hardware and optimization. And the Resolve toolset not generative in terms of voices and images but restorative, acceleration and streamline, smooth slow-motion effect and upscaling, object tracking, stabilization, audio panning, AI spatial noise reduction, performance boosting.

Of course you would expect that, information coming from NVIDIA.
 
Totally!

The cool thing is that even the "expensive" NLEs are really cheap compared to the old days. And we don't need to buy VTRs...as AI continues to eat away at our lively hood. :oops:
 
What are some examples of using the voice cloner?
2 things, so far -- fixing/frankensteining sound bites. This we could do already by splicing in a "the" or whatever other word we need to supply from elsewhere in the script. But it's a thankless task because sometimes I go through all the available alternatives of the word and don't find one that matches the intonation and register of the sound bite I'm trying to fix. It's faster and easier with a more guaranteed result to just generate what I need with the voice cloner. And I can splice in a 3-word phrase or whatever size chunk will blend most seamlessly rather than a single word.

But the bigger deal is covering for producer errors by generating material they should have gotten on the day. For example, I did a video recently where we wanted to list the artists who were included in a charity auction as we showed their respective works on screen. But the producer didn't ask the interviewee to name them. We were going to put graphics of the names on-screen, which was not ideal. Presto, you can get them saying it with AI.

You can make transitional sentences. You can fix their deliveries. I did a video recently where the interviewee said "The other version of this painting is in another museum." We changed it in post to the specific name and location of the other museum. They didn't have to record more VO, I could just generate it with AI.

Even a concluding statement. We didn't have a nice final line on one video, so we generated one.

Generally, it seems best to use the shortest possible bits of AI speech because human voices sound less robotic. But the seamlessness of the blending is remarkable. It's a quantum leap forward in achievable script quality when your producer didn't nail their job.

Of course, it's only usable when talent is off-screen. Although there was one sound bite with a sound issue where I generated the line and laid it in over the interview shot and the sync was close enough that it worked.
 
My understanding is that training wide-ranging generative AI generators is expensive, and is likely to get more so for images as copyright issues (or at least lawsuits) come into play.... I could be wrong. But for "simple" problems, it looks like generating useful models isn't terrible.... Check out Ian Sampson's really useful Hush noise and reverb reduction tool; I think he built his model on Apple's Neural Engine. Hush is cool: https://hushaudioapp.com/
It's interesting because the most recent generation of AI seems to be a factor of magnitude more expensive to train. GPT-3 only cost several million. Stanford trained a similar model for $600: https://newatlas.com/technology/stanford-alpaca-cheap-gpt/

But more recently: '“OpenAI’s GPT-4 used an estimated $78m worth of compute to train, while Google’s Gemini Ultra cost $191m million for compute,” the report said.

Both of these models were released last year and represent a substantial leap in costs compared to the previous cost leader – Google’s PaLM model – which cost more than $12m worth of compute to train in 2022.' - https://www.cnbc.com/2023/03/13/cha...re-booming-but-at-a-very-expensive-price.html

If you look online, there are a ton of voice cloners, audio cleaners, and various other AI services available. I assume this is in part because they're not that expensive to manufacture.

What will happen with copyright -- dunno. Hopefully it will kill AI in the cradle. But I would think that the training would cost the same and just the cost of using them will go up as some of the cash goes to copyright owners.

We'll probably all be filming with an iPhone 17 and editing (or having our phone edit) in CapCut or Canva... ;-)
I'd rather throw myself into Mount Doom.
 
It's interesting, those software tools. AFAIK, BMD haven't said something similar, the integration seems not 3rd party plugin but NVIDIA TensorRT deep learning engine hardware and optimization. And the Resolve toolset not generative in terms of voices and images but restorative, acceleration and streamline, smooth slow-motion effect and upscaling, object tracking, stabilization, audio panning, AI spatial noise reduction, performance boosting.

Of course you would expect that, information coming from NVIDIA.
I think whoever can bring all the tools under one roof -- including generative tools -- will have a big advantage. Round-tripping sucks. And I'm currently spending more per month to subscribe to a few different AI websites than a Premiere subscription costs. (Which I don't have since I use FCP.)
 
It's interesting because the most recent generation of AI seems to be a factor of magnitude more expensive to train. GPT-3 only cost several million. Stanford trained a similar model for $600: https://newatlas.com/technology/stanford-alpaca-cheap-gpt/

Ya, Stanford's Alpaca is pretty cool (a friend's kid worked on it). But remember the students weren't paid ;) and their model was built on an LLaMA, an open-source large-language model built and maintained by Meta. Meta, I think, has spent A LOT of money on LLaMA, part of their $35B+ investment in AI. So the core investment is still pretty significant. And every spinoff that uses LLaMA, such as Alpaca, helps Meta train their models. But like you say, pretty amazing stuff in Alpaca for not much money.

Have you seen the annual AI Index? It's crazy long but each year IEEE Spectrum creates an article that summarizes a bunch of the report into a series of charts... Great for visual people like us. Here's what they published a month ago:

15 Graphs That Explain the State of AI in 2024​

The AI Index tracks the generative AI boom, model costs, and responsible AI use​


In January, a smart friend, who runs a really successful motion design firm, bought a bunch of Nvidia stock within an hour of Zuckerberg saying they were buying several billion dollars worth of Nvidia's H100 GPUs. His guess was that Amazon and everyone else would follow suit. That worked out for him. If LLMs plateaus or at least slow down improvement, he plans to invest in ARM Holdings stock since the next step will be to cram the models onto small-biz/profesional and consumer products. Something like that; he has more money to play with than I do.

Copyright: I really wonder how that will play out. I think some of the content holders (on the Getty & larger scale) want money for their stuff being used for training. But I've read what seems like interesting fair-use arguments that the AIs are transformative enough to do well in court. Look at 2 Live Crew (Campbell v Acuff-Rose), Jeff Koons, Warhol v Goldsmith...

I think whoever can bring all the tools under one roof -- including generative tools -- will have a big advantage. Round-tripping sucks.
Totally. And maybe that's the really impressive thing that Alpaca (and perhaps that Adobe's embracing Pika, Sora, Runway & not just Firefly) shows. All these lower-level LLMs and other AI core technologies "just" need to be wrapped up into useful tools... And like you say, preferably from one or maybe two sources.

Speaking of which, remember Not Hotdog from the HBO series Silicon Valley? That was back in 2017, but a cool extension: The show actually created and released a Not Hotdog app. AND!!! The developer wrote a long article about how they did it. Perhaps old-fashioned tech now, but still a neat insight into how clever people can use high-level "easy" programming tools to leverage deep tech from Google, Meta, etc to build cool things. Here's the article and the clip:

How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native​





Great thread! I'm looking forward to the next posts.
 
Of course, it's only usable when talent is off-screen. Although there was one sound bite with a sound issue where I generated the line and laid it in over the interview shot and the sync was close enough that it worked.

I'm fine with off screen audio being generated but generating on camera delivery in non-fiction applications is pretty unethical to me. We all have different boundaries of what's too far. How is it different to nice lighting, post audio tools etc. to make the person sound and appear better than they do normally? Again I'm fine with b-roll covering up a voice clone, but I'll never agree with on-camera generated performance under the guise of genuine delivery for non-narrative. I know the tech is here to stay but I hope documentaries find a way to stay true, otherwise the entire category won't exist. I'm sure your example is somewhere in between i.e. promotional in nature using employees, I just think if you're claiming you can speak well on camera (whole point of a film shoot) then you have to be able to do that!

Maybe resolve should required you to categorise the project type then it prevents you from AI tools if it falls under a certain category. I doubt this would happen and am sort of joking but it'd be cool to see some type of AI free logo etc. that we cant trust.
 
I'm fine with off screen audio being generated but generating on camera delivery in non-fiction applications is pretty unethical to me. We all have different boundaries of what's too far. How is it different to nice lighting, post audio tools etc. to make the person sound and appear better than they do normally? Again I'm fine with b-roll covering up a voice clone, but I'll never agree with on-camera generated performance under the guise of genuine delivery for non-narrative. I know the tech is here to stay but I hope documentaries find a way to stay true, otherwise the entire category won't exist. I'm sure your example is somewhere in between i.e. promotional in nature using employees, I just think if you're claiming you can speak well on camera (whole point of a film shoot) then you have to be able to do that!

Maybe resolve should required you to categorise the project type then it prevents you from AI tools if it falls under a certain category. I doubt this would happen and am sort of joking but it'd be cool to see some type of AI free logo etc. that we cant trust.
Interesting that you mention some sort of “logo” that can be trusted.

Some of the newer stills cameras are being developed with a new internal security chip that generates a signed certificate (Content Credentials) in the metadata of each image, allowing photos to be verified through Adobe’s open-source Content Authenticity Initiative (CAI).

Leica has one model and some others are jumping on as well.

Only time will tell how much reach something like this gets, but with Adobe’s backing it has legs.

It’s also ironic that a company that is jumping headfirst into AI is also championing a resource for identifying it in stills.
 
I'm fine with off screen audio being generated but generating on camera delivery in non-fiction applications is pretty unethical to me. We all have different boundaries of what's too far. How is it different to nice lighting, post audio tools etc. to make the person sound and appear better than they do normally? Again I'm fine with b-roll covering up a voice clone, but I'll never agree with on-camera generated performance under the guise of genuine delivery for non-narrative. I know the tech is here to stay but I hope documentaries find a way to stay true, otherwise the entire category won't exist. I'm sure your example is somewhere in between i.e. promotional in nature using employees, I just think if you're claiming you can speak well on camera (whole point of a film shoot) then you have to be able to do that!

Maybe resolve should required you to categorise the project type then it prevents you from AI tools if it falls under a certain category. I doubt this would happen and am sort of joking but it'd be cool to see some type of AI free logo etc. that we cant trust.
I was talking about a commercial application, not a documentary.

Even for a documentary -- while I'm not sure I would use an AI-generated interview footage, I don't think it's that big of a deal versus using AI to generate verite footage. And then it should be labeled "dramatization" like they already do for reenactments.

But it's not unthinkable to AI your doc interview because your subject flubbed a word and you want to correct it. Who cares? Or even -- he wanted to clarify/alter his statement or add something after the fact.

I think what matters more is the outcome you're trying to achieve. Are you trying to convey accurate information or are you trying to misinform? And do you have permission from the subject.

We already slice and dice and rearrange our interviews six ways from Sunday. Not to distort their message but to distill it.

My first job was at a documentary production company. I was shocked to see them edit in scenes or lines out of context into a feature doc. And there was one scene that they kept rearranging in the timeline of the film to see where it worked best in the narrative arc. I asked them if it was proper to use it out of sequence from when it happened in reality and they just looked at me like I was naive.

That movie went on to win best documentary editing at Sundance, among other awards.

The purpose of the edits I've done which used AI is to sell paintings. Not to show that our specialists can speak well on camera. That's not even their job. Their job is to give accurate and useful information and advice to our clients.

I think actual documentaries may become more valuable if we sink into a cesspool of AI-generated muck. Time will tell.
 
voice isolation - forgot about that one. although FCP has pretty good non-"AI" voice isolation in the program, I've been going to audostudio.com for the real basket cases. on a pretty regular basis. and I wouldn't be surprised if tools like that have already cost sound mixers some pay.

I've been using Audo Studio for close to two years now, I recently re-tested it against the Resolve voice isolation feature and it still bests it so I keep using it. I haven't downloaded Resolve 19 yet, I'll probably wait until it is out of beta, so it's possible the refresh has pushed Resolve further ahead. I am pretty curious to try the Music Remixer, that's a pretty insane utility!
 
Back
Top