New AI Research: DragGAN – Pose Characters with AI
Sometimes, viewers, AI research is just too good to pass up on. I have got to talk about this recent paper called “Drag Your GAN.” This is 100% going to be the future of characters being created with AI art. The amount of fine-tunability you can have in these final images is just insane, and it really is specifically meant for characters, but it can work for other things as well. Whenever I see something really revolutionary like this, I just have to talk about it.
Here is the project page for the research, again, “Drag Your GAN: Interactive Point-Based Manipulation on the Generative Image Manifold.” That is just fancy science speak for saying that you can drag your images around and manipulate them in real time. This is a very large paragraph of text, and now I’ve had Chat GPT summarize all this text for us and converted it into bullet points.
The goals here were to synthesize visual content with precise controlability. To satisfy this requirement, the pose, shape, expression, and layout of the generated objects need to be manipulated. Existing methods for controlling generative adversarial networks (GANs) lack flexibility, precision, and rely on manual annotations or 3D models. So, Dragon allows users to drag points on an image to target positions, enabling interactive and precise manipulations. Manipulations performed with Dragon can deform images and control pose, shape, expression, and layout across diverse categories, which we will see later. Dragon is able to produce realistic outputs even in channel-challenging scenarios, like hallucinating occluded content and maintaining object rigidity.
Qualitative and quantitative comparisons demonstrate the superiority of Dragon in image manipulation and point tracking tasks. And this is one of my favorite parts – this method can also be applied to manipulate real images that have already been taken, and they do this through GAN inversion. So that’s cool and all, let’s check it out.
So, viewers, before I play this clip, you’ll notice they have a layout, so there is actually some UI. It’s a little bit blurry, but we’re going to try to make this all out. This seems to be the image that we are editing, and I think this might be a real image of a dog. We have a speed up and down button, some sort of size here. Here are your drag options, where you can add points, reset points, start and stop dragging. They also have a mask setting with a flex area and a fixed area, and of course, the ability to reset the mask and set its radius. Very, very interesting here. So let’s go ahead and play the demo. I think you guys are going to be very impressed and probably want your hands on this tech immediately.
You can see he makes a few points, and he’s able to resize the dog’s leg and give him a wider stance. He’s able to add some points on his nose and mouth and drag his mouth open, so his mouth is also bigger but also more open. You can adjust it as finely as you want. You can also bring the ears up, so it has completely different shaped ears now. Again, more examples of the ears being reshaped. You can also completely make the dog sit down. This one is really, really cool. Let’s say we want the dog facing the camera; you can literally go ahead and add a mask to his whole head and do the points and drag his head so he’s actually looking at the camera or further drag it so he’s looking the other direction. This one is pretty interesting, completely changing the type of car that it is, but it maintains a lot of the colors and shapes that the original vehicle had, but you’re just essentially changing what vehicle it is.
Again, you can completely drag it to change the whole shape of the vehicle, changing the whole styling but maintaining the same exact rims and the same color and the same style headlights. Really, really interesting, and it’s pretty crazy that it actually is able to work on vehicles at all, to be honest, because vehicles are usually pretty difficult for these AI image generators. And again, these are real photos that are being manipulated. You can make this thing into more of like a van, take this horse and make the horse have a very wide stance legs. You can bring the horse down a little bit, manipulate the cat and turn which direction he is looking in. You can actually close his eyes up a little bit too; this will go ahead and make the cat winking in this scenario.
Yeah, it just allows for some really insane fine-tuning of images that you already have existing. You can essentially just drag things around and edit them on a whim. You can completely change the shape of people’s faces. This is so much easier than Photoshop and trying to do stuff like this in Photoshop would take you hours and hours, and it’s literally happening in real-time with this “Drag Your GAN” technology. Really insane stuff, but yeah, you can see there’s all kinds of different use cases for this that you can imagine: editing your photos at home, actual professional products, such as this example that we see here, editing all these different clothes. It’s like Photoshop but way quicker and way easier, and it honestly looks really coherent. This is impressive stuff. This is an example where they make cells different sizes, so you can see it’s really, really flexible in the different things that it can edit. You can literally change what the Sun looks like in the sky; it’s pretty crazy, very versatile.
This is honestly hilarious; so many people are going to look at this mind-blowing paper right now that it had an internal server error, and I can’t even really… Oh, there it goes; it came back, but some of these videos aren’t even loading properly on this website, which is pretty crazy. As you can see, this one is actually in real-time, so we can get a real feel for how fast this is. Just click the start button, and it brings the dogs right in. But you guys can see exactly how it works here. He goes ahead, and he places a dot on the nose, which is his first point, and then the point in blue here is where he wants that nose position to be, and then he simply just clicks the start button, and it begins to drag it over, pretty slowly but still. It’s generating all these images in real-time, which is, I mean, that’s the main benefit of GANs, is they are lightning fast, and it brings the nose right over to that point, having him directly stare at the camera, and it’s able to really make that dog look exactly as if he was staring directly at us. There’s no mystery to it, and as you can see, this is all stuff that it had to hallucinate in there because obviously, it doesn’t know what is behind the dog, so it has to make some guesses.
We’ve got another image of a dog here where he can just drag the nose and change the direction that he’s looking in, but again, the dog isn’t really too screwed up here; I mean, there’s a little weirdness down here maybe, but it’s able to do it with a lot of coherency, I must say. Again, he’s going to bring the dog down, so now the dog is growing in size and he’s getting larger. I mean, honestly, I could really see people using this technology right away. There are so many different things, and character posing is a huge part of what people really care about with this AI image generation. The vast majority of AI art generations you see from mid-journey or stable diffusion are characters, whether they’re human or not, and this looks like it’s able to change and pose those characters very, very accurately. You could do some really, really specific fine-tuning and editing to your images in real-time with this, and you can upload ones that you’ve already created in the past.
Apparently, all those previous images were AI-generated, and now we’re going to try this on a real image with Joe Biden’s face for some reason, but yeah, we’re actually going to see how it manipulates a human face, which is a very difficult task, and this is a real human as well. You can slowly change his facial expression, and that worked pretty good, it’s very, very realistic there. It’s going to go ahead and turn his whole face a little bit more towards the right-hand side, and that is working out pretty good. Still looks kind of like Joe Biden, a little bit weird though, not exactly perfect. Yeah, I mean, we all know what he’s supposed to look like here, so it’s pretty interesting.
Now they’re going to try the mask method where they mask off his entire forehead and add a point there, so just his forehead now is going to be moving downwards. We can bring his hairline back, I’m sure Joe Biden would love that in real life, but he’s not going to get that. This is all AI-generated stuff, so yeah, you can really manipulate stuff. If you got a receding hairline, this is the tool for you, like further on, you know, stretch his eyes open a little bit more so he’s a little bit more wide-eyed, and I mean, this is kind of a scary final result. It’s not exactly perfect; we can tell that this image is heavily manipulated, but it’s not bad for a starting point, and for imaginary characters or images that are not exactly real photos, this is going to be absolutely killer. And it even still does work really well on animals, maybe not so much humans, so it’s also important to note that the code is going to be released in June. In my opinion, this is a lot more dynamic than something like ControlNet. Even it is really, really promising, and this just further shows that GANs might actually be the future of AI-generated art. They are lightning fast and allow for these real-time morphs and transitions and translations. If you’re trying to do that with mid-journey, it would take you know, hours and hours to manipulate your photos to be perfect because it takes so long to generate a mid-journey image. Instead, you can just generate one really nice image with mid-journey and toss it in here and edit it with the GANs. Very, very cool stuff. Let me know what you guys think down in the comments below, and I will see you guys in the next one. Goodbye.