Der Kabelbinder schrieb:
Kennt zufällig jemand ein gutes Tutorial, wie man die Bildkomposition bei SD v1.5 kontrollieren kann?
Wie kann man die Eigenschaften von Motiv und Hintergrund am besten voneinander trennen? Angaben wie "X in the background" wendet er bei mir aktuell ständig auf das Hauptmotiv an. 🤔
Ich hatte damit das Osterwochenende rumprobiert...
Das Ergebnis meiner letzten Versuche:
Ich hatte dazu auch ein paar Worte zusammengefasst. Zu eurem (möglichen) Leidwesen allerdings auf Englisch - sorry, not sorry. Im Zweifelsfall mal durch Google Translate jagen.
This is a trial for using Controlnet with multiple very different looking subjects, all taking part in the same scene.
I used some stock photo of an office meeting (notebook on table and everything) for an openpose controlnet, everything else was text2img and img2img.
This was practice for... well, lots of things.
Issues:
- Hands. Always hands. Though not so bad that you're likely to notice at first glance.
- Leg length on the kimono girl. I definitely should've put more effort into trying to fine-tune the control net at the beginning - that lazyness bit me in the ass later.
- The green jacket doesn't fall right where it goes behind the arm.
- The chairs have wonky proportions and so does the table.
Successes:
- Body types are as different as intended
- Clothes and hair styles (as well as their colours) are on point and came out as intended
- One person sitting at a 90° angle to the camera and looking at someone else
- Spring season feeling, established
I've done a bit of experimenting with Latent Couple and Tiled Diffusion... as best I can tell:
Latent Couple is good for:
- creating cohesive scenes (relatively speaking)
- sticking to a theme
- working with input from an openpose control net
It simply cannot deal with upscaling images it creates and without a control net it's pretty bad at creating specific setups or even sticking to the amount of characters described in the prompt
Tiled Diffusion is good for:
- getting the amount of characters you intended where you intended them to be*
- upscaling images started with LC
It's pretty shit at creating more complex scenes wholesale and in my testing failed to turn an openpose control net showing three people sitting into an image of three people sitting around a table. There were three bits of different tables, consistently.
So, my current best method for getting a specific scene: Create an openpose controlnet through a stock image or a 3D program, use that with Latent Couple at relatively low resolution to fine-tune my prompt and get the composition about right, then do one or two upscaling steps with Tiled Diffusion.
I have yet to experiment using more than just openpose in connection. Sure, the openpose_bones for Blender gives me everything I really need, but I don't have the patience to fiddle around with multiple models to get everything in the right position...
Meanwhile, finding a stock photo that is
exactly right for more than just openpose... good luck. Unless you see an image and decide to turn that into anime style or something, I suppose.
*I've experimented with making the background the first or last region - the background in the first region seems to be a bit better, but it's not even close to being a fix, not if you want a background element in front of a character, such as people sitting around a table.
Trying to do the table as a seperate front or background region just makes it worse, since it tries to populate that region with more (tiny) people, amongst other problems.
Aitrepreneur hat dazu auch eine Anleitung gemacht, der Typ macht gute Anleitungen...
PS: Parameter des Bild einfach per PNG Info auslesen