别急着投奔Flux!SD3.5最新实测告诉你:StabilityAI还没死!
发布日期:2024-10-26 15:53:20
浏览次数: 3535
来源:彩虹之眼
等了整整四个月,在SD社区的成员们都放弃希望转投Flux怀抱的时候……SD3.5它来了,它终于来了。而且,它竟然没有让人失望。2024年6月12日,SD用户翘首以盼的SD3 Medium权重发布。发布当天,一位红迪用户的“草坪上的女孩”SD3测试图片就让众人笑掉大牙,大家纷纷表示,怎么回事啊?等了整整一年多,你们就只能交出这样的作品?其实SD3本身的提示词遵循能力,以及文字生成能力,相比SDXL确实是有巨大提高的。但是无奈,对于SD社区中的大部分人来说,对于人体结构的生成能力是极其重要的,甚至远比其他任何主题重要。毕竟,开源文生图模型最大的优势之一就是提示词和图片不会受到审核、并且有着极高的自由度可以微调成各种███的样子。但如果基础模型对人体的理解就糟糕成这样,它真的有希望成为用户手中的主力模型吗?
2024年7月5日,StabilityAI官方声明,“SD3 Medium is still a work in progress. We aim to release a much improved version in the coming weeks.”。受SAI过往的拖延发布黑历史影响,几乎没有人还对SD3有什么期待。很多人认为,SD已经完全赚不到钱了,即使Improve也不会好到哪儿去了,散了吧。2024年8月1日,Flux.1-dev以及schnell横空出世,SD用户所有原本对SD3的期待,它都做到了。生成手指极高的成功率,对提示词极强的遵循能力,以及大大减少的artifacts,使得国内外各大SD社区迅速将重心转到Flux。仅仅两个月之内,Flux就已经拥有了轻量版(NF4),GGUF量化版,数个来自不同developer的ControlNet模型,数个模型训练UI以及云服务如Civitai的完美支持,就在不久前,你还经常可以在红迪的SD社区看到“StabilityAI is dead”的言论。整整四个多月过去了。就在曾经的忠实SD用户已经全面迁移到Flux的时候,SD3.5 Large is back to the game。我记得刚开始发布的那几个小时里,红迪SD社区对它都毫无期待,但两天过去了,不断有用户惊讶地发现,SD3.5的质量竟然还真不差。说实在的,我原本也是对它没有什么信心的,我一直觉得即使“Improved version”发布了,质量也不可能赶上目前的Flux,但是亲测之后,我觉得它的潜力非常大。(顺便一说,我怀疑他们最初只发了2B参数量版本一方面是想赚8B版本api的钱,一方面是觉得反正很多用户的电脑也跑不起来8B,但是不知道是不是Flux的走红也对他们的决定有影响,Flux dev是个足有12B参数量的模型,大概是迄今开源文生图模型里最大的一个,但因为它极高的质量,热心且聪慧的SD社区用户仍然琢磨出了低配置PC运行Flux的办法。所以我在想,会不会是他们看到12B的模型只要足够好用户也会硬着头皮用,所以才终于决定发布8B版本?)好了,废话不多说了,下面我们就来详细测评一下SD3.5的能力。
进入stabilityai/stable-diffusion-3.5-large 这个链接,单击下载(那个向下的箭头)sd3.5_large.safetensors模型文件,并把它放在ComfyUI目录的models/checkpoint下。https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main 进入text_encoders 这个链接,下载clip_g,clip_l,以及t5xxl(你可以根据你的配置选择fp8或者fp16,如果你有16G以上的显存建议直接选fp16,效果会好一些),把它放到models/clip下。https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main/text_encoders下载这个SD3.5L_example_workflow.json 官方的ComfyUI工作流,在Comfy里载入https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/SD3.5L_example_workflow.json在节点中选择你下载的这些模型,修改你想要的参数(我建议直接选24步以内,3.5的CFG就够了,不需要像官方工作流给出的那样生成40步),就可以开始使用啦!
提示词:a cinematic front view photo of a slim white male dryad emerging from a tree, his eyes closed with his head lowered, facing the viewer with his back on the tree. His arms and chest are made of green branches and white flowers, his hair made of brown vines and branches, his body fused with the tree trunk, his skin covered in moss and leaf, his shoulders and collar bone resembling pale human skin. The photo is taken with a 35mm lens capturing the essence of golden hour.这一轮我得承认SD3.5远远胜过Flux dev。虽然如果要抠字眼的话,Flux确实对我提示词中写到的“front view”“golden hour”和“hair made of branches”有更好的呈现,但是SD3.5的色彩和打光明显要比Flux“漂亮”很多,甚至可以说是接近Midjourney V6.1的水平。这种审美水平在开源模型里其实是非常难得的(我迄今仍然在traumatized by AuraFlow……笑死)尤其是我后来发现aesthetic其实是一种很抽象的东西,有人也专门用MJ做的图去微调过Flux,但是我看了下效果离真正的MJ生成图还是差非常远。(顺便一说,我不知道SD官方Replicate页面用的是什么黑魔法,但是我的这个提示词只有在官方付费渠道才生成得出来“树枝做的身体”,本地运行的就是死活做不出来,再怎么整CFG或者Dynamic Thresholding都没卵用)提示词:full color portrait photo of a 20yo woman laying underwater, sunlight casting realistic water caustics on her face and body, she's wearing a gauze white dress.……这轮我能说还是SD3.5胜吗?虽然胜得不多,但在我心里还是胜。首先,我选择这个提示词就是因为过去的SD和Flux对于“water caustics”究竟是什么鬼东西完全没有数,只有MJ和Ideogram做得出来。说实话,SD3.5的water caustics虽然也算不上好,但是我感觉它的理解还是相对来讲好一些,而且Flux做得感觉人物根本就不在水下,就好像是那种,普通环境里加上了一个水池和打光一样,看起来很假。除此之外Flux的锁骨阴影实在是……当然这个后来被人微调了就解决了,但我们现在都是比较官方基模嘛。提示词:a top-down close-up photo of a slim attractive woman playing the piano, the camera focusing on her hands, she is wearing a red and black plaid skirt, the piano is shiny and black. The photo is taken in a bright room with soft diffused sunlight.啊……这个主题我觉得只能说打个平手吧,Flux对于手的结构的呈现还是明显更好(它这一点可以说是非常出名了hhhhh)但是细看的话会觉得Flux的手指有点太统一了,就是中间三根手指好像都长得一模一样。SD3.5的手指在皮肤细节上看起来更自然,但是红得有点过头了(等等好像Flux的手也很红,笑死),而且右手的手背莫名其妙陷下去一块儿,左手虎口那里的皮肤明显就不对,当然SD这张图最大的问题还是在钢琴hhhhh右边怎么就没了!不过还是得说真的很佩服Flux做钢琴的能力,从左到右虽然黑键有一些问题,白键之间的距离和宽度等等都是极其精准地相等,这个还是很厉害的。提示词:Amateur phone photo of a slim attractive white man, wearing unbuttoned long-sleeve white dress shirt and black pants. He is sitting on the front edge of bed, hand running through his hair, while looking at viewer. The photo was taken in a bedroom at night, the bedroom is dimly lit with warm color tone, the only light source is the bedside table lamp. The photo was posted to reddit in 2012. The image is grainy jpeg with motion blur and soft focus, a snapshot taken by amateur with deep focus, added digital sharpening and blurry, diffused poor lighting.这轮SD3.5输得太彻底了,完全没有任何还手的余地。提示词里大部分的要求全都没做到,身材也不slim,也没坐在床的前边缘,也没用手捋头发,室内光线也不够暗,台灯也不是唯一的光源,看起来也不像普通人随便拍的。SD3.5这轮出的图完全只能说是SDXL的水平,腿还陷在床里了(标志性SDXL,笑死),总之非常糟糕。提示词:amateur side view photo of a slim white woman, she is cooking in kitchen and wearing a white apron, underneath the apron is a plain grey tshirt, she is looking at the food in the steel sauce pot, her head lowered, holding a wooden spoon in her right hand. The photo was taken from her left side by an amateur, taken with a smartphone in 2015, in a modern kitchen with soft diffused indoor lighting at night.但是用这个提示词Flux也没有逃过浅景深的命运,然而整体来说SD感觉差得实在有点远,尤其那个扎起来的头发完全不科学,一个人的发量是有限的……另外就是SD图的眼睛的侧面感觉不正常,下睫毛即使从侧面看也不会是这种效果。哦还有,SD生成的那个锅把手也不对……提示词:low angle close-up photo of the Eiffel Tower, on a sunny day in Paris, center compositionSD3.5 be like:别打了求求你了这个伦家真的打不过嘤嘤嘤(☍﹏⁰)。提示词:a photo of a seal bicolor ragdoll cat, it is facing camera, standing on its hind legs on a blue pillow, holding out one paw, wearing a wizard hat and a purple wizard robe, casting spells with its paw, silver sparkles swirling around its paw. The photo is taken in a spring garden in the morning, with bright diffused natural lighting.虽然质量上感觉差的不多,但是Flux的景深未免太离谱了,而且Flux也没有做出来silver sparkles而是做成了golden sparkles(不过Flux理解了“只伸出一只爪子”)。SD3.5的猫爪质量比较捉鸡,但是Flux显然对seal bicolor ragdoll cat是啥一无所知(这也是为啥我后来自己训练了一个LoRA……)SD3.5的虽然也不是我要的那个猫,但它至少看着也像是一个point ragdoll……提示词:Photograph of a majestic cake adorned with intricate fondant decorations inspired by ocean waves. The whole cake has a base color of modest dark blue, surrounded by swirls of light blue layers shaped like ocean waves, the layers closing in from bottom to top, forming a curved shape like a blooming rosebud. On the outer side of the cake, pink and purple corals decorating the bottom of ocean wave fondant, resembling a beautiful tiara. The photo is taken in a room with simple dark background.不得不承认SD3.5的aesthetic真是……on another level。虽然它没有完全遵循我提示词里最后说的“pink and purple corals decorating the bottom of ocean wave fondant”(珊瑚不在底边在上面),也没有做出来我说的“forming a curved shape like a blooming rosebud”,但是我真的非常喜欢SD的配色(尤其是这个蓝绿色的渐变,太好看了),以及整体的美学呈现。Flux虽然正确遵循了我的“rosebud”提示词,但也没有get到底边的装饰应该是珊瑚形状的而不是玫瑰,而且它外层的那个浪花看着就很low,离SD3.5的审美差很远。(SD生成的那个蛋糕底座的紫色丝带倒是……有点破坏氛围,但是这种东西也容易P掉)
提示词:1980s Retro manga style illustration of a slim young white man, with messy wavy light brown hair and fair skin, his head tilted to the side, his face clean-shaved. The image only portrays his face and chest. He is wearing a long clothing made of white waterlily flowers and green leaves, the thick layers of leaf covering his whole body, contrasting his blue eyes, a laurel leaf wreath on his head, while he looks at viewer. He is in a summer forest at dusk, soft diffused sunlight shining on him.首先两个模型都不晓得“1980s Retro manga”是指的啥,SD的看起来就是比较generic的现代插画,FLUX的好像也……反正都离我们普遍认知离的80年代漫画差很远。FLUX对于“a long clothing made of white waterlily flowers and green leaves”的理解还是要比SD强很多(不过这点我也早就知道了……),以及FLUX也做出了我想要的背景,大概是因为训练素材完全不一样,我感觉这两个很难对比……提示词:Vector clipart of a fluffy orange cat sitting on an office chair, facing a computer moniter, its one paw placed on keyboard, one paw placed on mouse, turning to look at viewer, simple pale pink background, bold line style.首先这个猫就没坐在电脑椅上,爪子也没放在键盘上,画风也不是我想要的那种简洁的矢量图风格,相比之下FLUX除了把猫生成对眼儿了以外基本上没毛病。提示词:Crayon drawing of a chubby white duck on top of a tubby orange cat on top of a small capybara. All three animals stacked vertically, on the grass of a sunny garden.首先我要的就是Crayon drawing(蜡笔画),所以很明显我想要的就是一种比较随意、比较粗糙的感觉,不应该有像FLUX出的图这种过于精致过于严格的线条。SD的图虽然也不完全像蜡笔(感觉更像细的彩笔一点),但是很有我想要的那种粗糙的普通人手绘的感觉,加分!提示词:Retro 16bit pixel game art of a grumpy penguin with wings, facing the viewer while holding a large board that says "IT'S PENGUIN, NOT PENGWING", sitting on ice in antarctica, the image is nostalgic and pixelated with vibrant colors.虽然我更喜欢FLUX的配色,但是SD的出图应该更符合人们印象中的“retro pixel game art”,像素块要大一些,而且色彩非常鲜艳以及高对比(笑死,看多了晃眼睛),两个模型生成这段文字都非常轻松。那我就暂且说它们打了个平手吧!提示词:3D animation movie scene of a grumpy penguin with wings, facing the viewer while holding a large board that says "IT'S PENGUIN, NOT PENGWING", sitting on ice in antarctica, DreamWorks style.呃……虽说确实都是3D动画风格吧,但是!!!但是SD这个未免也太丑了!我喜欢SD的那个翅膀,但是脸真的好难看……FLUX感觉相对来讲更接近马达加斯加的企鹅那种风格。虽然FLUX图里拿着牌子的翅膀连接到身体的部分好怪啊……就好像只有细细的骨头……SD的版本,拿着牌子的部分也好怪,怎么有手指XD提示词:A renaissance oil painting of a strange creature that resembles a chimera of fish and cat, the creature's upper body is a white angora cat, its lower body looks like tropical fish with iridescent fish scales. The creature is swimming in the sea, the water is dark blue colored.我感觉FLUX在鱼的那半部分其实还不错,看起来确实是油画的感觉,但是猫的部分就实在太假了,而且鼻子嘴那块好像也怪怪的,眼睛好像也太大了……SD的版本就很符合我预期的样子,虽然后腿的位置有点崩了,但是我还真的蛮喜欢SD的配色(FLUX甚至没听我话做“the water is dark blue colored”)。
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业