近期,快手开源了名为Kolors(可图)的文本到图像生成模型,该模型具有对英语和汉语的深刻理解,并能够生成高质量、逼真的图像。技术报告中也提了几个重要的工作内容:
首先,Kolors基于通用语言模型(ChatGLM),而不是像Imagen和Stable Diffusion 3基于大语言模型T5,这增强了其对英语和汉语的理解能力,并利用多模态大型语言模型CogVLM重新为训练数据集中的图像生成更详细的描述;
其次,Kolors训练分为两个阶段,即概念学习阶段和质量改进阶段,并使用特定的数据集进行训练以提高视觉吸引力,通过引入高质量的数据和优化高分辨率训练技术来改善图像质量;
最后,Kolors团队提出了一种平衡类别的基准数据集KolorsPrompts,用于指导Kolors的训练和评估。
实验结果表明,即使使用U-Net backbone,可图Kolors也表现出色,在人类评价中超越了现有的开源模型,性能达到了Midjourney-v6水平。Kolors代码和权重已经开源!
代码开源链接:https://github.com/Kwai-Kolors/Kolors
模型开源链接:https://modelscope.cn/models/Kwai-Kolors/Kolors
技术报告链接:https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
模型卡片直达:
下载方式:
sdk下载:
#模型下载from modelscope import snapshot_downloadmodel_dir = snapshot_download('Kwai-Kolors/Kolors')
git下载
git clone https://www.modelscope.cn/Kwai-Kolors/Kolors.git
CLI下载
modelscope download --model=Kwai-Kolors/Kolors --local_dir ./Kolors/
参考开源项目:https://github.com/kijai/ComfyUI-KwaiKolorsWrapper,我们在魔搭社区免费GPU算力上,完成了Kolors的ComfyUI环境搭建和体验实践。
体验环境
使用魔搭社区的Notebook运行Kolors可图模型:
搭建 ComfyUI
从最新的ComfyUI的代码安装
# #@title Environment Setup
from pathlib import Path
OPTIONS = {}
UPDATE_COMFY_UI = True#@param {type:"boolean"}
INSTALL_COMFYUI_MANAGER = True#@param {type:"boolean"}
INSTALL_KOLORS = True#@param {type:"boolean"}
INSTALL_CUSTOM_NODES_DEPENDENCIES = True#@param {type:"boolean"}
OPTIONS['UPDATE_COMFY_UI'] = UPDATE_COMFY_UI
OPTIONS['INSTALL_COMFYUI_MANAGER'] = INSTALL_COMFYUI_MANAGER
OPTIONS['INSTALL_KOLORS'] = INSTALL_KOLORS
OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES'] = INSTALL_CUSTOM_NODES_DEPENDENCIES
current_dir = !pwd
WORKSPACE = f"{current_dir[0]}/ComfyUI"
%cd /mnt/workspace/
![ ! -d $WORKSPACE ] && echo -= Initial setup ComfyUI =- && git clone https://github.com/comfyanonymous/ComfyUI
%cd $WORKSPACE
if OPTIONS['UPDATE_COMFY_UI']:
!echo "-= Updating ComfyUI =-"
!git pull
if OPTIONS['INSTALL_COMFYUI_MANAGER']:
%cd custom_nodes
![ ! -d ComfyUI-Manager ] && echo -= Initial setup ComfyUI-Manager =- && git clone https://github.com/ltdrdata/ComfyUI-Manager
%cd ComfyUI-Manager
!git pull
if OPTIONS['INSTALL_KOLORS']:
%cd ../
![ ! -d ComfyUI-KwaiKolorsWrapper ] && echo -= Initial setup KOLORS =- && git clone https://github.com/kijai/ComfyUI-KwaiKolorsWrapper.git
%cd ComfyUI-KwaiKolorsWrapper
!git pull
%cd $WORKSPACE
if OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES']:
!pwd
!echo "-= Install custom nodes dependencies =-"
![ -f "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py" ] && python "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py"
下载模型权重
#@markdown ###Download standard resources
OPTIONS = {}
#@markdown **unet**
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/unet/diffusion_pytorch_model.fp16.safetensors" -P ./models/diffusers/Kolors/unet/
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/unet/config.json" -P ./models/diffusers/Kolors/unet/
#@markdown **encoder**
!modelscope download --model=ZhipuAI/chatglm3-6b-base --local_dir ./models/diffusers/Kolors/text_encoder/
#@markdown **vae**
!wget -c "https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix/resolve/master/sdxl.vae.safetensors" -P ./models/vae/ #sdxl-vae-fp16-fix.safetensors
#@markdown **scheduler**
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/scheduler/scheduler_config.json" -P ./models/diffusers/Kolors/scheduler/
#@markdown **modelindex**
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/model_index.json" -P ./models/diffusers/Kolors/
通过cloudflareg启动ComfyUI
!wget "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/cloudflared-linux-amd64.deb"
!dpkg -i cloudflared-linux-amd64.deb
%cd /mnt/workspace/ComfyUI
import subprocess
import threading
import time
import socket
import urllib.request
def iframe_thread(port):
while True:
time.sleep(0.5)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
result = sock.connect_ex(('127.0.0.1', port))
if result == 0:
break
sock.close()
print("\nComfyUI finished loading, trying to launch cloudflared (if it gets stuck here cloudflared is having issues)\n")
p = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://127.0.0.1:{}".format(port)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in p.stderr:
l = line.decode()
if "trycloudflare.com " in l:
print("This is the URL to access ComfyUI:", l[l.find("http"):], end='')
#print(l, end='')
threading.Thread(target=iframe_thread, daemon=True, args=(8188,)).start()
!python main.py --dont-print-server
点击右侧 load,加载ComfyUI-KwaiKolorsWrapper项目提供的 workflow
文生图体验:
图生图体验(一辆白色小汽车):
显存占用:
效果测试
简单 Prompt
复杂 Prompt
多实体生成能力很能打,颜色能做到分别控制,空间关系也比较完美
多风格
多风格,强!
文本
可以处理简单的文本
多样性
多样性还不错
性能测试
1024 分辨率,A10,生成一张图片(25步)耗时7秒。