一、前言
1.1 AIGC 发展背景
根据图像构成的类型,图像按照颜色和灰度的多少可以分为二值图、灰度图、索引图和RGB图,图像生成模型可实现不同图像类型的转换。 在实际应用中,模型的效果表现主要体现在生成图像的质量和图像的多样性,其在平面设计、游戏制作、动画制作等领域均有广泛的应用,另外,在医学影像合成与分析,化合物合成和药物发现等方面,图像生成也具有很大的应用潜力。
1.2 技术发展的关键阶段
GAN生成阶段:
自回归生成阶段:
扩散模型生成阶段:
1.3 主流模型实现原理及优缺点
扩散模型(Diffusion Model)
CLIP(Contrastive Language-image Pre-training)
1.4 当前AIGC 行业发展趋势
二、应用场景
2)基于旅程信息,期望大模型在汽车内容社区,自动生成如下的风格化素材,并推送
同时为了最大化的c端引流,车企对AIGC的能力提出了极高的要求,尤其注重生图细节的下列部分:
生图的风格化,是否能完全遵从指令
汽车logo和边缘的色差
背景车型无违和拼装等
三、实践落地
3.1 AIGC生图工具选型
在SDXL模型推理上相较于其他 UI 有很大的性能优化,图片生成速度相较于 webui 有 10%~25% 的提升。 高度自定义,可以让用户更加精准和细粒度控制整个图片生成过程,深度用户可以通过 ComfyUI 更简单地生成更好的图片。 Workflow 以 json 或者图片的形式更易于分享传播,可以更好地提高效率。 开发者友好,Workflow 的 API 调用可以通过简单加载相同的 API 格式 json 文件,以任何语言来调用生成图片。
ComfyUI的工作流配置页面
3.2 业务流程确认
往往容易被忽视的第1步,就是基于业务需求设计完整的工作流。
3.3 自定义节点开发
class Example:
"""
A example node
Class methods
-------------
INPUT_TYPES (dict):
Tell the main program input parameters of nodes.
IS_CHANGED:
optional method to control when the node is re executed.
Attributes
----------
RETURN_TYPES (`tuple`):
The type of each element in the output tuple.
RETURN_NAMES (`tuple`):
Optional: The name of each output in the output tuple.
FUNCTION (`str`):
The name of the entry-point method. For example, if `FUNCTION = "execute"` then it will run Example().execute()
OUTPUT_NODE ([`bool`]):
If this node is an output node that outputs a result/image from the graph. The SaveImage node is an example.
The backend iterates on these output nodes and tries to execute all their parents if their parent graph is properly connected.
Assumed to be False if not present.
CATEGORY (`str`):
The category the node should appear in the UI.
DEPRECATED (`bool`):
Indicates whether the node is deprecated. Deprecated nodes are hidden by default in the UI, but remain
functional in existing workflows that use them.
EXPERIMENTAL (`bool`):
Indicates whether the node is experimental. Experimental nodes are marked as such in the UI and may be subject to
significant changes or removal in future versions. Use with caution in production workflows.
execute(s) -> tuple || None:
The entry point method. The name of this method must be the same as the value of property `FUNCTION`.
For example, if `FUNCTION = "execute"` then this method's name must be `execute`, if `FUNCTION = "foo"` then it must be `foo`.
"""
def __init__(self):
pass
@classmethod
def INPUT_TYPES(s):
"""
Return a dictionary which contains config for all input fields.
Some types (string): "MODEL", "VAE", "CLIP", "CONDITIONING", "LATENT", "IMAGE", "INT", "STRING", "FLOAT".
Input types "INT", "STRING" or "FLOAT" are special values for fields on the node.
The type can be a list for selection.
Returns: `dict`:
- Key input_fields_group (`string`): Can be either required, hidden or optional. A node class must have property `required`
- Value input_fields (`dict`): Contains input fields config:
* Key field_name (`string`): Name of a entry-point method's argument
* Value field_config (`tuple`):
+ First value is a string indicate the type of field or a list for selection.
+ Second value is a config for type "INT", "STRING" or "FLOAT".
"""
return {
"required": {
"image": ("IMAGE",),
"int_field": ("INT", {
"default": 0,
"min": 0, #Minimum value
"max": 4096, #Maximum value
"step": 64, #Slider's step
"display": "number", # Cosmetic only: display as "number" or "slider"
"lazy": True # Will only be evaluated if check_lazy_status requires it
}),
"float_field": ("FLOAT", {
"default": 1.0,
"min": 0.0,
"max": 10.0,
"step": 0.01,
"round": 0.001, #The value representing the precision to round to, will be set to the step value by default. Can be set to False to disable rounding.
"display": "number",
"lazy": True
}),
"print_to_screen": (["enable", "disable"],),
"string_field": ("STRING", {
"multiline": False, #True if you want the field to look like the one on the ClipTextEncode node
"default": "Hello World!",
"lazy": True
}),
},
}
RETURN_TYPES = ("IMAGE",)
#RETURN_NAMES = ("image_output_name",)
FUNCTION = "test"
#OUTPUT_NODE = False
CATEGORY = "Example"
def check_lazy_status(self, image, string_field, int_field, float_field, print_to_screen):
"""
Return a list of input names that need to be evaluated.
This function will be called if there are any lazy inputs which have not yet been
evaluated. As long as you return at least one field which has not yet been evaluated
(and more exist), this function will be called again once the value of the requested
field is available.
Any evaluated inputs will be passed as arguments to this function. Any unevaluated
inputs will have the value None.
"""
if print_to_screen == "enable":
return ["int_field", "float_field", "string_field"]
else:
return []
def test(self, image, string_field, int_field, float_field, print_to_screen):
if print_to_screen == "enable":
print(f"""Your input contains:
string_field aka input text: {string_field}
int_field: {int_field}
float_field: {float_field}
""")
image = 1.0 - image
return (image,)
"""
The node will always be re executed if any of the inputs change but
this method can be used to force the node to execute again even when the inputs don't change.
You can make this node return a number or a string. This value will be compared to the one returned the last time the node was
executed, if it is different the node will be executed again.
This method is used in the core repo for the LoadImage node where they return the image hash as a string, if the image hash
changes between executions the LoadImage node is executed again.
"""
#@classmethod
# Set the web directory, any .js file in that directory will be loaded by the frontend as a frontend extension
# WEB_DIRECTORY = "./somejs"
# Add custom API routes, using router
from aiohttp import web
from server import PromptServer
@PromptServer.instance.routes.get("/hello")
async def get_hello(request):
return web.json_response("hello")
# A dictionary that contains all nodes you want to export with their names
# NOTE: names should be globally unique
NODE_CLASS_MAPPINGS = {
"Example": Example
}
# A dictionary that contains the friendly/humanly readable titles for the nodes
NODE_DISPLAY_NAME_MAPPINGS = {
"Example": "Example Node"
}
qwen-max的plugin 节点
from http import HTTPStatus
import dashscope
import json
class 旅行文本生成:
def __init__(self):
dashscope.api_key = ""
@classmethod
def INPUT_TYPES(s):
return {
"required": {
"system_prompt": ("STRING", {"default": """请根据我输入的中文描述,生成符合主题的完整提示词。生成后的内容服务于一个绘画AI,它只能理解具象的提示词而非抽象的概念。请严格遵守以下规则,规则如下:
#内容
根据文字生成一张与风景相关的优美的画面。
#风格
真实、高清、写实
#action
1.提取途径城市之一,根据此地点搜索当地最著名的景点或建筑,例如:上海,可提取上海东方明珠
2.提取有关天气的词汇,会决定于整个画面的色调
3.提取有关心情、驾驶体验的描述,与天气同时决定画面的色调
4.提取日期,判断季节,作为画面的主要色调参考
""",
"multiline": True
}),
"query_prompt": ("STRING", {
"default": """- 用户标记emoji:出游
- 用户文字:新司机的五一出游!
- 出行时间:2024/5/2 下午10:38-2024/5/5 下午6:57
- 总驾驶时长:14小时28分钟
- 公里数:645.4km
- 起点:上海市黄浦区中山南路1891号-1893号
- 起点天气:晴天
- 终点:上海市闵行区申长路688号
- 终点天气:多云
- 途径城市:湖州市 无锡市 常州市
- 组队信息:欧阳开心的队伍
- 车辆信息:黑色一代
""",
"multiline": True})
},
}
RETURN_TYPES = ("STRING",)
FUNCTION = "生成绘画提示词"
CATEGORY = "旅行文本生成"
def 生成绘画提示词(self, system_prompt, query_prompt):
messages = [
{'role': 'system', 'content': system_prompt},
{'role': 'user', 'content': query_prompt}
]
response = dashscope.Generation.call(
model="qwen-max",
messages=messages,
result_format='message'
)
if response.status_code == HTTPStatus.OK:
# Assuming the response contains the generated prompt in the 'output' field
painting_prompt = response.output.choices[0].message.content
else:
raise Exception('Request failed: Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
return (painting_prompt,)
# A dictionary that contains all nodes you want to export with their names
NODE_CLASS_MAPPINGS = {
"旅行文本生成": 旅行文本生成
}
# A dictionary that contains the friendly/humanly readable titles for the nodes
NODE_DISPLAY_NAME_MAPPINGS = {
"旅行文本生成": "生成旅行本文提示词"
}
万相2.0的plugin 节点
from http import HTTPStatus
from urllib.parse import urlparse, unquote
from pathlib import PurePosixPath
import requests
import dashscope
from dashscope import ImageSynthesis
import random
class ImageSynthesisNode:
"""
A node for generating images based on a provided prompt.
Class methods
-------------
INPUT_TYPES (dict):
Define the input parameters of the node.
IS_CHANGED:
Optional method to control when the node is re-executed.
Attributes
----------
RETURN_TYPES (`tuple`):
The type of each element in the output tuple.
FUNCTION (`str`):
The name of the entry-point method.
CATEGORY (`str`):
The category the node should appear in the UI.
"""
@classmethod
def INPUT_TYPES(s):
return {
"required": {
"prompt": ("STRING", {
"default": "",
"multiline": True
})
},
}
RETURN_TYPES = ("STRING",)
FUNCTION = "generate_image_url"
CATEGORY = "Image Synthesis"
def __init__(self):
# 设置API密钥
dashscope.api_key = ""
def generate_image_url(self, prompt):
negative_prompt_str = '(car:1.4), NSFW, nude, naked, porn, (worst quality, low quali-ty:1.4), deformed iris, deformed pupils, (deformed, distorted, disfigured:1.3), cropped, out of frame, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, cloned face, (mu-tated hands and fingers:1.4), disconnected limbs, extra legs, fused fingers, too many fingers, long neck, mutation, mutated, ugly, disgusting, amputa-tion, blurry, jpeg artifacts, watermark, water-marked, text, Signature, sketch'
random_int = random.randint(1,4294967290)
rsp = ImageSynthesis.call(
model='wanx2-t2i-lite',
prompt=prompt,
negative_prompt=negative_prompt_str,
n=1,
size='768*960',
extra_input={'seed':random_int}
)
if rsp.status_code == HTTPStatus.OK:
# 获取生成的图片URL
image_url = rsp.output.results[0].url
else:
raise Exception('Request failed: Status code: %s, code: %s, message: %s' % (
rsp.status_code, rsp.code, rsp.message
))
return (image_url,)
# A dictionary that contains all nodes you want to export with their names
NODE_CLASS_MAPPINGS = {
"ImageSynthesisNode": ImageSynthesisNode
}
# A dictionary that contains the friendly/humanly readable titles for the nodes
NODE_DISPLAY_NAME_MAPPINGS = {
"ImageSynthesisNode": "Image Synthesis Node"
}
# 示例调用
if __name__ == '__main__':
prompt = "A beautiful and realistic high-definition landscape scene, featuring the famous landmark of Wuxi, the Turtle Head Isle Park, as it is one of the cities passed through during the journey. The weather transitions from a clear, sunny day in the starting point, Shanghai, to a cloudy sky at the destination, also in Shanghai. The overall tone of the image reflects the transition from a bright, cheerful start to a more serene, calm atmosphere, with lush greenery and blooming flowers indicating the early summer season. The harmonious blend of natural beauty and man-made structures, along with the changing weather, creates a picturesque and tranquil setting"
node = ImageSynthesisNode()
image_url = node.generate_image_url(prompt)
print(f"Generated Image URL: {image_url}")
3.4 PAI 服务部署&增加算力选择
标准版:适用于单用户使用WebUI或使用一个实例调用API场景。支持通过WebUI生成视频,也可通过API进行调用。请求发送时,会绕过EAS接口,前端直接将请求传递给后端服务器,所有请求均由同一个后端实例进行处理。 API版:系统自动转换服务为异步模式,适用于高并发场景。仅支持通过API进行调用。如果需要多台实例时,建议选用API版。
服务配置
{
"cloud": {
"computing": {
"instance_type": "ecs.gn8is-2x.8xlarge"
},
"networking": {
"security_group_id": "sg-uf626dg02ts498gqoa2n",
"vpc_id": "vpc-uf6usys7jvf2p7ugcyq1j",
"vswitch_id": "vsw-uf6lv36zo7kkzyq9blyc6"
}
},
"containers": [
{
"image": "eas-registry-vpc.cn-shanghai.cr.aliyuncs.com/pai-eas/comfyui:1.7-beta",
"port": 8000,
"script": "python main.py --listen --port 8000 --data-dir /deta-code-oss"
}
],
"metadata": {
"cpu": 32,
"enable_webservice": true,
"gpu": 2,
"instance": 1,
"memory": 256000,
"name": "jiashu16"
},
"name": "jiashu16",
"options": {
"enable_cache": true
},
"storage": [
{
"mount_path": "/deta-code-oss",
"oss": {
"path": "oss://ai4d-k4kulrqkyt37jhz1mv/482832/data-205381316445420758/",
"readOnly": false
},
"properties": {
"resource_type": "model"
}
}
]
}
3.5 节点和模型挂载
/custom_nodes:该目录用来存储ComfyUI插件。编写之后的qwen-max的plugin 节点和万相2.0的plugin 节点,需要上传到本文件夹。
/models:该目录用来存放模型文件。
/output:工作流最后的输出结果的存储地址。
3.6 基于workflow json的服务接口建设
工作流workflow api json样例{ "4": { "inputs": { "ckpt_name": "基础模型XL _xl_1.0.safetensors" }, "class_type": "CheckpointLoaderSimple", "_meta": { "title": "Checkpoint加载器(简易)" } }, "6": { "inputs": { "text": [ "149", 0 ], "speak_and_recognation": true, "clip": [ "145", 1 ] }, "class_type": "CLIPTextEncode", "_meta": { "title": "CLIP文本编码器" } }, "7": { "inputs": { "text": "*I* *Do* *Not* *Use* *Negative* *Prompts*", "speak_and_recognation": true, "clip": [ "145", 1 ] }, "class_type": "CLIPTextEncode", "_meta": { "title": "CLIP文本编码器" } }
3.7 工程架构和稳定性保障
重点展示基于PAI ComfyUI + 百炼qwen + 百炼万相部分的架构设计和稳定性保障,上层应用部署在ACS或者ecs,可以基于客户真实环境和利旧情况进行调整。
同时,所有生图需要紧贴用户最新的旅程时间,所以图片都有季节性【旅游旺季和淡季】。因此,整个系统架构,从模型层到应用层,都具备高QPS和弹性伸缩的能力。
四、技术服务避坑点
同步调用【标准版】:标准版服务仅支持同步调用方式,即客户端发送一个请求,同步等待结果返回。 异步调用【API版】:API版服务仅支持异步调用方式,即客户端使用EAS的队列服务向输入队列发送请求,并通过订阅的方式从输出队列查询结果。