【实战干货】AI大模型工程应用于车联网场景的实战总结 - 53AI-AI知识库|大模型知识库|大模型训练|智能体开发

一、前言

1.1 AIGC 发展背景

图像作为人工智能内容生成的一种模态，一直在AIGC领域中扮演着重要角色，由于图像生成应用的广泛性和实用性，使其受到学术界和产业界相当多的关注。近年来，图像生成技术也取得了很多关键性突破，从经典的GAN技术到目前主流的扩散模型，以及在此基础上不断迭代出性能更强、生成效果更好的算法和模型，极大拓展了图像生成技术的应用领域和发展前景。而在进行商业化落地时，生成速度和稳定性的提升、可控性和多样性的增强，以及数据隐私和知识产权等问题，也需要在图像生成向各行各业渗透的过程中进行解决和探索。在实际应用中，模型的效果表现主要体现在生成图像的质量和图像的多样性，其在平面设计、游戏制作、动画制作等领域均有广泛的应用，另外，在医学影像合成与分析，化合物合成和药物发现等方面，图像生成也具有很大的应用潜力。

根据图像构成的类型，图像按照颜色和灰度的多少可以分为二值图、灰度图、索引图和RGB图，图像生成模型可实现不同图像类型的转换。
在实际应用中，模型的效果表现主要体现在生成图像的质量和图像的多样性，其在平面设计、游戏制作、动画制作等领域均有广泛的应用，另外，在医学影像合成与分析，化合物合成和药物发现等方面，图像生成也具有很大的应用潜力。

1.2 技术发展的关键阶段

作为计算机视觉领域的重要组成部分，图像生成的技术发展大致经历了三个关键阶段：

GAN生成阶段：

生成对抗网络（GAN）是上一代主流图像生成模型，GAN通过生成器和判别器进行博弈训练来不断提升生成能力和鉴别能力，使生成式网络的数据愈发趋近真实数据，从而达到生成逼真图像的目的。但在发展过程中，GAN也存在稳定性较差、生成图像缺乏多样性、模式崩溃等问题。

自回归生成阶段：

自回归模型进行图像生成的灵感得益于NLP预训练方式的成功经验，利用Transformer结构中的自注意力机制能够优化GAN的训练方式，提高了模型的稳定性和生成图像的合理性，但基于自回归模型的图像生成在推理速度和训练成本方面的问题，使其实际应用受限。

扩散模型生成阶段：

对于前代模型在性能方面的局限性，扩散模型（Diffusion Model）已经使这些问题得到解决，其在训练稳定性和结果准确性的效果提升明显，因此迅速取代了GAN的应用。而对于产业应用中的大量跨模态图像生成需求，则需要结合CLIP进行，CLIP基于文本-图像对的训练方式能够建立跨模态的连接，显著提升生成图像的速度和质量。

目前，业内主流且生成效果优秀的图像生成产品主要是基于扩散模型和CLIP实现的。

1.3 主流模型实现原理及优缺点

扩散模型（Diffusion Model）

1、实现原理：扩散模型是通过定义一个扩散步骤的马尔可夫链，通过连续向数据添加随机噪声，直到得到一个纯高斯噪声数据，然后再学习逆扩散的过程，经过反向降噪推断来生成图像。扩散模型通过系统地扰动数据中的分布，再恢复数据分布，使整个过程呈现一种逐步优化的性质，确保了模型的稳定性和可控度。

2、模型优缺点：扩散模型的优点在于其基于马尔可夫链的正向及反向扩散过程能够更加准确地还原真实数据，对图像细节的保持能力更强，因此生成图像的写实性更好。特别是在图像补全修复、分子图生成等应用上扩散模型都能取得很好的效果。但由于计算步骤的繁杂，相应地，扩散模型也存在采样速度较慢的问题，以及对数据类型的泛化能力较弱。

CLIP（Contrastive Language-image Pre-training）

1、原理：CLIP是基于对比学习的文本-图像跨模态预训练模型，其训练原理是通过编码器分别对文本和图像进行特征提取，将文本和图像映射到同一表示空间，通过文本-图像对的相似度和差异度计算来训练模型，从而能够根据给定的文本生成符合描述的图像。

2、模型优缺点：CLIP模型的优点在于其基于多模态的对比学习和预训练的过程，能够将文本特征和图像特征进行对齐，因此无需事先标注数据，使其在零样本图像文本分类任务中表现出色；同时对文本描述和图像风格的把握更加准确，并能够在不改变准确性的同时对图像的非必要细节进行变化，因此在生成图像的多样性方面表现更佳。

由于CLIP本质上属于一种图像分类模型，因此对于复杂和抽象场景的表现存在局限性，例如可能在包含时间序列数据和需要推理计算的任务中生成图像的效果不佳。另外，CLIP的训练效果依赖大规模的文本-图像对数据集，对训练资源的消耗比较大。

1.4 当前AIGC 行业发展趋势

基于扩散模型和CLIP模型的基础架构，衍生出一些列可悲开发者使用的工具平台，加速AIGC生图的生产力和商业化进程。目前行业主流AIGC工作流工具，基本就是ComfyUI和Web UI两个。从Midjourney和SD的官方社区文档，可以查看到下列两个工具的比较：

二、应用场景

对于新能源汽车行业，车联网的互动能力和趣味性，会成为行业内竞争堡垒。而且新能源企业面向车主的服务方式会更贴近于互联网企业，内容交互的引流已经变成各家车企重点攻坚的方向。典型场景如下：

1）车主节假日中短途游之后，基于车联网和车载芯片，会记录如下旅程信息：

2）基于旅程信息，期望大模型在汽车内容社区，自动生成如下的风格化素材，并推送

同时为了最大化的c端引流，车企对AIGC的能力提出了极高的要求，尤其注重生图细节的下列部分：

生图的风格化，是否能完全遵从指令
汽车logo和边缘的色差
背景车型无违和拼装等

三、实践落地

3.1 AIGC生图工具选型

总体来看，面向to c场景的生产环境使用，ComfyUI的学习曲线虽然较陡，但相较于其他的Stable Diffusion runtime有以下优势：

在SDXL模型推理上相较于其他 UI 有很大的性能优化，图片生成速度相较于 webui 有 10%~25% 的提升。
高度自定义，可以让用户更加精准和细粒度控制整个图片生成过程，深度用户可以通过 ComfyUI 更简单地生成更好的图片。
Workflow 以 json 或者图片的形式更易于分享传播，可以更好地提高效率。
开发者友好，Workflow 的 API 调用可以通过简单加载相同的 API 格式 json 文件，以任何语言来调用生成图片。

当然，还有一个关键点是，PAI EAS基于场景化部署，对于ComfyUI的版本选择更多样，更便捷。

ComfyUI的工作流配置页面

3.2 业务流程确认

往往容易被忽视的第1步，就是基于业务需求设计完整的工作流。

因为aigc最后的效果是要求比较高，所以为了实现目标，往往需要大语言模型，大视觉模型，NLP，VAE，CLIP等一系列模型组合才能达到效果，而且ComfyUI的生图时间普遍较长，所以节点的编排和选择，串行还是并行，哪个节点加图层，都很有讲究。

3.3 自定义节点开发

完成工作流的设计之后，下一步就是基于开源社区，确认可被使用的开发节点，以及需要后续自开发的节点。

目前通过github所能获取的标准ComfyUI节点，都是开源模型节点，所以完成上边链路所需要的文生文和文生图，就需要对通义千问和通义万相节点进行定制化编写后，才能挂载到ComfyUI上。下边是基于ComfyUI社区介绍，整理的《ComfyUI自定义节点开发规范》：

ComfyUI自定义节点开发规范

class Example:    """    A example node    Class methods    -------------    INPUT_TYPES (dict):        Tell the main program input parameters of nodes.    IS_CHANGED:        optional method to control when the node is re executed.    Attributes    ----------    RETURN_TYPES (`tuple`):        The type of each element in the output tuple.    RETURN_NAMES (`tuple`):        Optional: The name of each output in the output tuple.    FUNCTION (`str`):        The name of the entry-point method. For example, if `FUNCTION = "execute"` then it will run Example().execute()    OUTPUT_NODE ([`bool`]):        If this node is an output node that outputs a result/image from the graph. The SaveImage node is an example.        The backend iterates on these output nodes and tries to execute all their parents if their parent graph is properly connected.        Assumed to be False if not present.    CATEGORY (`str`):        The category the node should appear in the UI.    DEPRECATED (`bool`):        Indicates whether the node is deprecated. Deprecated nodes are hidden by default in the UI, but remain        functional in existing workflows that use them.    EXPERIMENTAL (`bool`):        Indicates whether the node is experimental. Experimental nodes are marked as such in the UI and may be subject to        significant changes or removal in future versions. Use with caution in production workflows.    execute(s) -> tuple || None:        The entry point method. The name of this method must be the same as the value of property `FUNCTION`.        For example, if `FUNCTION = "execute"` then this method's name must be `execute`, if `FUNCTION = "foo"` then it must be `foo`.    """    def __init__(self):        pass    @classmethod    def INPUT_TYPES(s):        """            Return a dictionary which contains config for all input fields.            Some types (string): "MODEL", "VAE", "CLIP", "CONDITIONING", "LATENT", "IMAGE", "INT", "STRING", "FLOAT".            Input types "INT", "STRING" or "FLOAT" are special values for fields on the node.            The type can be a list for selection.            Returns: `dict`:                - Key input_fields_group (`string`): Can be either required, hidden or optional. A node class must have property `required`                - Value input_fields (`dict`): Contains input fields config:                    * Key field_name (`string`): Name of a entry-point method's argument                    * Value field_config (`tuple`):                        + First value is a string indicate the type of field or a list for selection.                        + Second value is a config for type "INT", "STRING" or "FLOAT".        """        return {            "required": {                "image": ("IMAGE",),                "int_field": ("INT", {                    "default": 0,                     "min": 0, #Minimum value                     "max": 4096, #Maximum value                    "step": 64, #Slider's step                    "display": "number", # Cosmetic only: display as "number" or "slider"                    "lazy": True # Will only be evaluated if check_lazy_status requires it                }),                "float_field": ("FLOAT", {                    "default": 1.0,                    "min": 0.0,                    "max": 10.0,                    "step": 0.01,                    "round": 0.001, #The value representing the precision to round to, will be set to the step value by default. Can be set to False to disable rounding.                    "display": "number",                    "lazy": True                }),                "print_to_screen": (["enable", "disable"],),                "string_field": ("STRING", {                    "multiline": False, #True if you want the field to look like the one on the ClipTextEncode node                    "default": "Hello World!",                    "lazy": True                }),            },        }    RETURN_TYPES = ("IMAGE",)    #RETURN_NAMES = ("image_output_name",)    FUNCTION = "test"    #OUTPUT_NODE = False    CATEGORY = "Example"    def check_lazy_status(self, image, string_field, int_field, float_field, print_to_screen):        """            Return a list of input names that need to be evaluated.            This function will be called if there are any lazy inputs which have not yet been            evaluated. As long as you return at least one field which has not yet been evaluated            (and more exist), this function will be called again once the value of the requested            field is available.            Any evaluated inputs will be passed as arguments to this function. Any unevaluated            inputs will have the value None.        """        if print_to_screen == "enable":            return ["int_field", "float_field", "string_field"]        else:            return []    def test(self, image, string_field, int_field, float_field, print_to_screen):        if print_to_screen == "enable":            print(f"""Your input contains:                string_field aka input text: {string_field}                int_field: {int_field}                float_field: {float_field}            """)        #do some processing on the image, in this example I just invert it        image = 1.0 - image        return (image,)    """        The node will always be re executed if any of the inputs change but        this method can be used to force the node to execute again even when the inputs don't change.        You can make this node return a number or a string. This value will be compared to the one returned the last time the node was        executed, if it is different the node will be executed again.        This method is used in the core repo for the LoadImage node where they return the image hash as a string, if the image hash        changes between executions the LoadImage node is executed again.    """    #@classmethod    #def IS_CHANGED(s, image, string_field, int_field, float_field, print_to_screen):    #    return ""# Set the web directory, any .js file in that directory will be loaded by the frontend as a frontend extension# WEB_DIRECTORY = "./somejs"# Add custom API routes, using routerfrom aiohttp import webfrom server import PromptServer@PromptServer.instance.routes.get("/hello")async def get_hello(request):    return web.json_response("hello")# A dictionary that contains all nodes you want to export with their names# NOTE: names should be globally uniqueNODE_CLASS_MAPPINGS = {    "Example": Example}# A dictionary that contains the friendly/humanly readable titles for the nodesNODE_DISPLAY_NAME_MAPPINGS = {    "Example": "Example Node"}

按照开发规范，通过调用百炼接口，对qwen-max和wanx-v2的进行节点封装：

qwen-max的plugin 节点from http import HTTPStatusimport dashscopeimport jsonclass 旅行文本生成:         def __init__(self):        # temp        dashscope.api_key = ""    @classmethod    def INPUT_TYPES(s):        return {            "required": {                "system_prompt": ("STRING", {"default": """请根据我输入的中文描述，生成符合主题的完整提示词。生成后的内容服务于一个绘画AI，它只能理解具象的提示词而非抽象的概念。请严格遵守以下规则，规则如下：#内容根据文字生成一张与风景相关的优美的画面。#风格真实、高清、写实#action 1.提取途径城市之一，根据此地点搜索当地最著名的景点或建筑，例如：上海，可提取上海东方明珠2.提取有关天气的词汇，会决定于整个画面的色调3.提取有关心情、驾驶体验的描述，与天气同时决定画面的色调4.提取日期，判断季节，作为画面的主要色调参考                """,                 "multiline": True                }),                "query_prompt": ("STRING", {                    "default": """- 用户标记emoji：出游- 用户文字：新司机的五一出游！- 出行时间：2024/5/2 下午10:38-2024/5/5 下午6:57- 总驾驶时长：14小时28分钟- 公里数：645.4km- 起点：上海市黄浦区中山南路1891号-1893号- 起点天气：晴天- 终点：上海市闵行区申长路688号- 终点天气：多云- 途径城市：湖州市 无锡市 常州市- 组队信息：欧阳开心的队伍- 车辆信息：黑色一代                            """,                     "multiline": True})            },        }    RETURN_TYPES = ("STRING",)    FUNCTION = "生成绘画提示词"    CATEGORY = "旅行文本生成"    def 生成绘画提示词(self, system_prompt, query_prompt):        messages = [            {'role': 'system', 'content': system_prompt},            {'role': 'user', 'content': query_prompt}        ]        response = dashscope.Generation.call(            model="qwen-max",            messages=messages,            result_format='message'        )        if response.status_code == HTTPStatus.OK:            # Assuming the response contains the generated prompt in the 'output' field            painting_prompt = response.output.choices[0].message.content        else:            raise Exception('Request failed: Request id: %s, Status code: %s, error code: %s, error message: %s' % (                response.request_id, response.status_code,                response.code, response.message            ))        return (painting_prompt,)# A dictionary that contains all nodes you want to export with their namesNODE_CLASS_MAPPINGS = {    "旅行文本生成": 旅行文本生成}# A dictionary that contains the friendly/humanly readable titles for the nodesNODE_DISPLAY_NAME_MAPPINGS = {    "旅行文本生成": "生成旅行本文提示词"}

万相2.0的plugin 节点from http import HTTPStatusfrom urllib.parse import urlparse, unquotefrom pathlib import PurePosixPathimport requestsimport dashscopefrom dashscope import ImageSynthesisimport randomclass ImageSynthesisNode:    """    A node for generating images based on a provided prompt.    Class methods    -------------    INPUT_TYPES (dict):        Define the input parameters of the node.    IS_CHANGED:        Optional method to control when the node is re-executed.    Attributes    ----------    RETURN_TYPES (`tuple`):        The type of each element in the output tuple.    FUNCTION (`str`):        The name of the entry-point method.    CATEGORY (`str`):        The category the node should appear in the UI.    """        @classmethod    def INPUT_TYPES(s):        return {            "required": {                "prompt": ("STRING", {                    "default": "",                     "multiline": True                    })            },        }    RETURN_TYPES = ("STRING",)    FUNCTION = "generate_image_url"    CATEGORY = "Image Synthesis"    def __init__(self):        # 设置API密钥        dashscope.api_key = ""    def generate_image_url(self, prompt):        negative_prompt_str = '(car:1.4), NSFW, nude, naked, porn, (worst quality, low quali-ty:1.4), deformed iris, deformed pupils, (deformed, distorted, disfigured:1.3), cropped, out of frame, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, cloned face, (mu-tated hands and fingers:1.4), disconnected limbs, extra legs, fused fingers, too many fingers, long neck, mutation, mutated, ugly, disgusting, amputa-tion, blurry, jpeg artifacts, watermark, water-marked, text, Signature, sketch'        random_int = random.randint(1,4294967290)        rsp = ImageSynthesis.call(            model='wanx2-t2i-lite',            prompt=prompt,            negative_prompt=negative_prompt_str,            n=1,            size='768*960',            extra_input={'seed':random_int}        )        if rsp.status_code == HTTPStatus.OK:            # 获取生成的图片URL            image_url = rsp.output.results[0].url        else:            raise Exception('Request failed: Status code: %s, code: %s, message: %s' % (                rsp.status_code, rsp.code, rsp.message            ))        return (image_url,)# A dictionary that contains all nodes you want to export with their namesNODE_CLASS_MAPPINGS = {    "ImageSynthesisNode": ImageSynthesisNode}# A dictionary that contains the friendly/humanly readable titles for the nodesNODE_DISPLAY_NAME_MAPPINGS = {    "ImageSynthesisNode": "Image Synthesis Node"}# 示例调用if __name__ == '__main__':    # prompt = "A beautiful and realistic high-definition landscape scene, featuring the famous landmark of Wuxi, the Turtle Head Isle Park, as it is one of the cities passed through during the journey. The weather transitions from a clear, sunny day in the starting point, Shanghai, to a cloudy sky at the destination, also in Shanghai. This scenic drive, experienced by a new driver on a May Day trip, spans from the evening of May 2, 2024, to the late afternoon of May 5, 2024, covering a total distance of 645.4 kilometers. The mood is joyful and adventurous, with the team named \"OuYang's Happy Team\" enjoying the ride in a black ES. The overall tone of the image reflects the transition from a bright, cheerful start to a more serene, calm atmosphere, with lush greenery and blooming flowers indicating the early summer season. The harmonious blend of natural beauty and man-made structures, along with the changing weather, creates a picturesque and tranquil setting."    prompt = "A beautiful and realistic high-definition landscape scene, featuring the famous landmark of Wuxi, the Turtle Head Isle Park, as it is one of the cities passed through during the journey. The weather transitions from a clear, sunny day in the starting point, Shanghai, to a cloudy sky at the destination, also in Shanghai. The overall tone of the image reflects the transition from a bright, cheerful start to a more serene, calm atmosphere, with lush greenery and blooming flowers indicating the early summer season. The harmonious blend of natural beauty and man-made structures, along with the changing weather, creates a picturesque and tranquil setting"    node = ImageSynthesisNode()    image_url = node.generate_image_url(prompt)    print(f"Generated Image URL: {image_url}")

3.4 PAI 服务部署&增加算力选择

目前对比了几家主流云厂商，阿里云的PAI和AWS 的Bedrock是比较好的支持ComfyUI多版本的部署，同时对于资源挂载的适配也比较好。这里注意通过PA对ComfyUI部署，会涉及到两个版本：

标准版：适用于单用户使用WebUI或使用一个实例调用API场景。支持通过WebUI生成视频，也可通过API进行调用。请求发送时，会绕过EAS接口，前端直接将请求传递给后端服务器，所有请求均由同一个后端实例进行处理。
API版：系统自动转换服务为异步模式，适用于高并发场景。仅支持通过API进行调用。如果需要多台实例时，建议选用API版。

官方文档从性价比考虑，资源规格推荐使用GU30、A10或T4卡型。系统默认选择GPU > ml.gu7i.c16m60.1-gu30。

实际测试结果看，建议部署L20卡，生图速度相比GU30快一些。基于性价比考虑，选择的是单卡L20，16核128G。

服务配置{    "cloud": {        "computing": {            "instance_type": "ecs.gn8is-2x.8xlarge"        },        "networking": {            "security_group_id": "sg-uf626dg02ts498gqoa2n",            "vpc_id": "vpc-uf6usys7jvf2p7ugcyq1j",            "vswitch_id": "vsw-uf6lv36zo7kkzyq9blyc6"        }    },    "containers": [        {            "image": "eas-registry-vpc.cn-shanghai.cr.aliyuncs.com/pai-eas/comfyui:1.7-beta",            "port": 8000,            "script": "python main.py --listen --port 8000 --data-dir /deta-code-oss"        }    ],    "metadata": {        "cpu": 32,        "enable_webservice": true,        "gpu": 2,        "instance": 1,        "memory": 256000,        "name": "jiashu16"    },    "name": "jiashu16",    "options": {        "enable_cache": true    },    "storage": [        {            "mount_path": "/deta-code-oss",            "oss": {                "path": "oss://ai4d-k4kulrqkyt37jhz1mv/482832/data-205381316445420758/",                "readOnly": false            },            "properties": {                "resource_type": "model"            }        }    ]}

3.5 节点和模型挂载

服务部署后，系统会自动在已挂载的OSS或NAS存储空间中创建以下目录结构：

/custom_nodes：该目录用来存储ComfyUI插件。编写之后的qwen-max的plugin 节点和万相2.0的plugin 节点，需要上传到本文件夹。
/models：该目录用来存放模型文件。
/output：工作流最后的输出结果的存储地址。

3.6 基于workflow json的服务接口建设

完成ComfyUI的工作流搭建后【如果对ComfyUI工作流搭建的细节感兴趣，欢迎在评论区留言】，主要要开启开发者模式，并导出workflow api json，后续就可以进行api调用。

工作流workflow api json样例

工作流workflow api json样例{  "4": {    "inputs": {      "ckpt_name": "基础模型XL _xl_1.0.safetensors"    },    "class_type": "CheckpointLoaderSimple",    "_meta": {      "title": "Checkpoint加载器(简易)"    }  },  "6": {    "inputs": {      "text": [        "149",        0      ],      "speak_and_recognation": true,      "clip": [        "145",        1      ]    },    "class_type": "CLIPTextEncode",    "_meta": {      "title": "CLIP文本编码器"    }  },  "7": {    "inputs": {      "text": "*I* *Do* *Not* *Use* *Negative* *Prompts*",      "speak_and_recognation": true,      "clip": [        "145",        1      ]    },    "class_type": "CLIPTextEncode",    "_meta": {      "title": "CLIP文本编码器"    }  }

3.7 工程架构和稳定性保障