我要投稿

这有一份给法律人的AI术语速查清单

发布日期：2024-06-22 07:26:10 浏览次数： 2227

作者：法律胶片卷

微信搜一搜，关注“法律胶片卷”

本文共4920字，预计阅读时间： 16分钟

叠甲：本人在文中的表述、发表的观点与本人任职的机构无关，皆由本人负责。本文中部分内容由AI自动生成，如有错误或侵权，请私信联系博主。转载请注明出处。

2024年6月19日，无疑是一个值得铭记的里程碑。就在这天，英伟达的市值超越了科技巨头微软，一举成为世界上最值钱的公司。不管承认或不承认，AI时代都已经轰轰烈烈地全面到来了。在这个全新的时代，互联网黑话已经过时了，而且连一刻都没有为互联网黑话被淘汰哀悼，立刻赶来战场的是一系列新颖且充满神秘感的AI黑海。

最近，在浏览某些讲AI的文章时，我注意到除了那些严谨的AI专业术语外，还夹杂着一些“新造的词”。写文章的只要考虑把词语一股脑塞进去就行了，但是读者要考虑的就多了。我其实是有些生理性读不懂部分新词的。这让我不禁怀疑，是自己的理解力出现了问题，还是这些新词含义丰富太过深奥，以至于难以捉摸。

经过一番深刻的反思，我还是先归因到可能是由于我自身的AI知识储备不足。为了弥补这一短板，我决定从最基础的术语学起，一步一个脚印地提升自己。

幸运的是，我找到了一份由国际知名法律媒体ALM旗下的law.com的两位法律科技专栏资深作者Stephanie Wilkins和Rhys Dipshan在2023年5月精心整理的AI术语清单。他们在2024年3月又对这份清单进行了更新，使其更加全面和前沿。

借助强大的翻译工具，我得以将这份宝贵的资料翻译成中文，既作为自己学习的笔记，也希望能够与所有渴望学习AI、紧跟时代潮流的法律界同仁以及非法律专业人士共同分享这份知识。

术语是建立共识最快的方法，可以轻松替代长达可能上白字的复杂解释。当我们谈论AI的时候，不仅要清楚自己心里想的是什么，更要确保别人也能听明白你在说什么。别去搞那些看起来高大上，实则只有自己能懂的词。简明的语言足够传递深刻的见解，刻意卖弄是真的会徒增沟通的成本。

算法：在人工智能领域，算法是一套指令或程序，指导计算机如何自主学习并解决特定问题或执行特定任务。

Algorithm: In AI, a set of instructions or programming that tells a computer what to do in order to allow the machine to learn to operate on its own to solve a specific problem or perform a specific task.

人工智能（AI）：专注于开发能够模仿人类智能和思考或完成通常需要人类智能的任务的计算机系统的理论、发展和设计，它是计算机科学的一个分支。

Artificial Intelligence (AI): The branch of computer science focused on the theory, development and design of computer systems that have the ability to mimic human intelligence and thought or perform tasks that normally require human intelligence.

Bard：谷歌于2023年2月推出的一款基于LaMDA大型语言模型的聊天机器人工具。

Bard: A chatbot tool released by Google in February 2023, based on the LaMDA large language model.

聊天机器人：一种与用户进行“对话”的计算机程序。基于规则或流程的聊天机器人会根据问题提供预设答案，而基于AI的聊天机器人则更为灵活，能够从更广泛的信息库中提取信息，并且能够随着时间推移不断学习。这些聊天机器人建立在会话AI技术之上。

Chatbot: A computer program that “converses” with its user. Rule- or flow-based chatbots deliver pre-written answers in response to questions and cannot deviate from this content. AI-based chatbots are more dynamic, can pull from larger databases of information, and can learn more over time. These are built on top of conversational AI.

ChatGPT：由Open AI推出的一款商业聊天机器人，最初基于GPT-3.5大型语言模型，也被称作text-davinci-003，现在已经升级到GPT-4，该模型于2022年11月30日发布。

ChatGPT: A commercially available chatbot from Open AI, based originally on the GPT-3.5 large language model, also known as text-davinci-003, and now on GPT-4, that was released on November 30, 2022.

持续主动学习（CAL）：一种AI应用，它通过监督学习学会了区分不同程度的响应和非响应文档或概念，从而能够在没有持续人类监督的情况下自我纠正。在电子发现领域，TAR 2.0就是这种学习方式的一个实例。

Continuous Active Learning (CAL): An application of AI in which the system learns to correct itself—without the need for ongoing human supervision—because it has learned to discern between varying degrees of responsive and nonresponsive documents or concepts via supervised learning. In e-discovery, TAR 2.0 is an example of continuous-active-learning.

会话AI：利用大量数据、机器学习和自然语言处理技术，使用户能够通过文本和语音输入与技术进行“对话”，模仿人类的互动方式。会话AI是一些聊天机器人背后的智能“大脑”。

Conversational AI: Technologies that use large volumes of data, machine learning and natural language processing to allow users to “talk to” the technology, by imitating human interaction through recognizing text and speech inputs. Conversational AI serves as the synthetic “brain” behind some chatbots.

深度学习：一种机器学习技术，它使用神经网络来模仿人脑的工作方式，通过三层或更多层的深度训练，使AI能够进行数据聚类和预测。

Deep Learning: A type of machine learning that utilizes neutral networks to mimic the human brain, using three or more layers of training to enable the AI cluster data and make predictions.

基础模型：一种在大量未标记数据上通过自监督学习训练的大型AI模型，能够准确执行包括自然语言处理、图像分类、问题回答等在内的广泛任务，只需进行少量微调。

Foundational Model: A large AI model trained on massive quantities of unlabeled data, usually through self-supervised learning, that can be used to accurately perform a wide range of tasks with minimal fine-tuning. Such tasks include: natural language processing, image classification, question answering and more.

垃圾进，垃圾出：这个表达意味着AI系统的性能取决于其训练数据的质量。如果AI系统基于不准确、有偏见或过时的数据进行训练，其输出结果也会反映出这些问题。

Garbage In, Garbage Out: An expression meaning that an AI system is only as good as the data on which it is trained. If an AI system is trained on inaccurate, biased or outdated data, its outputs will reflect those shortcomings.

生成性AI：包括大型语言模型在内的AI系统类别，它们能够基于之前训练的数据独立创造独特、新颖的内容，如文本、图像、音频等。与传统AI系统不同，生成性AI算法不仅能够识别模式和进行预测，一些高级的生成性AI系统甚至能够响应它们之前未训练的问题或提示，这被称为零样本学习。

Generative AI: A category of AI systems, including large language models, that can independently create unique, novel content, in the form of text, images, audio and more, based on the data they have previously been trained on. Unlike traditional AI systems, generative AI algorithms go beyond recognizing patterns and making predictions. Some advanced generative AI systems are not limited to their training datasets, and can learn to respond to questions or prompts containing information on which they were not previously trained. This is defined as zero-shot learning.

GPT：生成性预训练变换器，是OpenAI公司推出的一系列大型语言模型的名称前缀。例如，GPT-3是这一系列的第三代模型。GPT-1于2018年6月发布，GPT-2于2019年2月发布，GPT-3于2020年6月发布，GPT-3.5于2022年3月发布，而GPT-4则于2023年3月14日发布。

GPT: Generative Pre-trained Transformer; the prefix to various generations of large language models from the company OpenAI. For example, GPT-3 is the third generation of GPT models. GPT-1 was released in June 2018. GPT-2 was released in February 2019. GPT-3 was released in June 2020. GPT-3.5 was released in March 2022, with underlying models rolled out over the year, and tex-davinci-003 receiving significant attention in late 2022. GPT-4 was released on March 14, 2023.

图形处理单元（GPU）：一种高效的处理器，用于在计算机屏幕上渲染图形。GPU在AI系统和大型语言模型的训练中发挥着关键作用，因为它们需要大量的处理能力。

Graphics Processing Unit (GPU): A type of efficient processor that is used to render graphics on a computer screen. GPUs are critical in the training of AI systems and large language models that require significant processing power.

幻觉：当AI系统在回答问题或提示时，提供了一个虚假但看似合理的答案，并且自信这个答案是正确的。

Hallucination: An instance where an AI system, when asked a question or prompt, provides a false, fictitious, yet convincing answer that it’s confident is correct.

LaMDA：对话应用的语言模型，是谷歌在2021年5月发布的一款大型语言模型。

LaMDA: Language Model for Dialogue Applications, a large language model released by Google in May 2021.

大型语言模型（LLM）：一种能够执行多种自然语言处理任务的深度学习算法或机器学习模型，包括阅读、总结、翻译、分类、预测和生成文本，以及以会话方式回答问题或响应提示。这些模型基于从大量数据集中获得的知识和通过监督学习和强化学习获得的技能。

Large Language Model (LLM): A type of deep learning algorithm or machine learning model that can perform a variety of natural language processing tasks. These include: reading, summarizing, translating, classifying, predicting and generating text words or sentences, answering questions or responding to prompts in a conversational manner and translating text from one language to another. It performs these tasks based on knowledge gained from massive datasets and supervised and reinforcement learning. LLMs are one kind of foundational model.

LLaMA：Meta AI的大型语言模型，于2023年2月发布。

LLaMA: Large Language Model Meta AI, a large language model released by Meta in February 2023.

机器学习：AI的一个广泛分支，它涉及“教授”AI系统以模仿人类智能行为的方式执行任务、理解概念或解决问题，随着在更多数据上的训练，其性能会逐渐提高。

Machine Learning: A broad branch of AI concerned with “teaching” AI systems to perform tasks, understand concepts or solve problems in a way that imitates intelligent human behavior, gradually becoming more accurate as it is trained on more data.

模型：基于定义好的数据集的AI工具或算法，能够在没有人为干预的情况下，根据相同的信息做出人类专家会做出的决策。例如，GPT-3就是一个AI模型。

Model: An AI tool or algorithm based on a defined dataset that makes decisions a human expert would make given the same information, but without human interference in the decision-making process. GPT-3, for example, is an AI model.

多模态AI：能够处理包括图像、视频、声音以及文本在内的多种数据类型的AI系统，以便生成输出。

Multimodal AI: An AI system that is capable of processing multiple types of data, such as images, video or sound, in addition to text, in order to generate output.

自然语言处理（NLP）：AI和计算机科学的一个分支，它指的是计算机或软件理解和处理书面和口语语言的能力，包括文本和语音数据的意图和情感。

Natural Language Processing (NLP): A branch of AI and computer science that refers to the ability of computers or software to understand and read written and spoken language in the form of text and voice data, including intent and sentiment.

神经网络：一种机器学习手段，模仿人脑的工作方式，能够同时进行多层训练。神经网络由数百万的处理节点组成，是深度学习的核心。

Neural Network: A means of machine learning that mimics the human brain, and includes the ability for multiple layers of training to occur simultaneously. Neural networks are made up of millions of processing nodes and are central to deep learning.

参数：AI模型在训练过程中学习到的知识或变量片段，可以看作是概念之间的连接。在训练过程中，参数会进行调整以实现特定输入的期望输出。通常来说，参数越多，AI模型理解和连接复杂概念的能力就越强。

Parameters: Bits of knowledge or variables, which can be thought of as connections between concepts, that an AI model learns throughout its training process. Parameters are adjusted during training to achieve desired outputs from specific inputs. Generally speaking, the more parameters, the greater AI’s ability to understand and connect complex concepts together. Therefore, the more parameters, the more advanced the AI model.

提示：给AI模型或机器学习算法的指令，以生成特定的输出。

Prompt: The instruction given to an AI model or machine learning algorithm in order to generate a specific output.

提示工程：识别并使用合适的提示词，以从AI工具中产生最有用或最理想的结果。

Prompt Engineering: Identifying and using the right prompts to produce the most useful or desirable outcomes from an AI tool.

强化学习：一种机器学习技术，通过试错和从自身行为及输出中获取反馈，使AI模型进行交互式学习。

Reinforcement Learning: A machine learning technique used to train an AI model in which the AI system interactively learns by trial and error, incorporating feedback from its own actions and outputs.

检索增强生成（RAG）：一种提高大型语言模型输出质量的过程，通过向模型提供预先存在的外部信息，为模型提供上下文，使其能够在指定的知识库内生成响应。

Retrieval Augmented Generation (RAG): A process for improving the output of a large language model that involves providing the LLM with pre-existing, outside information that gives the LLM context and allows it to ground its responses within a specified knowledge base.

机器人流程自动化（RPA）：一种业务流程自动化形式，也称为软件机器人，它允许使用智能自动化技术来定义执行大量、重复性人工任务的指令集，快速且无错误。尽管RPA技术与AI有相似之处，但它并不是AI的一种形式。

Robotic Process Automation (RPA): A form of business process automation, also known as software robotics, that allows humans to use intelligent automation technology to define a set of instructions for the performance of high-volume, repetitive human tasks quickly and without error. While RPA technology shares similarities with AI and is often included in the same discussions, it is not a form of AI.

自监督学习：一种机器学习形式，模型输入非结构化数据并自动生成数据标签；本质上，模型自我训练以区分输入的不同部分。

Self-Supervised Learning: A form of machine learning in which a model is input with unstructured data and automatically generates data labels; essentially, the model trains itself to differentiate between different parts of the input. Also known as predictive or pretext learning.

半监督学习：一种机器学习形式，其中部分输入数据被标记。半监督学习是监督学习和无监督学习的结合。

Semi-Supervised Learning: A form of machine learning in which some of the input data is labeled. Semi-supervised learning is a mix of supervised and unsupervised learning.

监督学习：一种机器学习形式，模型通过人工在训练过程中的手动纠正来学习识别特定概念或主题。在电子发现领域，TAR 1.0就是监督学习的一个实例。

Supervised Learning: A form of machine learning in which a model is taught how to identify a certain concept or topic—for example, a specific type of document—via a person manually correcting the machine during the training process. In e-discovery, TAR 1.0 is an example of supervised learning.

Token:在自然语言处理中，形成书写语言中语义单元或特定角色的字符序列被称为词元。将语言流分解为有意义的元素，如单词或句子的过程称为分词（tokenization）

In natural language processing, a sequence of characters that form a semantic unit or certain role in a written language. The process of breaking a stream of language into meaningful elements such as words or sentences is called tokenization.

无监督学习：一种机器学习形式，模型使用深度学习技术在没有显式训练标记数据的情况下检测数据中的模式。

Unsupervised Learning: A form of machine learning in which a model employs deep learning techniques to detect patterns in data without explicit training on labeled data.

网络爬虫：从网站提取数据，通常是大量网站，并使用这些数据来训练AI模型。这些提取的数据成为AI和生成性AI工具后来生成输出的基础。

Web scraping: Extracting data from websites, usually a large number of them, and using that extracted data to train AI models. The extracted data becomes the basis of learning that informs outputs later generated by AI and generative AI tools.

零样本学习：AI系统在没有接受过特定问题或提示的训练的情况下，学会如何响应、创造新内容或对数据进行分类的能力。

Zero-Shot Learning: The ability for an AI system to learn how to respond to questions or prompts, create new content or classify data on which it was not previously trained.

想学习原文可以点击阅读原文。这份清单目前是会随着AI发展不断更新的，个人觉得可以作为一个字典来查阅。