我要投稿

关注企业 RAG 应用私有部署，Jina AI 模型登陆 Amazon SageMaker

发布日期：2024-04-18 17:08:07 浏览次数： 2094 作者：Jina AI

本文主要介绍了如何在 AWS（亚马逊云服务）上使用 Jina AI 的 Embeddings 和 Rerankers 模型来构建 RAG 应用，涵盖从配置 AWS 账户、设置 Python 环境、订阅模型、加载数据集、启动模型、构建和索引数据集的各个步骤。
此外，还展示了如何利用 FAISS 数据库进行语义搜索、以及整合 Mistral-Instruct LLM 用于生成结果等。

Jina AI 的 Embeddings 和 Reranker 模型已经正式登陆 Amazon SageMaker，企业可轻松将 Jina AI 先进的人工智能模型部署在私有 AWS 环境中，充分利用 AWS 成熟的基础架构，确保云服务的安全、稳定与一致性。

通过 AWS Marketplace，SageMaker 用户享受到的不只是 Jina AI 模型领先的 8k 输入上下文窗口和优秀的多语种 Embeddings 能力，还有具有竞争力的价格优势。在这里，模型传输免费，价格明码标价，账单直接和 AWS 账户整合，一切透明又简单。

目前在 Amazon SageMaker 上可用的 Jina AI 模型包括：

Jina Embeddings v2 Base - 英语
Jina Embeddings v2 Small - 英语
Jina Embeddings v2 双语模型：

德语/英语
中文/英语
西班牙语/英语

Jina Embeddings v2 Base - 代码
Jina Reranker v1 Base - 英语
Jina ColBERT v1 - 英语
Jina ColBERT Reranker v1 - 英语

完整的模型列表请访问 Jina AI 的 AWS Marketplace 供应商页面，并可以享受七天免费试用。

? https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy

本文将指导您使用来自 Amazon SageMaker 的组件创建一个检索增强生成 (RAG) 应用。我们将使用以下模型：Jina Embeddings v2 - English、Jina Reranker v1 和 Mistral-7B-Instruct 大型语言模型。

您还可以通过下载 Python 代码来学习操作。

? https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/sagemaker/sagemaker.ipynb

检索增强生成

RAG(检索增强生成) 是目前流行的新范式。在应用中如果直接使用大型语言模型 (LLM) 回答用户请求，就非常容易生成虚假信息。RAG 系统会从外部数据库中搜集与用户请求相关的信息，然后这些信息会作为输入提示送给语言模型。语言模型的任务是将这些信息整合成一个连贯且准确的回答，从而减少误导性信息的生成，并确保回答的实用性。

RAG 应用通常有 4 个组件组成：一个适用于信息检索的向量数据库作为数据源；一个处理用户查询并检索相关数据的信息检索系统；一个选出合适数据作为语言模型输入的重排系统；以及一个生成用户回答的大型语言模型。

此外， Embeddings 模型和 Reranker 模型也是这一过程中不可或缺的部分。这些模型通过分析文本间的语义相似性，提高了信息检索的精度，使系统更好地响应用户需求。

具体来说，Embeddings 模型将文本转换为高维向量，这些向量之间的空间关系可以显示文本的语义相关性。而 Reranker 模型则在检索结果中进行筛选，挑选出与用户请求最为匹配的数据，确保提供给语言模型的提示信息是最相关、最具体的，极大地提升了回答的质量。

模型部署在 SageMaker 上的表现如何？

为了评估 Jina Embeddings v2 Base - English 模型作为 SageMaker 终端的性能和稳定性，我们在一台 g4dn.xlarge 实例上进行了测试。测试方法是持续每秒模拟一个新用户，每个用户发送一个请求，等待响应，收到响应后重复上述操作。

测试结果摘要：

请求长度小于 100 个词元(token)：

并发用户数最高可达 150 人：响应时间始终低于 100 毫秒 (ms)。
并发用户数超过 150 人：响应时间随着用户数增加呈线性增长，在 300 人左右达到 1.5 秒 (s)。由于 API 错误超过 5 次，测试停止。

请求长度在 1,000 到 8,000 个词元之间：

并发用户数最高可达 20 人：响应时间始终低于 8 秒 (s)。
并发用户数超过 20 人：响应时间随着用户数增加呈线性增长，在 140 人左右达到 60 秒 (s)。由于 API 错误超过 5 次，测试停止。

结论：

对于大多数用户来说，g4dn.xlarge 或 g5.xlarge 实例足以满足日常 Embedding 需求。
对于大型索引作业（通常远不如搜索任务频繁），用户可能需要更强大的实例。
有关所有可用 SageMaker 实例的列表，请参阅 AWS 的 EC2 概述。

配置您的 AWS 账户

创建 AWS 账户

开始之前，您需要拥有一个 AWS 账户。如果没有，请前往 AWS 官网注册: https://portal.aws.amazon.com/ 进行注册。

注意： 免费账户无法完成本教程，因为 Amazon 不提供免费使用 SageMaker 的权限。即使您使用 Jina AI 提供的七天免费试用，也需要为账户添加支付方式来订阅这些模型。

安装 Python 环境

在用于本教程的 Python 环境中安装以下工具和库：

pip install awscli jina-sagemaker

获取访问密钥

您还需要获取您的 AWS 账户访问密钥和秘密访问密钥。具体操作方法请参考 AWS 网站上的获取访问密钥指南：

? https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html

选择 AWS 区域

选择您要使用的 AWS 区域。有关区域的信息，可以参考：

? https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html

设置环境变量

在 Python 脚本或 Jupyter Notebook 中，使用以下代码设置环境变量：

import os

os.environ["AWS_ACCESS_KEY_ID"] = "<YOUR_ACCESS_KEY_ID>"  # 替换为您的访问密钥 ID
os.environ["AWS_SECRET_ACCESS_KEY"] = "<YOUR_SECRET_ACCESS_KEY>"  # 替换为您的秘密访问密钥
os.environ["AWS_DEFAULT_REGION"] = "<YOUR_AWS_REGION>"  # 替换为您的 AWS 区域
os.environ["AWS_DEFAULT_OUTPUT"] = "json"  # 设置默认输出格式为 json

您也可以通过 AWS 命令行工具或在本地文件系统中设置 AWS 配置文件来完成上述操作。更多细节请参考 AWS 网站上的相关文档。

pip install awscli jina-sagemaker

创建角色

您还需要一个具有足够权限的 AWS 角色来使用本教程所需的资源。

该角色必须：

启用 AmazonSageMakerFullAccess。
拥有进行 AWS Marketplace 订阅的权限，并已启用所有三个:

aws-marketplace:ViewSubscriptions
aws-marketplace:Unsubscribe
aws-marketplace:Subscribe

或您的 AWS 账户订阅了 jina-embedding-model。

将角色的 ARN（Amazon 资源名称）存储在变量名称 role 中：

role = <YOUR_ROLE_ARN>

有关更多信息，请参阅 AWS 网站上的角色文档。

? https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html

订阅 Jina AI 模型

本教程将使用 Jina Embeddings v2 base English 模型和 Jina Reranker v1 - English 模型。您需要在 AWS Marketplace 上订阅这两个模型：

Jina Embeddings v2 Base English 模型: https://aws.amazon.com/marketplace/pp/prodview-jwbhofu3iesos
Jina Reranker v1 - English 模型: https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy

Jina AI 目前提供其模型的 7 天免费试用。您需要为运行模型的 AWS 实例付费，但在试用期内，您无需为模型额外付费。

订阅模型后，获取您 AWS 区域的模型 ARN，并将它们分别存储在变量名 embedding_package_arn 和 reranker_package_arn 中。本教程中的代码将使用这些变量名引用它们。

如果您不知道如何获取 ARN，请将您的 Amazon 区域名称放入变量区域，并使用以下代码：

region = os.environ["AWS_DEFAULT_REGION"]

def get_arn_for_model(region_name, model_name):
    model_package_map = {
        "us-east-1": f"arn:aws:sagemaker:us-east-1:253352124568:model-package/{model_name}",
        "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{model_name}",
        "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{model_name}",
        "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{model_name}",
        "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{model_name}",
        "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{model_name}",
        "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{model_name}",
        "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{model_name}",
        "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{model_name}",
        "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{model_name}",
        "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{model_name}",
        "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{model_name}",
        "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{model_name}",
        "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{model_name}",
        "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{model_name}",
        "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{model_name}",
    }

    return model_package_map[region_name]

embedding_package_arn = get_arn_for_model(region, "jina-embeddings-v2-base-en")
reranker_package_arn = get_arn_for_model(region, "jina-reranker-v1-base-en")

加载数据集

在本教程中，我们将使用 YouTube 频道 TU Delft Online Learning 提供的视频集。该频道制作各种 STEM 科目的教育材料。其编程已获得 CC-BY 许可。

数据集来源：代尔夫特理工大学在线学习频道:

- https://www.youtube.com/@tudelftonlinelearning1226

数据集处理：

我们从该频道下载了 193 个视频，并使用 OpenAI 的开源 Whisper 语音识别模型对其进行处理。
我们使用最小的模型 openai/whisper-tiny 将视频处理成文字记录。
文字记录已整理成 CSV 文件，您可以从此处下载。https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/sagemaker/tu_delft.csv

CSV 文件的每一行包含视频标题、YouTube 上的视频 URL、视频的文字记录

要在 Python 中加载此数据，首先安装 pandas 和 requests：

pip install requests pandas

将 CSV 数据直接加载到名为 tu_delft_dataframe 的 Pandas DataFrame 中：

import pandas

# Load the CSV file
tu_delft_dataframe = pandas.read_csv("https://raw.githubusercontent.com/jina-ai/workshops/feat-sagemaker-post/notebooks/embeddings/sagemaker/tu_delft.csv")

您可以使用 DataFrame 的 head() 方法检查内容。在笔记本中，它应该看起来像这样：

您还可以使用此数据集中给出的 URL 观看视频，验证语音识别是否大致准确。

启动 Jina Embeddings v2

下面的代码将在 AWS 上启动 ml.g4dn.xlarge 的实例来运行 Embeddings 模型。此过程可能需要几分钟才能完成。

import boto3
from jina_sagemaker import Client

# Choose a name for your embedding endpoint. It can be anything convenient.
embeddings_endpoint_name = "jina_embedding"

embedding_client = Client(region_name=boto3.Session().region_name)
embedding_client.create_endpoint(
    arn=embedding_package_arn,
    role=role,
    endpoint_name=embeddings_endpoint_name,
    instance_type="ml.g4dn.xlarge",
    n_instances=1,
)

embedding_client.connect_to_endpoint(endpoint_name=embeddings_endpoint_name)

更改 instance_type 以选择不同的 AWS 云实例类型（如果适用）。

⚠️ 此命令返回后，AWS 将立即开始按时间计费。您将按小时计费，直到您停止此实例。为此，请按照文末的说明进行操作。

加载数据集并建立索引

现在我们已经加载了数据，正在运行 Jina Embeddings v2 模型，我们可以准备数据并为其建立索引。我们将把数据存储在 FAISS 向量存储中，这是一个专为 AI 应用程序设计的开源向量数据库。

首先，安装 RAG 应用程序的其余先决条件：

pip install tdqm numpy faiss-cpu

分块

我们需要将各个成绩单分成更小的部分，以便我们可以将多个文本放入 LLM 的提示中。下面的代码在句子边界上分解各个转录本，确保默认情况下所有块的单词数不超过 128 个。

def chunk_text(text, max_words=128):
    """
    Divide text into chunks where each chunk contains the maximum number
    of full sentences with fewer words than `max_words`.
    """
    sentences = text.split(".")
    chunk = []
    word_count = 0

    for sentence in sentences:
        sentence = sentence.strip(".")
        if not sentence:
          continue

        words_in_sentence = len(sentence.split())
        if word_count + words_in_sentence <= max_words:
            chunk.append(sentence)
            word_count += words_in_sentence
        else:
            # Yield the current chunk and start a new one
            if chunk:
              yield ". ".join(chunk).strip() + "."
            chunk = [sentence]
            word_count = words_in_sentence

    # Yield the last chunk if it's not empty
    if chunk:
        yield " ".join(chunk).strip() + "."

获取每个块的 Embeddings

我们需要 Embedding 每个块，以将其存储在 FAISS 数据库中。为了获取它们，我们使用 embedding_client.embed() 方法将文本块传递到 Jina AI Embeddings 模型。然后，我们将文本块和 Embeddings 向量添加到 pandas 数据帧 tu_delft_dataframe 作为新的列块和 Embeddings：

import numpy as np
from tqdm import tqdm

tqdm.pandas()

def generate_embeddings(text_df):
    chunks = list(chunk_text(text_df["Text"]))
    embeddings = []

    for i, chunk in enumerate(chunks):
      response = embedding_client.embed(texts=[chunk])
      chunk_embedding = response[0]["embedding"]
      embeddings.append(np.array(chunk_embedding))

    text_df["chunks"] = chunks
    text_df["embeddings"] = embeddings
    return text_df

print("Embedding text chunks ...")
tu_delft_dataframe = generate_embeddings(tu_delft_dataframe)
## if you are using Google Colab or a Python notebook, you can
## delete the line above and uncomment the following line instead:
# tu_delft_dataframe = tu_delft_dataframe.progress_apply(generate_embeddings, axis=1)

使用 Faiss 设置语义搜索

下面的代码创建一个 FAISS 数据库，并通过迭代 tu_delft_pandas 插入块和 Embeddings 向量：

import faiss

dim = 768  # dimension of Jina v2 embeddings
index_with_ids = faiss.IndexIDMap(faiss.IndexFlatIP(dim))
k = 0

doc_ref = dict()

for idx, row in tu_delft_dataframe.iterrows():
    embeddings = row["embeddings"]
    for i, embedding in enumerate(embeddings):
        normalized_embedding = np.ascontiguousarray(np.array(embedding, dtype="float32").reshape(1, -1))
        faiss.normalize_L2(normalized_embedding)
        index_with_ids.add_with_ids(normalized_embedding, k)
        doc_ref[k] = (row["chunks"][i], idx)
        k += 1

启动 Jina Reranker v1 模型

与上面的 Jina Embedding v2 模型一样，此代码将在 AWS 上启动 ml.g4dn.xlarge 的实例来运行 reranker 模型。同样，运行可能需要几分钟。

import boto3
from jina_sagemaker import Client

# Choose a name for your reranker endpoint. It can be anything convenient.
reranker_endpoint_name = "jina_reranker"

reranker_client = Client(region_name=boto3.Session().region_name)
reranker_client.create_endpoint(
    arn=reranker_package_arn,
    role=role,
    endpoint_name=reranker_endpoint_name,
    instance_type="ml.g4dn.xlarge",
    n_instances=1,
)

reranker_client.connect_to_endpoint(endpoint_name=reranker_endpoint_name)

定义查询函数

接下来，我们将定义一个函数，用于识别与任何文本查询最相似的转录块。

这是一个两步过程：

使用 embedding_client.embed() 方法将用户输入转换为 Embeddings 向量，就像我们在数据准备阶段所做的那样。
将 Embeddings 传递给 FAISS 索引以检索最佳匹配。在下面的函数中，默认返回 20 个最佳匹配，但您可以使用 n 参数控制它。

函数 find_most_similar_transcript_segment 将通过比较存储的 Embeddings 与查询 Embeddings 的余弦来返回最佳匹配。

def find_most_similar_transcript_segment(query, n=20):
    query_embedding = embedding_client.embed(texts=[query])[0]["embedding"]  # Assuming the query is short enough to not need chunking
    query_embedding = np.ascontiguousarray(np.array(query_embedding, dtype="float32").reshape(1, -1))
    faiss.normalize_L2(query_embedding)

    D, I = index_with_ids.search(query_embedding, n)  # Get the top n matches

    results = []
    for i in range(n):
        distance = D[0][i]
        index_id = I[0][i]
        transcript_segment, doc_idx = doc_ref[index_id]
        results.append((transcript_segment, doc_idx, distance))

    # Sort the results by distance
    results.sort(key=lambda x: x[2])

    return [(tu_delft_dataframe.iloc[r[1]]["Title"].strip(), r[0]) for r in results]

我们还将定义一个函数，用于访问重新排序 reranker_client，将 find_most_similar_transcript_segment 的结果传递给它，并仅返回三个最相关的结果。它使用 reranker_client.rerank() 方法调用 reranker。

def rerank_results(query_found, query, n=3):
    ret = reranker_client.rerank(
        documents=[f[1] for f in query_found],
        query=query,
        top_n=n,
    )
    return [query_found[r['index']] for r in ret[0]['results']]

使用 JumpStart 加载 Mistral-Instruct

在本教程中，我们将使用 Mistra-7b-instruct 模型（可通过 Amazon SageMaker JumpStart 获得）作为 RAG 系统的 LLM 部分。

? https://aws.amazon.com/blogs/machine-learning/mistral-7b-foundation-models-from-mistral-ai-are-now-available-in-amazon-sagemaker-jumpstart/

运行以下代码来加载和部署 Mistral-Instruct：

from sagemaker.jumpstart.model import JumpStartModel

jumpstart_model = JumpStartModel(model_id="huggingface-llm-mistral-7b-instruct", role=role)
model_predictor = jumpstart_model.deploy()

访问此 LLM 的端点存储在变量 model_predictor 中。

⚠️ 使用此模型也是 AWS 的一项计费服务，因此请不要忘记在完成本教程后将其关闭。完成后请参阅“关闭”部分来停止此部署。

Mistral-Instruct 与 JumpStart

下面是使用 Python 的内置字符串模板类为此应用程序创建 Mistral-Instruct 提示模板的代码。它假设对于每个查询，将向模型呈现三个匹配的转录块。

您可以自己尝试使用此模板来修改此应用程序或看看是否可以获得更好的结果。

from string import Template

prompt_template = Template("""
  <s>[INST] Answer the question below only using the given context.
  The question from the user is based on transcripts of videos from a YouTube
    channel.
  The context is presented as a ranked list of information in the form of
    (video-title, transcript-segment), that is relevant for answering the
    user's question.
  The answer should only use the presented context. If the question cannot be
    answered based on the context, say so.

  Context:
  1. Video-title: $title_1, transcript-segment: $segment_1
  2. Video-title: $title_2, transcript-segment: $segment_2
  3. Video-title: $title_3, transcript-segment: $segment_3

  Question: $question

  Answer: [/INST]
""")

有了这个组件，我们现在就拥有了完整 RAG 应用程序的所有部分。

查询模型

查询模型分为三个步骤。

根据查询搜索相关块。
组装提示。
将提示发送到 Mistral-Instruct 模型并返回其答案。

为了搜索相关块，我们使用上面定义的 find_most_similar_transcript_segment 函数。

question = "When was the first offshore wind farm commissioned?"
search_results = find_most_similar_transcript_segment(question)
reranked_results = rerank_results(search_results, question)

您可以按重新排序的顺序检查搜索结果：

for title, text, _ in reranked_results:
    print(title + "\n" + text + "\n")

结果：

Offshore Wind Farm Technology - Course Introduction
Since the first offshore wind farm commissioned in 1991 in Denmark, scientists and engineers have adapted and improved the technology of wind energy to offshore conditions.This is a rapidly evolving field with installation of increasingly larger wind turbines in deeper waters.At sea, the challenges are indeed numerous, with combined wind and wave loads, reduced accessibility and uncertain-solid conditions.My name is Axel Vire, I'm an assistant professor in Wind Energy at U-Delf and specializing in offshore wind energy.This course will touch upon the critical aspect of wind energy, how to integrate the various engineering disciplines involved in offshore wind energy.Each week we will focus on a particular discipline and use it to design and operate a wind farm.

Offshore Wind Farm Technology - Course Introduction
I'm a researcher and lecturer at the Wind Energy and Economics Department and I will be your moderator throughout this course.That means I will answer any questions you may have.I'll strengthen the interactions between the participants and also I'll get you in touch with the lecturers when needed.The course is mainly developed for professionals in the field of offshore wind energy.We want to broaden their knowledge of the relevant technical disciplines and their integration.Professionals with a scientific background who are new to the field of offshore wind energy will benefit from a high-level insight into the engineering aspects of wind energy.Overall, the course will help you make the right choices during the development and operation of offshore wind farms.

Offshore Wind Farm Technology - Course Introduction
Designed wind turbines that better withstand wind, wave and current loadsIdentify great integration strategies for offshore wind turbines and gain understanding of the operational and maintenance of offshore wind turbines and farmsWe also hope that you will benefit from the course and from interaction with other learners who share your interest in wind energyAnd therefore we look forward to meeting you online.

我们可以直接在提示模板中使用这些信息：

prompt_for_llm = prompt_template.substitute(
    question = question,
    title_1 = search_results[0][0],
    segment_1 = search_results[0][1],
    title_2 = search_results[1][0],
    segment_2 = search_results[1][1],
    title_3 = search_results[2][0],
    segment_3 = search_results[2][1],
)

输出结果字符串，查看实际发送到 LLM 的提示词：

print(prompt_for_llm)
<s>[INST] Answer the question below only using the given context.
  The question from the user is based on transcripts of videos from a YouTube
    channel.
  The context is presented as a ranked list of information in the form of
    (video-title, transcript-segment), that is relevant for answering the
    user's question.
  The answer should only use the presented context. If the question cannot be
    answered based on the context, say so.

  Context:
  1. Video-title: Offshore Wind Farm Technology - Course Introduction, transcript-segment: Since the first offshore wind farm commissioned in 1991 in Denmark, scientists and engineers have adapted and improved the technology of wind energy to offshore conditions.  This is a rapidly evolving field with installation of increasingly larger wind turbines in deeper waters.  At sea, the challenges are indeed numerous, with combined wind and wave loads, reduced accessibility and uncertain-solid conditions.  My name is Axel Vire, I'm an assistant professor in Wind Energy at U-Delf and specializing in offshore wind energy.  This course will touch upon the critical aspect of wind energy, how to integrate the various engineering disciplines involved in offshore wind energy.  Each week we will focus on a particular discipline and use it to design and operate a wind farm.
  2. Video-title: Offshore Wind Farm Technology - Course Introduction, transcript-segment: For example, we look at how to characterize the wind and wave conditions at a given location.  How to best place the wind turbines in a farm and also how to retrieve the electricity back to shore.  We look at the main design drivers for offshore wind turbines and their components.  We'll see how these aspects influence one another and the best choices to reduce the cost of energy.  This course is organized by the two-delfd wind energy institute, an interfaculty research organization focusing specifically on wind energy.  You will therefore benefit from the expertise of the lecturers in three different faculties of the university.  Aerospace engineering, civil engineering and electrical engineering.  Hi, my name is Ricardo Pareda.
  3. Video-title: Systems Analysis for Problem Structuring part 1B the mono actor perspective example, transcript-segment: So let's assume the demarcation of the problem and the analysis of objectives has led to the identification of three criteria.  The security of supply, the percentage of offshore power generation and the costs of energy provision.  We now reason backwards to explore what factors have an influence on these system outcomes.  Really, the offshore percentage is positively influenced by the installed Wind Power capacity at sea, a key system factor.  Capacity at sea in turn is determined by both the size and the number of wind farms at sea.  The Ministry of Economic Affairs cannot itself invest in new wind farms but hopes to simulate investors and energy companies by providing subsidies and by expediting the granting process of licenses as needed.

  Question: When was the first offshore wind farm commissioned?

  Answer: [/INST]

通过方法 model_predictor.predict() 将此提示传递到 LLM — model_predictor：

answer = model_predictor.predict({"inputs": prompt_for_llm})

这将返回一个列表，但由于我们只传入一个提示，因此它将是一个包含一个条目的列表。每个条目都是一个字典，其响应文本位于 key generated_text 下：

answer = answer[0]['generated_text']
print(answer)

结果：

The first offshore wind farm was commissioned in 1991. (Context: Video-title: Offshore Wind Farm Technology - Course Introduction, transcript-segment: Since the first offshore wind farm commissioned in 1991 in Denmark, ...)

让我们通过编写一个函数来完成所有步骤来简化查询：将字符串问题作为参数并将答案作为字符串返回：

def ask_rag(question):
    search_results = find_most_similar_transcript_segment(question)
    reranked_results = rerank_results(search_results, question)
    prompt_for_llm = prompt_template.substitute(
        question = question,
        title_1 = search_results[0][0],
        segment_1 = search_results[0][1],
        title_2 = search_results[1][0],
        segment_2 = search_results[1][1],
        title_3 = search_results[2][0],
        segment_3 = search_results[2][1],
    )
    answer = model_predictor.predict({"inputs": prompt_for_llm})
    return answer[0]["generated_text"]

现在我们可以再问它几个问题。答案将取决于视频文字记录的内容。例如，当答案存在于数据中时，我们可以提出详细问题并得到答案：

ask_rag("What is a Kaplan Meyer estimator?")
The Kaplan Meyer estimator is a non-parametric estimator for the survival
function, defined for both censored and not censored data. It is represented
as a series of declining horizontal steps that approaches the truths of the
survival function if the sample size is sufficiently large enough. The value
of the empirical survival function obtained is assumed to be constant between
two successive distinct observations.
ask_rag("Who is Reneville Solingen?")
Reneville Solingen is a professor at Delft University of Technology in Global
Software Engineering. She is also a co-author of the book "The Power of Scrum."
answer = ask_rag("What is the European Green Deal?")
print(answer)
The European Green Deal is a policy initiative by the European Union to combat
climate change and decarbonize the economy, with a goal to make Europe carbon
neutral by 2050. It involves the use of green procurement strategies in various
sectors, including healthcare, to reduce carbon emissions and promote corporate
social responsibility.

我们还可以提出超出现有信息范围的问题：

ask_rag("What countries export the most coffee?")
Based on the context provided, there is no clear answer to the user's
question about which countries export the most coffee as the context
only discusses the Delft University's cafeteria discounts and sustainable
coffee options, as well as lithium production and alternatives for use in
electric car batteries.
ask_rag("How much wood could a woodchuck chuck if a woodchuck could chuck wood?")
The context does not provide sufficient information to answer the question.
The context is about thermit welding of rails, stress concentration factors,
and a lyrics video. There is no mention of woodchucks or the ability of
woodchuck to chuck wood in the context.

尝试您自己的查询。您还可以更改 LLM 的提示方式，看看是否会改善您的结果。

资源清理

由于 AWS 是基于您使用的模型，以及运行这些模型的 AWS 基础设施的使用时长收费，因此在完成本教程后，要记得关闭三个 AI 模型：

Embeddings 模型 embedding_client
Reranker 模型 reranker_client
大型语言模型 model_predictor

要关闭所有三个模型，请运行以下代码：

# shut down the embedding endpoint
embedding_client.delete_endpoint()
embedding_client.close()
# shut down the reranker endpoint
reranker_client.delete_endpoint()
reranker_client.close()
# shut down the LLM endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()

立刻探索 AWS Marketplace 上的 Jina AI 模型

在 AWS Marketplace 上，Jina AI 提供了多款 Embeddings 和 Reranker 模型。企业用户可以立即利用我们的模型，同时享受 AWS 提供的安全、稳定、一致和成本可控的云服务优势。

我们致力于打造实用的 AI 工具，以及顶级的模型，确保其易于上手，且能轻松整合。让企业能够最小化投资成本，最大化回报。

想看看 Jina AI 能为您做什么？直接跳到 AWS Marketplace，挑选您需要的模型，还能享受 7 天免费试用。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费场景POC验证，效果验证后签署服务协议。零风险落地应用大模型，已交付160+中大型企业