AI知识库

53AI知识库

学习大模型的前沿技术与行业应用场景


医学GraphRAG案例研究:将医生记录转换为医学时序知识图谱
发布日期:2024-11-27 11:40:17 浏览次数: 1526 来源:知识图谱科技


摘要

该案例研究描述了如何将医生的病历转录转换为医学记录时序知识图谱,以进行更复杂的医疗数据分析和问题解答,并强调了WhyHow.AI的平台在该过程中的独特架构和优势。

Key Takeaways:

* 本研究使用合成医疗转录数据构建了一个时序医学记录知识图谱。

* 该知识图谱包含五种节点类型(病人、观察、免疫接种、病情和遭遇类型)和六种文本块类型,这些文本块包含与三元组相关的关键信息摘要。

* WhyHow.AI平台的独特之处在于其向量搜索、独立的三元组对象和JSON数据格式支持,这使得构建和查询时间知识图谱更加高效和准确。

* 该系统能够回答关于病人治疗、历史病历分析等复杂问题,并展示了其在多病人分析和跨多份转录本分析中的优势。

* 通过Claude等LLM工具,可以将非结构化文本数据转换为WhyHow.AI平台可接受的JSON格式,方便知识图谱构建。

* 与仅限向量的RAG系统相比,该方法在多病人分析和跨多份病历分析中具有显著优势。

*  整个过程耗时25个开发小时,其中大部分时间用于迭代完善知识图谱的模式。

正文

有兴趣将医生/患者的医疗记录和成绩单转换为时序知识图谱,以便您可以跨多个病史、时间段和患者提出复杂的问题吗?

在本案例研究中,我们展示了如何将医疗成绩单转换为您可以依赖用于 RAG 和分析目的的时序知识图谱。我们展示了针对这个系统的真实问答是什么,以及你可以通过这个系统实现什么样的商业结果。据我们所知,这里的步骤组合是一种相对较新的知识图谱实现。

使用的数据

出于数据隐私原因,我们使用了根据 Synthea 数据创建的医学成绩单合成数据集:https://synthea.mitre.org/downloads。以下是用作知识图谱创建输入数据的医学转录文本之一的示例。我们将这些转录数据与 Synthea 数据中的结构化医疗记录相结合。我们有 ~75 个转录本,涵盖 10 名患者(即每个患者有 5-10 个转录本)。以下是使用的成绩单示例:

新颖的知识图谱架构概述

节点:

我们有 5 种类型的节点:患者、观察、免疫、条件和遭遇类型

Triples 三元组 (样本列表):

患者 -> 遭遇 -> 遭遇

患者 -> 患有 -> 状况

患者 -> 接受 -> 免疫接种

患者 -> 进行测量 -> 观察

块:

块是作为独立对象的文本块。Chunk 与每个 Triple 相关联,并且可以有许多 Chunk 与单个 Triple 相关联。在这种情况下,Chunk 不是 Triple 的非结构化源,而是与每个 Triple 类型相关的摘要和关键点。因此,我们有 6 种类型的块:- 患者人口统计块、病情摘要块、访问块、观察块、免疫块和条件详细信息块。

与三元组关联的不同类型的 chunk 的示例如下所示:

1. Patient -> EncounterType
Triple: (Patient) -[had_encounter]-> (EncounterType)
- Chunk_ids link to specific visit instances
- Example Chunk: "Annual physical on 2024–01–15. BP 120/80, routine screenings
updated."


2. Patient -> Condition
Triple: (Patient) -[has_condition]-> (Condition)
- Chunk_ids link to condition episodes
- Example Chunk: "Diagnosed with hypertension on 2020–03–10. Status: active.
Managed with medication."


3. Patient -> Immunization
Triple: (Patient) -[received]-> (Immunization)
- Chunk_ids link to administration records
- Example Chunk: "Influenza vaccine administered on 2024–01–15."

4. Patient -> Observation
Triple: (Patient) -[has_measurement]-> (Observation)
- Chunk_ids link to measurement instances
- Example Chunk: "2024–01–15: Blood Pressure 120/80 mmHg, Weight 70kg."

链接到创建的图表:

https://main--whyhowai.netlify.app/public/graph/673032011997e08c8849316c

使用这种特定的图形架构,您可以将关键点和摘要与三元组相关联,然后您可以专注于通过非结构化搜索找到正确的三元组,然后以结构化的方式通过链接的块引入所有相关的关键信息。

WhyHow 图谱技术架构

WhyHow 图谱基础设施有一些独特之处,使我们能够以简单的方式构建此架构。

首先,通过向量搜索嵌入和检索 Triples,避免了必须使用 Text2Cypher 来识别节点、关系,然后构建 Cypher 查询才能找到正确的 Triple 的常见检索问题。事实证明,这可以显著提高检索准确性高达 3 倍。

其次,Triples 是 WhyHow 中的独立对象,你可以将 chunk 链接到它。这样,您就可以提取每个 Triple 要检索的关键信息,并在找到正确的 Triple 后将其直接引入上下文。这避免了必须以图形格式表示关键信息和上下文(使架构构建过程复杂化),并在初始非结构化向量搜索后以结构化方式引入信息。这在过程中类似于 LinkedIn 在其系统中应用知识图谱,其中“重现步骤”等关键信息以类似的方式表示和检索,而步骤本身则表示为单独的“块”/“节点”。

第三,WhyHow 接受 JSON 格式的数据,这允许任何提取框架之间直接无缝交互到图形创建中。在本例中,我们使用 Claude 将转录数据初始转换为必要的 JSON 结构,以加载到 WhyHow 中。如果你已经有 JSON 格式的信息,那么将数据加载到 WhyHow 中会容易得多。

第四,由于 Chunks 和检索过程在 WhyHow 系统中的设计方式,您可以轻松包含可用于控制答案构建方式的临时数据。时态数据在知识图谱中一直是一个很难建模的东西(以至于领先的 KG 专家通常建议不要这样做),但它显然是工作流程的重要组成部分。现有的方法甚至尝试对时态数据进行建模,都会尝试将其摄取到知识图谱本身中,然后根据结构化的 Cypher 查询进行检索,而我们的架构则独特地使用 LLM 来帮助筛选时态数据。

将 LLM 的强大功能与知识图谱等结构化知识表示相结合是实现业务成果的重要方式,我们认为这种时态知识图谱架构将有助于通过成功实施时态数据来释放大量商业价值。

使用的数据转换过程

首先,我们使用 Claude 将 transcript 信息转换为基于每个 transcript 的架构对齐信息集。除了来自结构化医疗记录的信息外,成绩单还转换为 JSON 摘要,如下所示:

PATIENT SUMMARY
Name: Joseph Crona
DOB: 2022–08–29
Age: 2 years
Gender: male
MRN: #dbfbaa

CURRENT MEASUREMENTS (as of 2024–08–05)
Height: 84.1cm (50th percentile)
Weight: 14.5kg (52nd percentile)
ALLERGIES
No known allergies

IMMUNIZATIONS
- DTaP: 2022–12–05, 2023–02–06, 2023–03–06, 2024–02–05
- Hepatitis A: 2023–11–06
- Hepatitis B: 2022–08–29, 2022–10–03, 2023–03–06
- Hib: 2022–12–05, 2023–02–06, 2023–11–06
- Influenza: 2023–03–06, 2024–08–05
- MMR: 2023–11–06
- PCV13: 2022–12–05, 2023–02–06, 2023–03–06, 2023–11–06
- Polio: 2022–12–05, 2023–02–06, 2023–03–06
- Rotavirus: 2022–12–05, 2023–02–06
- Varicella: 2023–11–06

MEDICAL HISTORY
- Viral sinusitis (disorder)
Onset: 2023–03–13
Status: resolved
Outcome: Resolved

GROWTH & DEVELOPMENT
- 2023–11–06: Body Weight: 12.7 kg
- 2024–02–05: Body Height: 79 cm
- 2024–02–05: Body Weight: 13.4 kg
- 2024–08–05: Body Height: 84.1 cm
- 2024–08–05: Body Weight: 14.5 kg
Development: Age-appropriate milestones met
- Gross motor: Age appropriate
- Fine motor: Age appropriate
- Language: Age appropriate
- Social: Age appropriate

PREVENTIVE CARE
Well-Child Visits:
- 2024–08–05: 2yo well visit - Development on track
- 2024–02–05: 1yo well visit - Development on track
- 2023–11–06: 1yo well visit - Development on track
- 2023–08–07: 1yo well visit - Development on track
- 2023–05–08: 9mo well visit - Age appropriate exam completed
- 2023–02–06: 6mo well visit - Age appropriate exam completed
- 2022–12–05: 4mo well visit - Age appropriate exam completed
- 2022–10–03: 2mo well visit - Age appropriate exam completed
- 2022–08–29: Newborn visit - Normal exam

FAMILY HISTORY
Mother: Healthy
Father: Healthy
Siblings: None documented

SOCIAL HISTORY
Living Situation: Lives with parents
Development: Meeting age-appropriate milestones
Sleep: Age-appropriate pattern
Nutrition: Age-appropriate diet

其次,我们将此 JSON 架构映射到 WhyHow 架构中,然后将所有信息导入到 WhyHow.AI KG Studio 中。

下面是最终加载到 WhyHow 中的 KG 结构的示例。

Knowledge Graph Structure (Timeless):


Nodes:
1. Patient Node
Structure: {
name: str, # "John Smith"
label: "Patient",
properties: {
gender: str, # FHIR gender
patient_type: str # "adult" | "pediatric"
},
chunk_ids: List[str] # Links to demographic chunks
}


2. EncounterType Node
Structure: {
name: str, # "Well-child visit" | "Annual physical"
label: "EncounterType",
properties: {
category: str, # "preventive" | "acute" | "chronic"
specialty: str # "primary_care" | "pediatrics" | "emergency"
},
chunk_ids: List[str] # Links to visit pattern chunks
}


3. Condition Node
Structure: {
name: str, # "Essential hypertension"
label: "Condition",
properties: {
category: str, # "chronic" | "acute" | "resolved"
system: str, # "respiratory" | "cardiovascular" | etc
is_primary: bool # True if primary diagnosis
},
chunk_ids: List[str] # Links to condition history chunks
}


4. Immunization Node
Structure: {
name: str, # "DTaP" | "MMR"
label: "Immunization",
properties: {
series: str, # "primary" | "booster"
target: str # "tetanus" | "measles" | etc
},
chunk_ids: List[str] # Links to immunization records
}


5. Observation Node
Structure: {
name: str, # "Blood Pressure" | "Height"
label: "Observation",
properties: {
category: str, # "vital" | "lab" | "growth"
unit: str # "mmHg" | "cm" | etc
},
chunk_ids: List[str] # Links to measurement records
}


Relations:
1. Patient -> EncounterType
Triple: (Patient) -[had_encounter]-> (EncounterType)
- Chunk_ids link to specific visit instances


2. Patient -> Condition
Triple: (Patient) -[has_condition]-> (Condition)
- Chunk_ids link to condition episodes


3. Patient -> Immunization
Triple: (Patient) -[received]-> (Immunization)
- Chunk_ids link to administration records


4. Patient -> Observation
Triple: (Patient) -[has_measurement]-> (Observation)
- Chunk_ids link to measurement instances


5. Condition -> EncounterType
Triple: (Condition) -[managed_in]-> (EncounterType)
- Links conditions to typical encounter types


6. Immunization -> EncounterType
Triple: (Immunization) -[given_during]-> (EncounterType)
- Links vaccines to visit types

然后,我们运行一个自定义提示,在每次自然语言查询后,将从知识图谱中检索到的三元组置于上下文中。

有了这个架构,一件有趣的事情是,我们现在可以轻松地继续将有关患者就诊、患者治疗和病情的信息添加到知识图谱中,因为只需向现有的三元组添加额外的块即可。如果 Patient 患有新疾病,则会将其他 Condition 节点添加到 Patient 节点。

此过程花费了 25 个开发小时,可分为以下几部分:

  • 2 小时 (8%) 用于查找和理解数据(探索性数据分析)

  • 18 小时 (72%) 用于迭代架构,并弄清楚图中应该包含哪些节点,哪些节点应该连接到什么,应该存在哪些块,它应该如何连接到各种三元组,用一组问题测试检索到的答案,并相应地迭代。

  • 2 小时 (8%) 用于编写代码以创建要加载的三元组集

  • 编写验证检查和输出检查花费了 3 小时 (12%) 来捕获任何错误

针对医疗记录知识图谱的问答

使用 WhyHow 自然语言图查询引擎从知识图谱中检索相关上下文后,用于构建答案的提示

    You are an AI assistant specializing in medical records analysis. 
Use the following information to answer the user's question.
The information is derived from a knowledge graph of patient medical records.

Relevant Nodes (these represent patients, encounters, and conditions):
{
node_context}

Relevant Relationships (these show connections between patients,
encounters, and conditions)
:
{
triple_context}

Relevant Medical Records:
{chunk_context}

Answer the question using the following steps:
1. Analyze the medical information, focusing on the specific topic of the
question
2. Identify key medical events, conditions, and temporal relationships
3. Formulate a clear and concise answer that directly addresses the
question
4. If there are any uncertainties in the medical information, mention
them explicitly
5. If you don't have enough information to answer confidently, say so
6. Consider temporal relationships between medical events
7. It's currently {datetime.now().strftime("%Y-%m-%d")}. Consider this
when interpreting dates
8. Maintain patient privacy by using only first names or initials when
necessary

Question: {question}

商业价值:患者治疗分析

历史患者分析

问题:Joseph Crona 在第一年接受了哪些类型的儿童健康检查和免疫接种?

答:

To determine the types of well-child visits and immunizations Joseph Crona 
received in his first year, we need to consider the relevant medical events
and their timing. Joseph was born in 2022, and it is currently November 9, 2024.
Therefore, his first year would span from 2022 to 2023.

**Well-Child Visits:**
- Joseph had an encounter categorized as a "Well child visit," which falls
under preventive care in pediatrics. This type of visit is typically part of
routine check-ups for children, especially in their first year.

**Immunizations:**
During his first year, Joseph received the following immunizations, all of
which are common for pediatric patients:
1. **Hep B, adolescent or pediatric** - Completed
2. **Hib (PRP-OMP)** - Completed
3. **Rotavirus, monovalent** - Completed
4. **IPV (Inactivated Poliovirus Vaccine)** - Completed
5. **DTaP (Diphtheria, Tetanus, and Pertussis)** - Completed
6. **Pneumococcal conjugate PCV 13** - Completed

These immunizations are typically administered during the first year of life
as part of the standard immunization schedule for infants.

In summary, Joseph Crona received a well-child visit and completed a series of
standard pediatric immunizations during his first year.

问题:对于中耳炎患者,记录的就诊和观察顺序是怎样的?

答:

To address the question regarding the sequence of encounters and observations 
for patients with otitis media, we will focus on the medical records of Miles
Dickinson, as he is the patient identified with this condition.

1. **Condition Onset:**
- Miles Dickinson was diagnosed with otitis media on June 10, 2024. The
condition is currently resolved.

2. **Encounters:**
- Miles had two types of encounters recorded:
- **Well Child Visit:** This is categorized as a preventive encounter in
pediatrics.
- **Encounter for Symptom:** This is categorized as a follow-up encounter in
primary care.

3. **Observations:**
- Various observations were recorded for Miles, including:
- **Pain Severity:** Recorded multiple times with scores ranging from 0 to 4.
- **Body Weight:** Recorded with values ranging from 3.2 kg to 8.2 kg.
- **Head Occipital-frontal Circumference and Percentile:** Recorded with
specific values in cm and percentiles.
- **Heart Rate:** Recorded with values ranging from 61 to 100 beats per minute.
- **Respiratory Rate:** Recorded with values ranging from 12 to 16 breaths per
minute.

4. **Temporal Sequence:**
- The otitis media diagnosis occurred on June 10, 2024. Observations and
encounters before this date may have been related to other conditions or
routine check-ups. Observations and encounters after this date could be
related to follow-up care for otitis media or other health assessments.

5. **Conclusion:**
- The sequence of encounters and observations for Miles Dickinson includes
preventive and follow-up encounters, along with various health observations.
The specific sequence related to otitis media is not detailed in terms of exact
dates for each observation or encounter, but the diagnosis date provides a
temporal marker for understanding subsequent medical activities.

If more detailed temporal data on each observation and encounter were
available, a more precise sequence could be established. However, based on
the current information, this is the overview of the sequence related to
otitis media for Miles Dickinson.

历史多患者分析

问题:有多少患者被诊断出患有急性支气管炎,典型的发病模式是什么?

答:

Based on the provided medical records, there are two patients diagnosed with 
acute bronchitis: Maximo Swaniawski and Bennie Herman.

### Onset Patterns:
- **Maximo Swaniawski**:
- Onset dates for acute bronchitis:
- October 24, 2014
- November 7, 2015
- November 29, 2018
- **Bennie Herman**:
- Onset dates for acute bronchitis:
- October 24, 2014
- November 7, 2015
- November 29, 2018

### Typical Onset Patterns:
The onset dates for both patients are identical, suggesting a pattern where
acute bronchitis episodes occurred in late October or November. This could
indicate a seasonal pattern, possibly related to colder weather or other
seasonal factors that might contribute to respiratory conditions.

Both patients have had their conditions resolved, indicating that the episodes
of acute bronchitis were acute and not chronic.

问题:哪些患者同时患有病毒性鼻窦炎和支气管炎,他们最早的发病日期是什么时候?

答:

To determine which patients had both viral sinusitis and bronchitis, 
we need to analyze the conditions associated with each patient and their
onset dates.

1. **Maximo Swaniawski**:
- Conditions: Acute bronchitis
- Earliest onset of bronchitis: 2014–10–24
2. **Cassandra Monahan**:
- Conditions: Viral sinusitis
- Earliest onset of sinusitis: 2014–12–12
3. **Bennie Herman**:
- Conditions: Acute bronchitis
- Earliest onset of bronchitis: 2014–10–24
4. **Deangelo Price**:
- Conditions: Viral sinusitis
- Earliest onset of sinusitis: 2014–12–12
5. **Joseph Crona**:
- Conditions: Viral sinusitis
- Earliest onset of sinusitis: 2014–12–12

From the information provided, none of the patients have both viral sinusitis
and bronchitis. Each patient has either one condition or the other, but not
both. Therefore, there are no patients with both conditions, and consequently,
no earliest onset dates for both conditions in a single patient.

与仅 Vector RAG 系统相比的优势

针对多个患者的卓越分析:

  • 在关于哪些患者同时患有支气管炎和病毒性鼻窦炎的问题中,我们必须能够知道患者是否被记录为患有或没有特定疾病。由于向量检索是关于识别相关块的,因此它无法检测患者是否绝对没有特定疾病。这意味着需要一个中间数据聚合,可以明确地表明患者 X 的名称没有 'diabetes' 节点。

多转录本分析和 RAG:

  • 在关于 Miles 和他在过去一年中的多次就诊的问题中,我们可以看到条件是跨多次就诊和多个成绩单的。这意味着需要一个中间数据聚合来映射 Patient 及其随时间推移产生的累积访问和观察。


参考文献:

  1. MDAgents:一个用于医疗决策的自适应大模型多智能体 - MIT&Google等

  2. 通过“AI科学家”智能体赋能生物医学科学发现 - 哈佛医学院等

  3. 斯坦福&哈佛医学院 - MMedAgent,一个用于医疗领域的多模态医疗AI智能体

  4. 喜讯|柯基数据中标两个“大模型+医学”国自然面上项目
  5. 哈佛医学院&辉瑞推出基于知识图谱的复杂医学问答智能体MedAI

  6. 通过知识图谱自动生成和丰富加速医学知识发现 - 哈佛大学等

  7. 医疗保健和医学领域的大模型综述 - 斯坦福&加州大学
  8. 医学GraphRAG:通过知识图谱检索增强实现安全医疗大语言模型 - 牛津大学最新论文
  9. 消除幻觉的知识图谱增强医学大模型 - "Nature"NPJ数字医学杂志
  10. Almanac: 一种用于临床医学的检索增强RAG大语言模型(2023vs2024版)
  11. “大模型+知识图谱”双轮驱动的医药数智化转型新范式-OpenKG TOC专家谈

  12. 医学AI专家Anthropic CEO万字长文预测人工智能将消除癌症、人类寿命翻倍,世界变得更美好

  13. 医疗保健和医学领域的大模型综述 - 斯坦福&加州大学

  14. OpenAI o1模型的医学初步研究:我们离人工智能医生更近了吗?

  15. 哈佛医学院将生成式人工智能纳入课程和临床实践,以培训下一代医生


53AI,企业落地应用大模型首选服务商

产品:大模型应用平台+智能体定制开发+落地咨询服务

承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业

联系我们

售前咨询
186 6662 7370
预约演示
185 8882 0121

微信扫码

与创始人交个朋友

回到顶部

 
扫码咨询