向量检索里Top k是说在检索时找到k个最相似的文本块,score阈值是相似度来过滤文本块的;Rerank是做文本块的排序,dify种有好几个比如cohere的;










我尝试使用了jinareader、Web Scraper,对比下来jinareader和Web Scraper的抓取有的信息够,有的信息抓了很多不相关的



这块尝试了line、bar和pie,本地项目使用的是pie 饼图



输入data: 2,3,4; 20,18,32          #组间以英文分号隔开,组内是英文逗号

输入axis: tag1;tag2;tag3;   #以英文分号隔开







1、Workflow1 DSL给网站做分类

app:  description: websiteCat  icon: "\U0001F916"  icon_background: '#FFEAD5'  mode: workflow  name: websiteCatworkflow:  features:    file_upload:      image:        enabled: false        number_limits: 3        transfer_methods:        - local_file        - remote_url    opening_statement: ''    retriever_resource:      enabled: false    sensitive_word_avoidance:      enabled: false    speech_to_text:      enabled: false    suggested_questions: []    suggested_questions_after_answer:      enabled: false    text_to_speech:      enabled: false      language: ''      voice: ''  graph:    edges:    - data:        sourceType: start        targetType: tool      id: 1717578663813-1717579685305      source: '1717578663813'      sourceHandle: source      target: '1717579685305'      targetHandle: target      type: custom    - data:        sourceType: tool        targetType: llm      id: 1717579685305-1717579853399      source: '1717579685305'      sourceHandle: source      target: '1717579853399'      targetHandle: target      type: custom    - data:        sourceType: llm        targetType: end      id: 1717579967241-1717579117964      source: '1717579967241'      sourceHandle: source      target: '1717579117964'      targetHandle: target      type: custom    - data:        sourceType: llm        targetType: knowledge-retrieval      id: 1717579853399-1717580066887      source: '1717579853399'      sourceHandle: source      target: '1717580066887'      targetHandle: target      type: custom    - data:        sourceType: knowledge-retrieval        targetType: llm      id: 1717580066887-1717579967241      source: '1717580066887'      sourceHandle: source      target: '1717579967241'      targetHandle: target      type: custom    nodes:    - data:        desc: ''        selected: false        title: Start        type: start        variables:        - label: urls          max_length: 256          options: []          required: true          type: text-input          variable: urls      height: 90      id: '1717578663813'      position:        x: 112.4877286321854        y: 197.44333138123648      positionAbsolute:        x: 112.4877286321854        y: 197.44333138123648      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    - data:        desc: ''        outputs:        - value_selector:          - '1717579967241'          - text          variable: text        selected: false        title: End        type: end      height: 90      id: '1717579117964'      position:        x: 799.2832195913106        y: -74.46777660074163      positionAbsolute:        x: 799.2832195913106        y: -74.46777660074163      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    - data:        desc: "\u6293\u53D6\u7F51\u7AD9\u4FE1\u606F"        provider_id: google        provider_name: google        provider_type: builtin        selected: true        title: GoogleSearch        tool_configurations:          result_type: link        tool_label: GoogleSearch        tool_name: google_search        tool_parameters:          query:            type: mixed            value: '{{#1717578663813.urls#}}'        type: tool      height: 120      id: '1717579685305'      position:        x: 198.59871386335033        y: -74.46777660074163      positionAbsolute:        x: 198.59871386335033        y: -74.46777660074163      selected: true      sourcePosition: right      targetPosition: left      type: custom      width: 244    - data:        context:          enabled: true          variable_selector:          - '1717579685305'          - text        desc: "\u603B\u7ED3\u7F51\u7AD9\u5173\u952E\u8BCD"        model:          completion_params:            temperature: 0          mode: chat          name: moonshot-v1-8k          provider: moonshot        prompt_template:        - id: 4aa3aab4-11f5-41dc-a382-dd9831658f78          role: system          text: "Please extract the corresponding title or description keywords based\            \ on the content of {{#1717579685305.text#}}\n\n1\uFF09Note. If this website\            \ sells many different categories of things, it can be classified as shopping\n\            2) Extract no more than 3 keywords, and the output format is:\n xx, xx,\            \ xx\n"        - id: dcd22bf7-6d91-4e6e-a843-ed266659d8d7          role: user          text: /        selected: false        title: LLM        type: llm        variables: []        vision:          enabled: false      height: 128      id: '1717579853399'      position:        x: 463.64815768773747        y: 197.44333138123648      positionAbsolute:        x: 463.64815768773747        y: 197.44333138123648      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    - data:        context:          enabled: true          variable_selector:          - '1717580066887'          - result        desc: "\u6DA6\u8272\u7ED9\u51FA\u6700\u7EC8\u7684\u7F51\u7AD9\u6807\u7B7E"        model:          completion_params:            temperature: 0          mode: chat          name: moonshot-v1-8k          provider: moonshot        prompt_template:        - id: 0bb1c5a6-930e-4f74-be63-06474a77fe48          role: system          text: 'Please combine the website''s keywords{{#1717579853399.text#}} with            the classification labels/proofreading retrieved from the knowledge base            to provide the most suitable classification labels retrieved from the            knowledge base{{#context#}},            Attention:            1) This tag should come from a tag retrieved from the knowledge base.            Please select only the tag that best expresses this website and output            only one category tag            2) If you don''t know which category it is, just output Uncategorized            3) Output only needs to output the website category  without detailed            explanation,without ID            output  format:            xx'        selected: false        title: LLM 2        type: llm        variables: []        vision:          enabled: false      height: 128      id: '1717579967241'      position:        x: 799.2832195913106        y: 197.44333138123648      positionAbsolute:        x: 799.2832195913106        y: 197.44333138123648      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    - data:        dataset_ids:        - 2307d992-5284-4c0b-b270-8436f497bfea        desc: "\u68C0\u7D22\u5DF2\u6709\u7684\u7F51\u7AD9\u5206\u7C7B\u6807\u7B7E"        multiple_retrieval_config:          reranking_model:            model: rerank-english-v2.0            provider: cohere          score_threshold: 0.8          top_k: 5        query_variable_selector:        - '1717579853399'        - text        retrieval_mode: single        selected: false        single_retrieval_config:          model:            completion_params: {}            mode: chat            name: moonshot-v1-8k            provider: moonshot        title: Knowledge Retrieval        type: knowledge-retrieval      height: 122      id: '1717580066887'      position:        x: 503.45190546262575        y: -74.46777660074163      positionAbsolute:        x: 503.45190546262575        y: -74.46777660074163      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    viewport:      x: 40.67810562789691      y: 254.34055609621913      zoom: 0.6607535491528895


app:  description: Batch&Visusalization  icon: "\U0001F916"  icon_background: '#FFEAD5'  mode: workflow  name: Batch&Visusalizationworkflow:  features:    file_upload:      image:        enabled: false        number_limits: 3        transfer_methods:        - local_file        - remote_url    opening_statement: ''    retriever_resource:      enabled: false    sensitive_word_avoidance:      enabled: false    speech_to_text:      enabled: false    suggested_questions: []    suggested_questions_after_answer:      enabled: false    text_to_speech:      enabled: false      language: ''      voice: ''  graph:    edges:    - data:        sourceType: start        targetType: tool      id: 1717587078340-1717596240088      source: '1717587078340'      sourceHandle: source      target: '1717596240088'      targetHandle: target      type: custom    - data:        sourceType: tool        targetType: end      id: 1717596240088-1717587757492      source: '1717596240088'      sourceHandle: source      target: '1717587757492'      targetHandle: target      type: custom    nodes:    - data:        desc: ''        selected: false        title: Start        type: start        variables:        - label: tags          max_length: 33024          options: []          required: true          type: paragraph          variable: tags        - label: count          max_length: 256          options: []          required: true          type: text-input          variable: count      height: 116      id: '1717587078340'      position:        x: 92.06703286554381        y: 80      positionAbsolute:        x: 92.06703286554381        y: 80      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    - data:        desc: ''        provider_id: chart        provider_name: chart        provider_type: builtin        selected: false        title: Pie Chart        tool_configurations: {}        tool_label: Pie Chart        tool_name: pie_chart        tool_parameters:          categories:            type: mixed            value: '{{#1717587078340.tags#}}'          data:            type: mixed            value: '{{#1717587078340.count#}}'        type: tool      height: 54      id: '1717596240088'      position:        x: 413.7520931625256        y: 162.0976621428633      positionAbsolute:        x: 413.7520931625256        y: 162.0976621428633      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    - data:        desc: ''        outputs:        - value_selector:          - '1717596240088'          - files          variable: text        selected: false        title: End        type: end      height: 90      id: '1717587757492'      position:        x: 663.9837808989655        y: 169.00026104872103      positionAbsolute:        x: 663.9837808989655        y: 169.00026104872103      selected: false      sourcePosition: right      targetPosition: left      type: custom      width: 244    viewport:      x: -76.8851566982803      y: 181.9020021881131      zoom: 1.0705699835378846


import requestsimport csvfrom collections import Counterimport webbrowserimport time# Step 1: Read site.csv filesites = []with open('sites.csv', mode='r', encoding='utf-8') as file:    reader = csv.DictReader(file)    for row in reader:        sites.append(row['site'])# Step 2: Request to get the category for each siteheaders = {    "Authorization": "Bearer xxxxxxxx",    "Content-Type": "application/json"}categories = []for site in sites:    body = {        "inputs": {"urls": site},        "response_mode": "blocking",        "user": "abc-123"    }    response ="http://localhost/v1/workflows/run", headers=headers, json=body)    if response.status_code == 200:        response_data = response.json()        if response_data['data'] and response_data['data']['outputs']:            category = response_data['data']['outputs']['text']            categories.append(category)        else:            print(f"No 'outputs' in response for site: {site}")            print(response_data)    else:        print(f"Failed to get category for site: {site}")        print(response.text)    #time.sleep(5)  # Wait for 5 seconds before the next request# Step 3: Count the categories and sort themcategory_counts = Counter(categories)sorted_categories = sorted(category_counts.items(), key=lambda x: x[1], reverse=True)# Form the required stringssorted_sites = ";".join(site for site, count in sorted_categories)sorted_counts = ";".join(str(count) for site, count in sorted_categories)# Step 4: Call the visualization APIheaders_vis = {    "Authorization": "Bearer xxxxxxxxxxxx",    "Content-Type": "application/json"}body_vis = {    "inputs": {"tags": sorted_sites, "count": sorted_counts},    "response_mode": "blocking",    "user": "abc-123"}response_vis ="http://localhost/v1/workflows/run", headers=headers_vis, json=body_vis)if response_vis.status_code == 200:    response_data_vis = response_vis.json()    if response_data_vis['data'] and response_data_vis['data']['outputs']:        image_url = response_data_vis['data']['outputs']['text'][0]['url']        print(f"Visualization URL: {image_url}")    else:        print("No 'outputs' in visualization response.")        print(response_data_vis)else:    print("Failed to generate visualization.")    print(response_vis.text)






