n8n + Notion + Vibe Coding: Your Automated Daily Papers Hub

September 21, 2025

Explore an automated hub for daily research papers, powered by n8n, Notion, and Vibe Coding. Automatically aggregate, organize, and access the latest studies across AI, RAG, and other domains, designed for researchers, engineers, and knowledge enthusiasts.

Design Objectives

Fetch user-specific research papers from arXiv on a daily schedule, process and structure the data, and create or update entries in a Notion database, with support for data delivery and application development.

Goal Breakdown

Goal 1: Implement a daily scheduled workflow to retrieve papers from arXiv in specific domains, perform data cleaning and structuring, store the results in the Notion database, and push updates to users via instant messaging or email.
Goal 2: Based on the Notion database and Vibe Coding, develop and deploy a public website that allows other users to access and query the data

Platform/Tool/Model

Name	Category	Description (EN)	Link
n8n	Platform	Workflow and API service orchestration	n8n.io
Gemini-2.5-Flash	Model	Research paper summarization and processing	deepmind.google
Notion	Database	Knowledge and project database	notion.com
Gmail	Email	Email notifications and delivery	gmail.com
Feishu (LARK)	IM	Instant messaging and notification delivery	larksuite.com
ClawCloud	Deployment	n8n deployment service	run.claw.cloud
Dyad	Coding	a free, local, open-source Vibe coding tool	dyad.sh
GitHub	Code Hosting	Code publishing and version control	github.com
Qwen3-Coder-Plus-2025-07-22	Model	Code generation model	qwenlm.github.io
Vercel	Deployment	Web app deployment platform	vercel.com
Recraft	AI Design	SVG illustration and asset generation	recraft.com
OpenAI GPT-5	Model	Email HTML template Design	chatgpt.com
Claude Sonnet 4	Model	JavaScript code assistant	claude.ai

PART 1：Creating a Personalized Daily Papers with n8n and Notion

Implementation

Paper Topic: RAG (single query keyword)
Update Frequency: Daily updates, with fewer than 20 entries expected per day
Tools:
- Platform: n8n, for end-to-end workflow configuration
- AI Model: Gemini-2.5-Flash, for daily paper summarization and data processing
- Database: Notion, with two tables — Daily Paper Summary and Paper Details
- Message: Feishu (IM bot notifications), Gmail (email notifications)

Workflow Steps

Data Retrieval
Data Extraction: Extract core information and reconstruct the data structure
Data Processing: AI-based text processing to enrich and enhance the content
Data Storage
Data Delivery

n8n workflow code

JSON

{
  "nodes": [
    {
      "parameters": {
        "promptType": "define",
        "text": "={{ $json.data }}",
        "messages": {
          "messageValues": [
            {
              "message": "You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:\n\n1. RAG Relevance and Labeling:\n   - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.\n   - For each data item, add three new fields:\n     - `RAG_TF`: \"T\" if related, \"F\" if not\n     - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty\n     - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty\n\n2. RAG Method Extraction:\n   - Analyze the `summary` and extract the RAG method proposed in the paper.\n   - Store it in the new field `RAG_NAME`.\n\n3. External Link Extraction:\n   - Analyze the `summary` content for `github` or `huggingface` links.\n   - If present, extract the URLs and populate the existing `github` and `huggingface` fields.\n   - If not present, leave them unchanged.\n\nOutput Format: standard JSON\n\nExample:\n\nGiven a data item with the following `summary`:\n\n\"summary\":\"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer\n"
            }
          ]
        },
        "batching": {}
      },
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "typeVersion": 1.7,
      "position": [
        272,
        0
      ],
      "id": "7e9f18f1-edfe-4af6-835b-12fe16a99034",
      "name": "Basic LLM Chain"
    },
    {
      "parameters": {
        "modelName": "=models/gemini-2.5-flash",
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "typeVersion": 1,
      "position": [
        272,
        144
      ],
      "id": "92d37dc1-aaaf-47ec-987a-e6d23c93e055",
      "name": "Google Gemini Chat Model",
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      }
    },
    {
      "parameters": {
        "jsCode": "// Function 节点代码\nconst now = new Date();\nconst yesterday = new Date(now);\nyesterday.setDate(now.getDate() - 2);\n\nconst y = yesterday.getFullYear();\nconst m = String(yesterday.getMonth() + 1).padStart(2, '0');\nconst d = String(yesterday.getDate()).padStart(2, '0');\n\nreturn [\n  {\n    json: {\n      from: `${y}${m}${d}0000`,\n      to: `${y}${m}${d}2359`\n    }\n  }\n];\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        -1664,
        320
      ],
      "id": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
      "name": "submittedDate:T-1"
    },
    {
      "parameters": {
        "conditions": {
          "options": {
            "caseSensitive": true,
            "leftValue": "",
            "typeValidation": "strict",
            "version": 2
          },
          "conditions": [
            {
              "id": "de0a5a7e-67dd-4dd0-8ccc-3406e17bd09c",
              "leftValue": "={{ $json.paperCount }}",
              "rightValue": 0,
              "operator": {
                "type": "number",
                "operation": "notEquals"
              }
            }
          ],
          "combinator": "and"
        },
        "options": {}
      },
      "type": "n8n-nodes-base.if",
      "typeVersion": 2.2,
      "position": [
        -160,
        16
      ],
      "id": "c3685631-8bbd-409a-978a-fbb3e9847115",
      "name": "If"
    },
    {
      "parameters": {
        "rule": {
          "interval": [
            {
              "triggerAtHour": 6
            }
          ]
        }
      },
      "type": "n8n-nodes-base.scheduleTrigger",
      "typeVersion": 1.2,
      "position": [
        -1856,
        320
      ],
      "id": "4dd24343-1872-472d-8d7d-4cd28a9dbabe",
      "name": "Schedule Trigger"
    },
    {
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "caseSensitive": true,
                  "leftValue": "",
                  "typeValidation": "strict",
                  "version": 2
                },
                "conditions": [
                  {
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "feishu",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "id": "7b804f5e-6702-4d4a-99b9-3f06f8eb20d4"
                  }
                ],
                "combinator": "and"
              }
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.switch",
      "typeVersion": 3.2,
      "position": [
        576,
        720
      ],
      "id": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
      "name": "FEISHU"
    },
    {
      "parameters": {
        "method": "POST",
        "url": "=",
        "sendBody": true,
        "bodyParameters": {
          "parameters": [
            {
              "name": "msg_type",
              "value": "={{ $json.msg_type }}"
            },
            {
              "name": "content",
              "value": "={{ $json.content }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        800,
        720
      ],
      "id": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
      "name": "FEISHU POST"
    },
    {
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "caseSensitive": true,
                  "leftValue": "",
                  "typeValidation": "strict",
                  "version": 2
                },
                "conditions": [
                  {
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "gmail",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "id": "3222832c-bbf2-46a2-abd8-2bb14095b7bf"
                  }
                ],
                "combinator": "and"
              }
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.switch",
      "typeVersion": 3.2,
      "position": [
        576,
        544
      ],
      "id": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
      "name": "gmail"
    },
    {
      "parameters": {
        "sendTo": "xing.adam@gmail.com",
        "subject": "={{ $json.subject }}",
        "message": "={{ $json.message }}",
        "options": {}
      },
      "type": "n8n-nodes-base.gmail",
      "typeVersion": 2.1,
      "position": [
        800,
        544
      ],
      "id": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
      "name": "Send a message",
      "webhookId": "cb0a1f30-59e0-4505-af24-db689d9c1f23",
      "credentials": {
        "gmailOAuth2": {
          "id": "WoyY5hj4D93bD2Fp",
          "name": "Gmail account"
        }
      }
    },
    {
      "parameters": {
        "modelId": {
          "__rl": true,
          "value": "models/gemini-2.5-flash-lite",
          "mode": "list",
          "cachedResultName": "models/gemini-2.5-flash-lite"
        },
        "messages": {
          "values": [
            {
              "content": "You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:\n\n1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary\n2. Set the daily date field `Date`: yyyy-mm-dd\n3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.\n4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.\n5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.\n\nExample: If there are papers:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 2,\n  \"SUMMARY_CN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.\",\n  \"SUMMARY_EN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency.\"\n}\n\nIf the number of papers is 0, maintain the JSON structure:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 0,\n  \"SUMMARY_CN\": \"\",\n  \"SUMMARY_EN\": \"\"\n}",
              "role": "model"
            },
            {
              "content": "={{ $json.data }}"
            }
          ]
        },
        "simplify": false,
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.googleGemini",
      "typeVersion": 1,
      "position": [
        -1040,
        320
      ],
      "id": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
      "name": "Message a model",
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      }
    },
    {
      "parameters": {
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "value": "26fa136d-cee4-8092-8b85-cf9e9cbc424f",
          "mode": "list",
          "cachedResultName": "RAG Daily Paper Summary",
          "cachedResultUrl": "https://www.notion.so/26fa136dcee480928b85cf9e9cbc424f"
        },
        "title": "={{ $json.title }}",
        "simple": false,
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "DATE|date",
              "date": "={{ $json.date }}"
            },
            {
              "key": "Number of papers|number",
              "numberValue": "={{ $json.paperCount }}"
            },
            {
              "key": "SUMMARY_EN|rich_text",
              "textContent": "={{ $json.summaryEN }}"
            },
            {
              "key": "SUMMARY_CN|rich_text",
              "textContent": "={{ $json.summaryCN }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.notion",
      "typeVersion": 2.2,
      "position": [
        800,
        320
      ],
      "id": "024c6399-857e-45a3-a15d-8b733e16da67",
      "name": "RAG Daily Paper Summary",
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      }
    },
    {
      "parameters": {
        "jsCode": "const items = $input.all();\nconst response = items[0].json;\n\ntry {\n  // Extract text content from Gemini API response\n  // Note: response is directly an object, not an array\n  const text = response.candidates[0].content.parts[0].text;\n  \n  // Extract JSON content\n  const jsonMatch = text.match(/```json\\n([\\s\\S]*?)\\n```/);\n  const jsonStr = jsonMatch[1];\n  \n  // Parse JSON\n  const data = JSON.parse(jsonStr);\n  \n  // Manually handle duplicate keys - extract from original string\n  const titleMatch = jsonStr.match(/\"Number of papers\":\\s*\"([^\"]+)\"/);\n  const countMatch = jsonStr.match(/\"Number of papers\":\\s*(\\d+)/);\n  \n  // Construct result\n  items[0].json = {\n    title: titleMatch ? titleMatch[1] : '',\n    date: data.Date || '',\n    paperCount: countMatch ? parseInt(countMatch[1]) : 0,\n    summaryCN: data.SUMMARY_CN || '',\n    summaryEN: data.SUMMARY_EN || ''\n  };\n  \n} catch (error) {\n  items[0].json = {\n    error: error.message,\n    originalData: response\n  };\n}\n\nreturn items;\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        -688,
        320
      ],
      "id": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
      "name": "JSON FORMAT"
    },
    {
      "parameters": {
        "content": "## 1. Data Retrieval\n### arXiv API\n\nThe arXiv provides a public API that allows users to query research papers by topic or by predefined categories.\n\n[arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual)\n\n**Key Notes:**\n\n1. **Response Format**: The API returns data as a typical *Atom Response*.\n2. **Timezone & Update Frequency**:  \n   - The arXiv submission process operates on a 24-hour cycle.  \n   - Newly submitted articles become available in the API only at midnight *after* they have been processed.  \n   - Feeds are updated daily at midnight Eastern Standard Time (EST).  \n   - Therefore, a single request per day is sufficient.  \n3. **Request Limits**:  \n   - The maximum number of results per call (`max_results`) is **30,000**,  \n   - Results must be retrieved in slices of at most **2,000** at a time, using the `max_results` and `start` query parameters.  \n4. **Time Format**:  \n   - The expected format is `[YYYYMMDDTTTT+TO+YYYYMMDDTTTT]`,  \n   - `TTTT` is provided in 24-hour time to the minute, in GMT.\n\n### Scheduled Task\n\n- **Execution Frequency**: Daily  \n- **Execution Time**: 6:00 AM  \n- **Time Parameter Handling (JS)**:  \n  According to arXiv’s update rules, the scheduled task should query the **previous day’s (T-1)** `submittedDate` data.\n\n",
        "height": 768,
        "width": 736
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [
        -1984,
        544
      ],
      "id": "f1a331fa-d830-4656-b108-7e18e7430b04",
      "name": "Sticky Note3"
    },
    {
      "parameters": {
        "url": "=https://export.arxiv.org/api/query?search_query=all:RAG+AND+submittedDate:[{{$json[\"from\"]}}+TO+{{$json[\"to\"]}}]",
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "={{ $json.from }}"
            },
            {
              "name": "={{ $json.to }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        -1440,
        320
      ],
      "id": "ae855e91-2363-4b97-8933-761934b269fe",
      "name": "arXiv API"
    },
    {
      "parameters": {
        "jsCode": "// Get current date\nconst now = new Date();\nconst year = now.getFullYear();\nconst month = String(now.getMonth() + 1).padStart(2, '0');\nconst day = String(now.getDate()).padStart(2, '0');\nconst date = `${year}-${month}-${day}`;\n\n// Get input data\nconst inputData = $input.first().json;\n\n// Generate message content\nconst messageContent = inputData.SUMMARY_CN;\n\n// Gmail message body\nconst gmailMessage = {\n    subject: inputData.title || `Daily Paper Summary - ${date}`,\n    message: `<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n    <title> RAG Daily Paper Summary - ${date}</title>\n    <style type=\"text/css\">\n        /* Gmail safe styles */\n        body {\n            font-family: Arial, sans-serif;\n            line-height: 1.4;\n            margin: 0;\n            padding: 0;\n            background-color: #f9f9f9;\n            color: #333333;\n        }\n        \n        table {\n            border-collapse: collapse;\n            mso-table-lspace: 0pt;\n            mso-table-rspace: 0pt;\n        }\n        \n        .email-wrapper {\n            width: 100%;\n            background-color: #f9f9f9;\n            padding: 40px 20px;\n        }\n        \n        .email-container {\n            width: 100%;\n            max-width: 600px;\n            margin: 0 auto;\n            background-color: #ffffff;\n            border-radius: 8px;\n            box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);\n        }\n        \n        .header {\n            background-color: #2563eb;\n            padding: 24px;\n            text-align: center;\n            border-radius: 8px 8px 0 0;\n        }\n        \n        .header h1 {\n            margin: 0 0 8px 0;\n            font-size: 24px;\n            font-weight: 600;\n            color: #ffffff;\n        }\n        \n        .date {\n            font-size: 14px;\n            color: #ffffff;\n            opacity: 0.9;\n        }\n        \n        .stats {\n            background-color: #f1f5f9;\n            padding: 16px 24px;\n            font-size: 14px;\n            color: #64748b;\n        }\n        \n        .content {\n            padding: 32px 24px 40px 24px;\n        }\n        \n        .section {\n            margin-bottom: 24px;\n        }\n        \n        .section-title {\n            font-size: 16px;\n            font-weight: 600;\n            color: #1e293b;\n            margin-bottom: 12px;\n            padding-bottom: 8px;\n            border-bottom: 1px solid #e2e8f0;\n        }\n        \n        .flag {\n            display: inline-block;\n            width: 20px;\n            height: 14px;\n            margin-right: 8px;\n            border-radius: 2px;\n            vertical-align: middle;\n        }\n        \n        .flag-cn {\n            background-color: #de2910;\n        }\n        \n        .flag-en {\n            background-color: #012169;\n        }\n        \n        .summary {\n            font-size: 14px;\n            line-height: 1.6;\n            color: #475569;\n            padding: 16px;\n            background-color: #f8fafc;\n            border-radius: 6px;\n            border-left: 3px solid #2563eb;\n        }\n        \n        .divider {\n            height: 1px;\n            background-color: #e2e8f0;\n            margin: 20px 0;\n            border: none;\n        }\n        \n        /* Mobile responsive */\n        @media screen and (max-width: 600px) {\n            .email-wrapper {\n                padding: 20px 10px !important;\n            }\n            \n            .header, .stats {\n                padding: 20px 16px !important;\n            }\n            \n            .content {\n                padding: 24px 16px 32px 16px !important;\n            }\n            \n            .email-container {\n                border-radius: 0;\n            }\n        }\n        \n        /* Gmail specific fixes */\n        .gmail-fix {\n            display: none;\n        }\n        \n        /* Outlook specific fixes */\n        .ExternalClass {\n            width: 100%;\n        }\n        \n        .ExternalClass,\n        .ExternalClass p,\n        .ExternalClass span,\n        .ExternalClass font,\n        .ExternalClass td,\n        .ExternalClass div {\n            line-height: 100%;\n        }\n    </style>\n    <!--[if mso]>\n    <style type=\"text/css\">\n        .email-container {\n            width: 600px !important;\n        }\n    </style>\n    <![endif]-->\n</head>\n<body>\n    <table role=\"presentation\" class=\"email-wrapper\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n        <tr>\n            <td align=\"center\">\n                <table role=\"presentation\" class=\"email-container\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n                    <!-- Header -->\n                    <tr>\n                        <td class=\"header\">\n                            <h1>RAG Daily Papers</h1>\n                            <div class=\"date\">${inputData.Date || date}</div>\n                        </td>\n                    </tr>\n                    \n                    <!-- Stats -->\n                    <tr>\n                        <td class=\"stats\">\n                            <strong>${inputData[\"Number of papers\"] || inputData.paperCount || 0} papers</strong> reviewed today\n                        </td>\n                    </tr>\n                    \n                    <!-- Content -->\n                    <tr>\n                        <td class=\"content\">\n                            <!-- Chinese Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                  🇨🇳 Chinese\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}\n                                </div>\n                            </div>\n                            \n                            <!-- Divider -->\n                            <hr class=\"divider\">\n                            \n                            <!-- English Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                    🇺🇸 English\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}\n                                </div>\n                            </div>\n                        </td>\n                    </tr>\n                </table>\n            </td>\n        </tr>\n    </table>\n</body>\n</html>`\n};\n\n// Feishu message body\nconst feishuMessage = {\n    msg_type: \"text\",\n    content: {\n        text: `Today ${$input.first().json.date} ${$input.first().json.paperCount}  papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`\n    }\n};\n\n// n8n output format\nreturn [\n    { json: { type: \"gmail\", ...gmailMessage } },\n    { json: { type: \"feishu\", ...feishuMessage } }\n];\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        -128,
        528
      ],
      "id": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
      "name": "Message Construction"
    },
    {
      "parameters": {
        "content": "## 5. Message Push\n\nSet up two channels for message delivery: **EMAIL** and **IM**, and define the message format and content.\n\n### Email: Gmail\n\n**GMAIL OAuth 2.0 – Official Documentation**  \n[Configure your OAuth consent screen](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#configure-your-oauth-consent-screen)\n\n**Steps:**\n- Enable Gmail API  \n- Create OAuth consent screen  \n- Create OAuth client credentials  \n- Audience: Add **Test users** under Testing status  \n\n**Message format**: HTML  \n(Model: OpenAI GPT — used to design an HTML email template)\n\n### IM: Feishu (LARK)\n\n**Bots in groups**  \n[Use bots in groups](https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups)\n",
        "height": 576,
        "width": 1152
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -176,
        896
      ],
      "typeVersion": 1,
      "id": "2582c7df-9b15-4473-bc47-91cf6f7304e0",
      "name": "Sticky Note"
    },
    {
      "parameters": {
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "value": "26ba136d-cee4-8029-ad3d-e0e8ac64993f",
          "mode": "list",
          "cachedResultName": "RAG DAILY",
          "cachedResultUrl": "https://www.notion.so/26ba136dcee48029ad3de0e8ac64993f"
        },
        "title": "={{ $json.title }}",
        "simple": false,
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "published|date",
              "date": "={{ $json.published }}"
            },
            {
              "key": "summary|rich_text",
              "textContent": "={{ $json.summary }}"
            },
            {
              "key": "id|rich_text",
              "textContent": "={{ $json.id }}"
            },
            {
              "key": "html_url|url",
              "urlValue": "={{ $json.html_url }}"
            },
            {
              "key": "pdf_url|url",
              "urlValue": "={{ $json.pdf_url }}"
            },
            {
              "key": "primary_category|rich_text",
              "textContent": "={{ $json.primary_category }}"
            },
            {
              "key": "github|url",
              "ignoreIfEmpty": true,
              "urlValue": "={{ $json.github }}"
            },
            {
              "key": "huggingface|url",
              "ignoreIfEmpty": true,
              "urlValue": "={{ $json.huggingface }}"
            },
            {
              "key": "RAG_TF|rich_text",
              "textContent": "={{ $json.RAG_TF }}"
            },
            {
              "key": "RAG_REASON|rich_text",
              "textContent": "={{ $json.RAG_REASON }}"
            },
            {
              "key": "RAG_Category|rich_text",
              "textContent": "={{ $json.RAG_Category }}"
            },
            {
              "key": "RAG_NAME|rich_text",
              "textContent": "={{ $json.RAG_NAME }}"
            },
            {
              "key": "updated|date",
              "date": "={{ $json.updated }}"
            },
            {
              "key": "author|multi_select",
              "multiSelectValue": "={{ $json.authors }}"
            },
            {
              "key": "category|multi_select",
              "multiSelectValue": "={{ $json.categories }}"
            }
          ]
        },
        "blockUi": {
          "blockValues": [
            {
              "textContent": "={{ $json.summary }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.notion",
      "typeVersion": 2.2,
      "position": [
        800,
        0
      ],
      "id": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
      "name": "RAG Daily papers",
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      }
    },
    {
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        112,
        0
      ],
      "id": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
      "name": "Data Extraction"
    },
    {
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        592,
        0
      ],
      "id": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
      "name": "JSON Format"
    },
    {
      "parameters": {
        "content": "## 3. Data Processing\n\nAnalyze and summarize paper data using AI, then standardize output as JSON.\n\n### Single Paper Basic Information Analysis and Enhancement  \n### Daily Paper Summary and Multilingual Translation",
        "height": 192,
        "width": 656
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -160,
        -224
      ],
      "typeVersion": 1,
      "id": "8fbefc67-e9f7-4597-b935-d5f5895cf93c",
      "name": "Sticky Note1"
    },
    {
      "parameters": {
        "content": "## 4. Data Storage: Notion Database\n\n- Create a corresponding database in Notion with the same predefined field names.  \n- In Notion, create an integration under **Integrations** and grant access to the database. Obtain the corresponding **Secret Key**.  \n- Use the Notion **\"Create a database page\"** node to configure the field mapping and store the data.  \n\n**Notes**  \n- **\"Create a database page\"** only adds new entries; data will not be updated.  \n- The `updated` and `published` timestamps of arXiv papers are in **UTC**.  \n- Notion **single-select** and **multi-select** fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.  \n- Notion does not accept `null` values, which causes a **400 error**.  \n",
        "height": 368,
        "width": 624
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1024,
        16
      ],
      "typeVersion": 1,
      "id": "884f2c40-4628-4376-a040-709e2db34c48",
      "name": "Sticky Note2"
    },
    {
      "parameters": {
        "content": "## 2. **Data Extraction**\n\n### Data Cleaning Rules (Convert to Standard JSON)\n\n1. **Remove Header**  \n   - Keep only the `<entry></entry>` blocks representing paper items.\n\n2. **Single Item**  \n   - Each `<entry></entry>` represents a single item.\n\n3. **Field Processing Rules**  \n   - `<id></id>` ➡️ `id`  \n     Extract content.  \n     Example: `<id>http://arxiv.org/abs/2409.06062v1</id>` → `http://arxiv.org/abs/2409.06062v1`  \n   - `<updated></updated>` ➡️ `updated`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<published></published>` ➡️ `published`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<title></title>` ➡️ `title`  \n     Extract text content  \n   - `<summary></summary>` ➡️ `summary`  \n     Keep text, remove line breaks  \n   - `<author></author>` ➡️ `author`  \n     Combine all authors into an array  \n     Example: `[ \"Ernest Pusateri\", \"Anmol Walia\" ]` (for Notion multi-select field)  \n   - `<arxiv:comment></arxiv:comment>` ➡️ Ignore / discard  \n   - `<link type=\"text/html\">` ➡️ `html_url`  \n     Extract URL  \n   - `<link type=\"application/pdf\">` ➡️ `pdf_url`  \n     Extract URL  \n   - `<arxiv:primary_category term=\"cs.CL\">` ➡️ `primary_category`  \n     Extract `term` value  \n   - `<category>` ➡️ `category`  \n     Merge all `<category>` values into an array  \n     Example: `[ \"eess.AS\", \"cs.SD\" ]` (for Notion multi-select field)  \n\n4. **Add Empty Fields**  \n   - `github`  \n   - `huggingface`\n",
        "height": 912,
        "width": 624
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1088,
        544
      ],
      "typeVersion": 1,
      "id": "4991129d-9406-4c52-bd8f-87e2721c4a6f",
      "name": "Sticky Note4"
    }
  ],
  "connections": {
    "Basic LLM Chain": {
      "main": [
        [
          {
            "node": "JSON Format",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "submittedDate:T-1": {
      "main": [
        [
          {
            "node": "arXiv API",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "If": {
      "main": [
        [
          {
            "node": "Data Extraction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "submittedDate:T-1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "FEISHU": {
      "main": [
        [
          {
            "node": "FEISHU POST",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "gmail": {
      "main": [
        [
          {
            "node": "Send a message",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message a model": {
      "main": [
        [
          {
            "node": "JSON FORMAT",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON FORMAT": {
      "main": [
        [
          {
            "node": "RAG Daily Paper Summary",
            "type": "main",
            "index": 0
          },
          {
            "node": "If",
            "type": "main",
            "index": 0
          },
          {
            "node": "Message Construction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "arXiv API": {
      "main": [
        [
          {
            "node": "Message a model",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message Construction": {
      "main": [
        [
          {
            "node": "gmail",
            "type": "main",
            "index": 0
          },
          {
            "node": "FEISHU",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Data Extraction": {
      "main": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON Format": {
      "main": [
        [
          {
            "node": "RAG Daily papers",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "pinData": {},
  "meta": {
    "instanceId": "a6011e4876c6b1225fa48dae1dbfa92e1932a633b3186bbb7bfd5c9e6ad2d878"
  }
}

{
  "nodes": [
    {
      "parameters": {
        "promptType": "define",
        "text": "={{ $json.data }}",
        "messages": {
          "messageValues": [
            {
              "message": "You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:\n\n1. RAG Relevance and Labeling:\n   - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.\n   - For each data item, add three new fields:\n     - `RAG_TF`: \"T\" if related, \"F\" if not\n     - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty\n     - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty\n\n2. RAG Method Extraction:\n   - Analyze the `summary` and extract the RAG method proposed in the paper.\n   - Store it in the new field `RAG_NAME`.\n\n3. External Link Extraction:\n   - Analyze the `summary` content for `github` or `huggingface` links.\n   - If present, extract the URLs and populate the existing `github` and `huggingface` fields.\n   - If not present, leave them unchanged.\n\nOutput Format: standard JSON\n\nExample:\n\nGiven a data item with the following `summary`:\n\n\"summary\":\"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer\n"
            }
          ]
        },
        "batching": {}
      },
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "typeVersion": 1.7,
      "position": [
        272,
        0
      ],
      "id": "7e9f18f1-edfe-4af6-835b-12fe16a99034",
      "name": "Basic LLM Chain"
    },
    {
      "parameters": {
        "modelName": "=models/gemini-2.5-flash",
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "typeVersion": 1,
      "position": [
        272,
        144
      ],
      "id": "92d37dc1-aaaf-47ec-987a-e6d23c93e055",
      "name": "Google Gemini Chat Model",
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      }
    },
    {
      "parameters": {
        "jsCode": "// Function 节点代码\nconst now = new Date();\nconst yesterday = new Date(now);\nyesterday.setDate(now.getDate() - 2);\n\nconst y = yesterday.getFullYear();\nconst m = String(yesterday.getMonth() + 1).padStart(2, '0');\nconst d = String(yesterday.getDate()).padStart(2, '0');\n\nreturn [\n  {\n    json: {\n      from: `${y}${m}${d}0000`,\n      to: `${y}${m}${d}2359`\n    }\n  }\n];\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        -1664,
        320
      ],
      "id": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
      "name": "submittedDate:T-1"
    },
    {
      "parameters": {
        "conditions": {
          "options": {
            "caseSensitive": true,
            "leftValue": "",
            "typeValidation": "strict",
            "version": 2
          },
          "conditions": [
            {
              "id": "de0a5a7e-67dd-4dd0-8ccc-3406e17bd09c",
              "leftValue": "={{ $json.paperCount }}",
              "rightValue": 0,
              "operator": {
                "type": "number",
                "operation": "notEquals"
              }
            }
          ],
          "combinator": "and"
        },
        "options": {}
      },
      "type": "n8n-nodes-base.if",
      "typeVersion": 2.2,
      "position": [
        -160,
        16
      ],
      "id": "c3685631-8bbd-409a-978a-fbb3e9847115",
      "name": "If"
    },
    {
      "parameters": {
        "rule": {
          "interval": [
            {
              "triggerAtHour": 6
            }
          ]
        }
      },
      "type": "n8n-nodes-base.scheduleTrigger",
      "typeVersion": 1.2,
      "position": [
        -1856,
        320
      ],
      "id": "4dd24343-1872-472d-8d7d-4cd28a9dbabe",
      "name": "Schedule Trigger"
    },
    {
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "caseSensitive": true,
                  "leftValue": "",
                  "typeValidation": "strict",
                  "version": 2
                },
                "conditions": [
                  {
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "feishu",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "id": "7b804f5e-6702-4d4a-99b9-3f06f8eb20d4"
                  }
                ],
                "combinator": "and"
              }
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.switch",
      "typeVersion": 3.2,
      "position": [
        576,
        720
      ],
      "id": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
      "name": "FEISHU"
    },
    {
      "parameters": {
        "method": "POST",
        "url": "=",
        "sendBody": true,
        "bodyParameters": {
          "parameters": [
            {
              "name": "msg_type",
              "value": "={{ $json.msg_type }}"
            },
            {
              "name": "content",
              "value": "={{ $json.content }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        800,
        720
      ],
      "id": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
      "name": "FEISHU POST"
    },
    {
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "caseSensitive": true,
                  "leftValue": "",
                  "typeValidation": "strict",
                  "version": 2
                },
                "conditions": [
                  {
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "gmail",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "id": "3222832c-bbf2-46a2-abd8-2bb14095b7bf"
                  }
                ],
                "combinator": "and"
              }
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.switch",
      "typeVersion": 3.2,
      "position": [
        576,
        544
      ],
      "id": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
      "name": "gmail"
    },
    {
      "parameters": {
        "sendTo": "xing.adam@gmail.com",
        "subject": "={{ $json.subject }}",
        "message": "={{ $json.message }}",
        "options": {}
      },
      "type": "n8n-nodes-base.gmail",
      "typeVersion": 2.1,
      "position": [
        800,
        544
      ],
      "id": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
      "name": "Send a message",
      "webhookId": "cb0a1f30-59e0-4505-af24-db689d9c1f23",
      "credentials": {
        "gmailOAuth2": {
          "id": "WoyY5hj4D93bD2Fp",
          "name": "Gmail account"
        }
      }
    },
    {
      "parameters": {
        "modelId": {
          "__rl": true,
          "value": "models/gemini-2.5-flash-lite",
          "mode": "list",
          "cachedResultName": "models/gemini-2.5-flash-lite"
        },
        "messages": {
          "values": [
            {
              "content": "You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:\n\n1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary\n2. Set the daily date field `Date`: yyyy-mm-dd\n3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.\n4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.\n5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.\n\nExample: If there are papers:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 2,\n  \"SUMMARY_CN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.\",\n  \"SUMMARY_EN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency.\"\n}\n\nIf the number of papers is 0, maintain the JSON structure:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 0,\n  \"SUMMARY_CN\": \"\",\n  \"SUMMARY_EN\": \"\"\n}",
              "role": "model"
            },
            {
              "content": "={{ $json.data }}"
            }
          ]
        },
        "simplify": false,
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.googleGemini",
      "typeVersion": 1,
      "position": [
        -1040,
        320
      ],
      "id": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
      "name": "Message a model",
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      }
    },
    {
      "parameters": {
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "value": "26fa136d-cee4-8092-8b85-cf9e9cbc424f",
          "mode": "list",
          "cachedResultName": "RAG Daily Paper Summary",
          "cachedResultUrl": "https://www.notion.so/26fa136dcee480928b85cf9e9cbc424f"
        },
        "title": "={{ $json.title }}",
        "simple": false,
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "DATE|date",
              "date": "={{ $json.date }}"
            },
            {
              "key": "Number of papers|number",
              "numberValue": "={{ $json.paperCount }}"
            },
            {
              "key": "SUMMARY_EN|rich_text",
              "textContent": "={{ $json.summaryEN }}"
            },
            {
              "key": "SUMMARY_CN|rich_text",
              "textContent": "={{ $json.summaryCN }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.notion",
      "typeVersion": 2.2,
      "position": [
        800,
        320
      ],
      "id": "024c6399-857e-45a3-a15d-8b733e16da67",
      "name": "RAG Daily Paper Summary",
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      }
    },
    {
      "parameters": {
        "jsCode": "const items = $input.all();\nconst response = items[0].json;\n\ntry {\n  // Extract text content from Gemini API response\n  // Note: response is directly an object, not an array\n  const text = response.candidates[0].content.parts[0].text;\n  \n  // Extract JSON content\n  const jsonMatch = text.match(/```json\\n([\\s\\S]*?)\\n```/);\n  const jsonStr = jsonMatch[1];\n  \n  // Parse JSON\n  const data = JSON.parse(jsonStr);\n  \n  // Manually handle duplicate keys - extract from original string\n  const titleMatch = jsonStr.match(/\"Number of papers\":\\s*\"([^\"]+)\"/);\n  const countMatch = jsonStr.match(/\"Number of papers\":\\s*(\\d+)/);\n  \n  // Construct result\n  items[0].json = {\n    title: titleMatch ? titleMatch[1] : '',\n    date: data.Date || '',\n    paperCount: countMatch ? parseInt(countMatch[1]) : 0,\n    summaryCN: data.SUMMARY_CN || '',\n    summaryEN: data.SUMMARY_EN || ''\n  };\n  \n} catch (error) {\n  items[0].json = {\n    error: error.message,\n    originalData: response\n  };\n}\n\nreturn items;\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        -688,
        320
      ],
      "id": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
      "name": "JSON FORMAT"
    },
    {
      "parameters": {
        "content": "## 1. Data Retrieval\n### arXiv API\n\nThe arXiv provides a public API that allows users to query research papers by topic or by predefined categories.\n\n[arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual)\n\n**Key Notes:**\n\n1. **Response Format**: The API returns data as a typical *Atom Response*.\n2. **Timezone & Update Frequency**:  \n   - The arXiv submission process operates on a 24-hour cycle.  \n   - Newly submitted articles become available in the API only at midnight *after* they have been processed.  \n   - Feeds are updated daily at midnight Eastern Standard Time (EST).  \n   - Therefore, a single request per day is sufficient.  \n3. **Request Limits**:  \n   - The maximum number of results per call (`max_results`) is **30,000**,  \n   - Results must be retrieved in slices of at most **2,000** at a time, using the `max_results` and `start` query parameters.  \n4. **Time Format**:  \n   - The expected format is `[YYYYMMDDTTTT+TO+YYYYMMDDTTTT]`,  \n   - `TTTT` is provided in 24-hour time to the minute, in GMT.\n\n### Scheduled Task\n\n- **Execution Frequency**: Daily  \n- **Execution Time**: 6:00 AM  \n- **Time Parameter Handling (JS)**:  \n  According to arXiv’s update rules, the scheduled task should query the **previous day’s (T-1)** `submittedDate` data.\n\n",
        "height": 768,
        "width": 736
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [
        -1984,
        544
      ],
      "id": "f1a331fa-d830-4656-b108-7e18e7430b04",
      "name": "Sticky Note3"
    },
    {
      "parameters": {
        "url": "=https://export.arxiv.org/api/query?search_query=all:RAG+AND+submittedDate:[{{$json[\"from\"]}}+TO+{{$json[\"to\"]}}]",
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "={{ $json.from }}"
            },
            {
              "name": "={{ $json.to }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        -1440,
        320
      ],
      "id": "ae855e91-2363-4b97-8933-761934b269fe",
      "name": "arXiv API"
    },
    {
      "parameters": {
        "jsCode": "// Get current date\nconst now = new Date();\nconst year = now.getFullYear();\nconst month = String(now.getMonth() + 1).padStart(2, '0');\nconst day = String(now.getDate()).padStart(2, '0');\nconst date = `${year}-${month}-${day}`;\n\n// Get input data\nconst inputData = $input.first().json;\n\n// Generate message content\nconst messageContent = inputData.SUMMARY_CN;\n\n// Gmail message body\nconst gmailMessage = {\n    subject: inputData.title || `Daily Paper Summary - ${date}`,\n    message: `<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n    <title> RAG Daily Paper Summary - ${date}</title>\n    <style type=\"text/css\">\n        /* Gmail safe styles */\n        body {\n            font-family: Arial, sans-serif;\n            line-height: 1.4;\n            margin: 0;\n            padding: 0;\n            background-color: #f9f9f9;\n            color: #333333;\n        }\n        \n        table {\n            border-collapse: collapse;\n            mso-table-lspace: 0pt;\n            mso-table-rspace: 0pt;\n        }\n        \n        .email-wrapper {\n            width: 100%;\n            background-color: #f9f9f9;\n            padding: 40px 20px;\n        }\n        \n        .email-container {\n            width: 100%;\n            max-width: 600px;\n            margin: 0 auto;\n            background-color: #ffffff;\n            border-radius: 8px;\n            box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);\n        }\n        \n        .header {\n            background-color: #2563eb;\n            padding: 24px;\n            text-align: center;\n            border-radius: 8px 8px 0 0;\n        }\n        \n        .header h1 {\n            margin: 0 0 8px 0;\n            font-size: 24px;\n            font-weight: 600;\n            color: #ffffff;\n        }\n        \n        .date {\n            font-size: 14px;\n            color: #ffffff;\n            opacity: 0.9;\n        }\n        \n        .stats {\n            background-color: #f1f5f9;\n            padding: 16px 24px;\n            font-size: 14px;\n            color: #64748b;\n        }\n        \n        .content {\n            padding: 32px 24px 40px 24px;\n        }\n        \n        .section {\n            margin-bottom: 24px;\n        }\n        \n        .section-title {\n            font-size: 16px;\n            font-weight: 600;\n            color: #1e293b;\n            margin-bottom: 12px;\n            padding-bottom: 8px;\n            border-bottom: 1px solid #e2e8f0;\n        }\n        \n        .flag {\n            display: inline-block;\n            width: 20px;\n            height: 14px;\n            margin-right: 8px;\n            border-radius: 2px;\n            vertical-align: middle;\n        }\n        \n        .flag-cn {\n            background-color: #de2910;\n        }\n        \n        .flag-en {\n            background-color: #012169;\n        }\n        \n        .summary {\n            font-size: 14px;\n            line-height: 1.6;\n            color: #475569;\n            padding: 16px;\n            background-color: #f8fafc;\n            border-radius: 6px;\n            border-left: 3px solid #2563eb;\n        }\n        \n        .divider {\n            height: 1px;\n            background-color: #e2e8f0;\n            margin: 20px 0;\n            border: none;\n        }\n        \n        /* Mobile responsive */\n        @media screen and (max-width: 600px) {\n            .email-wrapper {\n                padding: 20px 10px !important;\n            }\n            \n            .header, .stats {\n                padding: 20px 16px !important;\n            }\n            \n            .content {\n                padding: 24px 16px 32px 16px !important;\n            }\n            \n            .email-container {\n                border-radius: 0;\n            }\n        }\n        \n        /* Gmail specific fixes */\n        .gmail-fix {\n            display: none;\n        }\n        \n        /* Outlook specific fixes */\n        .ExternalClass {\n            width: 100%;\n        }\n        \n        .ExternalClass,\n        .ExternalClass p,\n        .ExternalClass span,\n        .ExternalClass font,\n        .ExternalClass td,\n        .ExternalClass div {\n            line-height: 100%;\n        }\n    </style>\n    <!--[if mso]>\n    <style type=\"text/css\">\n        .email-container {\n            width: 600px !important;\n        }\n    </style>\n    <![endif]-->\n</head>\n<body>\n    <table role=\"presentation\" class=\"email-wrapper\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n        <tr>\n            <td align=\"center\">\n                <table role=\"presentation\" class=\"email-container\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n                    <!-- Header -->\n                    <tr>\n                        <td class=\"header\">\n                            <h1>RAG Daily Papers</h1>\n                            <div class=\"date\">${inputData.Date || date}</div>\n                        </td>\n                    </tr>\n                    \n                    <!-- Stats -->\n                    <tr>\n                        <td class=\"stats\">\n                            <strong>${inputData[\"Number of papers\"] || inputData.paperCount || 0} papers</strong> reviewed today\n                        </td>\n                    </tr>\n                    \n                    <!-- Content -->\n                    <tr>\n                        <td class=\"content\">\n                            <!-- Chinese Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                  🇨🇳 Chinese\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}\n                                </div>\n                            </div>\n                            \n                            <!-- Divider -->\n                            <hr class=\"divider\">\n                            \n                            <!-- English Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                    🇺🇸 English\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}\n                                </div>\n                            </div>\n                        </td>\n                    </tr>\n                </table>\n            </td>\n        </tr>\n    </table>\n</body>\n</html>`\n};\n\n// Feishu message body\nconst feishuMessage = {\n    msg_type: \"text\",\n    content: {\n        text: `Today ${$input.first().json.date} ${$input.first().json.paperCount}  papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`\n    }\n};\n\n// n8n output format\nreturn [\n    { json: { type: \"gmail\", ...gmailMessage } },\n    { json: { type: \"feishu\", ...feishuMessage } }\n];\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        -128,
        528
      ],
      "id": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
      "name": "Message Construction"
    },
    {
      "parameters": {
        "content": "## 5. Message Push\n\nSet up two channels for message delivery: **EMAIL** and **IM**, and define the message format and content.\n\n### Email: Gmail\n\n**GMAIL OAuth 2.0 – Official Documentation**  \n[Configure your OAuth consent screen](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#configure-your-oauth-consent-screen)\n\n**Steps:**\n- Enable Gmail API  \n- Create OAuth consent screen  \n- Create OAuth client credentials  \n- Audience: Add **Test users** under Testing status  \n\n**Message format**: HTML  \n(Model: OpenAI GPT — used to design an HTML email template)\n\n### IM: Feishu (LARK)\n\n**Bots in groups**  \n[Use bots in groups](https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups)\n",
        "height": 576,
        "width": 1152
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -176,
        896
      ],
      "typeVersion": 1,
      "id": "2582c7df-9b15-4473-bc47-91cf6f7304e0",
      "name": "Sticky Note"
    },
    {
      "parameters": {
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "value": "26ba136d-cee4-8029-ad3d-e0e8ac64993f",
          "mode": "list",
          "cachedResultName": "RAG DAILY",
          "cachedResultUrl": "https://www.notion.so/26ba136dcee48029ad3de0e8ac64993f"
        },
        "title": "={{ $json.title }}",
        "simple": false,
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "published|date",
              "date": "={{ $json.published }}"
            },
            {
              "key": "summary|rich_text",
              "textContent": "={{ $json.summary }}"
            },
            {
              "key": "id|rich_text",
              "textContent": "={{ $json.id }}"
            },
            {
              "key": "html_url|url",
              "urlValue": "={{ $json.html_url }}"
            },
            {
              "key": "pdf_url|url",
              "urlValue": "={{ $json.pdf_url }}"
            },
            {
              "key": "primary_category|rich_text",
              "textContent": "={{ $json.primary_category }}"
            },
            {
              "key": "github|url",
              "ignoreIfEmpty": true,
              "urlValue": "={{ $json.github }}"
            },
            {
              "key": "huggingface|url",
              "ignoreIfEmpty": true,
              "urlValue": "={{ $json.huggingface }}"
            },
            {
              "key": "RAG_TF|rich_text",
              "textContent": "={{ $json.RAG_TF }}"
            },
            {
              "key": "RAG_REASON|rich_text",
              "textContent": "={{ $json.RAG_REASON }}"
            },
            {
              "key": "RAG_Category|rich_text",
              "textContent": "={{ $json.RAG_Category }}"
            },
            {
              "key": "RAG_NAME|rich_text",
              "textContent": "={{ $json.RAG_NAME }}"
            },
            {
              "key": "updated|date",
              "date": "={{ $json.updated }}"
            },
            {
              "key": "author|multi_select",
              "multiSelectValue": "={{ $json.authors }}"
            },
            {
              "key": "category|multi_select",
              "multiSelectValue": "={{ $json.categories }}"
            }
          ]
        },
        "blockUi": {
          "blockValues": [
            {
              "textContent": "={{ $json.summary }}"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.notion",
      "typeVersion": 2.2,
      "position": [
        800,
        0
      ],
      "id": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
      "name": "RAG Daily papers",
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      }
    },
    {
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        112,
        0
      ],
      "id": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
      "name": "Data Extraction"
    },
    {
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        592,
        0
      ],
      "id": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
      "name": "JSON Format"
    },
    {
      "parameters": {
        "content": "## 3. Data Processing\n\nAnalyze and summarize paper data using AI, then standardize output as JSON.\n\n### Single Paper Basic Information Analysis and Enhancement  \n### Daily Paper Summary and Multilingual Translation",
        "height": 192,
        "width": 656
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -160,
        -224
      ],
      "typeVersion": 1,
      "id": "8fbefc67-e9f7-4597-b935-d5f5895cf93c",
      "name": "Sticky Note1"
    },
    {
      "parameters": {
        "content": "## 4. Data Storage: Notion Database\n\n- Create a corresponding database in Notion with the same predefined field names.  \n- In Notion, create an integration under **Integrations** and grant access to the database. Obtain the corresponding **Secret Key**.  \n- Use the Notion **\"Create a database page\"** node to configure the field mapping and store the data.  \n\n**Notes**  \n- **\"Create a database page\"** only adds new entries; data will not be updated.  \n- The `updated` and `published` timestamps of arXiv papers are in **UTC**.  \n- Notion **single-select** and **multi-select** fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.  \n- Notion does not accept `null` values, which causes a **400 error**.  \n",
        "height": 368,
        "width": 624
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1024,
        16
      ],
      "typeVersion": 1,
      "id": "884f2c40-4628-4376-a040-709e2db34c48",
      "name": "Sticky Note2"
    },
    {
      "parameters": {
        "content": "## 2. **Data Extraction**\n\n### Data Cleaning Rules (Convert to Standard JSON)\n\n1. **Remove Header**  \n   - Keep only the `<entry></entry>` blocks representing paper items.\n\n2. **Single Item**  \n   - Each `<entry></entry>` represents a single item.\n\n3. **Field Processing Rules**  \n   - `<id></id>` ➡️ `id`  \n     Extract content.  \n     Example: `<id>http://arxiv.org/abs/2409.06062v1</id>` → `http://arxiv.org/abs/2409.06062v1`  \n   - `<updated></updated>` ➡️ `updated`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<published></published>` ➡️ `published`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<title></title>` ➡️ `title`  \n     Extract text content  \n   - `<summary></summary>` ➡️ `summary`  \n     Keep text, remove line breaks  \n   - `<author></author>` ➡️ `author`  \n     Combine all authors into an array  \n     Example: `[ \"Ernest Pusateri\", \"Anmol Walia\" ]` (for Notion multi-select field)  \n   - `<arxiv:comment></arxiv:comment>` ➡️ Ignore / discard  \n   - `<link type=\"text/html\">` ➡️ `html_url`  \n     Extract URL  \n   - `<link type=\"application/pdf\">` ➡️ `pdf_url`  \n     Extract URL  \n   - `<arxiv:primary_category term=\"cs.CL\">` ➡️ `primary_category`  \n     Extract `term` value  \n   - `<category>` ➡️ `category`  \n     Merge all `<category>` values into an array  \n     Example: `[ \"eess.AS\", \"cs.SD\" ]` (for Notion multi-select field)  \n\n4. **Add Empty Fields**  \n   - `github`  \n   - `huggingface`\n",
        "height": 912,
        "width": 624
      },
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1088,
        544
      ],
      "typeVersion": 1,
      "id": "4991129d-9406-4c52-bd8f-87e2721c4a6f",
      "name": "Sticky Note4"
    }
  ],
  "connections": {
    "Basic LLM Chain": {
      "main": [
        [
          {
            "node": "JSON Format",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "submittedDate:T-1": {
      "main": [
        [
          {
            "node": "arXiv API",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "If": {
      "main": [
        [
          {
            "node": "Data Extraction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "submittedDate:T-1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "FEISHU": {
      "main": [
        [
          {
            "node": "FEISHU POST",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "gmail": {
      "main": [
        [
          {
            "node": "Send a message",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message a model": {
      "main": [
        [
          {
            "node": "JSON FORMAT",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON FORMAT": {
      "main": [
        [
          {
            "node": "RAG Daily Paper Summary",
            "type": "main",
            "index": 0
          },
          {
            "node": "If",
            "type": "main",
            "index": 0
          },
          {
            "node": "Message Construction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "arXiv API": {
      "main": [
        [
          {
            "node": "Message a model",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message Construction": {
      "main": [
        [
          {
            "node": "gmail",
            "type": "main",
            "index": 0
          },
          {
            "node": "FEISHU",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Data Extraction": {
      "main": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON Format": {
      "main": [
        [
          {
            "node": "RAG Daily papers",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "pinData": {},
  "meta": {
    "instanceId": "a6011e4876c6b1225fa48dae1dbfa92e1932a633b3186bbb7bfd5c9e6ad2d878"
  }
}

1. Data Retrieval

arXiv API

The arXiv provides a public API that allows users to query research papers by topic or by predefined categories.

https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual

Key Notes:

Response Format: The API returns data as a typical Atom Response.
Timezone & Update Frequency: The arXiv submission process operates on a 24-hour cycle. Newly submitted articles become available in the API only at midnight after they have been processed. Feeds are updated daily at midnight Eastern Standard Time (EST). Therefore, a single request per day is sufficient.
Request Limits: The maximum number of results per call (max_results) is 30,000, but results must be retrieved in slices of at most 2,000 at a time, using the max_results and start query parameters.
Time Format: The expected format is [YYYYMMDDTTTT+TO+YYYYMMDDTTTT], where TTTT is provided in 24-hour time to the minute, in GMT.

API Request Example:

Query keyword: RAG
Time range: submittedDate

https://export.arxiv.org/api/query?search_query=all:rag+AND+submittedDate:[202509140000+TO+202509142359]

API Response Example:

updated: Example value 2025-09-19T00:00:00-04:00 (EDT). Daily scheduled tasks should be executed after this timestamp.
totalResults: The total number of returned papers. Pagination is unnecessary if the count is small, but for popular topics or larger time spans, use the start and max_results parameters to paginate.
Paper Timestamps:
- updated and published example: 2025-09-14T06:29:18Z (UTC).

XML

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link href="http://arxiv.org/api/query?search_query%3Dall%3Arag%20AND%20submittedDate%3A%5B202509140000%20TO%20202509142359%5D%26id_list%3D%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>
  <title type="html">ArXiv Query: search_query=all:rag AND submittedDate:[202509140000 TO 202509142359]&id_list=&start=0&max_results=10</title>
  <id>http://arxiv.org/api/B2w5/U8KCkkmjkfs5ZT52WWxw2A</id>
  <updated>2025-09-19T00:00:00-04:00</updated>
  <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">4</opensearch:totalResults>
  <opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>
  <opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">10</opensearch:itemsPerPage>
  <entry>
    <id>http://arxiv.org/abs/2509.11124v1</id>
    <updated>2025-09-14T06:29:18Z</updated>
    <published>2025-09-14T06:29:18Z</published>
    <title>STASE: A spatialized text-to-audio synthesis engine for music generation</title>
    <summary>  While many text-to-audio systems produce monophonic or fixed-stereo outputs,
generating audio with user-defined spatial properties remains a challenge.
Existing deep learning-based spatialization methods often rely on latent-space
manipulations, which can limit direct control over psychoacoustic parameters
critical to spatial perception...
</summary>
    <author>
      <name>Tutti Chi</name>
    </author>
    <author>
      <name>Letian Gao</name>
    </author>
    <author>
      <name>Yixiao Zhang</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Accepted to LLM4Music @ ISMIR 2025</arxiv:comment>
    <link href="http://arxiv.org/abs/2509.11124v1" rel="alternate" type="text/html"/>
    <link title="pdf" href="http://arxiv.org/pdf/2509.11124v1" rel="related" type="application/pdf"/>
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.SD" scheme="http://arxiv.org/schemas/atom"/>
    <category term="cs.SD" scheme="http://arxiv.org/schemas/atom"/>
    <category term="eess.AS" scheme="http://arxiv.org/schemas/atom"/>
  </entry>
  ...
</feed>

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link href="http://arxiv.org/api/query?search_query%3Dall%3Arag%20AND%20submittedDate%3A%5B202509140000%20TO%20202509142359%5D%26id_list%3D%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>
  <title type="html">ArXiv Query: search_query=all:rag AND submittedDate:[202509140000 TO 202509142359]&id_list=&start=0&max_results=10</title>
  <id>http://arxiv.org/api/B2w5/U8KCkkmjkfs5ZT52WWxw2A</id>
  <updated>2025-09-19T00:00:00-04:00</updated>
  <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">4</opensearch:totalResults>
  <opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>
  <opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">10</opensearch:itemsPerPage>
  <entry>
    <id>http://arxiv.org/abs/2509.11124v1</id>
    <updated>2025-09-14T06:29:18Z</updated>
    <published>2025-09-14T06:29:18Z</published>
    <title>STASE: A spatialized text-to-audio synthesis engine for music generation</title>
    <summary>  While many text-to-audio systems produce monophonic or fixed-stereo outputs,
generating audio with user-defined spatial properties remains a challenge.
Existing deep learning-based spatialization methods often rely on latent-space
manipulations, which can limit direct control over psychoacoustic parameters
critical to spatial perception...
</summary>
    <author>
      <name>Tutti Chi</name>
    </author>
    <author>
      <name>Letian Gao</name>
    </author>
    <author>
      <name>Yixiao Zhang</name>
    </author>
    <arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Accepted to LLM4Music @ ISMIR 2025</arxiv:comment>
    <link href="http://arxiv.org/abs/2509.11124v1" rel="alternate" type="text/html"/>
    <link title="pdf" href="http://arxiv.org/pdf/2509.11124v1" rel="related" type="application/pdf"/>
    <arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.SD" scheme="http://arxiv.org/schemas/atom"/>
    <category term="cs.SD" scheme="http://arxiv.org/schemas/atom"/>
    <category term="eess.AS" scheme="http://arxiv.org/schemas/atom"/>
  </entry>
  ...
</feed>

Scheduled Task

Execution Frequency: Daily
Execution Time: 6:00 AM
Time Parameter Handling (JS): According to arXiv’s update rules, the scheduled task should query the previous day’s (T-1) submittedDate data.

JavaScript

const now = new Date();
const yesterday = new Date(now);
yesterday.setDate(now.getDate() - 2);

const y = yesterday.getFullYear();
const m = String(yesterday.getMonth() + 1).padStart(2, '0');
const d = String(yesterday.getDate()).padStart(2, '0');

return [
  {
    json: {
      from: `${y}${m}${d}0000`,
      to: `${y}${m}${d}2359`
    }
  }
];

const now = new Date();
const yesterday = new Date(now);
yesterday.setDate(now.getDate() - 2);

const y = yesterday.getFullYear();
const m = String(yesterday.getMonth() + 1).padStart(2, '0');
const d = String(yesterday.getDate()).padStart(2, '0');

return [
  {
    json: {
      from: `${y}${m}${d}0000`,
      to: `${y}${m}${d}2359`
    }
  }
];

2. Data Extraction

Data Cleaning Rules (Convert to Standard JSON)

Remove Header: Keep only the <entry></entry> blocks representing paper items.
Each <entry></entry> represents a single item.
Field Processing Rules:
- <id></id> ➡️ id: Extract content. Example: <id><http://arxiv.org/abs/2409.06062v1></id> → http://arxiv.org/abs/2409.06062v1
- <updated></updated> ➡️ updated: Convert timestamp to yyyy-mm-dd hh:mm:ss
- <published></published> ➡️ published: Convert timestamp to yyyy-mm-dd hh:mm:ss
- <title></title> ➡️ title: Extract text content
- <summary></summary> ➡️ summary: Keep text, remove line breaks
- <author></author> ➡️ author: Combine all authors into an array, e.g., [“Ernest Pusateri”, “Anmol Walia”], for Notion multi-select field
- <arxiv:comment></arxiv:comment> ➡️ Ignore / discard
- <link type="text/html"> ➡️ html_url: Extract URL
- <link type="application/pdf"> ➡️ pdf_url: Extract URL
- <arxiv:primary_category term="cs.CL"> ➡️ primary_category: Extract term value
- <category> ➡️ category: Merge all <category> values into an array, e.g., [“eess.AS”, “cs.SD”], for Notion multi-select field
Add Empty Fields: github and huggingface

JS CODE

Model: Claude Sonnet 4

JavaScript

// Get input data
const xmlData = $('HTTP Request1').first().json.data

if (!xmlData) {
    return [{
        json: {
            error: "XML data not found. Please ensure the input contains XML content",
            message: "Check the field names in the input data",
            success: false
        }
    }];
}

// Function to format date-time
function formatDateTime(isoString) {
    if (!isoString) return '';
    
    try {
        const date = new Date(isoString);
        if (isNaN(date.getTime())) return '';
        
        const year = date.getFullYear();
        const month = String(date.getMonth() + 1).padStart(2, '0');
        const day = String(date.getDate()).padStart(2, '0');
        const hours = String(date.getUTCHours()).padStart(2, '0');
        const minutes = String(date.getUTCMinutes()).padStart(2, '0');
        const seconds = String(date.getUTCSeconds()).padStart(2, '0');
        
        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;
    } catch (error) {
        return '';
    }
}

// General function to extract tag content
function extractTagContent(xml, tagName) {
    const regex = new RegExp(`<${tagName}[^>]*>([\\s\\S]*?)<\\/${tagName}>`, 'i');
    const match = xml.match(regex);
    return match ? match[1].trim().replace(/\s+/g, ' ') : '';
}

// Extract links
function extractLink(entryXml, linkType) {
    // Fixed link extraction to fit actual XML format
    // Format: <link href="..." rel="..." type="..."/>
    const patterns = [
        new RegExp(`<link[^>]*href="([^"]*)"[^>]*type="${linkType}"`, 'i'),
        new RegExp(`<link[^>]*type="${linkType}"[^>]*href="([^"]*)"`, 'i')
    ];
    
    for (const pattern of patterns) {
        const match = entryXml.match(pattern);
        if (match && match[1]) {
            return match[1];
        }
    }
    return '';
}

// Fixed author extraction function - returns array
function extractAuthors(entryXml) {
    const authorBlocks = entryXml.match(/<author[^>]*>([\s\S]*?)<\/author>/gi) || [];
    const authors = [];
    
    for (const block of authorBlocks) {
        const nameMatch = block.match(/<name[^>]*>(.*?)<\/name>/i);
        if (nameMatch && nameMatch[1].trim()) {
            authors.push(nameMatch[1].trim());
        }
    }
    
    return authors; // Return array instead of string
}

// Extract categories
function extractCategories(entryXml) {
    const categories = [];
    const regex = /<category[^>]*term="([^"]*)"/gi;
    let match;
    
    while ((match = regex.exec(entryXml)) !== null) {
        if (match[1]) {
            categories.push(match[1]);
        }
    }
    
    return categories;
}

// Extract primary category
function extractPrimaryCategory(entryXml) {
    // Handle namespace-prefixed primary category extraction
    const patterns = [
        /primary_category[^>]*term="([^"]*)"/i,
        /arxiv:primary_category[^>]*term="([^"]*)"/i
    ];
    
    for (const pattern of patterns) {
        const match = entryXml.match(pattern);
        if (match && match[1]) {
            return match[1];
        }
    }
    return '';
}

// New: extract arxiv comment
function extractArxivComment(entryXml) {
    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\/arxiv:comment>/i);
    return commentMatch ? commentMatch[1].trim() : '';
}

try {
    // Extract all entry blocks
    const entryRegex = /<entry[^>]*>([\s\S]*?)<\/entry>/gi;
    const entries = [];
    let match;
    
    while ((match = entryRegex.exec(xmlData)) !== null) {
        entries.push(match[1]);
    }
    
    if (entries.length === 0) {
        return [{
            json: {
                error: "No <entry> elements found",
                message: "Please check if the XML data format is correct",
                success: false
            }
        }];
    }

    // Process each entry
    const processedData = [];
    let processedCount = 0;

    for (let i = 0; i < entries.length; i++) {
        const entryXml = entries[i];
        
        try {
            const item = {
                id: extractTagContent(entryXml, 'id'),
                updated: formatDateTime(extractTagContent(entryXml, 'updated')),
                published: formatDateTime(extractTagContent(entryXml, 'published')),
                title: extractTagContent(entryXml, 'title'),
                summary: extractTagContent(entryXml, 'summary'),
                authors: extractAuthors(entryXml), // field name changed to authors, returns array
                html_url: extractLink(entryXml, 'text/html'),
                pdf_url: extractLink(entryXml, 'application/pdf'),
                primary_category: extractPrimaryCategory(entryXml),
                categories: extractCategories(entryXml), // field name changed to categories
                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment
                github: '',
                huggingface: ''
            };

            // Validate required fields
            if (item.id && item.title) {
                processedData.push(item);
                processedCount++;
            }
            
        } catch (error) {
            console.log(`Error processing entry ${i+1}: ${error.message}`);
            // Continue processing next entry
        }
    }

    // Return processed results
    return [{
        json: {
            success: true,
            message: `Successfully processed ${processedCount} entries`,
            data: processedData,
            processing_time: new Date().toISOString()
        }
    }];

} catch (error) {
    // Error handling
    return [{
        json: {
            error: "An error occurred during processing",
            message: error.message,
            success: false
        }
    }];
}

// Get input data
const xmlData = $('HTTP Request1').first().json.data

if (!xmlData) {
    return [{
        json: {
            error: "XML data not found. Please ensure the input contains XML content",
            message: "Check the field names in the input data",
            success: false
        }
    }];
}

// Function to format date-time
function formatDateTime(isoString) {
    if (!isoString) return '';
    
    try {
        const date = new Date(isoString);
        if (isNaN(date.getTime())) return '';
        
        const year = date.getFullYear();
        const month = String(date.getMonth() + 1).padStart(2, '0');
        const day = String(date.getDate()).padStart(2, '0');
        const hours = String(date.getUTCHours()).padStart(2, '0');
        const minutes = String(date.getUTCMinutes()).padStart(2, '0');
        const seconds = String(date.getUTCSeconds()).padStart(2, '0');
        
        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;
    } catch (error) {
        return '';
    }
}

// General function to extract tag content
function extractTagContent(xml, tagName) {
    const regex = new RegExp(`<${tagName}[^>]*>([\\s\\S]*?)<\\/${tagName}>`, 'i');
    const match = xml.match(regex);
    return match ? match[1].trim().replace(/\s+/g, ' ') : '';
}

// Extract links
function extractLink(entryXml, linkType) {
    // Fixed link extraction to fit actual XML format
    // Format: <link href="..." rel="..." type="..."/>
    const patterns = [
        new RegExp(`<link[^>]*href="([^"]*)"[^>]*type="${linkType}"`, 'i'),
        new RegExp(`<link[^>]*type="${linkType}"[^>]*href="([^"]*)"`, 'i')
    ];
    
    for (const pattern of patterns) {
        const match = entryXml.match(pattern);
        if (match && match[1]) {
            return match[1];
        }
    }
    return '';
}

// Fixed author extraction function - returns array
function extractAuthors(entryXml) {
    const authorBlocks = entryXml.match(/<author[^>]*>([\s\S]*?)<\/author>/gi) || [];
    const authors = [];
    
    for (const block of authorBlocks) {
        const nameMatch = block.match(/<name[^>]*>(.*?)<\/name>/i);
        if (nameMatch && nameMatch[1].trim()) {
            authors.push(nameMatch[1].trim());
        }
    }
    
    return authors; // Return array instead of string
}

// Extract categories
function extractCategories(entryXml) {
    const categories = [];
    const regex = /<category[^>]*term="([^"]*)"/gi;
    let match;
    
    while ((match = regex.exec(entryXml)) !== null) {
        if (match[1]) {
            categories.push(match[1]);
        }
    }
    
    return categories;
}

// Extract primary category
function extractPrimaryCategory(entryXml) {
    // Handle namespace-prefixed primary category extraction
    const patterns = [
        /primary_category[^>]*term="([^"]*)"/i,
        /arxiv:primary_category[^>]*term="([^"]*)"/i
    ];
    
    for (const pattern of patterns) {
        const match = entryXml.match(pattern);
        if (match && match[1]) {
            return match[1];
        }
    }
    return '';
}

// New: extract arxiv comment
function extractArxivComment(entryXml) {
    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\/arxiv:comment>/i);
    return commentMatch ? commentMatch[1].trim() : '';
}

try {
    // Extract all entry blocks
    const entryRegex = /<entry[^>]*>([\s\S]*?)<\/entry>/gi;
    const entries = [];
    let match;
    
    while ((match = entryRegex.exec(xmlData)) !== null) {
        entries.push(match[1]);
    }
    
    if (entries.length === 0) {
        return [{
            json: {
                error: "No <entry> elements found",
                message: "Please check if the XML data format is correct",
                success: false
            }
        }];
    }

    // Process each entry
    const processedData = [];
    let processedCount = 0;

    for (let i = 0; i < entries.length; i++) {
        const entryXml = entries[i];
        
        try {
            const item = {
                id: extractTagContent(entryXml, 'id'),
                updated: formatDateTime(extractTagContent(entryXml, 'updated')),
                published: formatDateTime(extractTagContent(entryXml, 'published')),
                title: extractTagContent(entryXml, 'title'),
                summary: extractTagContent(entryXml, 'summary'),
                authors: extractAuthors(entryXml), // field name changed to authors, returns array
                html_url: extractLink(entryXml, 'text/html'),
                pdf_url: extractLink(entryXml, 'application/pdf'),
                primary_category: extractPrimaryCategory(entryXml),
                categories: extractCategories(entryXml), // field name changed to categories
                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment
                github: '',
                huggingface: ''
            };

            // Validate required fields
            if (item.id && item.title) {
                processedData.push(item);
                processedCount++;
            }
            
        } catch (error) {
            console.log(`Error processing entry ${i+1}: ${error.message}`);
            // Continue processing next entry
        }
    }

    // Return processed results
    return [{
        json: {
            success: true,
            message: `Successfully processed ${processedCount} entries`,
            data: processedData,
            processing_time: new Date().toISOString()
        }
    }];

} catch (error) {
    // Error handling
    return [{
        json: {
            error: "An error occurred during processing",
            message: error.message,
            success: false
        }
    }];
}

3. Data Processing

Analyze and summarize paper data using AI, then standardize output as JSON.

3.1 Single Paper Basic Information Analysis and Enhancement

Model: gemini-2.5-flash

System Prompt Settings：

Plaintext

You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:

1. RAG Relevance and Labeling:
- Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.
- For each data item, add three new fields:
- `RAG_TF`: "T" if related, "F" if not
- `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty
- `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty

2. RAG Method Extraction:
- Analyze the `summary` and extract the RAG method proposed in the paper.
- Store it in the new field `RAG_NAME`.

3. External Link Extraction:
- Analyze the `summary` content for `github` or `huggingface` links.
- If present, extract the URLs and populate the existing `github` and `huggingface` fields.
- If not present, leave them unchanged.

Output Format: standard JSON

Example:

Given a data item with the following `summary`:

"summary":"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer

You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:

1. RAG Relevance and Labeling:
   - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.
   - For each data item, add three new fields:
     - `RAG_TF`: "T" if related, "F" if not
     - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty
     - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty

2. RAG Method Extraction:
   - Analyze the `summary` and extract the RAG method proposed in the paper.
   - Store it in the new field `RAG_NAME`.

3. External Link Extraction:
   - Analyze the `summary` content for `github` or `huggingface` links.
   - If present, extract the URLs and populate the existing `github` and `huggingface` fields.
   - If not present, leave them unchanged.

Output Format: standard JSON

Example:

Given a data item with the following `summary`:

"summary":"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer

JS CODE

Process AI Response Content into Standard JSON using JS

Model: Claude Sonnet 4

JavaScript

// ------------------------
// 1. Get the current item's JSON output
// ------------------------
const raw = $json;

// ------------------------
// 2. Define a function to recursively find the text field
// ------------------------
function findText(obj) {
    if (!obj || typeof obj !== 'object') return null;
    if (obj.text && typeof obj.text === 'string') return obj.text;

    for (const key of Object.keys(obj)) {
        const result = findText(obj[key]);
        if (result) return result;
    }
    return null;
}

// ------------------------
// 3. Find the AI output text
// ------------------------
const textBlock = findText(raw);

if (!textBlock) {
    throw new Error("No text field found, please check the AI output structure");
}

// ------------------------
// 4. Use regex to remove ```json ``` wrapper and extract JSON
// ------------------------
const jsonMatch = textBlock.match(/```json\s*([\s\S]*?)```/i);
const jsonText = jsonMatch ? jsonMatch[1] : textBlock;

// ------------------------
// 5. Parse into standard JSON
// ------------------------
let parsedJson;
try {
    parsedJson = JSON.parse(jsonText);
} catch (error) {
    throw new Error("Failed to parse JSON: " + error.message);
}

// ------------------------
// 6. Return in  standard format
// ------------------------
return parsedJson.map(item => ({ json: item }));

// ------------------------
// 1. Get the current item's JSON output
// ------------------------
const raw = $json;

// ------------------------
// 2. Define a function to recursively find the text field
// ------------------------
function findText(obj) {
    if (!obj || typeof obj !== 'object') return null;
    if (obj.text && typeof obj.text === 'string') return obj.text;

    for (const key of Object.keys(obj)) {
        const result = findText(obj[key]);
        if (result) return result;
    }
    return null;
}

// ------------------------
// 3. Find the AI output text
// ------------------------
const textBlock = findText(raw);

if (!textBlock) {
    throw new Error("No text field found, please check the AI output structure");
}

// ------------------------
// 4. Use regex to remove ```json ``` wrapper and extract JSON
// ------------------------
const jsonMatch = textBlock.match(/```json\s*([\s\S]*?)```/i);
const jsonText = jsonMatch ? jsonMatch[1] : textBlock;

// ------------------------
// 5. Parse into standard JSON
// ------------------------
let parsedJson;
try {
    parsedJson = JSON.parse(jsonText);
} catch (error) {
    throw new Error("Failed to parse JSON: " + error.message);
}

// ------------------------
// 6. Return in  standard format
// ------------------------
return parsedJson.map(item => ({ json: item }));

3.2 Daily Paper Summary and Multilingual Translation

Model: gemini-2.5-flash

System Prompt Settings

Plaintext

You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:

1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary
2. Set the daily date field `Date`: yyyy-mm-dd
3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.
4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.
5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.

Example: If there are papers:
{
"Number of papers":"2025-09-13 paper summary",
"Date":2025-09-13,
"Number of papers": 2,
"SUMMARY_CN": "Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.",
"SUMMARY_EN": "Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency."
}

If the number of papers is 0, maintain the JSON structure:
{
"Number of papers":"2025-09-13 paper summary",
"Date":2025-09-13,
"Number of papers": 0,
"SUMMARY_CN": "",
"SUMMARY_EN": ""
}

You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:

1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary
2. Set the daily date field `Date`: yyyy-mm-dd
3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.
4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.
5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.

Example: If there are papers:
{
  "Number of papers":"2025-09-13 paper summary",
  "Date":2025-09-13,
  "Number of papers": 2,
  "SUMMARY_CN": "Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.",
  "SUMMARY_EN": "Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency."
}

If the number of papers is 0, maintain the JSON structure:
{
  "Number of papers":"2025-09-13 paper summary",
  "Date":2025-09-13,
  "Number of papers": 0,
  "SUMMARY_CN": "",
  "SUMMARY_EN": ""
}

JS CODE

Process AI Response Content into Standard JSON using JS

Model: Claude Sonnet 4

JavaScript

const items = $input.all();
const response = items[0].json;

try {
  // Extract text content from Gemini API response
  // Note: response is directly an object, not an array
  const text = response.candidates[0].content.parts[0].text;
  
  // Extract JSON content
  const jsonMatch = text.match(/```json\n([\s\S]*?)\n```/);
  const jsonStr = jsonMatch[1];
  
  // Parse JSON
  const data = JSON.parse(jsonStr);
  
  // Manually handle duplicate keys - extract from original string
  const titleMatch = jsonStr.match(/"Number of papers":\s*"([^"]+)"/);
  const countMatch = jsonStr.match(/"Number of papers":\s*(\d+)/);
  
  // Construct result
  items[0].json = {
    title: titleMatch ? titleMatch[1] : '',
    date: data.Date || '',
    paperCount: countMatch ? parseInt(countMatch[1]) : 0,
    summaryCN: data.SUMMARY_CN || '',
    summaryEN: data.SUMMARY_EN || ''
  };
  
} catch (error) {
  items[0].json = {
    error: error.message,
    originalData: response
  };
}

return items;

const items = $input.all();
const response = items[0].json;

try {
  // Extract text content from Gemini API response
  // Note: response is directly an object, not an array
  const text = response.candidates[0].content.parts[0].text;
  
  // Extract JSON content
  const jsonMatch = text.match(/```json\n([\s\S]*?)\n```/);
  const jsonStr = jsonMatch[1];
  
  // Parse JSON
  const data = JSON.parse(jsonStr);
  
  // Manually handle duplicate keys - extract from original string
  const titleMatch = jsonStr.match(/"Number of papers":\s*"([^"]+)"/);
  const countMatch = jsonStr.match(/"Number of papers":\s*(\d+)/);
  
  // Construct result
  items[0].json = {
    title: titleMatch ? titleMatch[1] : '',
    date: data.Date || '',
    paperCount: countMatch ? parseInt(countMatch[1]) : 0,
    summaryCN: data.SUMMARY_CN || '',
    summaryEN: data.SUMMARY_EN || ''
  };
  
} catch (error) {
  items[0].json = {
    error: error.message,
    originalData: response
  };
}

return items;

4. Data Storage

Notion Database

Create a corresponding database in Notion with the same predefined field names.
In Notion, create an integration under “Integrations” and grant access to the database. Obtain the corresponding Secret Key.
Use the Notion “Create a database page” node to configure the field mapping and store the data.

Notes

*Notion single-select and multi-select fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.*

“Create a database page” only adds new entries; data will not be updated.
The updated and published timestamps of arXiv papers are in UTC.
Notion single-select and multi-select fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.
Notion does not accept null values, which causes a 400 error.

5. Message Push

Set up two channels for message delivery: EMAIL and IM, and define the message format and content.

Email: Gmail

GMAIL OAuth 2.0 ，Official Documentation

https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#configure-your-oauth-consent-screen

Steps:

Enable Gmail API
Create OAuth consent screen
Create OAuth client credentials
Audience: Add Test users under Testing status

Message format: HTML （Model: OpenAI GPT — used to design an HTML email template）

M: Feishu(LARK)

Bots in groups

https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups

Message format: TEXT

JSON

{
"msg_type": "text",
"content": {
"text": "content"
}
}

{
"msg_type": "text",
"content": {
"text": "content"
}
}

Message Construction

Use JS to build a collection of messages for Gmail and Feishu delivery.

Model: Claude Sonnet 4

JavaScript

// Get current date
const now = new Date();
const year = now.getFullYear();
const month = String(now.getMonth() + 1).padStart(2, '0');
const day = String(now.getDate()).padStart(2, '0');
const date = `${year}-${month}-${day}`;

// Get input data
const inputData = $input.first().json;

// Generate message content
const messageContent = inputData.SUMMARY_CN;

// Gmail message body
const gmailMessage = {
    subject: inputData.title || `Daily Paper Summary - ${date}`,
    message: `<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title> RAG Daily Paper Summary - ${date}</title>
    <style type="text/css">
        /* Gmail safe styles */
        body {
            font-family: Arial, sans-serif;
            line-height: 1.4;
            margin: 0;
            padding: 0;
            background-color: #f9f9f9;
            color: #333333;
        }
        
        table {
            border-collapse: collapse;
            mso-table-lspace: 0pt;
            mso-table-rspace: 0pt;
        }
        
        .email-wrapper {
            width: 100%;
            background-color: #f9f9f9;
            padding: 40px 20px;
        }
        
        .email-container {
            width: 100%;
            max-width: 600px;
            margin: 0 auto;
            background-color: #ffffff;
            border-radius: 8px;
            box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);
        }
        
        .header {
            background-color: #2563eb;
            padding: 24px;
            text-align: center;
            border-radius: 8px 8px 0 0;
        }
        
        .header h1 {
            margin: 0 0 8px 0;
            font-size: 24px;
            font-weight: 600;
            color: #ffffff;
        }
        
        .date {
            font-size: 14px;
            color: #ffffff;
            opacity: 0.9;
        }
        
        .stats {
            background-color: #f1f5f9;
            padding: 16px 24px;
            font-size: 14px;
            color: #64748b;
        }
        
        .content {
            padding: 32px 24px 40px 24px;
        }
        
        .section {
            margin-bottom: 24px;
        }
        
        .section-title {
            font-size: 16px;
            font-weight: 600;
            color: #1e293b;
            margin-bottom: 12px;
            padding-bottom: 8px;
            border-bottom: 1px solid #e2e8f0;
        }
        
        .flag {
            display: inline-block;
            width: 20px;
            height: 14px;
            margin-right: 8px;
            border-radius: 2px;
            vertical-align: middle;
        }
        
        .flag-cn {
            background-color: #de2910;
        }
        
        .flag-en {
            background-color: #012169;
        }
        
        .summary {
            font-size: 14px;
            line-height: 1.6;
            color: #475569;
            padding: 16px;
            background-color: #f8fafc;
            border-radius: 6px;
            border-left: 3px solid #2563eb;
        }
        
        .divider {
            height: 1px;
            background-color: #e2e8f0;
            margin: 20px 0;
            border: none;
        }
        
        /* Mobile responsive */
        @media screen and (max-width: 600px) {
            .email-wrapper {
                padding: 20px 10px !important;
            }
            
            .header, .stats {
                padding: 20px 16px !important;
            }
            
            .content {
                padding: 24px 16px 32px 16px !important;
            }
            
            .email-container {
                border-radius: 0;
            }
        }
        
        /* Gmail specific fixes */
        .gmail-fix {
            display: none;
        }
        
        /* Outlook specific fixes */
        .ExternalClass {
            width: 100%;
        }
        
        .ExternalClass,
        .ExternalClass p,
        .ExternalClass span,
        .ExternalClass font,
        .ExternalClass td,
        .ExternalClass div {
            line-height: 100%;
        }
    </style>
    <!--[if mso]>
    <style type="text/css">
        .email-container {
            width: 600px !important;
        }
    </style>
    <![endif]-->
</head>
<body>
    <table role="presentation" class="email-wrapper" cellpadding="0" cellspacing="0" border="0">
        <tr>
            <td align="center">
                <table role="presentation" class="email-container" cellpadding="0" cellspacing="0" border="0">
                    <!-- Header -->
                    <tr>
                        <td class="header">
                            <h1>RAG Daily Papers</h1>
                            <div class="date">${inputData.Date || date}</div>
                        </td>
                    </tr>
                    
                    <!-- Stats -->
                    <tr>
                        <td class="stats">
                            <strong>${inputData["Number of papers"] || inputData.paperCount || 0} papers</strong> reviewed today
                        </td>
                    </tr>
                    
                    <!-- Content -->
                    <tr>
                        <td class="content">
                            <!-- Chinese Section -->
                            <div class="section">
                                <h2 class="section-title">
                                  🇨🇳 Chinese
                                </h2>
                                <div class="summary">
                                    ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}
                                </div>
                            </div>
                            
                            <!-- Divider -->
                            <hr class="divider">
                            
                            <!-- English Section -->
                            <div class="section">
                                <h2 class="section-title">
                                    🇺🇸 English
                                </h2>
                                <div class="summary">
                                    ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}
                                </div>
                            </div>
                        </td>
                    </tr>
                </table>
            </td>
        </tr>
    </table>
</body>
</html>`
};

// Feishu message body
const feishuMessage = {
    msg_type: "text",
    content: {
        text: `Today ${$input.first().json.date} ${$input.first().json.paperCount} papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`
    }
};

// n8n output format
return [
    { json: { type: "gmail", ...gmailMessage } },
    { json: { type: "feishu", ...feishuMessage } }
];

// Get current date
const now = new Date();
const year = now.getFullYear();
const month = String(now.getMonth() + 1).padStart(2, '0');
const day = String(now.getDate()).padStart(2, '0');
const date = `${year}-${month}-${day}`;

// Get input data
const inputData = $input.first().json;

// Generate message content
const messageContent = inputData.SUMMARY_CN;

// Gmail message body
const gmailMessage = {
    subject: inputData.title || `Daily Paper Summary - ${date}`,
    message: `<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title> RAG Daily Paper Summary - ${date}</title>
    <style type="text/css">
        /* Gmail safe styles */
        body {
            font-family: Arial, sans-serif;
            line-height: 1.4;
            margin: 0;
            padding: 0;
            background-color: #f9f9f9;
            color: #333333;
        }
        
        table {
            border-collapse: collapse;
            mso-table-lspace: 0pt;
            mso-table-rspace: 0pt;
        }
        
        .email-wrapper {
            width: 100%;
            background-color: #f9f9f9;
            padding: 40px 20px;
        }
        
        .email-container {
            width: 100%;
            max-width: 600px;
            margin: 0 auto;
            background-color: #ffffff;
            border-radius: 8px;
            box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);
        }
        
        .header {
            background-color: #2563eb;
            padding: 24px;
            text-align: center;
            border-radius: 8px 8px 0 0;
        }
        
        .header h1 {
            margin: 0 0 8px 0;
            font-size: 24px;
            font-weight: 600;
            color: #ffffff;
        }
        
        .date {
            font-size: 14px;
            color: #ffffff;
            opacity: 0.9;
        }
        
        .stats {
            background-color: #f1f5f9;
            padding: 16px 24px;
            font-size: 14px;
            color: #64748b;
        }
        
        .content {
            padding: 32px 24px 40px 24px;
        }
        
        .section {
            margin-bottom: 24px;
        }
        
        .section-title {
            font-size: 16px;
            font-weight: 600;
            color: #1e293b;
            margin-bottom: 12px;
            padding-bottom: 8px;
            border-bottom: 1px solid #e2e8f0;
        }
        
        .flag {
            display: inline-block;
            width: 20px;
            height: 14px;
            margin-right: 8px;
            border-radius: 2px;
            vertical-align: middle;
        }
        
        .flag-cn {
            background-color: #de2910;
        }
        
        .flag-en {
            background-color: #012169;
        }
        
        .summary {
            font-size: 14px;
            line-height: 1.6;
            color: #475569;
            padding: 16px;
            background-color: #f8fafc;
            border-radius: 6px;
            border-left: 3px solid #2563eb;
        }
        
        .divider {
            height: 1px;
            background-color: #e2e8f0;
            margin: 20px 0;
            border: none;
        }
        
        /* Mobile responsive */
        @media screen and (max-width: 600px) {
            .email-wrapper {
                padding: 20px 10px !important;
            }
            
            .header, .stats {
                padding: 20px 16px !important;
            }
            
            .content {
                padding: 24px 16px 32px 16px !important;
            }
            
            .email-container {
                border-radius: 0;
            }
        }
        
        /* Gmail specific fixes */
        .gmail-fix {
            display: none;
        }
        
        /* Outlook specific fixes */
        .ExternalClass {
            width: 100%;
        }
        
        .ExternalClass,
        .ExternalClass p,
        .ExternalClass span,
        .ExternalClass font,
        .ExternalClass td,
        .ExternalClass div {
            line-height: 100%;
        }
    </style>
    <!--[if mso]>
    <style type="text/css">
        .email-container {
            width: 600px !important;
        }
    </style>
    <![endif]-->
</head>
<body>
    <table role="presentation" class="email-wrapper" cellpadding="0" cellspacing="0" border="0">
        <tr>
            <td align="center">
                <table role="presentation" class="email-container" cellpadding="0" cellspacing="0" border="0">
                    <!-- Header -->
                    <tr>
                        <td class="header">
                            <h1>RAG Daily Papers</h1>
                            <div class="date">${inputData.Date || date}</div>
                        </td>
                    </tr>
                    
                    <!-- Stats -->
                    <tr>
                        <td class="stats">
                            <strong>${inputData["Number of papers"] || inputData.paperCount || 0} papers</strong> reviewed today
                        </td>
                    </tr>
                    
                    <!-- Content -->
                    <tr>
                        <td class="content">
                            <!-- Chinese Section -->
                            <div class="section">
                                <h2 class="section-title">
                                  🇨🇳 Chinese
                                </h2>
                                <div class="summary">
                                    ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}
                                </div>
                            </div>
                            
                            <!-- Divider -->
                            <hr class="divider">
                            
                            <!-- English Section -->
                            <div class="section">
                                <h2 class="section-title">
                                    🇺🇸 English
                                </h2>
                                <div class="summary">
                                    ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}
                                </div>
                            </div>
                        </td>
                    </tr>
                </table>
            </td>
        </tr>
    </table>
</body>
</html>`
};

// Feishu message body
const feishuMessage = {
    msg_type: "text",
    content: {
        text: `Today ${$input.first().json.date} ${$input.first().json.paperCount} papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`
    }
};

// n8n output format
return [
    { json: { type: "gmail", ...gmailMessage } },
    { json: { type: "feishu", ...feishuMessage } }
];

Message Example

Gmail

Feishu(LARK)

PART 2：Building a RAG Daily Papers Website with Notion + Dyad + Qwen3 Code

Implementation

Tools:

Platform: Dyad, Vibe Coding (Integrations: Github, Vercel)
AI Code Model: qwen3-coder-plus-2025-07-22
Database: Notion Database (API service built via n8n)
AI Image: Recraft, SVG

Workflow Steps

Notion API integration
Vibe Coding
- Implement and test core functionalities
- SEO and detail optimization
Custom Domain

1. Notion API

Use n8n’s Notion integration node to build a simple and easy-to-use API servic

Compared with the official native API, the response from the Notion database interface in n8n is much simpler.

Notion Official API

Each field contains multiple properties, which are often unnecessary in application development. Additionally, the raw data format is complex to handle.

JSON

{
    "object": "list",
    "results": [
        {
            "object": "page",
            "id": "273a136d-cee4-8146-9a98-fc4d59187b2c",
            "created_time": "2025-09-19T10:05:00.000Z",
            "last_edited_time": "2025-09-19T10:05:00.000Z",
            "created_by": {
                "object": "user",
                "id": "fe8dfa3a-3ca2-4bc2-9817-9f608b9cbfc5"
            },
            "last_edited_by": {
                "object": "user",
                "id": "fe8dfa3a-3ca2-4bc2-9817-9f608b9cbfc5"
            },
            "cover": null,
            "icon": null,
            "parent": {
                "type": "database_id",
                "database_id": "26ba136d-cee4-8029-ad3d-e0e8ac64993f"
            },
            "archived": false,
            "in_trash": false,
            "is_locked": false,
            "properties": {
                "html_url": {
                    "id": "%3AMFp",
                    "type": "url",
                    "url": "http://arxiv.org/abs/2509.14608v1"
                },
                "RAG_Category": {
                    "id": "%3BKcX",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "Security",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "Security",
                            "href": null
                        }
                    ]
                },
                "RAG_REASON": {
                    "id": "Amf%3A",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "",
                            "href": null
                        }
                    ]
                },
                "github": {
                    "id": "BfR_",
                    "type": "url",
                    "url": null
                },
                "author": {
                    "id": "DKyR",
                    "type": "multi_select",
                    "multi_select": [
                        {
                            "id": "940eae58-6b4c-4473-9807-698a6d48ea0c",
                            "name": "Shashank Shreedhar Bhatt",
                            "color": "orange"
                        },
                        {
                            "id": "18c77d02-8eea-4017-96e7-39d6e8bf848f",
                            "name": "Tanmay Rajore",
                            "color": "yellow"
                        },
                        {
                            "id": "b5283157-1e46-4f6d-ad0c-bd2a6d0d083e",
                            "name": "Khushboo Aggarwal",
                            "color": "green"
                        },
                        {
                            "id": "ab8ffe83-9718-48cc-bf6c-651d503d0ed6",
                            "name": "Ganesh Ananthanarayanan",
                            "color": "default"
                        },
                        {
                            "id": "256faf03-b7f1-464d-b206-ae02400e4764",
                            "name": "Ranveer Chandra",
                            "color": "yellow"
                        },
                        {
                            "id": "c058a343-839e-4cd7-bf96-9379e6122088",
                            "name": "Nishanth Chandran",
                            "color": "orange"
                        },
                        {
                            "id": "a6c3357f-6b25-4c14-aaed-993f1652dba6",
                            "name": "Suyash Choudhury",
                            "color": "blue"
                        },
                        {
                            "id": "adcd0e80-3044-4d47-a1d0-b39408603c53",
                            "name": "Divya Gupta",
                            "color": "default"
                        },
                        {
                            "id": "45e04054-961b-476b-980c-61c16202e7b0",
                            "name": "Emre Kiciman",
                            "color": "purple"
                        },
                        {
                            "id": "b26bc14f-4330-4bb4-a9eb-2644080710f9",
                            "name": "Sumit Kumar Pandey",
                            "color": "gray"
                        },
                        {
                            "id": "2e84929e-8e8a-4b1a-9479-b523932552d5",
                            "name": "Srinath Setty",
                            "color": "green"
                        },
                        {
                            "id": "e0bc3de8-d911-4804-ad7b-8f40693d4155",
                            "name": "Rahul Sharma",
                            "color": "red"
                        },
                        {
                            "id": "4d9f1267-7e95-47c8-ac3a-77c32d60cecb",
                            "name": "Teijia Zhao",
                            "color": "green"
                        }
                    ]
                },
                "summary": {
                    "id": "E%7D%5Cx",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
                            "href": null
                        }
                    ]
                },
                "RAG_NAME": {
                    "id": "UR%3F%40",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "",
                            "href": null
                        }
                    ]
                },
                "pdf_url": {
                    "id": "UkEI",
                    "type": "url",
                    "url": "http://arxiv.org/pdf/2509.14608v1"
                },
                "primary_category": {
                    "id": "%5DSvu",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "cs.CR",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "cs.CR",
                            "href": null
                        }
                    ]
                },
                "published": {
                    "id": "h%5D%7CN",
                    "type": "date",
                    "date": {
                        "start": "2025-09-18T04:30:00.000+00:00",
                        "end": null,
                        "time_zone": null
                    }
                },
                "RAG_TF": {
                    "id": "j%5EmC",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "T",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "T",
                            "href": null
                        }
                    ]
                },
                "id": {
                    "id": "skBo",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "http://arxiv.org/abs/2509.14608v1",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "http://arxiv.org/abs/2509.14608v1",
                            "href": null
                        }
                    ]
                },
                "category": {
                    "id": "xfyy",
                    "type": "multi_select",
                    "multi_select": [
                        {
                            "id": "45922d11-589a-4011-b8ad-19fa2583fd29",
                            "name": "cs.CR",
                            "color": "blue"
                        },
                        {
                            "id": "49cc7a9c-4b92-4b33-ac3b-30b71d3858f4",
                            "name": "cs.AI",
                            "color": "green"
                        }
                    ]
                },
                "updated": {
                    "id": "%7CUT%5D",
                    "type": "date",
                    "date": {
                        "start": "2025-09-18T04:30:00.000+00:00",
                        "end": null,
                        "time_zone": null
                    }
                },
                "huggingface": {
                    "id": "%7C%7DH%7C",
                    "type": "url",
                    "url": null
                },
                "title": {
                    "id": "title",
                    "type": "title",
                    "title": [
                        {
                            "type": "text",
                            "text": {
                                "content": "Enterprise AI Must Enforce Participant-Aware Access Control",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "Enterprise AI Must Enforce Participant-Aware Access Control",
                            "href": null
                        }
                    ]
                }
            },
            "url": "https://www.notion.so/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c",
            "public_url": "https://dongou.notion.site/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c"
        }  
    ],
    "next_cursor": null,
    "has_more": false,
    "type": "page_or_database",
    "page_or_database": {},
    "developer_survey": "https://notionup.typeform.com/to/bllBsoI4?utm_source=postman",
    "request_id": "8472bbc3-45f3-49f5-8fda-e70a0bccf6e7"
}

{
    "object": "list",
    "results": [
        {
            "object": "page",
            "id": "273a136d-cee4-8146-9a98-fc4d59187b2c",
            "created_time": "2025-09-19T10:05:00.000Z",
            "last_edited_time": "2025-09-19T10:05:00.000Z",
            "created_by": {
                "object": "user",
                "id": "fe8dfa3a-3ca2-4bc2-9817-9f608b9cbfc5"
            },
            "last_edited_by": {
                "object": "user",
                "id": "fe8dfa3a-3ca2-4bc2-9817-9f608b9cbfc5"
            },
            "cover": null,
            "icon": null,
            "parent": {
                "type": "database_id",
                "database_id": "26ba136d-cee4-8029-ad3d-e0e8ac64993f"
            },
            "archived": false,
            "in_trash": false,
            "is_locked": false,
            "properties": {
                "html_url": {
                    "id": "%3AMFp",
                    "type": "url",
                    "url": "http://arxiv.org/abs/2509.14608v1"
                },
                "RAG_Category": {
                    "id": "%3BKcX",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "Security",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "Security",
                            "href": null
                        }
                    ]
                },
                "RAG_REASON": {
                    "id": "Amf%3A",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "",
                            "href": null
                        }
                    ]
                },
                "github": {
                    "id": "BfR_",
                    "type": "url",
                    "url": null
                },
                "author": {
                    "id": "DKyR",
                    "type": "multi_select",
                    "multi_select": [
                        {
                            "id": "940eae58-6b4c-4473-9807-698a6d48ea0c",
                            "name": "Shashank Shreedhar Bhatt",
                            "color": "orange"
                        },
                        {
                            "id": "18c77d02-8eea-4017-96e7-39d6e8bf848f",
                            "name": "Tanmay Rajore",
                            "color": "yellow"
                        },
                        {
                            "id": "b5283157-1e46-4f6d-ad0c-bd2a6d0d083e",
                            "name": "Khushboo Aggarwal",
                            "color": "green"
                        },
                        {
                            "id": "ab8ffe83-9718-48cc-bf6c-651d503d0ed6",
                            "name": "Ganesh Ananthanarayanan",
                            "color": "default"
                        },
                        {
                            "id": "256faf03-b7f1-464d-b206-ae02400e4764",
                            "name": "Ranveer Chandra",
                            "color": "yellow"
                        },
                        {
                            "id": "c058a343-839e-4cd7-bf96-9379e6122088",
                            "name": "Nishanth Chandran",
                            "color": "orange"
                        },
                        {
                            "id": "a6c3357f-6b25-4c14-aaed-993f1652dba6",
                            "name": "Suyash Choudhury",
                            "color": "blue"
                        },
                        {
                            "id": "adcd0e80-3044-4d47-a1d0-b39408603c53",
                            "name": "Divya Gupta",
                            "color": "default"
                        },
                        {
                            "id": "45e04054-961b-476b-980c-61c16202e7b0",
                            "name": "Emre Kiciman",
                            "color": "purple"
                        },
                        {
                            "id": "b26bc14f-4330-4bb4-a9eb-2644080710f9",
                            "name": "Sumit Kumar Pandey",
                            "color": "gray"
                        },
                        {
                            "id": "2e84929e-8e8a-4b1a-9479-b523932552d5",
                            "name": "Srinath Setty",
                            "color": "green"
                        },
                        {
                            "id": "e0bc3de8-d911-4804-ad7b-8f40693d4155",
                            "name": "Rahul Sharma",
                            "color": "red"
                        },
                        {
                            "id": "4d9f1267-7e95-47c8-ac3a-77c32d60cecb",
                            "name": "Teijia Zhao",
                            "color": "green"
                        }
                    ]
                },
                "summary": {
                    "id": "E%7D%5Cx",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
                            "href": null
                        }
                    ]
                },
                "RAG_NAME": {
                    "id": "UR%3F%40",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "",
                            "href": null
                        }
                    ]
                },
                "pdf_url": {
                    "id": "UkEI",
                    "type": "url",
                    "url": "http://arxiv.org/pdf/2509.14608v1"
                },
                "primary_category": {
                    "id": "%5DSvu",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "cs.CR",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "cs.CR",
                            "href": null
                        }
                    ]
                },
                "published": {
                    "id": "h%5D%7CN",
                    "type": "date",
                    "date": {
                        "start": "2025-09-18T04:30:00.000+00:00",
                        "end": null,
                        "time_zone": null
                    }
                },
                "RAG_TF": {
                    "id": "j%5EmC",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "T",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "T",
                            "href": null
                        }
                    ]
                },
                "id": {
                    "id": "skBo",
                    "type": "rich_text",
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {
                                "content": "http://arxiv.org/abs/2509.14608v1",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "http://arxiv.org/abs/2509.14608v1",
                            "href": null
                        }
                    ]
                },
                "category": {
                    "id": "xfyy",
                    "type": "multi_select",
                    "multi_select": [
                        {
                            "id": "45922d11-589a-4011-b8ad-19fa2583fd29",
                            "name": "cs.CR",
                            "color": "blue"
                        },
                        {
                            "id": "49cc7a9c-4b92-4b33-ac3b-30b71d3858f4",
                            "name": "cs.AI",
                            "color": "green"
                        }
                    ]
                },
                "updated": {
                    "id": "%7CUT%5D",
                    "type": "date",
                    "date": {
                        "start": "2025-09-18T04:30:00.000+00:00",
                        "end": null,
                        "time_zone": null
                    }
                },
                "huggingface": {
                    "id": "%7C%7DH%7C",
                    "type": "url",
                    "url": null
                },
                "title": {
                    "id": "title",
                    "type": "title",
                    "title": [
                        {
                            "type": "text",
                            "text": {
                                "content": "Enterprise AI Must Enforce Participant-Aware Access Control",
                                "link": null
                            },
                            "annotations": {
                                "bold": false,
                                "italic": false,
                                "strikethrough": false,
                                "underline": false,
                                "code": false,
                                "color": "default"
                            },
                            "plain_text": "Enterprise AI Must Enforce Participant-Aware Access Control",
                            "href": null
                        }
                    ]
                }
            },
            "url": "https://www.notion.so/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c",
            "public_url": "https://dongou.notion.site/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c"
        }  
    ],
    "next_cursor": null,
    "has_more": false,
    "type": "page_or_database",
    "page_or_database": {},
    "developer_survey": "https://notionup.typeform.com/to/bllBsoI4?utm_source=postman",
    "request_id": "8472bbc3-45f3-49f5-8fda-e70a0bccf6e7"
}

n8n : Notion/Get many database pages

In n8n, the Notion “Get many database pages” node provides a Simplify mode: it returns a simplified version of the response instead of the raw data.

JSON

[
    {
        "id": "273a136d-cee4-8146-9a98-fc4d59187b2c",
        "name": "Enterprise AI Must Enforce Participant-Aware Access Control",
        "url": "https://www.notion.so/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c",
        "property_html_url": "http://arxiv.org/abs/2509.14608v1",
        "property_rag_category": "Security",
        "property_rag_reason": "",
        "property_github": null,
        "property_author": [
            "Shashank Shreedhar Bhatt",
            "Tanmay Rajore",
            "Khushboo Aggarwal",
            "Ganesh Ananthanarayanan",
            "Ranveer Chandra",
            "Nishanth Chandran",
            "Suyash Choudhury",
            "Divya Gupta",
            "Emre Kiciman",
            "Sumit Kumar Pandey",
            "Srinath Setty",
            "Rahul Sharma",
            "Teijia Zhao"
        ],
        "property_summary": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
        "property_rag_name": "",
        "property_pdf_url": "http://arxiv.org/pdf/2509.14608v1",
        "property_primary_category": "cs.CR",
        "property_published": {
            "start": "2025-09-18T04:30:00.000+00:00",
            "end": null,
            "time_zone": null
        },
        "property_rag_tf": "T",
        "property_id": "http://arxiv.org/abs/2509.14608v1",
        "property_category": [
            "cs.CR",
            "cs.AI"
        ],
        "property_updated": {
            "start": "2025-09-18T04:30:00.000+00:00",
            "end": null,
            "time_zone": null
        },
        "property_huggingface": null,
        "property_title": "Enterprise AI Must Enforce Participant-Aware Access Control"
    }
]

[
    {
        "id": "273a136d-cee4-8146-9a98-fc4d59187b2c",
        "name": "Enterprise AI Must Enforce Participant-Aware Access Control",
        "url": "https://www.notion.so/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c",
        "property_html_url": "http://arxiv.org/abs/2509.14608v1",
        "property_rag_category": "Security",
        "property_rag_reason": "",
        "property_github": null,
        "property_author": [
            "Shashank Shreedhar Bhatt",
            "Tanmay Rajore",
            "Khushboo Aggarwal",
            "Ganesh Ananthanarayanan",
            "Ranveer Chandra",
            "Nishanth Chandran",
            "Suyash Choudhury",
            "Divya Gupta",
            "Emre Kiciman",
            "Sumit Kumar Pandey",
            "Srinath Setty",
            "Rahul Sharma",
            "Teijia Zhao"
        ],
        "property_summary": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
        "property_rag_name": "",
        "property_pdf_url": "http://arxiv.org/pdf/2509.14608v1",
        "property_primary_category": "cs.CR",
        "property_published": {
            "start": "2025-09-18T04:30:00.000+00:00",
            "end": null,
            "time_zone": null
        },
        "property_rag_tf": "T",
        "property_id": "http://arxiv.org/abs/2509.14608v1",
        "property_category": [
            "cs.CR",
            "cs.AI"
        ],
        "property_updated": {
            "start": "2025-09-18T04:30:00.000+00:00",
            "end": null,
            "time_zone": null
        },
        "property_huggingface": null,
        "property_title": "Enterprise AI Must Enforce Participant-Aware Access Control"
    }
]

Building Notion API with n8n

Construct Filters (JSON) based on the query parameters needed for your application. ⚠️ Note: All date and time formats must be in UTC.

Model: OpenAI GPT-5

RAG DAILY

JavaScript

const date = $json.query?.date;
const rag = "T";

let filter;

if (date) {
  filter = {
    and: [
      {
        property: "updated",
        date: {
          on_or_after: `${date}T00:00:00.000Z`,
          before: `${date}T23:59:59.000Z`
        }
      },
      {
        property: "RAG_TF",
        rich_text: { equals: rag }
      }
    ]
  };
} else {
  filter = {
    property: "RAG_TF",
    rich_text: { equals: rag }
  };
}
return [{ json: { filter: JSON.stringify(filter) } }];

const date = $json.query?.date;
const rag = "T";

let filter;

if (date) {
  filter = {
    and: [
      {
        property: "updated",
        date: {
          on_or_after: `${date}T00:00:00.000Z`,
          before: `${date}T23:59:59.000Z`
        }
      },
      {
        property: "RAG_TF",
        rich_text: { equals: rag }
      }
    ]
  };
} else {
  filter = {
    property: "RAG_TF",
    rich_text: { equals: rag }
  };
}
return [{ json: { filter: JSON.stringify(filter) } }];

RAG Daily Paper Summary

JavaScript

const date = $json.query?.date;
let filter;


if (date) {
const inputDate = new Date(date + "T00:00:00.000Z");

const formattedDate = inputDate.toISOString().split("T")[0];

filter = {
property: "DATE",
date: {
equals: formattedDate
}
};
} else {
filter = {};
}

return [{ json: { filter: JSON.stringify(filter) } }];

const date = $json.query?.date;
let filter;


if (date) {
const inputDate = new Date(date + "T00:00:00.000Z");

const formattedDate = inputDate.toISOString().split("T")[0];

filter = {
property: "DATE",
date: {
equals: formattedDate
}
};
} else {
filter = {};
}

return [{ json: { filter: JSON.stringify(filter) } }];

2. vibe coding

Dyad is a local open-source tool that allows you to freely add API services for Vibe Coding. However, note that Dyad’s development mode is not a packaged Agent, so it cannot automatically generate image assets for website design. External image resources need to be integrated separately.

Code Model: qwen3-coder-plus-2025-07-22

Image Model: Recraft, supports generating and exporting SVG images

PROMPT

Data Description :Table description: explain the meaning and purpose of each field
API Description: API request details, including Request and Response examples
Application Page Description: Explain the interaction and data access logic of each content module (SEO/Header/Body/Footer)
Design and Development Standards: Ensure consistency and standardization in design and development

Plaintext

# Purpose

Design a RAG Daily Papers calendar card website that fetches data from a Notion database.

# Notion Data Sources

The data comes from the Notion API, consisting of 2 tables: RAG Daily Paper Summary and RAG DAILY.

## RAG Daily Paper Summary
### Daily Paper Summary Table:
- property_title: Title
- property_date: Date in UTC
- property_number_of_papers: Number of papers per day
- property_summary_en: English summary
- property_summary_cn: Chinese summary

### API Request:
https://<YOUR_WEBHOOK_URL>?date={date}
- date format: YYYY-MM-DD, optional. If no date parameter is provided, all data is returned; if date is provided, only data for that date is returned.
- HTTP Method: GET
- Authentication: None

#### Example Requests:
- Get all data: https://<YOUR_WEBHOOK_URL>
- Get data for 2025-09-15: https://<YOUR_WEBHOOK_URL>?date=2025-09-15

#### Example RESPONSE:
{
"id": "271a136d-cee4-8177-bf61-df44bdb63010",
"name": "2025-09-15 paper summary",
"url": "https://www.notion.so/2025-09-15-paper-summary-271a136dcee48177bf61df44bdb63010",
"property_summary_en": "Today's papers focus on Retrieval-Augmented Generation (RAG) and its applications and improvements across various domains. One paper introduces the HiChunk framework and HiCBench benchmark for evaluating and enhancing RAG document chunking quality. Another paper develops a GPU-accelerated RAG Telegram assistant to provide academic support for students in an 'Introduction to Parallel Processing' course. A different study presents RAGs-to-Riches, a RAG-like few-shot learning method to improve Large Language Model (LLM) role-playing. MMORE is an open-source pipeline for Massive Multimodal Open Retrieval-Augmented Generation and Extraction, supporting diverse document formats. FinGEAR is a retrieval framework tailored for financial documents, using mapping guidance for enhanced answer retrieval to tackle complex documents like 10-Ks. One research explores the adaptation and evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management, proposing a 'Divide and Conquer' framework incorporating spinal keypoint prompting and RAG. Finally, SAQ introduces a novel vector quantization method, employing code adjustment and dimension segmentation to advance approximate nearest neighbor search (ANNS) and RAG.",
"property_date": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_summary_cn": "今日的论文主要集中在检索增强生成（RAG）及其在不同领域的应用和改进。其中一篇论文提出了HiChunk框架和HiCBench基准，用于评估和改进RAG的文档分块质量。另一篇论文开发了一个GPU加速的RAG Telegram助手，为“并行处理导论”课程的学生提供学术支持。还有一篇论文介绍了RAGs-to-Riches框架，一种类RAG的少样本学习方法，用于提高大型语言模型（LLM）的角色扮演能力。MMORE是一个用于大规模多模态开放检索增强生成和提取的开源流水线，支持多种文档格式。FinGEAR是一个针对金融文件设计的检索框架，通过映射指导增强答案检索，以解决10-K文件等复杂检索问题。一篇研究探讨了多模态大型语言模型在青少年特发性脊柱侧弯（AIS）自我管理中的应用，并提出了一个“分而治之”的框架，结合了脊柱关键点提示和RAG。最后，SAQ提出了一种新的向量量化方法，通过代码调整和维度分割来推动向量量化在近似最近邻搜索（ANNS）和RAG中的应用。",
"property_number_of_papers": 7,
"property_title": "2025-09-15 paper summary"
}


## RAG DAILY
### Paper List Table:
- property_title: Paper title
- property_updated: Last updated time (UTC)
- property_published: Published time (UTC)
- property_id: Paper ID / link
- property_summary: Paper abstract
- property_html_url: Paper HTML preview URL
- property_pdf_url: Paper PDF preview URL
- property_primary_category: Primary category
- property_category: Categories (multi-select tags)
- property_github: Related GitHub project URL
- property_huggingface: Related HuggingFace project URL
- property_rag_tf: Whether the paper is RAG-related, T/F
- property_rag_reason: Reason for not being RAG-related
- property_rag_category: RAG technology category (may be empty)
- property_rag_name: RAG technology name (may be empty)
- property_author: Authors (multi-select tags)

### API Request:
https://<YOUR_WEBHOOK_URL>?date={date}
- date format: YYYY-MM-DD, optional. If no date parameter is provided, all data is returned; if date is provided, only data for that date is returned.
- HTTP Method: GET
- Authentication: None

#### Example Requests:
- Get all data: https://<YOUR_WEBHOOK_URL>
- Get data for 2025-09-15: https://<YOUR_WEBHOOK_URL>?date=2025-09-15

#### Example RESPONSE:
{
"id": "271a136d-cee4-81ee-851e-d74884293f20",
"name": "SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation",
"url": "https://www.notion.so/SAQ-Pushing-the-Limits-of-Vector-Quantization-through-Code-Adjustment-and-Dimension-Segmentation-271a136dcee481ee851ed74884293f20",
"property_html_url": "http://arxiv.org/abs/2509.12086v1",
"property_rag_category": "Supporting Technology",
"property_rag_reason": "",
"property_github": null,
"property_author": [
"Hui Li",        
"Shiyuan Deng",
"Xiao Yan",
"Xiangyu Zhi",
"James Cheng" 
],
"property_summary": "Approximate Nearest Neighbor Search (ANNS) plays a critical role in applications such as search engines, recommender systems, and RAG for LLMs. Vector quantization (VQ), a crucial technique for ANNS, is commonly used to reduce space overhead and accelerate distance computations. However, despite significant research advances, state-of-the-art VQ methods still face challenges in balancing encoding efficiency and quantization accuracy. To address these limitations, we propose a novel VQ method called SAQ. To improve accuracy, SAQ employs a new dimension segmentation technique to strategically partition PCA-projected vectors into segments along their dimensions. By prioritizing leading dimension segments with larger magnitudes, SAQ allocates more bits to high-impact segments, optimizing the use of the available space quota. An efficient dynamic programming algorithm is developed to optimize dimension segmentation and bit allocation, ensuring minimal quantization error. To speed up vector encoding, SAQ devises a code adjustment technique to first quantize each dimension independently and then progressively refine quantized vectors using a coordinate-descent-like approach to avoid exhaustive enumeration. Extensive experiments demonstrate SAQ's superiority over classical methods (e.g., PQ, PCA) and recent state-of-the-art approaches (e.g., LVQ, Extended RabitQ). SAQ achieves up to 80% reduction in quantization error and accelerates encoding speed by over 80x compared to Extended RabitQ.",
"property_rag_name": "",
"property_pdf_url": "http://arxiv.org/pdf/2509.12086v1",
"property_primary_category": "cs.DB",
"property_published": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_rag_tf": "T",
"property_id": "http://arxiv.org/abs/2509.12086v1",
"property_category": [
"cs.DB",
"cs.DS",
"cs.IR"
],
"property_updated": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_huggingface": null,
"property_title": "SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation"
}


# Page Design Requirements
Values in code will be represented using ${} notation as described below.

## Basic Page Info:
- title: RAG Daily Papers
- 
- SEO Configuration:
-- <title>RAG Daily Papers - Latest RAG Research from arXiv</title>
-- <meta name="description" content="RAG Daily Papers curates the latest Retrieval-Augmented Generation (RAG) research papers from arXiv. Updated daily with summaries for researchers, engineers, and AI enthusiasts.">
-- <meta name="keywords" content="RAG, Retrieval-Augmented Generation, arXiv, AI papers, NLP, LLM, daily research, machine learning">
-- <meta name="robots" content="index, follow">

## Header
- title: RAG Daily Papers
- subtitle: Stay updated with the latest RAG research, every day.
- Total papers ${RAG DAILY, COUNT all entries}
- Data Source: arXiv
- button: NOTION button, ${Open in a new window: <YOUR_NOTION_LINK>}

## Body 
### Date Filter
Default unselected; selecting a date sets ${RAG DAILY, date param = selected date}

### Content List
- Information groups sorted by: ${RAG DAILY, grouped by property_updated, descending order inside each group}
- Info Group:
-- Group title: YYYY/MM/DD DayOfWeek, ${RAG DAILY, convert property_updated UTC to local display}
-- Group title note: "Displayed based on paper updated time"
-- Group summary: ${RAG Daily Paper Summary, property_date in UTC}
-- Card contents:
--- Paper title: ${RAG DAILY, property_title}
--- Paper summary: ${RAG DAILY, property_summary}
--- RAG technology name: ${RAG DAILY, property_rag_name}
--- RAG category description: ${RAG DAILY, property_rag_category}
--- Paper categories: tags ${RAG DAILY, property_category}
--- Project icons: GitHub, HuggingFace ${RAG DAILY, if property_github/property_huggingface not null, show icon}  
--- Click event: Show more in a modal
-- Modal contents:
--- Section 1:
---- RAG technology name: ${RAG DAILY, property_rag_category}
---- RAG category description: ${RAG DAILY, property_rag_name}
--- Section 2:
---- Paper title: ${RAG DAILY, property_title}
---- Paper published: ${RAG DAILY, property_published, UTC}
---- Paper updated: ${RAG DAILY, property_updated, UTC}
---- Authors: tags ${RAG DAILY, property_author}
---- Original link: hyperlink ${RAG DAILY, property_id}
--- Section 3:
---- Paper summary: ${RAG DAILY, property_summary}
---- Categories: tags ${RAG DAILY, property_category}
--- Paper buttons:
---- PDF Online: ${RAG DAILY, property_pdf_url}, open in new window
---- HTML Online: ${RAG DAILY, property_html_url}, open in new window
--- Project buttons:
---- GitHub: ${RAG DAILY, property_github if not null}, open in new window
---- HuggingFace: ${RAG DAILY, property_huggingface if not null}, open in new window

## Footer
- copyright: dongou.tech
- email: <YOUR_EMAIL>

# Page Development Guidelines:
1. The website is fully English; translate all non-English text.
2. Style: Notion-like, clean and minimal.
3. Layout: Accommodate varying content lengths and empty content gracefully.
4. Privacy & Security: Do not expose API keys in front-end code.
5. Performance: Avoid fetching all fields in one request; use lazy loading.
6. Timezone: UTC

# Purpose

Design a RAG Daily Papers calendar card website that fetches data from a Notion database.

# Notion Data Sources

The data comes from the Notion API, consisting of 2 tables: RAG Daily Paper Summary and RAG DAILY.

## RAG Daily Paper Summary
### Daily Paper Summary Table:
- property_title: Title
- property_date: Date in UTC
- property_number_of_papers: Number of papers per day
- property_summary_en: English summary
- property_summary_cn: Chinese summary

### API Request:
https://<YOUR_WEBHOOK_URL>?date={date}
- date format: YYYY-MM-DD, optional. If no date parameter is provided, all data is returned; if date is provided, only data for that date is returned.
- HTTP Method: GET
- Authentication: None

#### Example Requests:
- Get all data: https://<YOUR_WEBHOOK_URL>
- Get data for 2025-09-15: https://<YOUR_WEBHOOK_URL>?date=2025-09-15

#### Example RESPONSE:
{
"id": "271a136d-cee4-8177-bf61-df44bdb63010",
"name": "2025-09-15 paper summary",
"url": "https://www.notion.so/2025-09-15-paper-summary-271a136dcee48177bf61df44bdb63010",
"property_summary_en": "Today's papers focus on Retrieval-Augmented Generation (RAG) and its applications and improvements across various domains. One paper introduces the HiChunk framework and HiCBench benchmark for evaluating and enhancing RAG document chunking quality. Another paper develops a GPU-accelerated RAG Telegram assistant to provide academic support for students in an 'Introduction to Parallel Processing' course. A different study presents RAGs-to-Riches, a RAG-like few-shot learning method to improve Large Language Model (LLM) role-playing. MMORE is an open-source pipeline for Massive Multimodal Open Retrieval-Augmented Generation and Extraction, supporting diverse document formats. FinGEAR is a retrieval framework tailored for financial documents, using mapping guidance for enhanced answer retrieval to tackle complex documents like 10-Ks. One research explores the adaptation and evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management, proposing a 'Divide and Conquer' framework incorporating spinal keypoint prompting and RAG. Finally, SAQ introduces a novel vector quantization method, employing code adjustment and dimension segmentation to advance approximate nearest neighbor search (ANNS) and RAG.",
"property_date": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_summary_cn": "今日的论文主要集中在检索增强生成（RAG）及其在不同领域的应用和改进。其中一篇论文提出了HiChunk框架和HiCBench基准，用于评估和改进RAG的文档分块质量。另一篇论文开发了一个GPU加速的RAG Telegram助手，为“并行处理导论”课程的学生提供学术支持。还有一篇论文介绍了RAGs-to-Riches框架，一种类RAG的少样本学习方法，用于提高大型语言模型（LLM）的角色扮演能力。MMORE是一个用于大规模多模态开放检索增强生成和提取的开源流水线，支持多种文档格式。FinGEAR是一个针对金融文件设计的检索框架，通过映射指导增强答案检索，以解决10-K文件等复杂检索问题。一篇研究探讨了多模态大型语言模型在青少年特发性脊柱侧弯（AIS）自我管理中的应用，并提出了一个“分而治之”的框架，结合了脊柱关键点提示和RAG。最后，SAQ提出了一种新的向量量化方法，通过代码调整和维度分割来推动向量量化在近似最近邻搜索（ANNS）和RAG中的应用。",
"property_number_of_papers": 7,
"property_title": "2025-09-15 paper summary"
}


## RAG DAILY
### Paper List Table:
- property_title: Paper title
- property_updated: Last updated time (UTC)
- property_published: Published time (UTC)
- property_id: Paper ID / link
- property_summary: Paper abstract
- property_html_url: Paper HTML preview URL
- property_pdf_url: Paper PDF preview URL
- property_primary_category: Primary category
- property_category: Categories (multi-select tags)
- property_github: Related GitHub project URL
- property_huggingface: Related HuggingFace project URL
- property_rag_tf: Whether the paper is RAG-related, T/F
- property_rag_reason: Reason for not being RAG-related
- property_rag_category: RAG technology category (may be empty)
- property_rag_name: RAG technology name (may be empty)
- property_author: Authors (multi-select tags)

### API Request:
https://<YOUR_WEBHOOK_URL>?date={date}
- date format: YYYY-MM-DD, optional. If no date parameter is provided, all data is returned; if date is provided, only data for that date is returned.
- HTTP Method: GET
- Authentication: None

#### Example Requests:
- Get all data: https://<YOUR_WEBHOOK_URL>
- Get data for 2025-09-15: https://<YOUR_WEBHOOK_URL>?date=2025-09-15

#### Example RESPONSE:
{
"id": "271a136d-cee4-81ee-851e-d74884293f20",
"name": "SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation",
"url": "https://www.notion.so/SAQ-Pushing-the-Limits-of-Vector-Quantization-through-Code-Adjustment-and-Dimension-Segmentation-271a136dcee481ee851ed74884293f20",
"property_html_url": "http://arxiv.org/abs/2509.12086v1",
"property_rag_category": "Supporting Technology",
"property_rag_reason": "",
"property_github": null,
"property_author": [
"Hui Li",        
"Shiyuan Deng",
"Xiao Yan",
"Xiangyu Zhi",
"James Cheng" 
],
"property_summary": "Approximate Nearest Neighbor Search (ANNS) plays a critical role in applications such as search engines, recommender systems, and RAG for LLMs. Vector quantization (VQ), a crucial technique for ANNS, is commonly used to reduce space overhead and accelerate distance computations. However, despite significant research advances, state-of-the-art VQ methods still face challenges in balancing encoding efficiency and quantization accuracy. To address these limitations, we propose a novel VQ method called SAQ. To improve accuracy, SAQ employs a new dimension segmentation technique to strategically partition PCA-projected vectors into segments along their dimensions. By prioritizing leading dimension segments with larger magnitudes, SAQ allocates more bits to high-impact segments, optimizing the use of the available space quota. An efficient dynamic programming algorithm is developed to optimize dimension segmentation and bit allocation, ensuring minimal quantization error. To speed up vector encoding, SAQ devises a code adjustment technique to first quantize each dimension independently and then progressively refine quantized vectors using a coordinate-descent-like approach to avoid exhaustive enumeration. Extensive experiments demonstrate SAQ's superiority over classical methods (e.g., PQ, PCA) and recent state-of-the-art approaches (e.g., LVQ, Extended RabitQ). SAQ achieves up to 80% reduction in quantization error and accelerates encoding speed by over 80x compared to Extended RabitQ.",
"property_rag_name": "",
"property_pdf_url": "http://arxiv.org/pdf/2509.12086v1",
"property_primary_category": "cs.DB",
"property_published": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_rag_tf": "T",
"property_id": "http://arxiv.org/abs/2509.12086v1",
"property_category": [
"cs.DB",
"cs.DS",
"cs.IR"
],
"property_updated": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_huggingface": null,
"property_title": "SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation"
}


# Page Design Requirements
Values in code will be represented using ${} notation as described below.

## Basic Page Info:
- title: RAG Daily Papers
- 
- SEO Configuration:
-- <title>RAG Daily Papers - Latest RAG Research from arXiv</title>
-- <meta name="description" content="RAG Daily Papers curates the latest Retrieval-Augmented Generation (RAG) research papers from arXiv. Updated daily with summaries for researchers, engineers, and AI enthusiasts.">
-- <meta name="keywords" content="RAG, Retrieval-Augmented Generation, arXiv, AI papers, NLP, LLM, daily research, machine learning">
-- <meta name="robots" content="index, follow">

## Header
- title: RAG Daily Papers
- subtitle: Stay updated with the latest RAG research, every day.
- Total papers ${RAG DAILY, COUNT all entries}
- Data Source: arXiv
- button: NOTION button, ${Open in a new window: <YOUR_NOTION_LINK>}

## Body 
### Date Filter
Default unselected; selecting a date sets ${RAG DAILY, date param = selected date}

### Content List
- Information groups sorted by: ${RAG DAILY, grouped by property_updated, descending order inside each group}
- Info Group:
-- Group title: YYYY/MM/DD DayOfWeek, ${RAG DAILY, convert property_updated UTC to local display}
-- Group title note: "Displayed based on paper updated time"
-- Group summary: ${RAG Daily Paper Summary, property_date in UTC}
-- Card contents:
--- Paper title: ${RAG DAILY, property_title}
--- Paper summary: ${RAG DAILY, property_summary}
--- RAG technology name: ${RAG DAILY, property_rag_name}
--- RAG category description: ${RAG DAILY, property_rag_category}
--- Paper categories: tags ${RAG DAILY, property_category}
--- Project icons: GitHub, HuggingFace ${RAG DAILY, if property_github/property_huggingface not null, show icon}  
--- Click event: Show more in a modal
-- Modal contents:
--- Section 1:
---- RAG technology name: ${RAG DAILY, property_rag_category}
---- RAG category description: ${RAG DAILY, property_rag_name}
--- Section 2:
---- Paper title: ${RAG DAILY, property_title}
---- Paper published: ${RAG DAILY, property_published, UTC}
---- Paper updated: ${RAG DAILY, property_updated, UTC}
---- Authors: tags ${RAG DAILY, property_author}
---- Original link: hyperlink ${RAG DAILY, property_id}
--- Section 3:
---- Paper summary: ${RAG DAILY, property_summary}
---- Categories: tags ${RAG DAILY, property_category}
--- Paper buttons:
---- PDF Online: ${RAG DAILY, property_pdf_url}, open in new window
---- HTML Online: ${RAG DAILY, property_html_url}, open in new window
--- Project buttons:
---- GitHub: ${RAG DAILY, property_github if not null}, open in new window
---- HuggingFace: ${RAG DAILY, property_huggingface if not null}, open in new window

## Footer
- copyright: dongou.tech
- email: <YOUR_EMAIL>

# Page Development Guidelines:
1. The website is fully English; translate all non-English text.
2. Style: Notion-like, clean and minimal.
3. Layout: Accommodate varying content lengths and empty content gracefully.
4. Privacy & Security: Do not expose API keys in front-end code.
5. Performance: Avoid fetching all fields in one request; use lazy loading.
6. Timezone: UTC

Debug

Auto Debug: Let AI generate a checklist based on the prompt and check items one by one.
Manual Testing & Adjustment: Test functionality and data accuracy, and refine styles and layout.

Publish

Dyad: Integrate GitHub and Vercel to publish the app.
GitHub: Sync code to GitHub.
Vercel: Import the Git repository to deploy the app and configure a custom domain.

RAG Daily Papers

https://ragdaily.com

n8n + Notion + Vibe Coding: Your Automated Daily Papers Hub

Design Objectives

Goal Breakdown

Platform/Tool/Model

PART 1：Creating a Personalized Daily Papers with n8n and Notion

Implementation

Workflow Steps

n8n workflow code

1. Data Retrieval

arXiv API

API Request Example:

API Response Example:

Scheduled Task

2. Data Extraction

Data Cleaning Rules (Convert to Standard JSON)

JS CODE

3. Data Processing

3.1 Single Paper Basic Information Analysis and Enhancement

System Prompt Settings：

JS CODE

3.2 Daily Paper Summary and Multilingual Translation

System Prompt Settings

JS CODE

4. Data Storage

Notion Database

5. Message Push

Email: Gmail

M: Feishu(LARK)

Message Construction

Message Example

Gmail

Feishu(LARK)

PART 2：Building a RAG Daily Papers Website with Notion + Dyad + Qwen3 Code

Implementation

1. Notion API

Notion Official API

n8n : Notion/Get many database pages

Building Notion API with n8n

RAG DAILY

RAG Daily Paper Summary

2. vibe coding

PROMPT

Debug

Publish

RAG Daily Papers

Tags:

Categories

Tags

Prev

Hands-on with Rocket: Another AI Code (Vibe Coding) Product

Back to Blog

Next

2026, HAPPY CHINESE NEW YEAR