
Explore an automated hub for daily research papers, powered by n8n, Notion, and Vibe Coding. Automatically aggregate, organize, and access the latest studies across AI, RAG, and other domains, designed for researchers, engineers, and knowledge enthusiasts.
Fetch user-specific research papers from arXiv on a daily schedule, process and structure the data, and create or update entries in a Notion database, with support for data delivery and application development.
| Name | Category | Description (EN) | Link |
| n8n | Platform | Workflow and API service orchestration | n8n.io |
| Gemini-2.5-Flash | Model | Research paper summarization and processing | deepmind.google |
| Notion | Database | Knowledge and project database | notion.com |
| Gmail | Email notifications and delivery | gmail.com | |
| Feishu (LARK) | IM | Instant messaging and notification delivery | larksuite.com |
| ClawCloud | Deployment | n8n deployment service | run.claw.cloud |
| Dyad | Coding | a free, local, open-source Vibe coding tool | dyad.sh |
| GitHub | Code Hosting | Code publishing and version control | github.com |
| Qwen3-Coder-Plus-2025-07-22 | Model | Code generation model | qwenlm.github.io |
| Vercel | Deployment | Web app deployment platform | vercel.com |
| Recraft | AI Design | SVG illustration and asset generation | recraft.com |
| OpenAI GPT-5 | Model | Email HTML template Design | chatgpt.com |
| Claude Sonnet 4 | Model | JavaScript code assistant | claude.ai |

{
"nodes": [
{
"parameters": {
"promptType": "define",
"text": "={{ $json.data }}",
"messages": {
"messageValues": [
{
"message": "You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:\n\n1. RAG Relevance and Labeling:\n - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.\n - For each data item, add three new fields:\n - `RAG_TF`: \"T\" if related, \"F\" if not\n - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty\n - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty\n\n2. RAG Method Extraction:\n - Analyze the `summary` and extract the RAG method proposed in the paper.\n - Store it in the new field `RAG_NAME`.\n\n3. External Link Extraction:\n - Analyze the `summary` content for `github` or `huggingface` links.\n - If present, extract the URLs and populate the existing `github` and `huggingface` fields.\n - If not present, leave them unchanged.\n\nOutput Format: standard JSON\n\nExample:\n\nGiven a data item with the following `summary`:\n\n\"summary\":\"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer\n"
}
]
},
"batching": {}
},
"type": "@n8n/n8n-nodes-langchain.chainLlm",
"typeVersion": 1.7,
"position": [
272,
0
],
"id": "7e9f18f1-edfe-4af6-835b-12fe16a99034",
"name": "Basic LLM Chain"
},
{
"parameters": {
"modelName": "=models/gemini-2.5-flash",
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
"typeVersion": 1,
"position": [
272,
144
],
"id": "92d37dc1-aaaf-47ec-987a-e6d23c93e055",
"name": "Google Gemini Chat Model",
"credentials": {
"googlePalmApi": {
"id": "ra9slZSGvLJTHQw1",
"name": "Google Gemini(PaLM) Api account"
}
}
},
{
"parameters": {
"jsCode": "// Function 节点代码\nconst now = new Date();\nconst yesterday = new Date(now);\nyesterday.setDate(now.getDate() - 2);\n\nconst y = yesterday.getFullYear();\nconst m = String(yesterday.getMonth() + 1).padStart(2, '0');\nconst d = String(yesterday.getDate()).padStart(2, '0');\n\nreturn [\n {\n json: {\n from: `${y}${m}${d}0000`,\n to: `${y}${m}${d}2359`\n }\n }\n];\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
-1664,
320
],
"id": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
"name": "submittedDate:T-1"
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "de0a5a7e-67dd-4dd0-8ccc-3406e17bd09c",
"leftValue": "={{ $json.paperCount }}",
"rightValue": 0,
"operator": {
"type": "number",
"operation": "notEquals"
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
-160,
16
],
"id": "c3685631-8bbd-409a-978a-fbb3e9847115",
"name": "If"
},
{
"parameters": {
"rule": {
"interval": [
{
"triggerAtHour": 6
}
]
}
},
"type": "n8n-nodes-base.scheduleTrigger",
"typeVersion": 1.2,
"position": [
-1856,
320
],
"id": "4dd24343-1872-472d-8d7d-4cd28a9dbabe",
"name": "Schedule Trigger"
},
{
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"leftValue": "={{ $json.type }}",
"rightValue": "feishu",
"operator": {
"type": "string",
"operation": "equals"
},
"id": "7b804f5e-6702-4d4a-99b9-3f06f8eb20d4"
}
],
"combinator": "and"
}
}
]
},
"options": {}
},
"type": "n8n-nodes-base.switch",
"typeVersion": 3.2,
"position": [
576,
720
],
"id": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
"name": "FEISHU"
},
{
"parameters": {
"method": "POST",
"url": "=",
"sendBody": true,
"bodyParameters": {
"parameters": [
{
"name": "msg_type",
"value": "={{ $json.msg_type }}"
},
{
"name": "content",
"value": "={{ $json.content }}"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
800,
720
],
"id": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
"name": "FEISHU POST"
},
{
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"leftValue": "={{ $json.type }}",
"rightValue": "gmail",
"operator": {
"type": "string",
"operation": "equals"
},
"id": "3222832c-bbf2-46a2-abd8-2bb14095b7bf"
}
],
"combinator": "and"
}
}
]
},
"options": {}
},
"type": "n8n-nodes-base.switch",
"typeVersion": 3.2,
"position": [
576,
544
],
"id": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
"name": "gmail"
},
{
"parameters": {
"sendTo": "xing.adam@gmail.com",
"subject": "={{ $json.subject }}",
"message": "={{ $json.message }}",
"options": {}
},
"type": "n8n-nodes-base.gmail",
"typeVersion": 2.1,
"position": [
800,
544
],
"id": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
"name": "Send a message",
"webhookId": "cb0a1f30-59e0-4505-af24-db689d9c1f23",
"credentials": {
"gmailOAuth2": {
"id": "WoyY5hj4D93bD2Fp",
"name": "Gmail account"
}
}
},
{
"parameters": {
"modelId": {
"__rl": true,
"value": "models/gemini-2.5-flash-lite",
"mode": "list",
"cachedResultName": "models/gemini-2.5-flash-lite"
},
"messages": {
"values": [
{
"content": "You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:\n\n1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary\n2. Set the daily date field `Date`: yyyy-mm-dd\n3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.\n4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.\n5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.\n\nExample: If there are papers:\n{\n \"Number of papers\":\"2025-09-13 paper summary\",\n \"Date\":2025-09-13,\n \"Number of papers\": 2,\n \"SUMMARY_CN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.\",\n \"SUMMARY_EN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency.\"\n}\n\nIf the number of papers is 0, maintain the JSON structure:\n{\n \"Number of papers\":\"2025-09-13 paper summary\",\n \"Date\":2025-09-13,\n \"Number of papers\": 0,\n \"SUMMARY_CN\": \"\",\n \"SUMMARY_EN\": \"\"\n}",
"role": "model"
},
{
"content": "={{ $json.data }}"
}
]
},
"simplify": false,
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.googleGemini",
"typeVersion": 1,
"position": [
-1040,
320
],
"id": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
"name": "Message a model",
"credentials": {
"googlePalmApi": {
"id": "ra9slZSGvLJTHQw1",
"name": "Google Gemini(PaLM) Api account"
}
}
},
{
"parameters": {
"resource": "databasePage",
"databaseId": {
"__rl": true,
"value": "26fa136d-cee4-8092-8b85-cf9e9cbc424f",
"mode": "list",
"cachedResultName": "RAG Daily Paper Summary",
"cachedResultUrl": "https://www.notion.so/26fa136dcee480928b85cf9e9cbc424f"
},
"title": "={{ $json.title }}",
"simple": false,
"propertiesUi": {
"propertyValues": [
{
"key": "DATE|date",
"date": "={{ $json.date }}"
},
{
"key": "Number of papers|number",
"numberValue": "={{ $json.paperCount }}"
},
{
"key": "SUMMARY_EN|rich_text",
"textContent": "={{ $json.summaryEN }}"
},
{
"key": "SUMMARY_CN|rich_text",
"textContent": "={{ $json.summaryCN }}"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.notion",
"typeVersion": 2.2,
"position": [
800,
320
],
"id": "024c6399-857e-45a3-a15d-8b733e16da67",
"name": "RAG Daily Paper Summary",
"credentials": {
"notionApi": {
"id": "BNsFk38kgqvRDJpX",
"name": "Notion account"
}
}
},
{
"parameters": {
"jsCode": "const items = $input.all();\nconst response = items[0].json;\n\ntry {\n // Extract text content from Gemini API response\n // Note: response is directly an object, not an array\n const text = response.candidates[0].content.parts[0].text;\n \n // Extract JSON content\n const jsonMatch = text.match(/```json\\n([\\s\\S]*?)\\n```/);\n const jsonStr = jsonMatch[1];\n \n // Parse JSON\n const data = JSON.parse(jsonStr);\n \n // Manually handle duplicate keys - extract from original string\n const titleMatch = jsonStr.match(/\"Number of papers\":\\s*\"([^\"]+)\"/);\n const countMatch = jsonStr.match(/\"Number of papers\":\\s*(\\d+)/);\n \n // Construct result\n items[0].json = {\n title: titleMatch ? titleMatch[1] : '',\n date: data.Date || '',\n paperCount: countMatch ? parseInt(countMatch[1]) : 0,\n summaryCN: data.SUMMARY_CN || '',\n summaryEN: data.SUMMARY_EN || ''\n };\n \n} catch (error) {\n items[0].json = {\n error: error.message,\n originalData: response\n };\n}\n\nreturn items;\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
-688,
320
],
"id": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
"name": "JSON FORMAT"
},
{
"parameters": {
"content": "## 1. Data Retrieval\n### arXiv API\n\nThe arXiv provides a public API that allows users to query research papers by topic or by predefined categories.\n\n[arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual)\n\n**Key Notes:**\n\n1. **Response Format**: The API returns data as a typical *Atom Response*.\n2. **Timezone & Update Frequency**: \n - The arXiv submission process operates on a 24-hour cycle. \n - Newly submitted articles become available in the API only at midnight *after* they have been processed. \n - Feeds are updated daily at midnight Eastern Standard Time (EST). \n - Therefore, a single request per day is sufficient. \n3. **Request Limits**: \n - The maximum number of results per call (`max_results`) is **30,000**, \n - Results must be retrieved in slices of at most **2,000** at a time, using the `max_results` and `start` query parameters. \n4. **Time Format**: \n - The expected format is `[YYYYMMDDTTTT+TO+YYYYMMDDTTTT]`, \n - `TTTT` is provided in 24-hour time to the minute, in GMT.\n\n### Scheduled Task\n\n- **Execution Frequency**: Daily \n- **Execution Time**: 6:00 AM \n- **Time Parameter Handling (JS)**: \n According to arXiv’s update rules, the scheduled task should query the **previous day’s (T-1)** `submittedDate` data.\n\n",
"height": 768,
"width": 736
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
-1984,
544
],
"id": "f1a331fa-d830-4656-b108-7e18e7430b04",
"name": "Sticky Note3"
},
{
"parameters": {
"url": "=https://export.arxiv.org/api/query?search_query=all:RAG+AND+submittedDate:[{{$json[\"from\"]}}+TO+{{$json[\"to\"]}}]",
"sendQuery": true,
"queryParameters": {
"parameters": [
{
"name": "={{ $json.from }}"
},
{
"name": "={{ $json.to }}"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
-1440,
320
],
"id": "ae855e91-2363-4b97-8933-761934b269fe",
"name": "arXiv API"
},
{
"parameters": {
"jsCode": "// Get current date\nconst now = new Date();\nconst year = now.getFullYear();\nconst month = String(now.getMonth() + 1).padStart(2, '0');\nconst day = String(now.getDate()).padStart(2, '0');\nconst date = `${year}-${month}-${day}`;\n\n// Get input data\nconst inputData = $input.first().json;\n\n// Generate message content\nconst messageContent = inputData.SUMMARY_CN;\n\n// Gmail message body\nconst gmailMessage = {\n subject: inputData.title || `Daily Paper Summary - ${date}`,\n message: `<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n <title> RAG Daily Paper Summary - ${date}</title>\n <style type=\"text/css\">\n /* Gmail safe styles */\n body {\n font-family: Arial, sans-serif;\n line-height: 1.4;\n margin: 0;\n padding: 0;\n background-color: #f9f9f9;\n color: #333333;\n }\n \n table {\n border-collapse: collapse;\n mso-table-lspace: 0pt;\n mso-table-rspace: 0pt;\n }\n \n .email-wrapper {\n width: 100%;\n background-color: #f9f9f9;\n padding: 40px 20px;\n }\n \n .email-container {\n width: 100%;\n max-width: 600px;\n margin: 0 auto;\n background-color: #ffffff;\n border-radius: 8px;\n box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);\n }\n \n .header {\n background-color: #2563eb;\n padding: 24px;\n text-align: center;\n border-radius: 8px 8px 0 0;\n }\n \n .header h1 {\n margin: 0 0 8px 0;\n font-size: 24px;\n font-weight: 600;\n color: #ffffff;\n }\n \n .date {\n font-size: 14px;\n color: #ffffff;\n opacity: 0.9;\n }\n \n .stats {\n background-color: #f1f5f9;\n padding: 16px 24px;\n font-size: 14px;\n color: #64748b;\n }\n \n .content {\n padding: 32px 24px 40px 24px;\n }\n \n .section {\n margin-bottom: 24px;\n }\n \n .section-title {\n font-size: 16px;\n font-weight: 600;\n color: #1e293b;\n margin-bottom: 12px;\n padding-bottom: 8px;\n border-bottom: 1px solid #e2e8f0;\n }\n \n .flag {\n display: inline-block;\n width: 20px;\n height: 14px;\n margin-right: 8px;\n border-radius: 2px;\n vertical-align: middle;\n }\n \n .flag-cn {\n background-color: #de2910;\n }\n \n .flag-en {\n background-color: #012169;\n }\n \n .summary {\n font-size: 14px;\n line-height: 1.6;\n color: #475569;\n padding: 16px;\n background-color: #f8fafc;\n border-radius: 6px;\n border-left: 3px solid #2563eb;\n }\n \n .divider {\n height: 1px;\n background-color: #e2e8f0;\n margin: 20px 0;\n border: none;\n }\n \n /* Mobile responsive */\n @media screen and (max-width: 600px) {\n .email-wrapper {\n padding: 20px 10px !important;\n }\n \n .header, .stats {\n padding: 20px 16px !important;\n }\n \n .content {\n padding: 24px 16px 32px 16px !important;\n }\n \n .email-container {\n border-radius: 0;\n }\n }\n \n /* Gmail specific fixes */\n .gmail-fix {\n display: none;\n }\n \n /* Outlook specific fixes */\n .ExternalClass {\n width: 100%;\n }\n \n .ExternalClass,\n .ExternalClass p,\n .ExternalClass span,\n .ExternalClass font,\n .ExternalClass td,\n .ExternalClass div {\n line-height: 100%;\n }\n </style>\n <!--[if mso]>\n <style type=\"text/css\">\n .email-container {\n width: 600px !important;\n }\n </style>\n <![endif]-->\n</head>\n<body>\n <table role=\"presentation\" class=\"email-wrapper\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n <tr>\n <td align=\"center\">\n <table role=\"presentation\" class=\"email-container\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n <!-- Header -->\n <tr>\n <td class=\"header\">\n <h1>RAG Daily Papers</h1>\n <div class=\"date\">${inputData.Date || date}</div>\n </td>\n </tr>\n \n <!-- Stats -->\n <tr>\n <td class=\"stats\">\n <strong>${inputData[\"Number of papers\"] || inputData.paperCount || 0} papers</strong> reviewed today\n </td>\n </tr>\n \n <!-- Content -->\n <tr>\n <td class=\"content\">\n <!-- Chinese Section -->\n <div class=\"section\">\n <h2 class=\"section-title\">\n 🇨🇳 Chinese\n </h2>\n <div class=\"summary\">\n ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}\n </div>\n </div>\n \n <!-- Divider -->\n <hr class=\"divider\">\n \n <!-- English Section -->\n <div class=\"section\">\n <h2 class=\"section-title\">\n 🇺🇸 English\n </h2>\n <div class=\"summary\">\n ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}\n </div>\n </div>\n </td>\n </tr>\n </table>\n </td>\n </tr>\n </table>\n</body>\n</html>`\n};\n\n// Feishu message body\nconst feishuMessage = {\n msg_type: \"text\",\n content: {\n text: `Today ${$input.first().json.date} ${$input.first().json.paperCount} papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`\n }\n};\n\n// n8n output format\nreturn [\n { json: { type: \"gmail\", ...gmailMessage } },\n { json: { type: \"feishu\", ...feishuMessage } }\n];\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
-128,
528
],
"id": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
"name": "Message Construction"
},
{
"parameters": {
"content": "## 5. Message Push\n\nSet up two channels for message delivery: **EMAIL** and **IM**, and define the message format and content.\n\n### Email: Gmail\n\n**GMAIL OAuth 2.0 – Official Documentation** \n[Configure your OAuth consent screen](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#configure-your-oauth-consent-screen)\n\n**Steps:**\n- Enable Gmail API \n- Create OAuth consent screen \n- Create OAuth client credentials \n- Audience: Add **Test users** under Testing status \n\n**Message format**: HTML \n(Model: OpenAI GPT — used to design an HTML email template)\n\n### IM: Feishu (LARK)\n\n**Bots in groups** \n[Use bots in groups](https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups)\n",
"height": 576,
"width": 1152
},
"type": "n8n-nodes-base.stickyNote",
"position": [
-176,
896
],
"typeVersion": 1,
"id": "2582c7df-9b15-4473-bc47-91cf6f7304e0",
"name": "Sticky Note"
},
{
"parameters": {
"resource": "databasePage",
"databaseId": {
"__rl": true,
"value": "26ba136d-cee4-8029-ad3d-e0e8ac64993f",
"mode": "list",
"cachedResultName": "RAG DAILY",
"cachedResultUrl": "https://www.notion.so/26ba136dcee48029ad3de0e8ac64993f"
},
"title": "={{ $json.title }}",
"simple": false,
"propertiesUi": {
"propertyValues": [
{
"key": "published|date",
"date": "={{ $json.published }}"
},
{
"key": "summary|rich_text",
"textContent": "={{ $json.summary }}"
},
{
"key": "id|rich_text",
"textContent": "={{ $json.id }}"
},
{
"key": "html_url|url",
"urlValue": "={{ $json.html_url }}"
},
{
"key": "pdf_url|url",
"urlValue": "={{ $json.pdf_url }}"
},
{
"key": "primary_category|rich_text",
"textContent": "={{ $json.primary_category }}"
},
{
"key": "github|url",
"ignoreIfEmpty": true,
"urlValue": "={{ $json.github }}"
},
{
"key": "huggingface|url",
"ignoreIfEmpty": true,
"urlValue": "={{ $json.huggingface }}"
},
{
"key": "RAG_TF|rich_text",
"textContent": "={{ $json.RAG_TF }}"
},
{
"key": "RAG_REASON|rich_text",
"textContent": "={{ $json.RAG_REASON }}"
},
{
"key": "RAG_Category|rich_text",
"textContent": "={{ $json.RAG_Category }}"
},
{
"key": "RAG_NAME|rich_text",
"textContent": "={{ $json.RAG_NAME }}"
},
{
"key": "updated|date",
"date": "={{ $json.updated }}"
},
{
"key": "author|multi_select",
"multiSelectValue": "={{ $json.authors }}"
},
{
"key": "category|multi_select",
"multiSelectValue": "={{ $json.categories }}"
}
]
},
"blockUi": {
"blockValues": [
{
"textContent": "={{ $json.summary }}"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.notion",
"typeVersion": 2.2,
"position": [
800,
0
],
"id": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
"name": "RAG Daily papers",
"credentials": {
"notionApi": {
"id": "BNsFk38kgqvRDJpX",
"name": "Notion account"
}
}
},
{
"parameters": {
"jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n return [{\n json: {\n error: \"XML data not found. Please ensure the input contains XML content\",\n message: \"Check the field names in the input data\",\n success: false\n }\n }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n if (!isoString) return '';\n \n try {\n const date = new Date(isoString);\n if (isNaN(date.getTime())) return '';\n \n const year = date.getFullYear();\n const month = String(date.getMonth() + 1).padStart(2, '0');\n const day = String(date.getDate()).padStart(2, '0');\n const hours = String(date.getUTCHours()).padStart(2, '0');\n const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n \n return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n } catch (error) {\n return '';\n }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n const match = xml.match(regex);\n return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n // Fixed link extraction to fit actual XML format\n // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n const patterns = [\n new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n const authors = [];\n \n for (const block of authorBlocks) {\n const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n if (nameMatch && nameMatch[1].trim()) {\n authors.push(nameMatch[1].trim());\n }\n }\n \n return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n const categories = [];\n const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n let match;\n \n while ((match = regex.exec(entryXml)) !== null) {\n if (match[1]) {\n categories.push(match[1]);\n }\n }\n \n return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n // Handle namespace-prefixed primary category extraction\n const patterns = [\n /primary_category[^>]*term=\"([^\"]*)\"/i,\n /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n // Extract all entry blocks\n const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n const entries = [];\n let match;\n \n while ((match = entryRegex.exec(xmlData)) !== null) {\n entries.push(match[1]);\n }\n \n if (entries.length === 0) {\n return [{\n json: {\n error: \"No <entry> elements found\",\n message: \"Please check if the XML data format is correct\",\n success: false\n }\n }];\n }\n\n // Process each entry\n const processedData = [];\n let processedCount = 0;\n\n for (let i = 0; i < entries.length; i++) {\n const entryXml = entries[i];\n \n try {\n const item = {\n id: extractTagContent(entryXml, 'id'),\n updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n published: formatDateTime(extractTagContent(entryXml, 'published')),\n title: extractTagContent(entryXml, 'title'),\n summary: extractTagContent(entryXml, 'summary'),\n authors: extractAuthors(entryXml), // field name changed to authors, returns array\n html_url: extractLink(entryXml, 'text/html'),\n pdf_url: extractLink(entryXml, 'application/pdf'),\n primary_category: extractPrimaryCategory(entryXml),\n categories: extractCategories(entryXml), // field name changed to categories\n arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n github: '',\n huggingface: ''\n };\n\n // Validate required fields\n if (item.id && item.title) {\n processedData.push(item);\n processedCount++;\n }\n \n } catch (error) {\n console.log(`Error processing entry ${i+1}: ${error.message}`);\n // Continue processing next entry\n }\n }\n\n // Return processed results\n return [{\n json: {\n success: true,\n message: `Successfully processed ${processedCount} entries`,\n data: processedData,\n processing_time: new Date().toISOString()\n }\n }];\n\n} catch (error) {\n // Error handling\n return [{\n json: {\n error: \"An error occurred during processing\",\n message: error.message,\n success: false\n }\n }];\n}\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
112,
0
],
"id": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
"name": "Data Extraction"
},
{
"parameters": {
"jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n return [{\n json: {\n error: \"XML data not found. Please ensure the input contains XML content\",\n message: \"Check the field names in the input data\",\n success: false\n }\n }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n if (!isoString) return '';\n \n try {\n const date = new Date(isoString);\n if (isNaN(date.getTime())) return '';\n \n const year = date.getFullYear();\n const month = String(date.getMonth() + 1).padStart(2, '0');\n const day = String(date.getDate()).padStart(2, '0');\n const hours = String(date.getUTCHours()).padStart(2, '0');\n const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n \n return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n } catch (error) {\n return '';\n }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n const match = xml.match(regex);\n return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n // Fixed link extraction to fit actual XML format\n // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n const patterns = [\n new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n const authors = [];\n \n for (const block of authorBlocks) {\n const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n if (nameMatch && nameMatch[1].trim()) {\n authors.push(nameMatch[1].trim());\n }\n }\n \n return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n const categories = [];\n const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n let match;\n \n while ((match = regex.exec(entryXml)) !== null) {\n if (match[1]) {\n categories.push(match[1]);\n }\n }\n \n return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n // Handle namespace-prefixed primary category extraction\n const patterns = [\n /primary_category[^>]*term=\"([^\"]*)\"/i,\n /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n ];\n \n for (const pattern of patterns) {\n const match = entryXml.match(pattern);\n if (match && match[1]) {\n return match[1];\n }\n }\n return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n // Extract all entry blocks\n const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n const entries = [];\n let match;\n \n while ((match = entryRegex.exec(xmlData)) !== null) {\n entries.push(match[1]);\n }\n \n if (entries.length === 0) {\n return [{\n json: {\n error: \"No <entry> elements found\",\n message: \"Please check if the XML data format is correct\",\n success: false\n }\n }];\n }\n\n // Process each entry\n const processedData = [];\n let processedCount = 0;\n\n for (let i = 0; i < entries.length; i++) {\n const entryXml = entries[i];\n \n try {\n const item = {\n id: extractTagContent(entryXml, 'id'),\n updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n published: formatDateTime(extractTagContent(entryXml, 'published')),\n title: extractTagContent(entryXml, 'title'),\n summary: extractTagContent(entryXml, 'summary'),\n authors: extractAuthors(entryXml), // field name changed to authors, returns array\n html_url: extractLink(entryXml, 'text/html'),\n pdf_url: extractLink(entryXml, 'application/pdf'),\n primary_category: extractPrimaryCategory(entryXml),\n categories: extractCategories(entryXml), // field name changed to categories\n arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n github: '',\n huggingface: ''\n };\n\n // Validate required fields\n if (item.id && item.title) {\n processedData.push(item);\n processedCount++;\n }\n \n } catch (error) {\n console.log(`Error processing entry ${i+1}: ${error.message}`);\n // Continue processing next entry\n }\n }\n\n // Return processed results\n return [{\n json: {\n success: true,\n message: `Successfully processed ${processedCount} entries`,\n data: processedData,\n processing_time: new Date().toISOString()\n }\n }];\n\n} catch (error) {\n // Error handling\n return [{\n json: {\n error: \"An error occurred during processing\",\n message: error.message,\n success: false\n }\n }];\n}\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
592,
0
],
"id": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
"name": "JSON Format"
},
{
"parameters": {
"content": "## 3. Data Processing\n\nAnalyze and summarize paper data using AI, then standardize output as JSON.\n\n### Single Paper Basic Information Analysis and Enhancement \n### Daily Paper Summary and Multilingual Translation",
"height": 192,
"width": 656
},
"type": "n8n-nodes-base.stickyNote",
"position": [
-160,
-224
],
"typeVersion": 1,
"id": "8fbefc67-e9f7-4597-b935-d5f5895cf93c",
"name": "Sticky Note1"
},
{
"parameters": {
"content": "## 4. Data Storage: Notion Database\n\n- Create a corresponding database in Notion with the same predefined field names. \n- In Notion, create an integration under **Integrations** and grant access to the database. Obtain the corresponding **Secret Key**. \n- Use the Notion **\"Create a database page\"** node to configure the field mapping and store the data. \n\n**Notes** \n- **\"Create a database page\"** only adds new entries; data will not be updated. \n- The `updated` and `published` timestamps of arXiv papers are in **UTC**. \n- Notion **single-select** and **multi-select** fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays. \n- Notion does not accept `null` values, which causes a **400 error**. \n",
"height": 368,
"width": 624
},
"type": "n8n-nodes-base.stickyNote",
"position": [
1024,
16
],
"typeVersion": 1,
"id": "884f2c40-4628-4376-a040-709e2db34c48",
"name": "Sticky Note2"
},
{
"parameters": {
"content": "## 2. **Data Extraction**\n\n### Data Cleaning Rules (Convert to Standard JSON)\n\n1. **Remove Header** \n - Keep only the `<entry></entry>` blocks representing paper items.\n\n2. **Single Item** \n - Each `<entry></entry>` represents a single item.\n\n3. **Field Processing Rules** \n - `<id></id>` ➡️ `id` \n Extract content. \n Example: `<id>http://arxiv.org/abs/2409.06062v1</id>` → `http://arxiv.org/abs/2409.06062v1` \n - `<updated></updated>` ➡️ `updated` \n Convert timestamp to `yyyy-mm-dd hh:mm:ss` \n - `<published></published>` ➡️ `published` \n Convert timestamp to `yyyy-mm-dd hh:mm:ss` \n - `<title></title>` ➡️ `title` \n Extract text content \n - `<summary></summary>` ➡️ `summary` \n Keep text, remove line breaks \n - `<author></author>` ➡️ `author` \n Combine all authors into an array \n Example: `[ \"Ernest Pusateri\", \"Anmol Walia\" ]` (for Notion multi-select field) \n - `<arxiv:comment></arxiv:comment>` ➡️ Ignore / discard \n - `<link type=\"text/html\">` ➡️ `html_url` \n Extract URL \n - `<link type=\"application/pdf\">` ➡️ `pdf_url` \n Extract URL \n - `<arxiv:primary_category term=\"cs.CL\">` ➡️ `primary_category` \n Extract `term` value \n - `<category>` ➡️ `category` \n Merge all `<category>` values into an array \n Example: `[ \"eess.AS\", \"cs.SD\" ]` (for Notion multi-select field) \n\n4. **Add Empty Fields** \n - `github` \n - `huggingface`\n",
"height": 912,
"width": 624
},
"type": "n8n-nodes-base.stickyNote",
"position": [
-1088,
544
],
"typeVersion": 1,
"id": "4991129d-9406-4c52-bd8f-87e2721c4a6f",
"name": "Sticky Note4"
}
],
"connections": {
"Basic LLM Chain": {
"main": [
[
{
"node": "JSON Format",
"type": "main",
"index": 0
}
]
]
},
"Google Gemini Chat Model": {
"ai_languageModel": [
[
{
"node": "Basic LLM Chain",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"submittedDate:T-1": {
"main": [
[
{
"node": "arXiv API",
"type": "main",
"index": 0
}
]
]
},
"If": {
"main": [
[
{
"node": "Data Extraction",
"type": "main",
"index": 0
}
]
]
},
"Schedule Trigger": {
"main": [
[
{
"node": "submittedDate:T-1",
"type": "main",
"index": 0
}
]
]
},
"FEISHU": {
"main": [
[
{
"node": "FEISHU POST",
"type": "main",
"index": 0
}
]
]
},
"gmail": {
"main": [
[
{
"node": "Send a message",
"type": "main",
"index": 0
}
]
]
},
"Message a model": {
"main": [
[
{
"node": "JSON FORMAT",
"type": "main",
"index": 0
}
]
]
},
"JSON FORMAT": {
"main": [
[
{
"node": "RAG Daily Paper Summary",
"type": "main",
"index": 0
},
{
"node": "If",
"type": "main",
"index": 0
},
{
"node": "Message Construction",
"type": "main",
"index": 0
}
]
]
},
"arXiv API": {
"main": [
[
{
"node": "Message a model",
"type": "main",
"index": 0
}
]
]
},
"Message Construction": {
"main": [
[
{
"node": "gmail",
"type": "main",
"index": 0
},
{
"node": "FEISHU",
"type": "main",
"index": 0
}
]
]
},
"Data Extraction": {
"main": [
[
{
"node": "Basic LLM Chain",
"type": "main",
"index": 0
}
]
]
},
"JSON Format": {
"main": [
[
{
"node": "RAG Daily papers",
"type": "main",
"index": 0
}
]
]
}
},
"pinData": {},
"meta": {
"instanceId": "a6011e4876c6b1225fa48dae1dbfa92e1932a633b3186bbb7bfd5c9e6ad2d878"
}
}The arXiv provides a public API that allows users to query research papers by topic or by predefined categories.
https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual
Key Notes:
max_results) is 30,000, but results must be retrieved in slices of at most 2,000 at a time, using the max_results and start query parameters.[YYYYMMDDTTTT+TO+YYYYMMDDTTTT], where TTTT is provided in 24-hour time to the minute, in GMT.RAGsubmittedDatehttps://export.arxiv.org/api/query?search_query=all:rag+AND+submittedDate:[202509140000+TO+202509142359]
2025-09-19T00:00:00-04:00 (EDT). Daily scheduled tasks should be executed after this timestamp.start and max_results parameters to paginate.2025-09-14T06:29:18Z (UTC).<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://arxiv.org/api/query?search_query%3Dall%3Arag%20AND%20submittedDate%3A%5B202509140000%20TO%20202509142359%5D%26id_list%3D%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>
<title type="html">ArXiv Query: search_query=all:rag AND submittedDate:[202509140000 TO 202509142359]&id_list=&start=0&max_results=10</title>
<id>http://arxiv.org/api/B2w5/U8KCkkmjkfs5ZT52WWxw2A</id>
<updated>2025-09-19T00:00:00-04:00</updated>
<opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">4</opensearch:totalResults>
<opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>
<opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">10</opensearch:itemsPerPage>
<entry>
<id>http://arxiv.org/abs/2509.11124v1</id>
<updated>2025-09-14T06:29:18Z</updated>
<published>2025-09-14T06:29:18Z</published>
<title>STASE: A spatialized text-to-audio synthesis engine for music generation</title>
<summary> While many text-to-audio systems produce monophonic or fixed-stereo outputs,
generating audio with user-defined spatial properties remains a challenge.
Existing deep learning-based spatialization methods often rely on latent-space
manipulations, which can limit direct control over psychoacoustic parameters
critical to spatial perception...
</summary>
<author>
<name>Tutti Chi</name>
</author>
<author>
<name>Letian Gao</name>
</author>
<author>
<name>Yixiao Zhang</name>
</author>
<arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Accepted to LLM4Music @ ISMIR 2025</arxiv:comment>
<link href="http://arxiv.org/abs/2509.11124v1" rel="alternate" type="text/html"/>
<link title="pdf" href="http://arxiv.org/pdf/2509.11124v1" rel="related" type="application/pdf"/>
<arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.SD" scheme="http://arxiv.org/schemas/atom"/>
<category term="cs.SD" scheme="http://arxiv.org/schemas/atom"/>
<category term="eess.AS" scheme="http://arxiv.org/schemas/atom"/>
</entry>
...
</feed>submittedDate data.const now = new Date();
const yesterday = new Date(now);
yesterday.setDate(now.getDate() - 2);
const y = yesterday.getFullYear();
const m = String(yesterday.getMonth() + 1).padStart(2, '0');
const d = String(yesterday.getDate()).padStart(2, '0');
return [
{
json: {
from: `${y}${m}${d}0000`,
to: `${y}${m}${d}2359`
}
}
];
<entry></entry> blocks representing paper items.<entry></entry> represents a single item.<id></id> ➡️ id: Extract content. Example: <id><http://arxiv.org/abs/2409.06062v1></id> → http://arxiv.org/abs/2409.06062v1<updated></updated> ➡️ updated: Convert timestamp to yyyy-mm-dd hh:mm:ss<published></published> ➡️ published: Convert timestamp to yyyy-mm-dd hh:mm:ss<title></title> ➡️ title: Extract text content<summary></summary> ➡️ summary: Keep text, remove line breaks<author></author> ➡️ author: Combine all authors into an array, e.g., [“Ernest Pusateri”, “Anmol Walia”], for Notion multi-select field<arxiv:comment></arxiv:comment> ➡️ Ignore / discard<link type="text/html"> ➡️ html_url: Extract URL<link type="application/pdf"> ➡️ pdf_url: Extract URL<arxiv:primary_category term="cs.CL"> ➡️ primary_category: Extract term value<category> ➡️ category: Merge all <category> values into an array, e.g., [“eess.AS”, “cs.SD”], for Notion multi-select fieldgithub and huggingfaceModel: Claude Sonnet 4
// Get input data
const xmlData = $('HTTP Request1').first().json.data
if (!xmlData) {
return [{
json: {
error: "XML data not found. Please ensure the input contains XML content",
message: "Check the field names in the input data",
success: false
}
}];
}
// Function to format date-time
function formatDateTime(isoString) {
if (!isoString) return '';
try {
const date = new Date(isoString);
if (isNaN(date.getTime())) return '';
const year = date.getFullYear();
const month = String(date.getMonth() + 1).padStart(2, '0');
const day = String(date.getDate()).padStart(2, '0');
const hours = String(date.getUTCHours()).padStart(2, '0');
const minutes = String(date.getUTCMinutes()).padStart(2, '0');
const seconds = String(date.getUTCSeconds()).padStart(2, '0');
return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;
} catch (error) {
return '';
}
}
// General function to extract tag content
function extractTagContent(xml, tagName) {
const regex = new RegExp(`<${tagName}[^>]*>([\\s\\S]*?)<\\/${tagName}>`, 'i');
const match = xml.match(regex);
return match ? match[1].trim().replace(/\s+/g, ' ') : '';
}
// Extract links
function extractLink(entryXml, linkType) {
// Fixed link extraction to fit actual XML format
// Format: <link href="..." rel="..." type="..."/>
const patterns = [
new RegExp(`<link[^>]*href="([^"]*)"[^>]*type="${linkType}"`, 'i'),
new RegExp(`<link[^>]*type="${linkType}"[^>]*href="([^"]*)"`, 'i')
];
for (const pattern of patterns) {
const match = entryXml.match(pattern);
if (match && match[1]) {
return match[1];
}
}
return '';
}
// Fixed author extraction function - returns array
function extractAuthors(entryXml) {
const authorBlocks = entryXml.match(/<author[^>]*>([\s\S]*?)<\/author>/gi) || [];
const authors = [];
for (const block of authorBlocks) {
const nameMatch = block.match(/<name[^>]*>(.*?)<\/name>/i);
if (nameMatch && nameMatch[1].trim()) {
authors.push(nameMatch[1].trim());
}
}
return authors; // Return array instead of string
}
// Extract categories
function extractCategories(entryXml) {
const categories = [];
const regex = /<category[^>]*term="([^"]*)"/gi;
let match;
while ((match = regex.exec(entryXml)) !== null) {
if (match[1]) {
categories.push(match[1]);
}
}
return categories;
}
// Extract primary category
function extractPrimaryCategory(entryXml) {
// Handle namespace-prefixed primary category extraction
const patterns = [
/primary_category[^>]*term="([^"]*)"/i,
/arxiv:primary_category[^>]*term="([^"]*)"/i
];
for (const pattern of patterns) {
const match = entryXml.match(pattern);
if (match && match[1]) {
return match[1];
}
}
return '';
}
// New: extract arxiv comment
function extractArxivComment(entryXml) {
const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\/arxiv:comment>/i);
return commentMatch ? commentMatch[1].trim() : '';
}
try {
// Extract all entry blocks
const entryRegex = /<entry[^>]*>([\s\S]*?)<\/entry>/gi;
const entries = [];
let match;
while ((match = entryRegex.exec(xmlData)) !== null) {
entries.push(match[1]);
}
if (entries.length === 0) {
return [{
json: {
error: "No <entry> elements found",
message: "Please check if the XML data format is correct",
success: false
}
}];
}
// Process each entry
const processedData = [];
let processedCount = 0;
for (let i = 0; i < entries.length; i++) {
const entryXml = entries[i];
try {
const item = {
id: extractTagContent(entryXml, 'id'),
updated: formatDateTime(extractTagContent(entryXml, 'updated')),
published: formatDateTime(extractTagContent(entryXml, 'published')),
title: extractTagContent(entryXml, 'title'),
summary: extractTagContent(entryXml, 'summary'),
authors: extractAuthors(entryXml), // field name changed to authors, returns array
html_url: extractLink(entryXml, 'text/html'),
pdf_url: extractLink(entryXml, 'application/pdf'),
primary_category: extractPrimaryCategory(entryXml),
categories: extractCategories(entryXml), // field name changed to categories
arxiv_comment: extractArxivComment(entryXml), // new arxiv comment
github: '',
huggingface: ''
};
// Validate required fields
if (item.id && item.title) {
processedData.push(item);
processedCount++;
}
} catch (error) {
console.log(`Error processing entry ${i+1}: ${error.message}`);
// Continue processing next entry
}
}
// Return processed results
return [{
json: {
success: true,
message: `Successfully processed ${processedCount} entries`,
data: processedData,
processing_time: new Date().toISOString()
}
}];
} catch (error) {
// Error handling
return [{
json: {
error: "An error occurred during processing",
message: error.message,
success: false
}
}];
}
Analyze and summarize paper data using AI, then standardize output as JSON.
Model: gemini-2.5-flash
You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:
1. RAG Relevance and Labeling:
- Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.
- For each data item, add three new fields:
- `RAG_TF`: "T" if related, "F" if not
- `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty
- `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty
2. RAG Method Extraction:
- Analyze the `summary` and extract the RAG method proposed in the paper.
- Store it in the new field `RAG_NAME`.
3. External Link Extraction:
- Analyze the `summary` content for `github` or `huggingface` links.
- If present, extract the URLs and populate the existing `github` and `huggingface` fields.
- If not present, leave them unchanged.
Output Format: standard JSON
Example:
Given a data item with the following `summary`:
"summary":"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer
Process AI Response Content into Standard JSON using JS
Model: Claude Sonnet 4
// ------------------------
// 1. Get the current item's JSON output
// ------------------------
const raw = $json;
// ------------------------
// 2. Define a function to recursively find the text field
// ------------------------
function findText(obj) {
if (!obj || typeof obj !== 'object') return null;
if (obj.text && typeof obj.text === 'string') return obj.text;
for (const key of Object.keys(obj)) {
const result = findText(obj[key]);
if (result) return result;
}
return null;
}
// ------------------------
// 3. Find the AI output text
// ------------------------
const textBlock = findText(raw);
if (!textBlock) {
throw new Error("No text field found, please check the AI output structure");
}
// ------------------------
// 4. Use regex to remove ```json ``` wrapper and extract JSON
// ------------------------
const jsonMatch = textBlock.match(/```json\s*([\s\S]*?)```/i);
const jsonText = jsonMatch ? jsonMatch[1] : textBlock;
// ------------------------
// 5. Parse into standard JSON
// ------------------------
let parsedJson;
try {
parsedJson = JSON.parse(jsonText);
} catch (error) {
throw new Error("Failed to parse JSON: " + error.message);
}
// ------------------------
// 6. Return in standard format
// ------------------------
return parsedJson.map(item => ({ json: item }));
Model: gemini-2.5-flash
You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:
1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary
2. Set the daily date field `Date`: yyyy-mm-dd
3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.
4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.
5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.
Example: If there are papers:
{
"Number of papers":"2025-09-13 paper summary",
"Date":2025-09-13,
"Number of papers": 2,
"SUMMARY_CN": "Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.",
"SUMMARY_EN": "Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency."
}
If the number of papers is 0, maintain the JSON structure:
{
"Number of papers":"2025-09-13 paper summary",
"Date":2025-09-13,
"Number of papers": 0,
"SUMMARY_CN": "",
"SUMMARY_EN": ""
}Process AI Response Content into Standard JSON using JS
Model: Claude Sonnet 4
const items = $input.all();
const response = items[0].json;
try {
// Extract text content from Gemini API response
// Note: response is directly an object, not an array
const text = response.candidates[0].content.parts[0].text;
// Extract JSON content
const jsonMatch = text.match(/```json\n([\s\S]*?)\n```/);
const jsonStr = jsonMatch[1];
// Parse JSON
const data = JSON.parse(jsonStr);
// Manually handle duplicate keys - extract from original string
const titleMatch = jsonStr.match(/"Number of papers":\s*"([^"]+)"/);
const countMatch = jsonStr.match(/"Number of papers":\s*(\d+)/);
// Construct result
items[0].json = {
title: titleMatch ? titleMatch[1] : '',
date: data.Date || '',
paperCount: countMatch ? parseInt(countMatch[1]) : 0,
summaryCN: data.SUMMARY_CN || '',
summaryEN: data.SUMMARY_EN || ''
};
} catch (error) {
items[0].json = {
error: error.message,
originalData: response
};
}
return items;
Notes

updated and published timestamps of arXiv papers are in UTC.Set up two channels for message delivery: EMAIL and IM, and define the message format and content.
GMAIL OAuth 2.0 ,Official Documentation
Steps:
Message format: HTML (Model: OpenAI GPT — used to design an HTML email template)
Bots in groups
https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups
Message format: TEXT
{
"msg_type": "text",
"content": {
"text": "content"
}
}Use JS to build a collection of messages for Gmail and Feishu delivery.
Model: Claude Sonnet 4
// Get current date
const now = new Date();
const year = now.getFullYear();
const month = String(now.getMonth() + 1).padStart(2, '0');
const day = String(now.getDate()).padStart(2, '0');
const date = `${year}-${month}-${day}`;
// Get input data
const inputData = $input.first().json;
// Generate message content
const messageContent = inputData.SUMMARY_CN;
// Gmail message body
const gmailMessage = {
subject: inputData.title || `Daily Paper Summary - ${date}`,
message: `<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title> RAG Daily Paper Summary - ${date}</title>
<style type="text/css">
/* Gmail safe styles */
body {
font-family: Arial, sans-serif;
line-height: 1.4;
margin: 0;
padding: 0;
background-color: #f9f9f9;
color: #333333;
}
table {
border-collapse: collapse;
mso-table-lspace: 0pt;
mso-table-rspace: 0pt;
}
.email-wrapper {
width: 100%;
background-color: #f9f9f9;
padding: 40px 20px;
}
.email-container {
width: 100%;
max-width: 600px;
margin: 0 auto;
background-color: #ffffff;
border-radius: 8px;
box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);
}
.header {
background-color: #2563eb;
padding: 24px;
text-align: center;
border-radius: 8px 8px 0 0;
}
.header h1 {
margin: 0 0 8px 0;
font-size: 24px;
font-weight: 600;
color: #ffffff;
}
.date {
font-size: 14px;
color: #ffffff;
opacity: 0.9;
}
.stats {
background-color: #f1f5f9;
padding: 16px 24px;
font-size: 14px;
color: #64748b;
}
.content {
padding: 32px 24px 40px 24px;
}
.section {
margin-bottom: 24px;
}
.section-title {
font-size: 16px;
font-weight: 600;
color: #1e293b;
margin-bottom: 12px;
padding-bottom: 8px;
border-bottom: 1px solid #e2e8f0;
}
.flag {
display: inline-block;
width: 20px;
height: 14px;
margin-right: 8px;
border-radius: 2px;
vertical-align: middle;
}
.flag-cn {
background-color: #de2910;
}
.flag-en {
background-color: #012169;
}
.summary {
font-size: 14px;
line-height: 1.6;
color: #475569;
padding: 16px;
background-color: #f8fafc;
border-radius: 6px;
border-left: 3px solid #2563eb;
}
.divider {
height: 1px;
background-color: #e2e8f0;
margin: 20px 0;
border: none;
}
/* Mobile responsive */
@media screen and (max-width: 600px) {
.email-wrapper {
padding: 20px 10px !important;
}
.header, .stats {
padding: 20px 16px !important;
}
.content {
padding: 24px 16px 32px 16px !important;
}
.email-container {
border-radius: 0;
}
}
/* Gmail specific fixes */
.gmail-fix {
display: none;
}
/* Outlook specific fixes */
.ExternalClass {
width: 100%;
}
.ExternalClass,
.ExternalClass p,
.ExternalClass span,
.ExternalClass font,
.ExternalClass td,
.ExternalClass div {
line-height: 100%;
}
</style>
<!--[if mso]>
<style type="text/css">
.email-container {
width: 600px !important;
}
</style>
<![endif]-->
</head>
<body>
<table role="presentation" class="email-wrapper" cellpadding="0" cellspacing="0" border="0">
<tr>
<td align="center">
<table role="presentation" class="email-container" cellpadding="0" cellspacing="0" border="0">
<!-- Header -->
<tr>
<td class="header">
<h1>RAG Daily Papers</h1>
<div class="date">${inputData.Date || date}</div>
</td>
</tr>
<!-- Stats -->
<tr>
<td class="stats">
<strong>${inputData["Number of papers"] || inputData.paperCount || 0} papers</strong> reviewed today
</td>
</tr>
<!-- Content -->
<tr>
<td class="content">
<!-- Chinese Section -->
<div class="section">
<h2 class="section-title">
🇨🇳 Chinese
</h2>
<div class="summary">
${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}
</div>
</div>
<!-- Divider -->
<hr class="divider">
<!-- English Section -->
<div class="section">
<h2 class="section-title">
🇺🇸 English
</h2>
<div class="summary">
${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}
</div>
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>`
};
// Feishu message body
const feishuMessage = {
msg_type: "text",
content: {
text: `Today ${$input.first().json.date} ${$input.first().json.paperCount} papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`
}
};
// n8n output format
return [
{ json: { type: "gmail", ...gmailMessage } },
{ json: { type: "feishu", ...feishuMessage } }
];


Tools:
Workflow Steps
Use n8n’s Notion integration node to build a simple and easy-to-use API servic
Compared with the official native API, the response from the Notion database interface in n8n is much simpler.
Each field contains multiple properties, which are often unnecessary in application development. Additionally, the raw data format is complex to handle.
{
"object": "list",
"results": [
{
"object": "page",
"id": "273a136d-cee4-8146-9a98-fc4d59187b2c",
"created_time": "2025-09-19T10:05:00.000Z",
"last_edited_time": "2025-09-19T10:05:00.000Z",
"created_by": {
"object": "user",
"id": "fe8dfa3a-3ca2-4bc2-9817-9f608b9cbfc5"
},
"last_edited_by": {
"object": "user",
"id": "fe8dfa3a-3ca2-4bc2-9817-9f608b9cbfc5"
},
"cover": null,
"icon": null,
"parent": {
"type": "database_id",
"database_id": "26ba136d-cee4-8029-ad3d-e0e8ac64993f"
},
"archived": false,
"in_trash": false,
"is_locked": false,
"properties": {
"html_url": {
"id": "%3AMFp",
"type": "url",
"url": "http://arxiv.org/abs/2509.14608v1"
},
"RAG_Category": {
"id": "%3BKcX",
"type": "rich_text",
"rich_text": [
{
"type": "text",
"text": {
"content": "Security",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "Security",
"href": null
}
]
},
"RAG_REASON": {
"id": "Amf%3A",
"type": "rich_text",
"rich_text": [
{
"type": "text",
"text": {
"content": "",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "",
"href": null
}
]
},
"github": {
"id": "BfR_",
"type": "url",
"url": null
},
"author": {
"id": "DKyR",
"type": "multi_select",
"multi_select": [
{
"id": "940eae58-6b4c-4473-9807-698a6d48ea0c",
"name": "Shashank Shreedhar Bhatt",
"color": "orange"
},
{
"id": "18c77d02-8eea-4017-96e7-39d6e8bf848f",
"name": "Tanmay Rajore",
"color": "yellow"
},
{
"id": "b5283157-1e46-4f6d-ad0c-bd2a6d0d083e",
"name": "Khushboo Aggarwal",
"color": "green"
},
{
"id": "ab8ffe83-9718-48cc-bf6c-651d503d0ed6",
"name": "Ganesh Ananthanarayanan",
"color": "default"
},
{
"id": "256faf03-b7f1-464d-b206-ae02400e4764",
"name": "Ranveer Chandra",
"color": "yellow"
},
{
"id": "c058a343-839e-4cd7-bf96-9379e6122088",
"name": "Nishanth Chandran",
"color": "orange"
},
{
"id": "a6c3357f-6b25-4c14-aaed-993f1652dba6",
"name": "Suyash Choudhury",
"color": "blue"
},
{
"id": "adcd0e80-3044-4d47-a1d0-b39408603c53",
"name": "Divya Gupta",
"color": "default"
},
{
"id": "45e04054-961b-476b-980c-61c16202e7b0",
"name": "Emre Kiciman",
"color": "purple"
},
{
"id": "b26bc14f-4330-4bb4-a9eb-2644080710f9",
"name": "Sumit Kumar Pandey",
"color": "gray"
},
{
"id": "2e84929e-8e8a-4b1a-9479-b523932552d5",
"name": "Srinath Setty",
"color": "green"
},
{
"id": "e0bc3de8-d911-4804-ad7b-8f40693d4155",
"name": "Rahul Sharma",
"color": "red"
},
{
"id": "4d9f1267-7e95-47c8-ac3a-77c32d60cecb",
"name": "Teijia Zhao",
"color": "green"
}
]
},
"summary": {
"id": "E%7D%5Cx",
"type": "rich_text",
"rich_text": [
{
"type": "text",
"text": {
"content": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
"href": null
}
]
},
"RAG_NAME": {
"id": "UR%3F%40",
"type": "rich_text",
"rich_text": [
{
"type": "text",
"text": {
"content": "",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "",
"href": null
}
]
},
"pdf_url": {
"id": "UkEI",
"type": "url",
"url": "http://arxiv.org/pdf/2509.14608v1"
},
"primary_category": {
"id": "%5DSvu",
"type": "rich_text",
"rich_text": [
{
"type": "text",
"text": {
"content": "cs.CR",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "cs.CR",
"href": null
}
]
},
"published": {
"id": "h%5D%7CN",
"type": "date",
"date": {
"start": "2025-09-18T04:30:00.000+00:00",
"end": null,
"time_zone": null
}
},
"RAG_TF": {
"id": "j%5EmC",
"type": "rich_text",
"rich_text": [
{
"type": "text",
"text": {
"content": "T",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "T",
"href": null
}
]
},
"id": {
"id": "skBo",
"type": "rich_text",
"rich_text": [
{
"type": "text",
"text": {
"content": "http://arxiv.org/abs/2509.14608v1",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "http://arxiv.org/abs/2509.14608v1",
"href": null
}
]
},
"category": {
"id": "xfyy",
"type": "multi_select",
"multi_select": [
{
"id": "45922d11-589a-4011-b8ad-19fa2583fd29",
"name": "cs.CR",
"color": "blue"
},
{
"id": "49cc7a9c-4b92-4b33-ac3b-30b71d3858f4",
"name": "cs.AI",
"color": "green"
}
]
},
"updated": {
"id": "%7CUT%5D",
"type": "date",
"date": {
"start": "2025-09-18T04:30:00.000+00:00",
"end": null,
"time_zone": null
}
},
"huggingface": {
"id": "%7C%7DH%7C",
"type": "url",
"url": null
},
"title": {
"id": "title",
"type": "title",
"title": [
{
"type": "text",
"text": {
"content": "Enterprise AI Must Enforce Participant-Aware Access Control",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "Enterprise AI Must Enforce Participant-Aware Access Control",
"href": null
}
]
}
},
"url": "https://www.notion.so/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c",
"public_url": "https://dongou.notion.site/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c"
}
],
"next_cursor": null,
"has_more": false,
"type": "page_or_database",
"page_or_database": {},
"developer_survey": "https://notionup.typeform.com/to/bllBsoI4?utm_source=postman",
"request_id": "8472bbc3-45f3-49f5-8fda-e70a0bccf6e7"
}In n8n, the Notion “Get many database pages” node provides a Simplify mode: it returns a simplified version of the response instead of the raw data.
[
{
"id": "273a136d-cee4-8146-9a98-fc4d59187b2c",
"name": "Enterprise AI Must Enforce Participant-Aware Access Control",
"url": "https://www.notion.so/Enterprise-AI-Must-Enforce-Participant-Aware-Access-Control-273a136dcee481469a98fc4d59187b2c",
"property_html_url": "http://arxiv.org/abs/2509.14608v1",
"property_rag_category": "Security",
"property_rag_reason": "",
"property_github": null,
"property_author": [
"Shashank Shreedhar Bhatt",
"Tanmay Rajore",
"Khushboo Aggarwal",
"Ganesh Ananthanarayanan",
"Ranveer Chandra",
"Nishanth Chandran",
"Suyash Choudhury",
"Divya Gupta",
"Emre Kiciman",
"Sumit Kumar Pandey",
"Srinath Setty",
"Rahul Sharma",
"Teijia Zhao"
],
"property_summary": "Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \\emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.",
"property_rag_name": "",
"property_pdf_url": "http://arxiv.org/pdf/2509.14608v1",
"property_primary_category": "cs.CR",
"property_published": {
"start": "2025-09-18T04:30:00.000+00:00",
"end": null,
"time_zone": null
},
"property_rag_tf": "T",
"property_id": "http://arxiv.org/abs/2509.14608v1",
"property_category": [
"cs.CR",
"cs.AI"
],
"property_updated": {
"start": "2025-09-18T04:30:00.000+00:00",
"end": null,
"time_zone": null
},
"property_huggingface": null,
"property_title": "Enterprise AI Must Enforce Participant-Aware Access Control"
}
]Construct Filters (JSON) based on the query parameters needed for your application. ⚠️ Note: All date and time formats must be in UTC.
Model: OpenAI GPT-5

const date = $json.query?.date;
const rag = "T";
let filter;
if (date) {
filter = {
and: [
{
property: "updated",
date: {
on_or_after: `${date}T00:00:00.000Z`,
before: `${date}T23:59:59.000Z`
}
},
{
property: "RAG_TF",
rich_text: { equals: rag }
}
]
};
} else {
filter = {
property: "RAG_TF",
rich_text: { equals: rag }
};
}
return [{ json: { filter: JSON.stringify(filter) } }];
const date = $json.query?.date;
let filter;
if (date) {
const inputDate = new Date(date + "T00:00:00.000Z");
const formattedDate = inputDate.toISOString().split("T")[0];
filter = {
property: "DATE",
date: {
equals: formattedDate
}
};
} else {
filter = {};
}
return [{ json: { filter: JSON.stringify(filter) } }];
Dyad is a local open-source tool that allows you to freely add API services for Vibe Coding. However, note that Dyad’s development mode is not a packaged Agent, so it cannot automatically generate image assets for website design. External image resources need to be integrated separately.
Code Model: qwen3-coder-plus-2025-07-22
Image Model: Recraft, supports generating and exporting SVG images
# Purpose
Design a RAG Daily Papers calendar card website that fetches data from a Notion database.
# Notion Data Sources
The data comes from the Notion API, consisting of 2 tables: RAG Daily Paper Summary and RAG DAILY.
## RAG Daily Paper Summary
### Daily Paper Summary Table:
- property_title: Title
- property_date: Date in UTC
- property_number_of_papers: Number of papers per day
- property_summary_en: English summary
- property_summary_cn: Chinese summary
### API Request:
https://<YOUR_WEBHOOK_URL>?date={date}
- date format: YYYY-MM-DD, optional. If no date parameter is provided, all data is returned; if date is provided, only data for that date is returned.
- HTTP Method: GET
- Authentication: None
#### Example Requests:
- Get all data: https://<YOUR_WEBHOOK_URL>
- Get data for 2025-09-15: https://<YOUR_WEBHOOK_URL>?date=2025-09-15
#### Example RESPONSE:
{
"id": "271a136d-cee4-8177-bf61-df44bdb63010",
"name": "2025-09-15 paper summary",
"url": "https://www.notion.so/2025-09-15-paper-summary-271a136dcee48177bf61df44bdb63010",
"property_summary_en": "Today's papers focus on Retrieval-Augmented Generation (RAG) and its applications and improvements across various domains. One paper introduces the HiChunk framework and HiCBench benchmark for evaluating and enhancing RAG document chunking quality. Another paper develops a GPU-accelerated RAG Telegram assistant to provide academic support for students in an 'Introduction to Parallel Processing' course. A different study presents RAGs-to-Riches, a RAG-like few-shot learning method to improve Large Language Model (LLM) role-playing. MMORE is an open-source pipeline for Massive Multimodal Open Retrieval-Augmented Generation and Extraction, supporting diverse document formats. FinGEAR is a retrieval framework tailored for financial documents, using mapping guidance for enhanced answer retrieval to tackle complex documents like 10-Ks. One research explores the adaptation and evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management, proposing a 'Divide and Conquer' framework incorporating spinal keypoint prompting and RAG. Finally, SAQ introduces a novel vector quantization method, employing code adjustment and dimension segmentation to advance approximate nearest neighbor search (ANNS) and RAG.",
"property_date": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_summary_cn": "今日的论文主要集中在检索增强生成(RAG)及其在不同领域的应用和改进。其中一篇论文提出了HiChunk框架和HiCBench基准,用于评估和改进RAG的文档分块质量。另一篇论文开发了一个GPU加速的RAG Telegram助手,为“并行处理导论”课程的学生提供学术支持。还有一篇论文介绍了RAGs-to-Riches框架,一种类RAG的少样本学习方法,用于提高大型语言模型(LLM)的角色扮演能力。MMORE是一个用于大规模多模态开放检索增强生成和提取的开源流水线,支持多种文档格式。FinGEAR是一个针对金融文件设计的检索框架,通过映射指导增强答案检索,以解决10-K文件等复杂检索问题。一篇研究探讨了多模态大型语言模型在青少年特发性脊柱侧弯(AIS)自我管理中的应用,并提出了一个“分而治之”的框架,结合了脊柱关键点提示和RAG。最后,SAQ提出了一种新的向量量化方法,通过代码调整和维度分割来推动向量量化在近似最近邻搜索(ANNS)和RAG中的应用。",
"property_number_of_papers": 7,
"property_title": "2025-09-15 paper summary"
}
## RAG DAILY
### Paper List Table:
- property_title: Paper title
- property_updated: Last updated time (UTC)
- property_published: Published time (UTC)
- property_id: Paper ID / link
- property_summary: Paper abstract
- property_html_url: Paper HTML preview URL
- property_pdf_url: Paper PDF preview URL
- property_primary_category: Primary category
- property_category: Categories (multi-select tags)
- property_github: Related GitHub project URL
- property_huggingface: Related HuggingFace project URL
- property_rag_tf: Whether the paper is RAG-related, T/F
- property_rag_reason: Reason for not being RAG-related
- property_rag_category: RAG technology category (may be empty)
- property_rag_name: RAG technology name (may be empty)
- property_author: Authors (multi-select tags)
### API Request:
https://<YOUR_WEBHOOK_URL>?date={date}
- date format: YYYY-MM-DD, optional. If no date parameter is provided, all data is returned; if date is provided, only data for that date is returned.
- HTTP Method: GET
- Authentication: None
#### Example Requests:
- Get all data: https://<YOUR_WEBHOOK_URL>
- Get data for 2025-09-15: https://<YOUR_WEBHOOK_URL>?date=2025-09-15
#### Example RESPONSE:
{
"id": "271a136d-cee4-81ee-851e-d74884293f20",
"name": "SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation",
"url": "https://www.notion.so/SAQ-Pushing-the-Limits-of-Vector-Quantization-through-Code-Adjustment-and-Dimension-Segmentation-271a136dcee481ee851ed74884293f20",
"property_html_url": "http://arxiv.org/abs/2509.12086v1",
"property_rag_category": "Supporting Technology",
"property_rag_reason": "",
"property_github": null,
"property_author": [
"Hui Li",
"Shiyuan Deng",
"Xiao Yan",
"Xiangyu Zhi",
"James Cheng"
],
"property_summary": "Approximate Nearest Neighbor Search (ANNS) plays a critical role in applications such as search engines, recommender systems, and RAG for LLMs. Vector quantization (VQ), a crucial technique for ANNS, is commonly used to reduce space overhead and accelerate distance computations. However, despite significant research advances, state-of-the-art VQ methods still face challenges in balancing encoding efficiency and quantization accuracy. To address these limitations, we propose a novel VQ method called SAQ. To improve accuracy, SAQ employs a new dimension segmentation technique to strategically partition PCA-projected vectors into segments along their dimensions. By prioritizing leading dimension segments with larger magnitudes, SAQ allocates more bits to high-impact segments, optimizing the use of the available space quota. An efficient dynamic programming algorithm is developed to optimize dimension segmentation and bit allocation, ensuring minimal quantization error. To speed up vector encoding, SAQ devises a code adjustment technique to first quantize each dimension independently and then progressively refine quantized vectors using a coordinate-descent-like approach to avoid exhaustive enumeration. Extensive experiments demonstrate SAQ's superiority over classical methods (e.g., PQ, PCA) and recent state-of-the-art approaches (e.g., LVQ, Extended RabitQ). SAQ achieves up to 80% reduction in quantization error and accelerates encoding speed by over 80x compared to Extended RabitQ.",
"property_rag_name": "",
"property_pdf_url": "http://arxiv.org/pdf/2509.12086v1",
"property_primary_category": "cs.DB",
"property_published": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_rag_tf": "T",
"property_id": "http://arxiv.org/abs/2509.12086v1",
"property_category": [
"cs.DB",
"cs.DS",
"cs.IR"
],
"property_updated": {
"start": "2025-09-16T07:28:00.000+00:00",
"end": null,
"time_zone": null
},
"property_huggingface": null,
"property_title": "SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation"
}
# Page Design Requirements
Values in code will be represented using ${} notation as described below.
## Basic Page Info:
- title: RAG Daily Papers
-
- SEO Configuration:
-- <title>RAG Daily Papers - Latest RAG Research from arXiv</title>
-- <meta name="description" content="RAG Daily Papers curates the latest Retrieval-Augmented Generation (RAG) research papers from arXiv. Updated daily with summaries for researchers, engineers, and AI enthusiasts.">
-- <meta name="keywords" content="RAG, Retrieval-Augmented Generation, arXiv, AI papers, NLP, LLM, daily research, machine learning">
-- <meta name="robots" content="index, follow">
## Header
- title: RAG Daily Papers
- subtitle: Stay updated with the latest RAG research, every day.
- Total papers ${RAG DAILY, COUNT all entries}
- Data Source: arXiv
- button: NOTION button, ${Open in a new window: <YOUR_NOTION_LINK>}
## Body
### Date Filter
Default unselected; selecting a date sets ${RAG DAILY, date param = selected date}
### Content List
- Information groups sorted by: ${RAG DAILY, grouped by property_updated, descending order inside each group}
- Info Group:
-- Group title: YYYY/MM/DD DayOfWeek, ${RAG DAILY, convert property_updated UTC to local display}
-- Group title note: "Displayed based on paper updated time"
-- Group summary: ${RAG Daily Paper Summary, property_date in UTC}
-- Card contents:
--- Paper title: ${RAG DAILY, property_title}
--- Paper summary: ${RAG DAILY, property_summary}
--- RAG technology name: ${RAG DAILY, property_rag_name}
--- RAG category description: ${RAG DAILY, property_rag_category}
--- Paper categories: tags ${RAG DAILY, property_category}
--- Project icons: GitHub, HuggingFace ${RAG DAILY, if property_github/property_huggingface not null, show icon}
--- Click event: Show more in a modal
-- Modal contents:
--- Section 1:
---- RAG technology name: ${RAG DAILY, property_rag_category}
---- RAG category description: ${RAG DAILY, property_rag_name}
--- Section 2:
---- Paper title: ${RAG DAILY, property_title}
---- Paper published: ${RAG DAILY, property_published, UTC}
---- Paper updated: ${RAG DAILY, property_updated, UTC}
---- Authors: tags ${RAG DAILY, property_author}
---- Original link: hyperlink ${RAG DAILY, property_id}
--- Section 3:
---- Paper summary: ${RAG DAILY, property_summary}
---- Categories: tags ${RAG DAILY, property_category}
--- Paper buttons:
---- PDF Online: ${RAG DAILY, property_pdf_url}, open in new window
---- HTML Online: ${RAG DAILY, property_html_url}, open in new window
--- Project buttons:
---- GitHub: ${RAG DAILY, property_github if not null}, open in new window
---- HuggingFace: ${RAG DAILY, property_huggingface if not null}, open in new window
## Footer
- copyright: dongou.tech
- email: <YOUR_EMAIL>
# Page Development Guidelines:
1. The website is fully English; translate all non-English text.
2. Style: Notion-like, clean and minimal.
3. Layout: Accommodate varying content lengths and empty content gracefully.
4. Privacy & Security: Do not expose API keys in front-end code.
5. Performance: Avoid fetching all fields in one request; use lazy loading.
6. Timezone: UTC
