Building an AI-Powered Support Automation System with n8n
Published on 2026-11-27 • 5 min read
Support tickets are the heartbeat of any engineering organization. They reveal recurring issues, knowledge gaps, and opportunities for improvement. But they also represent a significant operational burden—triaging, routing, resolving, and documenting solutions across hundreds or thousands of tickets.
What if your support system could learn from every resolved ticket and automatically suggest solutions to new issues? That's exactly what we built using n8n, Claude AI, and vector embeddings.
The Problem
Our team manages infrastructure and platform support through a centralized GitHub repository. We were facing several interconnected challenges. Every new issue required manual triage—someone had to read it, understand the context, and assign it to the right team. Once an issue was resolved, the solution often stayed buried in that specific ticket thread, leading to lost knowledge. Users kept opening tickets for problems we'd already solved, and our support engineers spent hours searching through old tickets to find similar issues.
We needed a system that could automatically learn from resolved tickets and provide instant, context-aware assistance for new ones.
The Solution: A Two-Part Automation System
We built two interconnected n8n workflows that work together to create a self-learning support system:
1. GitHub Issue Automation Workflow
This workflow monitors our GitHub support repository and handles issues throughout their lifecycle:
When a new issue is opened:
- AI Analysis: Claude 3.5 Sonnet analyzes the issue content
- Knowledge Base Search: Uses vector similarity search to find related past issues
- Solution Suggestion: Generates a suggested solution based on historical resolutions
- Team Assignment: Automatically determines which team should own the issue based on content analysis
- Human Approval: Posts to Google Chat for team review before taking action
- GitHub Comment: Adds an AI-generated comment with the suggested solution and similar past issues
When an issue is closed:
- Thread Analysis: Reads the entire issue thread including all comments
- Solution Extraction: Identifies the actual working solution (ignoring failed attempts)
- Structured Summarization: Creates a knowledge base entry with:
- Problem statement
- Root cause analysis
- Step-by-step solution
- Error messages and logs
- Owning team
- Vector Embedding: Converts the summary to embeddings using AWS Bedrock Titan
- Knowledge Storage: Saves to PostgreSQL with PGVector for semantic search
2. Google Chat Bot Workflow
This workflow provides our team with instant access to the knowledge base through conversational commands:
The /suggest command reads the entire chat thread context, queries the knowledge base for similar issues, and suggests solutions with references to past tickets—all within the same thread. The /save command allows manual addition of solutions to the knowledge base, which is particularly useful for documenting tribal knowledge.
The /troubleshoot <application_name> command integrates with our observability platform via MCP protocol to perform real-time diagnostics. It checks synthetic metrics for the last 5 minutes, identifies error spikes and latency issues, extracts the 5 most recent error spans, and synthesizes a root cause analysis. The entire flow is optimized for speed with targeted queries and small time windows.
Technical Architecture
System Overview
graph TB
GitHub[GitHub Issues]
GChat[Google Chat]
n8n[n8n Workflows]
Bedrock[AWS Bedrock<br/>Claude 3.5 + Titan Embeddings]
PGVector[(PostgreSQL<br/>PGVector)]
Observability[Observability Platform<br/>MCP Server]
GitHub -->|Webhook| n8n
GChat -->|Webhook| n8n
n8n -->|Query/Embed| Bedrock
n8n -->|Vector Search| PGVector
n8n -->|Store Knowledge| PGVector
n8n -->|Post Updates| GitHub
n8n -->|Send Messages| GChat
n8n -->|Query Metrics| Observability
Bedrock -->|Generate| n8n
PGVector -->|Similar Issues| n8n
Observability -->|Diagnostics| n8n
style n8n fill:#3ecf8e
style Bedrock fill:#FF9900
style PGVector fill:#336791
AI Layer
The AI layer uses Claude 3.5 Sonnet via AWS Bedrock for reasoning and synthesis, while Amazon Titan Embed v2 handles vector representations. JSON schema validation ensures consistent knowledge base entries through structured output.
Data Layer
PostgreSQL with the PGVector extension serves as our vector database, enabling cosine similarity search with top-K retrieval. Each vector stores comprehensive metadata including issue number, URL, solution, owning team, and root cause analysis.
Integration Layer
The GitHub API provides webhooks for issue events and REST API access for comments and labels. Google Chat API handles webhook triggers and message posting, while our observability platform connects via Model Context Protocol for metrics and traces.
Orchestration
n8n provides the visual workflow automation that connects all components. Switch nodes route different issue actions based on conditional logic, and approval gates enable human-in-the-loop oversight for AI-generated content.
Workflow Diagrams
New Issue Flow: AI-Powered Triage
sequenceDiagram
participant User
participant GitHub
participant n8n
participant AI as AWS Bedrock<br/>(Claude)
participant KB as Knowledge Base<br/>(PGVector)
participant Team as Google Chat
User->>GitHub: Opens new issue
GitHub->>n8n: Webhook trigger
n8n->>KB: Vector search for similar issues
KB-->>n8n: Top 5 similar issues
n8n->>AI: Analyze issue + KB results
AI-->>n8n: Suggested solution + team
n8n->>Team: Post for approval
Team-->>n8n: Approve/Reject
n8n->>GitHub: Add comment with solution
n8n->>GitHub: Add team label
GitHub-->>User: Notification
Closed Issue Flow: Knowledge Capture
sequenceDiagram
participant User
participant GitHub
participant n8n
participant AI as AWS Bedrock<br/>(Claude)
participant KB as Knowledge Base<br/>(PGVector)
participant Team as Google Chat
User->>GitHub: Closes issue
GitHub->>n8n: Webhook trigger
n8n->>GitHub: Fetch all comments
GitHub-->>n8n: Issue thread
n8n->>AI: Extract solution from thread
AI-->>n8n: Structured summary<br/>(problem, root cause, solution)
n8n->>Team: Post summary for approval
Team-->>n8n: Approve/Reject
n8n->>AI: Generate embeddings
AI-->>n8n: Vector representation
n8n->>KB: Store with metadata
KB-->>n8n: Confirmation
Chat Bot Flow: Instant Support
sequenceDiagram
participant User
participant GChat as Google Chat
participant n8n
participant AI as AWS Bedrock<br/>(Claude)
participant KB as Knowledge Base<br/>(PGVector)
User->>GChat: /suggest <problem>
GChat->>n8n: Webhook with thread context
n8n->>GChat: Fetch full thread history
GChat-->>n8n: All messages
n8n->>KB: Semantic search
KB-->>n8n: Relevant past solutions
n8n->>AI: Generate response with context
AI-->>n8n: Solution + similar issues
n8n->>GChat: Reply in thread
GChat-->>User: Instant answer
Key Features
1. Intelligent Team Routing
The AI analyzes issue content and maps it to the appropriate engineering team based on the domain (infrastructure, platform, databases, networking, developer tools, etc.). It uses both label analysis and content understanding, so even unlabeled issues get routed correctly.
2. Solution Quality Control
The AI is trained to extract only the final working solution, ignoring all the trial-and-error that happened along the way. It looks for "turning point" comments where the user confirms the fix worked.
3. Precision Over Recall
When suggesting solutions, we prioritize accuracy over completeness. The prompt instructs the AI to include exact error codes and CLI commands, avoid empathetic filler in favor of pure technical solutions, and only mention approaches that actually worked. If no solution is found, the system explicitly says so rather than hallucinating an answer.
4. Speed-Optimized Troubleshooting
The /troubleshoot command uses a tiered retrieval strategy that dramatically reduces latency. It starts with synthetic metrics (pre-aggregated data) and only fetches raw spans if errors are detected. By limiting time windows to reduce data volume and requesting only specific attributes needed for diagnosis, we've reduced troubleshooting latency from minutes to seconds.
Real-World Impact
After deploying this system, we've seen transformative results. Triage time has dropped by 60%, as the AI handles initial classification and routing. Common problems that previously required engineer time now receive instant answers. The knowledge base preserves expertise that survives team turnover, and its accuracy continues to improve as it learns from each new resolution.
Perhaps most valuable is what the system has revealed about our infrastructure. By clustering similar issues, it's surfaced patterns we didn't know existed—recurring problems that pointed to underlying infrastructure issues we could fix proactively rather than reactively.
Lessons Learned
1. Structured Output is Critical
Early versions used free-form AI responses, which were inconsistent and hard to search. Switching to JSON schema validation with structured output parsers made the knowledge base much more reliable.
2. Human Approval Prevents Mistakes
We initially tried fully automated responses, but found edge cases where the AI suggested incorrect solutions. Adding a Google Chat approval gate lets our team catch errors before they reach users.
3. Metadata Matters
Storing rich metadata (team, error messages, solution type) alongside vectors enables powerful filtering and analytics. We can now track which teams have the most recurring issues.
4. Start Small with Time Windows
Our first troubleshooting queries searched days of data and timed out. Narrowing to minutes made queries fast and still caught 95% of real-time issues.
The Prompts: Prompt Engineering in Action
The quality of this system comes down to carefully crafted prompts. Here are the actual prompts we use:
Prompt 1: Knowledge Base Extraction (for closed issues)
You are a Principal Site Reliability Engineer (SRE) creating a "Knowledge Base"
entry from a raw GitHub Issue.
### INPUT DATA:
I will provide you with the Issue Title, Description, and the entire Comment History.
### YOUR GOAL:
Synthesize a technical "Root Cause & Solution" document. The output must be
specific enough that another engineer could copy-paste the solution to fix the
same error without reading the original thread.
### STRICT ANALYSIS RULES:
1. **Prioritize Code & Logs:** If the text contains error logs, stack traces,
or configuration snippets (YAML/JSON), these are the most important parts.
You MUST preserve specific error codes (e.g., "Exit Code 137", "OOMKilled").
2. **Find the "Turning Point":** Look for the comment where the user confirms
the fix (e.g., "That worked!", "Merged PR #123"). The solution is likely
in the comment *immediately preceding* this confirmation.
3. **Ignore Abandoned Paths:** If the users discussed 3 potential fixes but
only the 3rd one worked, completely ignore the first two. Do not mention
"We tried X and Y first." Only report the final working solution.
### OUTPUT FORMAT:
Extract these technical details into the JSON structure provided:
- problem: Summary of the user's original issue
- root_cause: What actually caused it
- solution: Step-by-step fix or code snippet
- error_messages: Specific error logs mentioned
- owning_team: The team responsible based on content analysis
Prompt 2: Solution Suggestion (for new issues)
You are a Senior Site Reliability Engineer (SRE) acting as a Tier 2 Support
specialist. Your goal is to provide immediate, actionable solutions to
infrastructure and platform issues based on the Technical Knowledge Base (KB)
available to you via your tools.
### OPERATIONAL GUIDELINES:
1. **Tool Usage**: Use the 'Postgres PGVector Store' tool to search for past
resolutions. Search using technical keywords from the user's issue (e.g.,
specific error codes, service names, or labels).
2. **Prioritize Precision**: If the KB entry contains specific error codes
(e.g., "Exit Code 137"), YAML snippets, or CLI commands, include them
exactly as they appear.
3. **No Fluff**: Do not use empathetic fillers like "I'm sorry you're having
this issue." Start immediately with the solution or the root cause analysis.
4. **Final Working Fix Only**: Do not mention "trial and error" or paths that
didn't work. Provide only the confirmed solution.
5. **Team Mapping**: Based on the content and the 'owning_team' found in the
KB, you must classify the response into one of the allowed team categories.
### RESPONSE FORMAT:
Your final response must follow the structure required by the Output Parser:
1. **solution**: A clear, technical explanation of the fix.
2. **similar_issues**: Extract the 'issue_number' from the metadata of the
items retrieved from the vector store.
3. **owning_team**: The team from the list above.
If no recorded solution is found in the Knowledge Base after searching, set
the solution to "No recorded solution found in the Knowledge Base. Please
escalate to the relevant domain team." and set the owning_team to "Unknown".
Prompt 3: Real-Time Troubleshooting
### High-Speed Troubleshooting Specialist
**Core Directive:** Prioritize **Time-to-Insight**. Do not request broad
datasets. Use a "tiered retrieval" strategy to provide the user with an
answer in the shortest possible time.
Use a small time window, e.g., the last hour.
### Optimized Workflow:
#### Phase A: The 5-Minute Pulse (Latency: ~500ms)
- Query **Synthetic Metrics** (not raw spans) for the last **5 minutes**.
- Goal: Identify if there is a spike in `error_count` or `p99_latency`.
- Logic: If metrics are healthy, report "No immediate issues in the last
5 minutes" and ask if the user wants to look further back.
#### Phase B: Targeted Error Extraction (Latency: ~1-2s)
- If Phase A shows errors, request **only the 5 most recent spans** where
`error=true`.
- Constraint: Do **not** request all span attributes. Request only: `span.id`,
`service.name`, `exception.message`, and `http.status_code`.
#### Phase C: Root Cause Synthesis
- Based on those 5 spans, identify the common denominator (e.g., all failing
spans point to the same `db.system`).
### Speed-Oriented Response Guidelines:
- **Summarize, Don't List:** Do not print a list of 10 spans. Say: "Found 42
errors in the last 5 mins; the primary cause is a `500 Internal Server Error`
on the `/auth` endpoint."
- **Be Succinct:** Use bullet points. Avoid conversational filler.
- **Early Exit:** If the application name is not found in the first tool call,
stop and ask for clarification immediately.
### Safety & Performance Constraints:
- **Max Time Window:** Never default to a window larger than **15 minutes**
unless explicitly asked.
- **Payload Limit:** Limit tool output to the top 10 results.
These prompts demonstrate several key techniques:
- Role Assignment: Giving the AI a specific persona (Principal SRE, Tier 2 Support)
- Clear Constraints: Explicit rules about what to include/exclude
- Output Structure: Defining exactly what format is expected
- Performance Optimization: Specifying time windows and data limits
- Failure Handling: What to do when no solution is found
Code Highlights
Here's how we extract the owning team using structured output:
{
"type": "object",
"properties": {
"owning_team": {
"type": "string",
"enum": [
"Infrastructure Team",
"Platform Team",
"Database Team",
"Networking Team",
"Developer Tools Team",
"Security Team",
"Observability Team",
"Unknown"
]
}
},
"required": ["owning_team"]
}
And here's the prompt that ensures we only extract working solutions:
### STRICT ANALYSIS RULES:
1. Prioritize Code & Logs: Preserve specific error codes
2. Find the "Turning Point": Look for confirmation comments
3. Ignore Abandoned Paths: Only report the final working solution
Future Enhancements
We're planning to add:
- Proactive Issue Detection: Monitor metrics and create issues automatically when anomalies are detected
- Solution Validation: Track if suggested solutions actually resolved the issue
- Multi-Repo Support: Extend beyond our main support tracker
- Slack Integration: Bring the bot to where more teams already collaborate
Conclusion
Building a self-learning support system isn't just about automation—it's about creating institutional memory that grows smarter over time. By combining n8n's visual workflow flexibility with Claude's reasoning capabilities and vector search's semantic understanding, we've transformed our support process from reactive and manual to proactive and intelligent.
The best part? This entire system runs on infrastructure we already had (AWS Bedrock, PostgreSQL) and required no custom application code. n8n's visual workflow builder made it easy to iterate and experiment until we found the right architecture.
If you're drowning in support tickets, consider this: every resolved ticket is training data for your next automation. The question isn't whether to build a system like this—it's how soon you can start learning from your own solutions.