TL;DR
You can build an AI ticket-tagging pipeline in 2-4 hours using OpenAI’s API and either a no-code automation tool or a short Python script. Every new ticket gets classified into your predefined categories within seconds of arrival. You need a support platform with API access, an OpenAI account, and either Make or a basic Python environment.
What You Need Before You Start
- A support platform with API access: Zendesk (Team plan or above), Freshdesk (Growth plan), or Intercom (Starter plan)
- An OpenAI API account with at least $5 in credits (gpt-4o-mini keeps costs under $1 per 1,000 tickets)
- At least 200-300 real historical tickets with verified tags to use as a test set
- A defined tag taxonomy of 10-20 categories before you write a single line of code
- Python 3.10+ OR a Make account (free tier covers up to 1,000 operations per month)
- Basic familiarity with JSON (not required if you use Make)
- Google Sheets or Excel for your tag mapping document
Step 1: Audit Your Existing Tags and Build a Taxonomy
Before you touch any AI, you need a clean tag list. Pull your last 6 months of tickets and count how often each tag appears. Export them to a CSV, open it in Google Sheets, and use =COUNTIF($B:$B, A2) to tally frequency by tag name. You will almost always find 40-plus tags created ad hoc, with heavy overlap and a long tail that appears fewer than 5 times.
Delete or merge any tag used fewer than 10 times in 6 months. Group the survivors into a flat list of 10-20 categories. A clean taxonomy looks like this:
billing_issue
account_access
feature_request
bug_report
onboarding
integration_help
data_export
cancellation
general_inquiry
Keep it flat. Nested tags like billing > invoice > duplicate confuse both agents and AI models. Write a one-sentence definition for each tag in a second column. That definition feeds directly into your prompt in Step 3.
You should now see a single spreadsheet with 10-20 tags and a plain-English description of each one.
Step 2: Export a Sample Dataset for Testing
You need a ground-truth test set before you automate anything. Without it, you have no way to measure whether the AI is actually tagging correctly.
Export 100-200 tickets where you already know the correct tag. In Zendesk, go to Reporting > Explore, select your ticket view, and export as CSV. In Freshdesk, go to Reports > Export Tickets and filter by date range.
Your CSV needs at minimum these four columns:
ticket_id, subject, body_first_300_chars, correct_tag
Truncate ticket bodies to 300 characters. You do not need the full transcript for classification, and shorter inputs reduce API cost significantly. Strip HTML tags from the body before saving:
import re
def clean_body(text):
text = re.sub(r'<[^>]+>', '', text)
return text[:300].strip()
You should now see a clean CSV with 100-200 rows, each with a verified correct tag in the last column.
Step 3: Write Your Classification Prompt
The prompt is where most people get it wrong. A vague prompt produces vague tags. A specific prompt with your tag definitions produces consistent, repeatable results.
Your prompt has three parts: a system message, the tag list with definitions, and the ticket content. Here is a template that works well with gpt-4o-mini:
system_prompt = """
You are a customer support ticket classifier.
Classify the ticket into exactly ONE of the following categories.
Return only the category name, nothing else.
Categories:
- billing_issue: questions about charges, invoices, or payment failures
- account_access: login problems, password resets, locked accounts
- feature_request: suggestions for new functionality
- bug_report: something in the product is broken or behaving unexpectedly
- onboarding: setup help or getting started questions
- integration_help: connecting to third-party tools
- cancellation: requests to cancel or downgrade
- general_inquiry: anything that does not fit the above categories
"""
user_prompt = f"Subject: {subject}\n\nTicket: {body}"
Keep the category list identical to your taxonomy spreadsheet from Step 1. The general_inquiry fallback prevents the model from hallucinating a new tag when it is uncertain.
You should now see a complete prompt template ready to paste into your script or automation tool.
Step 4: Test the Prompt Against Your Sample Dataset
Run your 100-200 ground-truth tickets through the prompt before building any automation. This gives you an accuracy baseline.
import openai
import pandas as pd
client = openai.OpenAI(api_key="your_key_here")
df = pd.read_csv("sample_tickets.csv")
def classify_ticket(subject, body):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Subject: {subject}\n\nTicket: {body}"}
],
temperature=0
)
return response.choices[0].message.content.strip()
df["ai_tag"] = df.apply(
lambda row: classify_ticket(row["subject"], row["body"]), axis=1
)
accuracy = (df["ai_tag"] == df["correct_tag"]).mean()
print(f"Accuracy: {accuracy:.1%}")
df.to_csv("test_results.csv", index=False)
Set temperature=0 to make results deterministic. Anything below 80% accuracy means your tag definitions need more specificity or your taxonomy has overlapping categories. Open the output CSV, filter for rows where ai_tag != correct_tag, and read each misclassified ticket. Adjust the definition of the category that caused the confusion.
You should now see an accuracy score printed to the terminal and a CSV showing exactly where the AI agreed and disagreed with your manual labels.
Step 5: Set Up the OpenAI Call in Make (No-Code Option)
If Python is not your preferred path, Make handles this without writing code. Create a new scenario and add these modules in order:
- Webhook (instant) – receives new ticket data from your support platform
- HTTP > Make a request – calls the OpenAI Chat Completions API
- JSON > Parse JSON – extracts the tag from the response body
- Zendesk > Update Ticket (or Freshdesk equivalent) – writes the tag back
For the HTTP module, configure it as POST to https://api.openai.com/v1/chat/completions with headers Authorization: Bearer YOUR_API_KEY and Content-Type: application/json. Use this raw JSON body:
{
"model": "gpt-4o-mini",
"temperature": 0,
"messages": [
{"role": "system", "content": "YOUR SYSTEM PROMPT HERE"},
{"role": "user", "content": "Subject: {{1.subject}}\n\nTicket: {{1.description}}"}
]
}
Replace {{1.subject}} and {{1.description}} with the actual field references from your webhook module. Make’s visual mapper handles this with drag and drop.
You should now see the scenario saving without errors and a test run showing a valid tag name in the HTTP response body.
Step 6: Write the Tag Back to Your Support Platform
Getting the tag from the AI is only half the job. You need to write it back to the ticket record.
Zendesk:
import requests
def tag_zendesk_ticket(ticket_id, tag, api_token, subdomain, email):
url = f"https://{subdomain}.zendesk.com/api/v2/tickets/{ticket_id}.json"
payload = {"ticket": {"tags": [tag]}}
response = requests.put(
url,
json=payload,
auth=(f"{email}/token", api_token)
)
return response.status_code
Freshdesk:
def tag_freshdesk_ticket(ticket_id, tag, api_key, domain):
url = f"https://{domain}.freshdesk.com/api/v2/tickets/{ticket_id}"
payload = {"tags": [tag]}
response = requests.put(url, json=payload, auth=(api_key, "X"))
return response.status_code
Note that both APIs replace the existing tags array on a PUT request. If you want to append the AI tag without removing any human-assigned tags, fetch the current tags first and merge the lists before sending the PUT.
You should now see the AI-generated tag appearing on the ticket inside your support platform dashboard within a few seconds of the ticket arriving.
Step 7: Deploy the Automation as a Webhook Trigger
Connect everything so it fires on every new ticket automatically.
In Zendesk, go to Settings > Triggers > Add Trigger. Set the condition to “Ticket is Created” and add an action: “Notify by Webhook.” Point it at your Make webhook URL or your hosted Python endpoint. Pass ticket_id, subject, and description as JSON in the request body.
In Freshdesk, go to Admin > Automation > Ticket Creation Rules. Add a rule with condition “Ticket is Created” and action “Trigger Webhook” with the same payload structure.
If you are running the Python script server-side, wrap it in a Flask route:
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/classify", methods=["POST"])
def classify():
data = request.json
tag = classify_ticket(data["subject"], data["body"])
tag_zendesk_ticket(data["ticket_id"], tag, ...)
return jsonify({"tag": tag})
Host this on any VPS or a free-tier Railway instance. The whole response cycle takes under 3 seconds for a typical ticket.
You should now see new tickets arriving in your support platform with the AI tag already applied within 5-10 seconds of creation.
Step 8: Monitor Accuracy Weekly for the First Month
Automation without monitoring drifts. Set a calendar reminder to review 50 randomly sampled auto-tagged tickets each week for the first four weeks.
Build a review sheet in Google Sheets with these columns: ticket_id, ai_tag, correct_tag, match. Have a team member spot-check them and mark each row Y or N. Track your weekly accuracy score in a running chart. If accuracy drops below 80%, look for new ticket topics that your original taxonomy did not cover.
Common drift signals to watch: a new product feature launches and creates a ticket type you never defined. A billing cycle creates a spike in one category that the model starts over-applying to adjacent topics.
For more guidance on keeping AI workflows reliable in production, see /category/automation/ and the related guide on monitoring AI classification pipelines.
You should now see a weekly accuracy score that stays above 85% once your taxonomy stabilizes after the first month.
Common Mistakes To Avoid
- Using too many tags. More than 20 categories cuts accuracy sharply. Overlapping definitions are the main cause of misclassification, not the model’s capability.
- Skipping the test dataset. Deploying straight to production without ground-truth testing means you have no idea if accuracy is 60% or 95%.
- Sending full ticket transcripts. Long email threads with quoted replies add noise and increase API cost. Use the first 300 characters of the initial message only.
- Forgetting the fallback category. Without a
general_inquirybucket, the model picks the closest wrong tag instead of flagging uncertainty. - Overwriting human tags with AI tags. Use a dedicated field or a prefixed tag like
ai:billing_issueso agents see the AI suggestion without losing their manual classifications. - Leaving
temperatureat default. Set it to 0 for all classification tasks. Variability is useful for creative generation but actively harmful for routing decisions.
When To Level Up
This approach works well up to roughly 500 tickets per day. Beyond that, sequential API calls create a backlog unless you parallelize with async Python or a queue-based architecture. You will also hit limits when you need multi-label tagging (one ticket tagged both billing_issue and bug_report), sentiment scoring alongside classification, or dynamic routing to specific agent queues based on tag plus priority.
At that scale, purpose-built tools like Intercom Fin or Forethought handle classification natively without API wiring. They ship with pre-trained models on support data, which gives you a faster accuracy baseline than a general-purpose GPT model tuned with a custom prompt.
A third inflection point is compliance. The DIY approach logs tags but not reasoning chains. Some enterprise or regulated environments need explainability reports that show why a ticket was classified a certain way. Vendor platforms often include those reports out of the box.
Before you migrate to any vendor tool, document your taxonomy and prompt definitions. Every platform you evaluate will need the same category list to benchmark against your current accuracy. Browse the full toolkit for scaling support automation at /category/automation/.
Frequently Asked Questions
Does this work with Zendesk’s built-in AI features?
Zendesk has native intent detection on higher plans, but it uses a fixed taxonomy you cannot fully customize. The approach in this guide lets you define exactly which tags matter for your product and tune definitions to match your customers’ actual language.
How much does the OpenAI API cost for tagging at scale?
With gpt-4o-mini, each classification costs roughly $0.0002. Tagging 10,000 tickets per month runs about $2. Costs rise if you send full ticket transcripts instead of truncated first messages, so the 300-character truncation in Step 2 matters more than it looks.
What accuracy should I expect out of the box?
A well-written prompt with clear, non-overlapping tag definitions typically hits 80-90% accuracy on the first run. The biggest gains come from reducing tag overlap and sharpening definitions, not from switching to a more expensive model.
Can this handle multilingual support tickets?
Yes. GPT-4o-mini classifies correctly in most major languages without changing your prompt. You can write tag definitions in English and the model handles tickets in Spanish, French, German, and others reliably. For less common languages, run a native-language sample test before going live.
What if a ticket genuinely fits two categories?
The single-label setup in this guide picks one. If multi-label classification matters, update the system prompt to return a comma-separated list and modify your tag-writing function to apply multiple tags. Multi-label prompts produce more errors than single-label ones, so test thoroughly before deploying. See /ai-classification-for-support-teams/ for a full walkthrough.
Bottom Line
Tagging support tickets with AI is a 2-4 hour project once your taxonomy is clean and your test dataset is ready. Audit your existing tags down to 10-20 flat categories with sharp definitions. Write a classification prompt, test it against 100-200 real tickets, and hit 80-plus percent accuracy before wiring anything to production. Connect the OpenAI call to your support platform via webhook using Make or a lightweight Flask app. Monitor accuracy weekly for the first month and update the prompt whenever new ticket types emerge. At $0.0002 per ticket, even 50,000 tickets a month costs $10 in API calls, which is trivial compared to the triage time you recover. For more automation patterns that cut manual work in customer success workflows, browse the full toolkit at /category/automation/.