How to predict churn without hiring an ML engineer - Data Research Analysis Collection

TL;DR

You can build a working churn prediction model in a weekend using your existing subscription data, a handful of behavioral signals, and a no-code ML tool like Obviously AI. the whole setup takes four to six hours spread across two sessions. you need a CSV export from your billing system, basic spreadsheet skills, and optionally a free account on a no-code ML platform.

What You Need Before You Start

Billing data export: 12+ months of subscription history with cancellation dates. Stripe, Chargebee, and Recurly all let you export this as CSV from their dashboard.
Product usage data: login frequency, feature adoption, and session length per user. Mixpanel, Amplitude, or even your own database logs work fine.
Google Sheets or Excel: any version from 2019 onward. you will write a few formulas, nothing complex.
Python (optional): Python 3.10+ with pandas and scikit-learn if you want to go beyond the spreadsheet. a free Google Colab account covers this at no cost.
Obviously AI account: free trial covers up to 10,000 rows. Akkio and Pecan AI are solid alternatives.
A clear definition of “churned”: decide this before you open a single file. churned means cancelled subscription, not just inactive. write it down somewhere you will not lose it.
At least 200 historical churned customers: below that number, any model you build will be unreliable.

Step 1: Export Your Subscription and Usage Data

Go into your billing system and pull a customer-level CSV. for Stripe, navigate to Customers > Export and select a 12-month date range. make sure the export includes: customer ID, subscription start date, subscription end date (blank if still active), plan name, and monthly recurring revenue.

Next, export your product usage data from Mixpanel or your analytics tool. you want one row per customer with columns like: total sessions in the last 30 days, features used in the last 30 days, days since last login, and support tickets opened.

Merge the two files by customer ID in Google Sheets using VLOOKUP:

=VLOOKUP(A2, UsageData!$A:$F, 2, FALSE)

Replace column index 2 with whichever column you need from the usage sheet.

You should now see a single flat table where each row is one customer and each column is either a subscription attribute or a usage metric. no blanks in the customer ID column.

Step 2: Add Your Churn Label

This is the column your model learns from. add a column called churned and fill it with 1 for customers who cancelled and 0 for everyone still active.

In Google Sheets, if your cancellation date is in column E:

=IF(E2="", 0, 1)

If you have trial customers who never converted, treat them separately. either exclude them entirely or create a second model for trial-to-paid conversion. mixing trial abandonments with genuine subscription cancellations confuses the model and produces nonsense predictions.

Check your label distribution. if fewer than 10% of your rows are churned, you have a class imbalance problem. note this down because you will handle it in step 6.

You should now see a churned column of 1s and 0s, and a COUNTIF at the bottom confirming roughly how many customers fall into each bucket.

Step 3: Choose Your Leading Indicators

You do not need 50 features. you need 4 to 8 that actually correlate with cancellation. based on patterns across B2B SaaS products, these tend to matter most:

days since last login (high = risky)
number of logins in last 30 days (low = risky)
number of core features used in last 30 days (low = risky)
support tickets opened in last 90 days (high can signal frustration, especially when combined with low usage)
time since onboarding completed (customers who never fully onboard churn faster)
expansion or contraction of seats or plan tier

Drop any column missing more than 20% of its values. fill remaining gaps with the column median, not zero. zero implies the customer had zero activity, which distorts the signal.

You should now have a clean feature set of 4 to 8 columns plus your churned label, with no blank cells anywhere in the table.

Step 4: Build a Simple Risk Score in Google Sheets

Before touching any ML tool, build a manual scoring model. it forces you to think about which signals matter most and gives you a baseline to beat later.

Assign weights based on your domain knowledge. a simple additive score works fine:

=( (days_since_login/MAX_days)*10*0.4 ) + ( (30-logins_last_30d)/30*10*0.3 ) + ( (10-features_used)/10*10*0.3 )

Normalize each input to a 0-10 scale first so the weights are comparable:

Normalized = (value - MIN(column)) / (MAX(column) - MIN(column)) * 10

Bucket the final score into Low (0-3), Medium (3-6), and High (6-10). pull a random sample of 20 customers from the High bucket and check whether those customers actually churned historically. if fewer than half did, your weights need adjusting.

You should now see a risk tier column you can sort and filter to find your most at-risk customers right now, today, before any ML model exists.

Step 5: Train a Real Model With a No-Code ML Tool

The manual score is useful but a trained model picks up on combinations of signals you would not think to weight manually. Obviously AI lets you upload the CSV you built in steps 1 through 3 and trains a classification model in under five minutes.

go to Obviously AI and create a free account.
click “New Project” and upload your customer feature CSV.
set the Target Column to churned.
click “Train.” the platform runs feature importance analysis and selects the best algorithm automatically.

Akkio is a strong alternative if you want a slightly different interface or need built-in Salesforce connectors. both tools handle class imbalance internally using SMOTE or similar resampling techniques, which matters if your churn rate sits below 10%.

You should now see an accuracy metric and a feature importance chart showing which signals the model weighted most heavily. if the top feature surprises you, investigate it before trusting any outputs.

Step 6: Validate the Model Output (Do Not Skip This)

Accuracy alone is a misleading metric for churn prediction. a model that predicts “nobody churns” on a 5% churn-rate dataset is 95% accurate and completely useless.

Look at these three metrics instead:

Precision: of the customers the model flagged as likely to churn, what percentage actually did?
Recall: of the customers who actually churned, what percentage did the model catch?
AUC-ROC: a score above 0.75 is a reasonable starting point for a simple churn model.

If you used Python with scikit-learn, generate the report with two lines:

from sklearn.metrics import classification_report, roc_auc_score

print(classification_report(y_test, y_pred))
print("AUC:", roc_auc_score(y_test, y_pred_proba[:, 1]))

For Obviously AI or Akkio, both platforms surface precision and recall in the model report tab without any code.

You should now be able to say: “this model catches X% of churners, and Y% of its churn predictions are correct.”

Step 7: Score Your Active Customers and Export a Priority List

Apply the trained model to your current customer base. export a fresh usage CSV for all active customers, run it through the same model, and get a churn probability score for each account.

In Obviously AI, click “Predict” and upload the new file. the platform returns a CSV with a predicted_churned column and a probability score between 0 and 1.

Sort by probability descending. the top 10 to 20 accounts are where your retention effort goes first. for each flagged account, check which feature drove the score highest and use that in your outreach.

If an account scores high because of “days since last login,” a re-engagement email sequence makes sense. if the driver is “support tickets last 90 days,” escalate to a personal customer success call immediately rather than an automated touch.

You should now have a prioritised list of at-risk accounts with a concrete, data-backed talking point for each one.

Step 8: Connect the Model to Your Existing Workflow

A prediction list sitting in a spreadsheet nobody checks is worthless. connect it to wherever your team actually works.

Option A: Slack alert via Zapier
Upload the scored CSV to a Google Sheet on a schedule. set a Zapier zap that triggers when a new row appears with churn_probability above 0.7 and posts a Slack message to your CS channel with the account name and the top risk factor.

Option B: Push into your CRM
Use the customer ID to match records in HubSpot or Salesforce and update a custom field called “Churn Risk Score.” your CS team filters their pipeline view by this field and works the highest-risk accounts without opening a spreadsheet.

See our customer data platform comparison for tools that can automate this connection at scale.

You should now see new churn risk scores appearing in your CRM or Slack without anyone manually running the model each week.

Step 9: Schedule a Monthly Model Refresh

Models drift. customer behaviour changes. a model trained on last year’s data gets stale faster than you expect, especially after a major product update or a pricing change.

Set a recurring calendar block for the first Monday of each month. the refresh takes about 30 minutes:

export a fresh 12-month customer dataset from your billing and analytics tools.
recalculate the churn labels based on cancellations in the new date window.
re-train the model in Obviously AI or your Python notebook.
compare the new AUC-ROC against the previous month. if it drops more than 5 points, investigate which features changed in distribution.

You should now have a documented, repeatable refresh process that keeps your predictions accurate as your product and customer base evolve.

Common Mistakes To Avoid

Labeling inactive users as churned. churn means cancelled subscription. a user who has not logged in for 60 days but is still paying is not churned. polluting your labels with inactivity makes the model predict the wrong thing entirely.
Using data that is not available at prediction time. if you train on “support tickets opened the week before churn,” that feature only exists in hindsight. use features you can observe 30 to 60 days before the expected churn date.
Ignoring class imbalance. a 5% churn rate means 95% of rows say “not churned.” most simple models just learn to predict the majority class. always check precision and recall, never just accuracy.
Never validating the feature importance output. if “account ID” or “customer email domain” shows up as a top predictor, something went wrong in your data prep. feature importance should make business sense to a non-technical person.
Running the model once and forgetting it. customer behaviour shifts constantly. monthly refreshes are not optional, they are the difference between a useful tool and a liability.
Sharing the full scored list with your whole team without training them. a salesperson calling a customer to say “we noticed you might cancel” is damaging. scope access to CS leads only and brief them on using the score as a conversation trigger, not a confession.

When To Level Up

This workflow handles most early-stage SaaS scenarios well. it breaks down when your customer base grows past 50,000 accounts and you need scores updated daily rather than weekly. it also struggles when you have complex multi-product behaviour or multiple cohorts with very different usage patterns that a single flat feature table cannot represent.

At that point you are looking at dedicated customer intelligence platforms like Gainsight or Totango, or a data warehouse pipeline feeding a custom XGBoost model retrained on a schedule. both paths require either a data engineer or a dedicated RevOps hire who can own the pipeline.

For most SaaS companies under 10,000 accounts, the approach in this guide is sufficient and cheaper than any enterprise platform. when you start asking “why is my model wrong on specific customer segments” rather than “does this even work at all,” that is the signal you are ready for something heavier. browse the data analysis tools section for comparisons of next-tier platforms and read our roundup of best no-code ML tools for small teams to see how these tools compare side by side before you commit to one.

Frequently Asked Questions

Do I need historical churn data to build a churn prediction model?
Yes. you need at least 200 examples of churned customers for the model to learn meaningful patterns. if your product is new and you have fewer cancelled accounts than that, focus on the manual risk scoring approach in step 4 until you accumulate more data.

What is a good AUC-ROC score for a churn model?
Anything above 0.75 is workable for early intervention. above 0.85 is solid. below 0.70, your features are probably not capturing the right signals and you should revisit which columns you included or how you defined churn.

How often should I retrain the model?
Monthly is the right default for most SaaS products. if you ship major feature changes or adjust pricing, retrain immediately after the change stabilises, usually two to four weeks post-launch.

Can I use this approach if my churn rate is below 5%?
You can, but you need to address class imbalance explicitly. use SMOTE oversampling in Python or choose a no-code tool that handles it automatically. also raise your probability threshold for flagging accounts so you are not flooding your CS team with false positives.

What if I do not have product usage data, only billing data?
Start with billing signals alone: days since signup, plan tier, MRR trend, and whether the customer has ever expanded. billing-only models are weaker but still useful. add usage data as soon as you can instrument it because even basic login frequency makes a meaningful difference to model accuracy.

Bottom Line

Predicting churn without an ML engineer is genuinely achievable for any SaaS founder with 12 months of billing data and a few hours to invest. you export your data, build a clean feature table, validate a manual scoring formula, train a model with a no-code tool, and pipe the outputs into your existing CS workflow. the whole thing runs on free or near-free tools until you scale past a point where the complexity justifies a bigger investment. revisit and retrain monthly, watch your precision and recall metrics, and keep your CS team focused on the highest-probability accounts with a specific reason to reach out. that loop alone catches most preventable churn before it hits your MRR. for your next step, explore the data analysis tools section to find platforms that can automate more of this pipeline as your team grows.