dbt for analysts (no engineering background required) — 2026 guide
most analytics tutorials about dbt assume you are already a data engineer. they jump into Jinja templating, macros, and CI/CD pipelines on page two and lose any analyst who just wants to clean some data. the irony is that dbt was originally designed for analysts. it is the tool that gives you SQL-based transformation power without forcing you to become a software engineer.
if you write SQL (even just SELECT statements) and you have ever found yourself maintaining a tangled chain of saved queries, materialized views, or staged tables that nobody else can read, dbt is the tool that fixes that for you. it turns SQL transformations into versioned, tested, documented models that anyone on your team (including future you) can understand.
this tutorial is for analysts, solopreneurs, and small-team data people who know SQL well enough to write a JOIN but have never touched dbt. you will set up a dbt project from scratch, build your first models, write tests, and deploy a small pipeline. by the end you will have a working dbt project and the muscle memory to refactor your spaghetti SQL into a documented graph. this is for the analyst who is ready to graduate from “queries scattered across tools” to “modular, tested, documented data”.
what dbt actually is
dbt (data build tool) is a SQL-based transformation framework. you write SELECT statements (called “models”) and dbt handles the orchestration: which models depend on which, what order to run them in, and how to materialize the results.
dbt is a SQL-based data transformation tool that turns analyst SELECT statements into a managed, tested, documented pipeline. for analysts in 2026, it is the bridge between writing one-off queries and building maintainable data models, without learning Python, Spark, or full software engineering. dbt Core is open-source and free; dbt Cloud has a generous free tier with one developer seat.
dbt does not move data. it transforms data already loaded into a warehouse like Snowflake, BigQuery, Redshift, Postgres, or DuckDB.
what dbt is great at
- turning messy ad-hoc SQL into modular, reusable models
- documenting data transformations so others (and future you) can read them
- testing data quality automatically (uniqueness, nulls, referential integrity)
- versioning transformations through git
- running incremental models to speed up large data updates
- generating data lineage diagrams
what dbt is not for
- moving data from source systems into a warehouse (use Fivetran, Stitch, Airbyte, or custom ELT)
- visualization (use Tableau, Looker Studio, or Power BI)
- machine learning training (though dbt-ml is starting to bridge this)
- streaming/real-time pipelines
dbt fits in the T of ELT (Extract-Load-Transform). it lives between your raw warehouse data and your BI dashboards.
prerequisites
- comfort writing SELECT statements with JOINs and aggregations
- a SQL-accessible data warehouse (BigQuery, Snowflake, Redshift, Postgres, DuckDB)
- Python installed (for dbt Core) or use dbt Cloud (browser-based)
- git basics (commit, push, pull)
- 2 to 3 hours for first end-to-end setup
if you do not have a warehouse, the easiest free path is DuckDB running locally. it works exactly like a real warehouse but stores data in a single file on your machine.
step 1: choose dbt Core or dbt Cloud
| feature | dbt Core | dbt Cloud (free tier) |
|---|---|---|
| cost | free | free for 1 developer |
| install | pip install dbt-core |
browser-based, no install |
| IDE | bring your own (VS Code, etc.) | built-in IDE |
| scheduling | bring your own (cron, Airflow) | built-in scheduler |
| version control | bring your own git | git built-in |
| documentation site | yes | yes |
for solopreneurs and individual analysts in 2026, dbt Cloud free tier is usually the easier start. one developer seat, scheduled jobs, built-in IDE, and free.
for this tutorial I will use commands that work with either; small differences noted as we go.
step 2: set up your warehouse
if you have a real warehouse (Snowflake, BigQuery, Redshift, Postgres), use it. if not, set up DuckDB:
- install DuckDB:
pip install duckdb - create a directory for your project.
- dbt-duckdb adapter handles the rest once we configure the project.
[SCREENSHOT: warehouse selection screen during dbt project setup]
step 3: install dbt and create a project
dbt Core path
-
install dbt with the right adapter for your warehouse:
pip install dbt-duckdb
(use dbt-snowflake, dbt-bigquery, dbt-postgres, etc. for others) -
initialize a project:
dbt init my_first_project
dbt asks for connection details. fill them in. -
cd into the project:
cd my_first_project
dbt Cloud path
- sign up at cloud.getdbt.com.
- connect a git repo (GitHub or GitLab).
- connect your warehouse.
- dbt Cloud creates the project structure for you.
step 4: understand the dbt project structure
a fresh dbt project has these directories:
| directory | purpose |
|---|---|
models/ |
your SQL models (the SELECT statements) |
tests/ |
custom data tests |
macros/ |
reusable SQL snippets (Jinja-based) |
seeds/ |
small CSV files loaded as tables |
snapshots/ |
slowly-changing-dimension snapshots |
analyses/ |
one-off analytical queries |
dbt_project.yml |
project config |
profiles.yml |
warehouse connection (often outside the repo) |
for your first project, focus on models/ and dbt_project.yml. ignore the rest until you need it.
[SCREENSHOT: dbt project directory structure in VS Code or dbt Cloud IDE]
step 5: write your first model
models are SQL files that contain a single SELECT statement.
create models/staging/stg_customers.sql:
{{ config(materialized='view') }}
SELECT
customer_id,
TRIM(LOWER(email)) AS email,
first_name,
last_name,
created_at
FROM {{ source('raw', 'customers') }}
WHERE customer_id IS NOT NULL
a few things to notice:
{{ config(materialized='view') }}tells dbt to create this as a view (other options: table, incremental, ephemeral){{ source('raw', 'customers') }}is dbt’s reference syntax. it points to a source table you have declared elsewhere- the SELECT is just SQL, with light cleanup (trimming, lowercasing email)
create another model that depends on this one. models/marts/dim_customers.sql:
{{ config(materialized='table') }}
SELECT
c.customer_id,
c.email,
c.first_name || ' ' || c.last_name AS full_name,
c.created_at,
COUNT(o.order_id) AS lifetime_orders,
SUM(o.total) AS lifetime_revenue
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3, 4, c.created_at
{{ ref('stg_customers') }} is dbt’s reference between models. dbt understands the dependency: dim_customers depends on stg_customers and stg_orders.
[SCREENSHOT: dbt Cloud IDE showing two model files side by side]
step 6: declare your sources
create models/staging/_sources.yml:
version: 2
sources:
- name: raw
schema: raw
tables:
- name: customers
- name: orders
this tells dbt that “raw.customers” and “raw.orders” are external source tables your models depend on.
step 7: run your models
dbt run
dbt parses every model, figures out the dependency order, and runs them.
output looks like:
Running with dbt=1.7.0
Found 2 models, 0 tests, ...
1 of 2 START sql view model main.stg_customers ............ [RUN]
1 of 2 OK created sql view model main.stg_customers ....... [OK in 0.4s]
2 of 2 START sql table model main.dim_customers ........... [RUN]
2 of 2 OK created sql table model main.dim_customers ...... [OK in 0.8s]
Completed successfully
each model is now materialized in your warehouse as a view or table.
step 8: write your first test
dbt has built-in tests for the most common data quality issues.
add a test schema file at models/staging/_models.yml:
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
tests:
- unique
- not_null
- name: email
tests:
- not_null
run the tests:
dbt test
dbt runs each test against the materialized model. failures are flagged.
[SCREENSHOT: dbt test output showing pass/fail status]
built-in tests:
– unique: column has no duplicates
– not_null: column has no nulls
– accepted_values: column matches a list
– relationships: foreign key matches another table
custom tests can be written as SQL.
step 9: build the documentation site
dbt docs generate
dbt docs serve
dbt generates a static documentation site that includes:
– every model with its description
– every column with its description
– the lineage graph (visual DAG of dependencies)
– the SQL behind each model
– the tests configured
[SCREENSHOT: dbt docs site showing lineage graph with model nodes and edges]
this is one of dbt’s killer features. the docs site auto-updates as your models evolve.
step 10: add descriptions
descriptions are added in the YAML schema files:
models:
- name: dim_customers
description: "One row per customer with lifetime metrics."
columns:
- name: customer_id
description: "Primary key. Source: raw.customers.customer_id."
- name: full_name
description: "Concatenated first and last name."
- name: lifetime_revenue
description: "Sum of all order totals for this customer."
descriptions show up in dbt docs serve and in any BI tool that reads dbt metadata.
step 11: schedule the pipeline
dbt Core path
set up a cron job or use Airflow, GitHub Actions, or Dagster.
example GitHub Actions workflow that runs daily:
on:
schedule:
- cron: '0 6 * * *'
jobs:
dbt:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pip install dbt-duckdb
- run: dbt run
- run: dbt test
dbt Cloud path
- in dbt Cloud, go to Deploy → Jobs.
- create a job: name it “Daily refresh”.
- set the schedule (e.g., daily at 6am).
- set commands:
dbt runthendbt test.
dbt Cloud handles the rest.
comparing dbt to alternatives
| tool | cost | learning curve | best for |
|---|---|---|---|
| dbt Core | free | medium-high | SQL-comfortable, custom infra |
| dbt Cloud | free tier (1 dev) | medium | analysts wanting managed |
| Tableau Prep | $75/month | medium | non-SQL Tableau users |
| Power Query | free with Excel/Power BI | medium | non-SQL Excel users |
| Coalesce | $$$ | medium | enterprise dbt alternative |
| SQLMesh | free | high | dbt’s main 2026 competitor |
dbt is the right choice if you write SQL and want a maintainable, tested transformation layer. for the no-SQL alternatives see Power Query in Excel tutorial 2026 and tableau prep builder tutorial.
common mistakes
1. one giant model with 200 lines of SQL
dbt’s value is modularity. break long queries into staging models, intermediate models, and marts. typical pattern: stg_X (cleaned source), int_X (joined intermediate), mart_X (analysis-ready).
2. ignoring tests
models without tests are not trustworthy. at minimum, add unique and not_null to every primary key.
3. hardcoding table references
never write FROM raw.customers directly. always use {{ source('raw', 'customers') }} so dbt understands lineage.
4. skipping descriptions
undocumented models become unmaintainable in months. write at least a one-sentence description for every model and primary key.
5. running everything every time
for large tables, use materialized='incremental' so dbt only processes new rows.
the staging-intermediate-mart pattern (industry standard)
most well-organized dbt projects follow a three-layer pattern.
staging models
one staging model per source table, in models/staging/. minimal transformation: rename columns to consistent conventions, cast types, basic filtering. naming: stg_<source>__<table>.sql.
example: stg_stripe__customers.sql, stg_stripe__charges.sql, stg_app__users.sql.
staging is your firewall against upstream schema changes. when a source column gets renamed, you update one staging model, and downstream models keep working.
intermediate models
joins and aggregations that are reused across multiple marts, in models/intermediate/. naming: int_<concept>.sql.
example: int_customer_orders.sql joining stg_customers with stg_orders, used by both dim_customers and fct_orders downstream.
mart models
the final analysis-ready tables, organized by business domain, in models/marts/<domain>/. naming: dim_* for dimensions, fct_* for facts.
example: marts/sales/dim_customers.sql, marts/sales/fct_orders.sql.
this layering keeps your project navigable as it grows past 20 to 30 models.
materialization strategies
dbt supports four main materializations:
| materialization | how it builds | best for |
|---|---|---|
| view | recomputed on every query | small tables, fast iteration |
| table | rebuilt fully on each dbt run | medium tables, frequent reads |
| incremental | only new/changed rows on each run | large tables, append-only data |
| ephemeral | inlined into other models, not materialized | reusable logic, no need to query alone |
incremental models are the key to scaling dbt to large data. example:
{{ config(materialized='incremental', unique_key='order_id') }}
SELECT * FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
the is_incremental() block runs only on subsequent runs, processing only the newer rows.
tests beyond the built-in four
dbt’s built-in tests (unique, not_null, accepted_values, relationships) cover most cases. for the rest, write custom tests.
singular tests
a SQL file in tests/ that returns rows when something is wrong. example: tests/no_negative_revenue.sql:
SELECT * FROM {{ ref('fct_orders') }} WHERE revenue < 0
if any rows return, the test fails.
generic tests (parameterized)
write reusable tests as macros in tests/generic/. example: a test that ensures a column is always a positive number.
dbt_utils tests
the dbt_utils package provides additional tests like equal_rowcount, at_least_one, expression_is_true. install via packages.yml.
connecting dbt to your wider stack
dbt is the transformation layer. before and after:
- before dbt (loading data into warehouse): Fivetran, Stitch, Airbyte
- after dbt (visualization): Looker Studio complete tutorial 2026, tableau prep builder tutorial
- alternative for non-SQL users: Power Query in Excel tutorial 2026
- the warehouse layer underneath dbt: Supabase for data analysis 2026, PostgreSQL for analysts
- broader analytics stack context: best AI tools for data analysis 2026, best free data analysis tools 2026
a typical small-team modern stack: source apps → Fivetran → Snowflake/BigQuery → dbt → Looker Studio. dbt sits in the middle, turning raw into analysis-ready.
conclusion
dbt is the tool that takes analysts from “writing one-off SQL” to “maintaining real data models”. the leap is intellectual more than technical: you stop thinking about queries and start thinking about a graph of related, tested, documented transformations.
the 11 steps above are the path to your first working dbt project. the SQL is familiar. the dbt-specific concepts (ref, source, materialization, tests) are small and learnable in a couple of hours. the docs site and lineage graph become genuinely useful by week two.
start with a small project. take the most-used set of saved queries you have and rebuild them as 3 to 5 dbt models. add tests on the primary keys. write descriptions. run dbt docs generate. share the docs link with someone. that single exercise will make you understand why teams that adopt dbt almost never go back to scattered SQL files.
if you write SQL daily and your queries have outgrown a single file, dbt is the next tool to learn in 2026. set up dbt Cloud free this week and build your first project. the analytical clarity gain is real and immediate.