Skip to main content

Creating Datasets

A dataset represents a dataset in your data warehouse. Metriql automatically creates datasets from your dbt models, sources, and seeds. A dataset has three important properties:

dimension lets you drill down into the dataset, it can be used to filter query results. It can be either a column in your table or an SQL expression that processes a single row and returns a value.

measure is a field that uses an SQL aggregate function, such as count, sum, or average. Measures can be used to aggregate dimensions. For example, measures for a Sales model might include total items sold (a count), total sales price (a sum), and the average sales price (an average).

relation defines a join in between the datasets. Metriql automatically generates SQL join logic to bring in all fields from another dataset correctly then the user analyzes a specific dataset.

Create datasets from dbt metrics#

info

The feature is in beta and the API is unstable for now.

We support dbt's metrics natively. The datasets that is created via dbt metrics only have single measures and the dimensions are inherited from the base model / source / seed if you use meta.metriql in them. Please refer to dbt's metrics feature to learn more about the definitions.

Create datasets from dbt models#

You can map your columns as dimensions and measures under the meta property of columns as follows:

models:
- name: customers
columns:
- name: country_code
meta:
metriql.dimension:
type: string
- name: city
meta:
metriql.dimension:
type: string
- name: total_customers
description: total number of customers defined as count(*) in sql
meta:
metriql.measure:
aggregation: sum

In addition to column mapping, you can also create custom measures and dimensions by defining them under meta of the model:

models:
- name: customers
meta:
metriql:
measures:
total_rows:
aggregation: count
dimensions:
full_location:
sql: CONCAT({TABLE}.country, {TABLE}.city)
type: string

You can see the full list of properties that you can use under model.meta and column.meta here.

Create datasets from dbt sources#

In case you want to create models that point to tables in your database, you can make use of dbt's source properties as follows:

sources:
- name: raw_events
tables:
- name: pageview
meta:
metriql:
measures:
total_pageviews:
aggregation: count
columns:
- name: url
meta:
metriql.dimension:
type: string

If you're analyzing the time-series data, you can also define mappings so that Metriql understands your data in a better way and lets you access specific features such as funnel and retention:

models:
- name: events
meta:
metriql:
mappings:
event_timestamp: event_occurred_at
userId: user_id

Create datasets from dbt seeds#

Exposing your seeds to the end-users#

Since seeds are usually used to enrich the data, we don't create models from seeds by default. However; you can expose them to the end-users if you enable the seeds to have appeared in dbt docs:

seeds:
- name: countries
docs:
show: true
meta:
metriql:
measures:
#
columns:
- name: iso_code
meta:
metriql.dimension:
- name: user_friendly_name
meta:
metriql.dimension:
type: string
models:
- name: pageviews
meta:
metriql:
relations:
countries:
to: ref('countries')
sql: "{TABLE}.country = {TARGET}.iso_code"
type: left_join
relationship: many_to_many

Metriql automatically creates the relation from pageview to a hidden countries model so that the users can drill down by user_friendly_name when they're analyzing pageviews.

Supported tests#

We make use of your dbt test definitions in order to understand your data in a better way. Here is the list of column tests we currently support:

  • unique marks the dimension as primary key.
  • not_null hides the is not set filter to the user in the user interface.
  • accepted_values makes Metriql suggests the values to the end-user in the user interface. It disables the automatic suggestion in favor of performance.