Querying datasets
When you model the data and create datasets, you can analyze the data using the REST API or JDBC Adapter. Metriql provides different reporting features for different use-cases and lets you interact with the data in different ways.
You can use the measure
and dimension
references of your datasets, filter the data using them and calculate different metrics depending on the report type. See the available report types:
Name | Description | Mappings (* is required) |
---|---|---|
Segmentation | Filter, drilldown and calculate the metrics within one dataset | event_timestamp |
SQL | Run SQL queries with additional Jinja syntax | - |
Funnel | Analyze customer event data and understand the user journey | user_id*, event_timestamp* |
Retention | Analyze customer event data and build cohort tables | user_id*, event_timestamp* |
tip
If you're using Metriql in a BI tool, you will most likely be executing Segmentation queries.
Here are the common properties of the available report types:
dataset
#
You can reference datasets using the ref
or source
functions in your dbt project as follows:
dataset: source('events', 'pageview')
Alternatively, you can override the dataset name using meta.metriql.name
property as follows:
models:- eventsmeta:metriql:name: events
and reference the events
model as follows:
dataset: events # alternative to ref('events')
measure
#
You can reference the measures by their name. If you're using column.meta.metriql.measure
, you need to specify the name of the measure explicity as follows:
models:- eventsmeta:metriql:measures:total_events:aggregation: countcolumns:- name: revenuemeta:metriql.measure:aggregation: sumname: total_revenue
For the events
dataset here, total_events
and total_revenue
are the available measures that you can reference in the queries.
dimension
#
Metriql automatically maps all your columns as dimensions. If the column name has non-alphanumeric characters, you should define name
of the dimension explicitly to be able to reference as follows:
columns:- name: "test column"meta:metriql.dimension:name: test_column
If you're referencing a date
, timestamp
, time
dimension, you can use the timeframes as follows:
dimension_name::month_of_year
filter
#
You can filter datasets by their dimensions and measures. There are different operators available for different types.
You can reference dimensions in the filter as shown below. You can also reference mappings with :
prefix. For example, use :user_id
in dimension if you want to filter by user_id.
filters:and:- dimension: countryoperator: equalsvalue: USA- or:- {dimension: city, operator: equals, value: San Francisco}- {dimension: city, operator: equals, value: New York}
If the value is an array, Metriql will merge the filters with OR
otherwise; use AND
. For the query above, Metriql will compile it to the following SQL expression:
WHERE country = 'USA' AND (city = 'San Franscisco' OR city = 'New York')
or, you can reference a measure in the filter:
filters:- measure: total_revenue # adds a HAVING condition to the queryvalue: value_for_operator
The filter object can either have and
and or
property that is an array or a single filter that has one of measure
, dimension
, and mapping
with operator
and value
properties.
#
OperatorFor different field types, Metriql offers different operators to filter the data.
#
For all types:is_set
and is_not_set
can be used for NULL checks. The value
must not be set.
filters: [{measure: vendor_type, operator: is_set}]
integer
, decimal
, double
, long
:#
For not_equals
, less_than
, equals
, greater_than
. The value
must be a numeric value
filters: [{measure: time_spent_on_page, operator: greater_than, value: 10}]
boolean
:#
For is
can be used for boolean equity checks. The value
can be either true
or false
filters: [{dimension: is_upgraded, operator: is, value: true}]
timestamp
:#
For equals
, less_than
, greater_than
, between
. The value
can be defined as absolute or relative values.
#
Relative valuesYou can add timestamp filters relative to the current timestamp as follows:
filters: [{dimension: created_at, operator: between, value: '1 day'}]
The filter above compiles to the following SQL:
WHEREcreated_at >= CAST(DATEADD(DAY, -1, to_date(date_trunc('day', CURRENT_TIMESTAMP)) AS TIMESTAMP))AND created_at < CAST(DATEADD(DAY, 1, to_date(date_trunc('day', CURRENT_TIMESTAMP)) AS TIMESTAMP))
Metriql filters the data from the beginning of the previous day to the end of the current day for 1 day
.
The date period can be either minute
, hour
, day
, week
, month
, or year
. Metriql also supports the plural versions such as minutes
.
#
Absolute valuesIf you want to select based on absolute date intervals, the value must be an object as follows:
filters: [{dimension: created_at, operator: between, value: {start: '2020-01-10', end: '2020-01-20'}}]
The filter above compiles to the following SQL:
created_at >= CAST('2020-01-10' AS TIMESTAMP) AND created_at < CAST('2020-01-20' AS TIMESTAMP)
tip
If you set the timezone defined in your config file, all the timestamp references are wrapped with timezone conversion function. For example, in Snowflake it's CONVERT_TIMEZONE('UTC', CAST('2020-01-20' AS TIMESTAMP))
date
:#
For equals
, less_than
, greater_than
, between
. The value
can be defined as absolute or relative values similar to timestamp type.
time
:#
For equals
, less_than
, greater_than
. The value
should be a string as follows:
filters: [{dimension: occurred_at_time, operator: equals, value: '16:00'}]
If you set the timezone defined in your config file, the time value will be shifted for your timezone.
array_string
:#
For includes
, not_includes
. You can use the filters as follows:
filters: [{dimension: items, operator: includes, value: 'array_item'}]
Metriql actually has types array_*
for all primitive types and the value can be set based on the primitive type as follows:
filters: [{dimension: array_integer_dimension, operator: includes, value: 4}]
#
Referencing fields from other datasets through relationsIf you define a relation between multiple datasets, you can also access the measures and dimensions in the target datasets.
seeds:- name: countriesmeta:metriql:measures:total_countries:aggregation: countmodels:- name: pageviewsmeta:metriql:relations:country:to: ref('countries')sql: "{TABLE}.country = {TARGET}.iso_code"type: left_joinrelationship: many_to_many
In this case, the reference countries.total_countries
from the dataset pageviews
automatically creates a join in your SQL query. Learn more about the relations here.