Data schema

Attribution collects a number of data points of which can later be exported. Data scientists, analysts and developers can use the schema below to design custom reports on top of this data.

Attribution exports provide the same data in the same format as used in your Attribution Dashboard to build all available reports, this data is in a raw form (as collected) and does not have any attribution model applied.

It is not possible to export data with any attribution model applied because your model is dependent on many factors you select on the dashboard, for example your current filters, date range, and model type etc. Also note that exports do not contain any pre-build reports.

The graph below displays the break down of Attribution's raw data. Some tables contain normalised data, which means the same data could be present in multiple tables but in different formats. For example the params table stores parsed URL parameters originally stored in events.uri column.

Interactive Data Schema

Events

The events table contains all events and pageview data generated by the track() and page() snippet calls and also any server-side sent events as well. This table does not contain the raw form of event properties instead they are stored in properties table.

Column Name	Type	Description
`id`	`BIGINT`	The Unique ID of an associated pageview or track event. Primary key for this table and is referenced from other tables.
`name`	`VARCHAR`	The event name. For pageview's `Loaded a Page` or `Viewed * Page` is used depending how you fire the `page()` snippet method.
`ip`	`VARCHAR`	IP address of visitor.
`created_at`	`TIMESTAMP`	Timestamp when event was written into Attribution database.
`user_id`	`BIGINT`	References `users.id`. This is internal user ID generated by Attribution, not the USER_ID used for tracking users.
`browser_id`	`BIGINT`	References `browsers.id`.
`project_id`	`BIGINT`	Internal ID of your project.
`time`	`TIMESTAMP`	Timestamp when event was fired.
`referring_url`	`VARCHAR`	HTTP Referer URL of the pageview.
`referring_host`	`VARCHAR`	Hostname part of `referring_url`.
`revenue`	`BIGINT`	Revenue property value in cents.
`visitor_id`	`BIGINT`	Internal ID of visitor. References `visitors.id`.
`updated_at`	`TIMESTAMP`	Timestamp when event was last updated in database.
`uri`	`VARCHAR`	Pageview URL at capture time.
`uri_path`	`VARCHAR`	URL path of `uri`.
`self_referral`	`BOOLEAN`	`TRUE` if `referring_host` matches `uri` hostname, `FALSE` otherwise.
`message_id`	`VARCHAR`	Unique event ID generated by library which sent the event.
`source`	`VARCHAR`	Name of the source where event was captured from.
`type`	`VARCHAR`	`p` for pageview events, e.g. `page()` snippet command, `t` for custom events sent by `track()` snippet command.

Visits (sessions)

The visits table has variable names and depends on our internal ID for your project, if your internal project ID is 1234 - you'd have visits_1234 table in your data export. The visits table contains all the information you see as Visits in Dashboard.

Visits are touchpoints, usually a visit is the first pageview in the user session (learn more about what a visit is), we don't call it sessions because session are a group of pageviews and events, while visit are just the first pageview, which contains information about the source of the session (it could be URL parameters, referring website or combination of both). The source of the visit (and the whole session following it) is called a Filter in Attribution system.

🚧
Heads up!
This table contains the most value information needed to build custom reports on top of Attribution's data.
This table is NOT incrementally exportable as most of the other tables, this means you will have to drop the existing visits table and load in full with every export. The reason for that is that visits depend on filters and filters are often added/updated/removed, especially by integration and your marketing team, which means that the visits table is re-created based on information from events and filters table.
In other words visits are the result of applying filters to events and finding all the matches, for example if you have a filter which has a rule "URL Paramter utm_source is partner" than any pageview which has "utmsource=partner" in it's URL parameters will be written into_visits table.

Column Name	Type	Description
`id`	`BIGINT`	References `events.id`, unique for visits table. Can be used as Primary Key.
`visitor_id`	`BIGINT`	References `events.visitor_id` and `visitors.id`.
`visit_time`	`TIMESTAMP`	References `events.time`.
`filter`	`BIGINT`	References `xNNNN_filters.id` where `xNNNN_filters.type` is `filter`. Replace `NNNN` with your internal Project ID, e.g. 1234.
`company_id`	`BIGINT`	If the number is positive it matches `visitor_id` from above, if the number is negative it references `company.id`. If visitor belongs to a company (has `company_id` trait) this number would be negative, you need to apply modulus to reference `company.id`.
`visit_type`	`VARCHAR`	`v` is for regular visit (pageview); `i` is for influence touchpoint used for TV Attribution.
`user_id`	`BIGINT`	References `events.user_id` `and users.id`. Can be NULL.
`original_created_at`	`TIMESTAMP`	Matches `visitor.original_created_at`. This column contains value of createdAt trait that if passed to `identify()` call. Can be NULL.
`path`	`VARCHAR`	Matches `events.uri_path` but with the exception that NULL indicates for no path or `/`. Can be NULL.

Most of the columns are excessive in the visits table but they are added to help optimize complex queries, you can skip loading selected columns and just load the two most important -id and filter. The rest of the columns can be fetched by joining events and other linked tables.

Filters

The filters table has variable name which depends on our internal ID for your project, if your internal project ID is 1234 - you'd have x1234_filters_v2 table in your data export. The filters table contains all the information you see as Filters and Channels in Dashboard.

The dashboard has tree-like structure called a "filter tree". The Filters table contains information about both channels (type=group) and filters (type=filter) from the Dashboard, since handling tree-like structures in SQL could be tricky we tried to flatten everything into filters table for easier use.

When joining this table with visits make sure you only join WHERE "type"='filter'. You will only get filter level reporting, if you want to have channel level reporting you need to sum all filter metrics by it's channel (use top_parent_id, parent_id or channel_N_id to group).

🚧
Important notes
This table is NOT incrementally exportable, because filters are added/updated/removed often by automated integrations or your marketing team - it's hard to keep track of all the changes, so it's easier just to truncate and load this table again. Also the original order of the channels and filters as in Dashboard is not preserved here.
Primary Key of this table is composite (id, type), and technically you may have two rows with the same id which are not duplicates.
This table doesn't include "Groups" like "Paid Traffic" and others manually created.

Column Name	Type	Description
`id`	`BIGINT`	ID of filter or channel. Not unique.
`name`	`VARCHAR`	Name of the filter or channel.
`type`	`VARCHAR`	Could be `filter` (Filter) or `group` (Channel).
`label`	`VARCHAR`	Attribution or integration qualification of filter/channel, could take different values to help identify what this entity is about. Can be NULL.
`integration`	`VARCHAR`	Contains technical identifier of the integration, e.g. `bing` is for "Microsoft Ads" and etc.
`parent_id`	`BIGINT`	References `id` where `type` is `group`. ID of the parent channel to which this filter/channel belongs to.
`top_parent_id`	`BIGINT`	References `id` where `type` is `group`. ID of the top parent channel to which this filter/channel belongs to.
`ordinal`	`INTEGER`	Position of the filter/channel inside its parent group.
`level`	`INTEGER`	Nesting distance from the root level.
`sort_index`	`INTEGER`	Global position index of the filter/channel inside whole filter tree. Use in `ORDER BY` to get the same order of groups and filters as in dashboard.
`path`	`VARCHAR`	JSON encoded Array of the full path (top to bottom) of channels to which filter or channel belongs to.
`channel_N_name`	`VARCHAR`	Channel name on `N` level of the tree relative to current filter/channel.
`channel_N_id`	`BIGINT`	Channel ID on `N` level of the tree relative to current filter/channel.
`path_level_N`	`VARCHAR`	Path name on level `N` is represented as a string with channel names joined by the `>>` (space, right arrow, right arrow, space) string as separator. This value does not include the filter name itself.

Note:

📘
The columns channel_N_name, channel_N_id, and path_level_N represent data for each depth level N in your filter tree. Only the first 5 levels are exported as individual columns. If your filter tree contains more than five levels of nesting, use the path column instead — it captures the full hierarchy of parent filter names.

To recreate your Attribution dashboard from exported filters table, please run this SQL query:

SELECT
    repeat('▷ ', level) || name,
    *
FROM
    xNNN_filters_v2
ORDER BY
    sort_index;

Filters (legacy, pre-2024)

Currently Attribution exports the filters table in the new format. The legacy format is effective for projects who set up data export before 2024.

The filters table has variable name which depends on our internal ID for your project, if your internal project ID is 1234 - you'd have x1234_filters table in your data export. The filters table contains all the information you see as Filters and Channels in Dashboard.

🚧
Important notes
This table is NOT incrementally exportable, because filters are added/updated/removed often by automated integrations or your marketing team - it's hard to keep track of all the changes, so it's easier just to truncate and load this table again. Also the original order of the channels and filters as in Dashboard is not preserved here.
Primary Key of this table is composite (id, type), and technically you may have two rows with the same id which are not duplicates.

Column Name	Type	Description
`id`	`BIGINT`	ID of fliter of channel. Not unique.
`parent_group_id`	`BIGINT`	References `id` where `type` is `group`. ID of the parent channel to which this channel or filter belongs to.
`top_parent_group_id`	`BIGINT`	References `id` where `type` is `group`. ID of the TOP parent channel to which this channel or filter belongs to.
`type`	`VARCHAR`	Could be `filter` (Filter) or `group` (Channel).
`name`	`VARCHAR`	Name of the filter or channel.
`label`	`VARCHAR`	Attribution or integration qualification of filter or channel, could take different values to help identify what this entity is about. Can be NULL.
`integration`	`VARCHAR`	Contains technical identifier of the integration, e.g. `bing` is for "Microsoft Ads" and etc.
`path`	`VARCHAR`	JSON encoded Array of the full path (top to bottom) of channels to which filter or channel belongs to.

Visitors

The visitor table is populated every time a new visitor is tracked or identified, this table contains information about both identified and anonymous visitors.

If a visitor comes to your website as anonymous and later identifies themselves Attribution will keep two records - first the anonymous visitor and the second migrated_to visitor.
Important note this table does not contain USER_ID which you pass in with your identify() call, and is instead stored in the users table.

Column Name	Type	Description
`id`	`BIGINT`	Unique ID. The primary key for this table, it is referenced from other tables.
`user_id`	`BIGINT`	References `users.id`.
`browser_id`	`BIGINT`	References `browsers.id`.
`project_id`	`BIGINT`	Internal ID of your project.
`updated_at`	`TIMESTAMP`	Timestamp when visitor was last updated.
`ip`	`VARCHAR`	Visitor IP address.
`traits`	`VARCHAR`	JSON hash of visitor traits.
`email`	`VARCHAR`	Email extracted from `traits`.
`company_id`	`BIGINT`	References `companies.id`. Company to which visitor belongs to.
`migrated_to`	`BIGINT`	References `id` if this visitor was aliased or merged with another visitor.
`original_created_at`	`TIMESTAMP`	The time when visitor or user was created/registered in your database, `created_at` is extracted from `traits`.

Users

This table primarily contains identifier which is USER_ID passed by you in identify() and track() calls. These should be users that have been identified by your tracking system (Segment, Shopify, Heap, Attribution manual methods)

Column Name	Type	Description
`id`	`BIGINT`	Unique ID. The primary key for this table, is referenced from other tables.
`identifier`	`VARCHAR`	USER_ID you passed in `identify()` call or `user_id` used in `track()`. This is your user id which you pass to our system.
`created_at`	`TIMESTAMP`	Timestamp when user was written into Attribution database first time.
`project_id`	`BIGINT`	Internal ID of your project.
`updated_at`	`TIMESTAMP`	Timestamp when visitor was last updated.
`original_created_at`	`TIMESTAMP`	DEPRECATED. Use `visitor.original_created_at` instead.

Amounts

This table contains information about your integrations and manually entered spend amounts. Each row contains information about spend amount per single filter per date.

Column Name	Type	Description
`id`	`BIGINT`	Unique ID. The primary key for this table.
`value`	`BIGINT`	Spent amount in cents.
`created_at`	`TIMESTAMP`	Timestamp when user was written into Attribution database first time.
`filter_id`	`BIGINT`	References `visits.filter` and `xNNNN_filters.id AND xNNNN_filters.type = 'filter'` tables.
`source`	`VARCHAR`	Integration name if this amount is automatically pulled.
`date`	`DATE`	Date for which spend amount if collected.
`updated_at`	`TIMESTAMP`	Timestamp when visitor was last updated.
`project_id`	`BIGINT`	Internal ID of your project.
`amount_range_id`	`BIGINT`	Internal use only. Reference amount ranged set if this spend is part of it.
`deleted`	`BOOLEAN`	If this spend was deleted. When you query or join this table make sure to always add `deleted = FALSE` condition.
`original_value`	`BIGINT`	If currency conversion was applied it contains original value in cents.
`original_currency`	`VARCHAR`	If currency conversion was applied it contains ISO code of the initial currency.
`conversion_rate`	`DECIMAL(18, 6)`	Currency conversion rate applied at the moment of conversion.
`currency_converted_at`	`TIMESTAMP`	Timestamp when currency was converted.

Params

This table represents URL parameters found in pageviews URLs. Each row contains one pair of normalized and parsed URL parameters from events.uri field. If you have pageview event in events table with uri like https://example.com/?utm_source=partner&utm_campaign=AWESOME then it will result in two records being added to this table. First record would have key = utm_source and value = partner, second would have key = utm_campaign and value = awesome. Note that both key and value would be downcased and trimmed to 32 and 128 characters accordingly. If you need the original values please use events.uri instead.

Column Name	Type	Description
`key`	`VARCHAR`	Parameter key.
`value`	`VARCHAR`	Parameter value.
`project_id`	`BIGINT`	Internal ID of your project.
`time`	`TIMESTAMP`	Timestamp when event was fired. Matches `events.time`.
`event_id`	`BIGINT`	References `events.id`.
`updated_at`	`TIMESTAMP`	Timestamp when parameter was last updated.
`id`	`BIGINT`	Unique ID. The primary key for this table.

Properties

This table represents custom event properties passed with track() call. Each row contains one pair of properties. If you have called track('Custom Event', { plan: 'Starter', revenue: '$50.00' }) if will create one record in events table and two records in properties table. First record would have key = plan and value = Starter, second would have key = revenue and value = $50.00. Note that both key and value would be sorted as they been passed, no transformations are applied.

Column Name	Type	Description
`id`	`BIGINT`	Unique ID. The primary key for this table.
`key`	`VARCHAR`	Property key.
`value`	`VARCHAR`	Property value.
`project_id`	`BIGINT`	Internal ID of your project.
`event_id`	`BIGINT`	References `events.id`.
`updated_at`	`TIMESTAMP`	Timestamp when parameter was last updated.

Companies

This table contains information about visitor (user) company. Visitor could be assigned to a company by defining company_id and company_name traits when calling identify() call.

Column Name	Type	Description
`id`	`BIGINT`	Unique ID. The primary key for this table.
`project_id`	`BIGINT`	Internal ID of your project.
`identifier`	`VARCHAR`	`company_id` you pass in traits of `identify()` call. This is your company id which you pass to our system.
`updated_at`	`TIMESTAMP`	Timestamp when parameter was last updated.
`name`	`VARCHAR`	`company_name` passed in traits of `identify()` call.
`traits`	`VARCHAR`	JSON hash of company traits.

Browsers

This table contains information about visitor browser (user_agent) and Anonymous ID (cookie_id). When Visitor first comes to website Attribution Snippet generates unique Anonymous ID this identificator is stored in browser LocalStorage (cookies) and passed to Attribution tracking endpoint for all snippet calls (identify(), track(), alias()).

Column Name	Type	Description
`id`	`BIGINT`	Unique ID. The primary key for this table.
`cookie_id`	`VARCHAR`	Also known as Anonymous ID, this is the exact value of cookie stored in visitor browser Local Storage under `_attrb` key. Calling `Attribution.user().anonymousId();` in JavaScript will return you this value.
`created_at`	`TIMESTAMP`	Timestamp when browser information was written into Attribution database first time.
`user_agent`	`VARCHAR`	Exact string of brwoser User-Agent header. It can be used to detect platform and device visitor is using.
`updated_at`	`TIMESTAMP`	Timestamp when parameter was last updated.
`project_id`	`BIGINT`	Internal ID of your project.

If you have any questions on this please feel free to contact [email protected]