Snowflake

📘
This article is for data engineers
This guide will provide you with the steps you need to perform in order to give Attribution access to your Snowflake database to execute the ETL process.

Before you continue with this guide please make sure that you have completed the Data Export for Amazon S3, Azure Blob Storage, or Google Cloud Storage since it is requirement for ETL for Snowflake.

Create Database/User/Role for Attribution

Review and run next SQL commands:

CREATE DATABASE attribution;
CREATE ROLE attribution_etl_role;
CREATE USER attribution;
GRANT ROLE attribution_etl_role TO USER attribution;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE attribution_etl_role; -- 👈 Replace "COMPUTE_WH" with warehouse name if needed
GRANT OWNERSHIP ON DATABASE attribution TO ROLE attribution_etl_role;
GRANT OWNERSHIP ON SCHEMA attribution.public TO ROLE attribution_etl_role;

Configure Attribution's Public Key

Attribution uses key-pair authentication to connect to your Snowflake instance. Run the following SQL command to assign Attribution's public key to the Snowflake user:

ALTER USER attribution SET RSA_PUBLIC_KEY='MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAvYccwUpgH/fhB5olKgLoBX6YkEp2k7TirVMxZsaAfJRyEJz/J2VZIsW6AnvoZMir1uoo3O1piLC546Pbs6kjMVCV/vUpUQDmXesKXTKrZnHzZW4d/N6UZ2o2jdYfWOmfz+4N9f3pfAvgSdkH+UXMCAfi4TlSyNXp66tLHZ2tN2PIDabXGIQMEAUpbwgVF5N0QfheRol8THnUrJuEdw2smiEXLDqYKo9nCA66df5vggzVi3SLCH4+yiGRVPASD3pp+7Q2GBKXRdUDDPPiqLKAIPJJRNlKO1fNpZ4j1tUdU2J5INmVFjJG14bcpU6gS7rl5R83wJHutmhNhFHNVt6LwwIDAQAB';

Access to cloud storage for Snowflake

You will need to provide us with access credentials and URL to your cloud storage provider where the data is exported to. This access credential would be used inside your Snowflake to connect to external storages for data ingestion.

For Amazon S3 - S3 URL to the bucket, create separate AWS user with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY - which will have access to your bucket where the data is exported. For addition information refer to Snowflake documentation.
For Azure Blob Storage - Blob SAS token and Blob URL. Please follow Snowflake documentation to generate token with required permissions.
For Google Cloud Storage - Service Account Key credential file (you already should have created it for data export). For addition information refer to Snowflake documentation.

Alternatively you can create STORAGE INTEGRATION on your side, this will give your Snowflake direct access to the data and you wouldn't need to share any cloud storage credentials with us. In this scenario please send us the storage integration name created in Snowflake and make sure that attribution user can access it:

GRANT USAGE ON INTEGRATION attribution TO ROLE attribution_etl_role;

Note that attribution in query above is "storage integration" name and it could be custom.

Please attach the requested credentials and send it to us in a secure email or disposable way.

Snowflake ODBC connection string

To find your ODBC connection string in Snowflake:

Click the account icon (bottom left)
Select "Connect a tool to Snowflake"
Go to "Connectors/Drivers"
Select "ODBC Connection string"
Choose database "ATTRIBUTION.PUBLIC" and Connection Method: "password"

You may need to format your Snowflake ODBC connection string as sampled below in case you have custom SCHEMA, WAREHOUSE, ROLE or any other. Note that it needs to be a single string without line breaks when entering in the Attribution connection dialog. If you followed instruction 1:1 above - you don't need to tweak copied connection string. Below is a line-broken sample for readability:

DRIVER=SnowflakeDSIIDriver;
Locale=en-US;
SERVER=account.us-east-1.aws.snowflakecomputing.com;
PORT=443;
ACCOUNT=account.us-east-1.aws;
DATABASE=ATTRIBUTION;
SCHEMA=PUBLIC;
WAREHOUSE=ATTRIBUTIONAPP_WH;
SSL=on;
QUERY_TIMEOUT=270;
UID=attribution;
ROLE=ATTRIBUTION_ETL_ROLE;
AUTHENTICATOR=SNOWFLAKE_JWT

Note: Do not include PRIV_KEY_FILE or PRIV_KEY_FILE_PWD in the connection string. These are provided separately through the Attribution setup form and are handled automatically.

For more details on ODBC parameters, see the Snowflake ODBC documentation.

The Process

Attribution ETL Service will perform next actions once per day:

Connect to Snowflake instance;
CREATE TABLEs for schema;
CREATE STAGE (external) to access exported data on cloud storage;
Truncate non-updatable tables and load new data from STAGE;
MERGE updates for updatable tables from STAGE;
Perform number of SELECT queries to verify that data is loaded correctly;
INSERT log records.
Once process is finished you are ready to use data as-is. You can build reports and views on top of data - no further actions are needed. Please refer to Data schema for more information.

🚧
Snowpipe
Snowpipe IS NOT USED by Attribution ETL Service. The reason for that is that Snowpipe is mainly designed to load append-only linear data, where Attribution data is updatable - which means data loaded once can be updated in future so old records needs to be updated or deleted. While it's possible to use PIPE alone to implement data loading, you would need to implement versioning of loaded data.
If you want to handle data loading on your end it's still possible to use Snowpipes but you will need to make sure that only actual data is used (e.g. previously imported data which was overwritten is ignored). There are number of ways to do it using VIEWs or ignoring old records by setting version tag on each record, however we do not recommend these methods unless you are proficient with such scenarios and know what you are doing.