Snowflake
This article is for data engineers
This guide will provide you with the steps you need to perform in order to give Attribution access to your Snowflake database to execute the ETL process.
Before you continue with this guide please make sure that you have completed the Data Export for Amazon S3, Azure Blob Storage, or Google Cloud Storage since it is requirement for ETL for Snowflake.
Create Database/User/Role for Attribution
Review and run next SQL commands:
CREATE DATABASE attribution;
CREATE ROLE attribution_etl_role;
CREATE USER attribution PASSWORD='CREATE_A_SECURE_PASSWORD' MUST_CHANGE_PASSWORD=TRUE; -- 👈 Please change this
GRANT ROLE attribution_etl_role TO USER attribution;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE attribution_etl_role; -- 👈 Replace "COMPUTE_WH" with warehouse name if needed
GRANT OWNERSHIP ON DATABASE attribution TO ROLE attribution_etl_role;
GRANT OWNERSHIP ON SCHEMA attribution.public TO ROLE attribution_etl_role;
Once the steps above are done, please provide us in a secure way with next information:
YOUR_ACCOUNT - Your Snowflake account (e.g. kl20032
)
SNOWFLAKE_REGION - e.g. us-east-1
YOUR_PASSWORD - password you've set for attribution
user
COMPUTE_WH - your warehouse name
Access to cloud storage for Snowflake
You will need to provide us with access credentials and URL to your cloud storage provider where the data is exported to. This access credential would be used inside your Snowflake to connect to external storages for data ingestion.
- For Amazon S3 - S3 URL to the bucket, create separate AWS user with
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
- which will have access to your bucket where the data is exported. For addition information refer to Snowflake documentation. - For Azure Blob Storage - Blob SAS token and Blob URL. Please follow Snowflake documentation to generate token with required permissions.
- For Google Cloud Storage - Service Account Key credential file (you already should have created it for data export). For addition information refer to Snowflake documentation.
Alternatively you can create STORAGE INTEGRATION on your side, this will give your Snowflake direct access to the data and you wouldn't need to share any cloud storage credentials with us. In this scenario please send us the storage integration name created in Snowflake and make sure that attribution
user can access it.
Please attach the requested credentials and send it to us in a secure email or disposable way.
The Process
Attribution ETL Service will perform next actions once per day:
- Connect to Snowflake instance;
- CREATE TABLEs for schema;
- CREATE STAGE (external) to access exported data on cloud storage;
- Truncate non-updatable tables and load new data from STAGE;
- MERGE updates for updatable tables from STAGE;
- Perform number of SELECT queries to verify that data is loaded correctly;
- INSERT log records.
- Once process is finished you are ready to use data as-is. You can build reports and views on top of data - no further actions are needed. Please refer to Data schema for more information.
Snowpipe - status as of early 2023
Snowpipe IS NOT USED by Attribution ETL Service. The reason for that is that Snowpipe is mainly designed to load append-only linear data, where Attribution data is updatable - which means data loaded once can be updated in future so old records needs to be updated or deleted. That kind of scenarios are sometimes not possible to implement with PIPE alone.
If you want to handle data loading on your end it's still possible to use Snowpipes but you will need to make sure that only actual data is used (e.g. previously imported data which was overwritten is ignored). There are number of ways to do it using VIEWs or ignoring old records by setting version tag on each record, however we do not recommend these methods unless you are proficient with such scenarios and know what you are doing.
Updated 7 months ago