This section describes the requirements, access privileges, and other features of HVR when using Snowflake for replication. For information about compatibility and supported versions of Snowflake with HVR platforms, see Platform Compatibility Matrix.
For information about the supported data types and mapping of data types in source DBMS to the corresponding data types in target DBMS or file format, see Data Type Mapping.
To quickly setup replication using Snowflake, see Quick Start for HVR - Snowflake.
HVR requires that the Snowflake ODBC driver is installed on the machine from which HVR connects to Snowflake. For more information on downloading and installing Snowflake ODBC driver, see Snowflake Documentation.
This section lists and describes the connection details required for creating Snowflake location in HVR.
|The hostname or ip-address of the machine on which the Snowflake server is running. |
|The port on which the Snowflake server is expecting connections. |
|The name of the Snowflake role to use. |
|The name of the Snowflake warehouse to use. |
|The name of the Snowflake database. |
|The name of the default Snowflake schema to use. |
|The username to connect HVR to the Snowflake Database. |
|The password of the User to connect HVR to the Snowflake Database.|
Linux / Unix
Driver Manager Library
|The optional directory path where the ODBC Driver Manager Library is installed. For a default installation, the ODBC Driver Manager Library is available at /usr/lib64 and does not need to specified. When UnixODBC is installed in for example /opt/unixodbc-2.3.1 this would be /opt/unixodbc-2.3.1/lib.|
|The directory path where odbc.ini and odbcinst.ini files are located. For a default installation, these files are available at /etc and does not need to be specified. When UnixODBC is installed in for example /opt/unixodbc-2.3.1 this would be /opt/unixodbc-2.3.1/etc. The odbcinst.ini file should contain information about the Snowflake ODBC Driver under the heading [SnowflakeDSIIDriver].|
|The user defined (installed) ODBC driver to connect HVR to the Snowflake server.|
Integrate and Refresh Target
HVR supports integrating changes into Snowflake location. This section describes the configuration requirements for integrating changes (using Integrate and refresh) into Snowflake location. For the list of supported Snowflake versions, into which HVR can integrate changes, see Integrate changes into location in Capabilities.
HVR uses the Snowflake ODBC driver to write data to Snowflake during continuous Integrate and row-wise Refresh. However, the preferred methods for writing data to Snowflake are Integrate with /Burst and Bulk Refresh using staging as they provide better performance (see section 'Burst Integrate and Bulk Refresh' below).
When performing the refresh operation using slicing (option -S), a refresh job is created per each slice for refreshing only rows contained in the slice. These refresh jobs must not be run in parallel but should be scheduled one after another to avoid a risk of corruption on a Snowflake target location.
Grants for Integrate and Refresh Target
The User should have permission to read and change replicated tables.
The User should have permission to create and drop HVR state tables.
The User should have permission to create and drop tables when HVR Refresh will be used to create target tables.
Burst Integrate and Bulk Refresh
While Integrate is running with parameter /Burst and Bulk Refresh, HVR can stream data into a target database straight over the network into a bulk loading interface specific for each DBMS (e.g. direct-path-load in Oracle), or else HVR puts data into a temporary directory (‘staging file') before loading data into a target database. For more information about staging files on Snowflake, see Snowflake Documentation.
For best performance, HVR performs Integrate with /Burst and Bulk Refresh into Snowflake using staging files. HVR implements Integrate with /Burst and Bulk Refresh (with file staging ) into Snowflake as follows:
- HVR first stages data into the configured staging platform:
- Snowflake Internal Staging using Snowflake ODBC driver (default)
- AWS or Google Cloud Storage using cURL library
- Azure Blob FS using HDFS-compatible libhdfs API
- HVR then uses Snowflake SQL command 'copy into' to ingest data from the staging directories into the Snowflake target tables
HVR supports the following cloud platforms for staging files:
Snowflake Internal Staging
By default, HVR stages data on the Snowflake internal staging before loading it into Snowflake while performing Integrate with Burst and Bulk Refresh. To use the Snowflake internal staging, it is not required to define action LocationProperties on the corresponding Integrate location.
Snowflake on AWS
- An AWS S3 location (bucket) - to store temporary data to be loaded into Snowflake. For more information about creating and configuring an S3 bucket, refer to AWS Documentation.
- An AWS user with 'AmazonS3FullAccess' policy - to access this location. For more information, refer to the following AWS documentation:
- Define action LocationProperties on the Snowflake location with the following parameters:
- /StagingDirectoryHvr: the location where HVR will create the temporary staging files (ex. s3://my_bucket_name/).
- /StagingDirectoryDb: the location from where Snowflake will access the temporary staging files. If /StagingDirectoryHvr is an Amazon S3 location then the value for /StagingDirectoryDb should be same as /StagingDirectoryHvr.
- /StagingDirectoryCredentials: the AWS security credentials. The supported formats are 'aws_access_key_id="key";aws_secret_access_key="secret_key"' or 'role="AWS_role"'. How to get your AWS credential or Instance Profile Role can be found on the AWS documentation webpage.
If the S3 bucket used for the staging directory does not reside in the default us-east-1 region, the region of the S3 bucket (e.g eu-west-2 or ap-south-1) must be explicitly specified. To set the S3 bucket region, define the following action on the Snowflake location:
Environment /Name=HVR_S3_BOOTSTRAP_REGION /Value=s3_bucket_region
Snowflake on Azure
HVR can be configured to stage the data on Azure BLOB storage before loading it into Snowflake. For staging the data on Azure BLOB storage and perform Integrate with Burst and Bulk Refresh, the following are required:
- An Azure BLOB storage location - to store temporary data to be loaded into Snowflake
- An Azure user (storage account) - to access this location. For more information, refer to the Azure Blob storage documentation.
- Define action LocationProperties on the Snowflake location with the following parameters:
- /StagingDirectoryHvr: the location where HVR will create the temporary staging files (e.g. wasbs://myblobcontainer).
- /StagingDirectoryDb: the location from where Snowflake will access the temporary staging files. If /StagingDirectoryHvr is an Azure location, this parameter should have the same value.
- /StagingDirectoryCredentials: the Azure security credentials. The supported format is "azure_account=azure_account;azure_secret_access_key=secret_key".
Hadoop client should be present on the machine from which HVR will access the Azure Blob FS. Internally, HVR uses the WebHDFS REST API to connect to the Azure Blob FS. Azure Blob FS locations can only be accessed through HVR running on Linux or Windows, and it is not required to run HVR installed on the Hadoop NameNode although it is possible to do so. For more information about installing Hadoop client, refer to Apache Hadoop Releases.
Hadoop Client Configuration
The following are required on the machine from which HVR connects to Azure Blob FS:
- Hadoop 2.6.x client libraries with Java 7 Runtime Environment or Hadoop 3.x client libraries with Java 8 Runtime Environment. For downloading Hadoop, refer to Apache Hadoop Releases.
- Set the environment variable $JAVA_HOME to the Java installation directory. Ensure that this is the directory that has a bin folder, e.g. if the Java bin directory is d:\java\bin, $JAVA_HOME should point to d:\java.
- Set the environment variable $HADOOP_COMMON_HOME or $HADOOP_HOME or $HADOOP_PREFIX to the Hadoop installation directory, or the hadoop command line client should be available in the path.
- One of the following configuration is recommended,
- Set $HADOOP_CLASSPATH=$HADOOP_HOME/share/hadoop/tools/lib/*
Create a symbolic link for $HADOOP_HOME/share/hadoop/tools/lib in $HADOOP_HOME/share/hadoop/common or any other directory present in classpath.
Since the binary distribution available in Hadoop website lacks Windows-specific executables, a warning about unable to locate winutils.exe is displayed. This warning can be ignored for using Hadoop library for client operations to connect to a HDFS server using HVR. However, the performance on integrate location would be poor due to this warning, so it is recommended to use a Windows-specific Hadoop distribution to avoid this warning. For more information about this warning, refer to Hadoop Wiki and Hadoop issue HADOOP-10051.
Verifying Hadoop Client Installation
To verify the Hadoop client installation,
- The HADOOP_HOME/bin directory in Hadoop installation location should contain the hadoop executables in it.
Execute the following commands to verify Hadoop client installation:
If the Hadoop client installation is verified successfully then execute the following command to check the connectivity between HVR and Azure Blob FS:
To execute this command successfully and avoid the error "ls: Password fs.adl.oauth2.client.id not found", few properties needs to be defined in the file core-site.xml available in the hadoop configuration folder (for e.g., <path>/hadoop-2.8.3/etc/hadoop). The properties to be defined differs based on the Mechanism (authentication mode). For more information, refer to section 'Configuring Credentials' in Hadoop Azure Blob FS Support documentation.
Verifying Hadoop Client Compatibility with Azure Blob FS
To verify the compatibility of Hadoop client with Azure Blob FS, check if the following JAR files are available in the Hadoop client installation location ( $HADOOP_HOME/share/hadoop/tools/lib ):
Snowflake on Google Cloud Storage
HVR can be configured to stage the data on Google Cloud Storage before loading it into Snowflake. For staging the data on Google Cloud Storage and perform Integrate with Burst and Bulk Refresh, the following are required:
A Google Cloud Storage location - to store temporary data to be loaded into Snowflake
A Google Cloud user (storage account) - to access this location.
- Configure the storage integrations to allow Snowflake to read and write data into a Google Cloud Storage bucket. For more information, see Configuring an Integration for Google Cloud Storage in Snowflake documentation.
Define action LocationProperties on the Snowflake location with the following parameters:
/StagingDirectoryHvr: the location where HVR will create the temporary staging files (e.g. gs://mygooglecloudstorage_bucketname).
/StagingDirectoryDb: the location from where Snowflake will access the temporary staging files. If /StagingDirectoryHvr is a Google cloud storage location, this parameter should have the same value.
/StagingDirectoryCredentials: Google cloud storage credentials. The supported format is "gs_access_key_id=key;gs_secret_access_key=secret_key;gs_storage_integration=integration_name for google cloud storage".
Compare and Refresh Source
The User should have permission to read replicated tables.