Requirements for Snowflake

  Since    v5.2.3/16  

Contents

Snowflake
Capture Hub Integrate

This section describes the requirements, access privileges, and other features of HVR when using Snowflake for replication. For information about the Capabilities supported by HVR on Snowflake, see Capabilities for Snowflake.

For information about compatibility and supported versions of Snowflake with HVR platforms, see Platform Compatibility Matrix.

ODBC Connection

HVR requires that the Snowflake ODBC driver is installed on the machine from which HVR will connect to Snowflake. For more information on downloading and installing Snowflake ODBC driver, see Snowflake Documentation.

Location Connection

This section lists and describes the connection details required for creating Snowflake location in HVR.

Field

Description

Database Connection

Server

The hostname or ip-address of the machine on which the Snowflake server is running.
Example: www.snowflakecomputing.com

Port

The port on which the Snowflake server is expecting connections.
Example: 443

Role

The name of the Snowflake role to use.
Example: admin

Warehouse

The name of the Snowflake warehouse to use.
Example: snowflakewarehouse

Database

The name of the Snowflake database.
Example: mytestdb

Schema

The name of the default Snowflake schema to use.
Example: snowflakeschema

User

The username to connect HVR to the Snowflake Database.
Example: hvruser

Password

The password of the User to connect HVR to the Snowflake Database.

Linux / Unix

Driver Manager Library

The optional directory path where the ODBC Driver Manager Library is installed. For a default installation, the ODBC Driver Manager Library is available at /usr/lib64 and does not need to specified. When UnixODBC is installed in for example /opt/unixodbc-2.3.1 this would be /opt/unixodbc-2.3.1/lib.

ODBCSYSINI

The directory path where odbc.ini and odbcinst.ini files are located. For a default installation, these files are available at /etc and does not need to be specified. When UnixODBC is installed in for example /opt/unixodbc-2.3.1 this would be /opt/unixodbc-2.3.1/etc. The odbcinst.ini file should contain information about the Snowflake ODBC Driver under the heading [SnowflakeDSIIDriver].

ODBC Driver

The user defined (installed) ODBC driver to connect HVR to the Snowflake server.

Integrate and Refresh Target

HVR supports integrating changes into Snowflake location. This section describes the configuration requirements for integrating changes (using Integrate and refresh) into Snowflake location.

Grants for Integrate and Refresh Target

  • The User should have permission to read and change replicated tables.

    grant select, insert, update, delete, truncate on tbl to hvruser

  • The User should have permission to create and drop HVR state tables.

  • The User should have permission to create and drop tables when HVR Refresh will be used to create target tables.

    grant usage, modify, create table on schema schema in database database to role hvruser

Burst Integrate and Bulk Refresh

HVR allows you to perform Integrate with Burst and Bulk Refresh on Snowflake (it uses the Snowflake "COPY INTO" feature for maximum performance).

Snowflake on AWS

For Snowflake on AWS, HVR requires the following to perform Integrate with Burst and Bulk Refresh:

  1. An AWS S3 location - to store temporary data to be loaded into Snowflake.
  2. An AWS user with 'AmazonS3FullAccess' policy - to access this location.
    For more information, refer to the following AWS documentation:
  3. Define action LocationProperties on the Snowflake location with the following parameters:
    • /StagingDirectoryHvr: the location where HVR will create the temporary staging files (ex. s3://my_bucket_name/).
    • /StagingDirectoryDb: the location from where Snowflake will access the temporary staging files. If /StagingDirectoryHvr is an Amazon S3 location then the value for /StagingDirectoryDb should be same as /StagingDirectoryHvr.
    • /StagingDirectoryCredentials: the AWS security credentials. The supported formats are 'aws_access_key_id="key";aws_secret_access_key="secret_key"' or 'role="AWS_role"'. How to get your AWS credential or Instance Profile Role can be found on the AWS documentation webpage.

Snowflake on Azure

  Since    v5.5.5/4  

For Snowflake on Azure, HVR requires the following to perform Integrate with Burst and Bulk Refresh:

  1. An Azure BLOB storage location - to store temporary data to be loaded into Snowflake
  2. An Azure user (storage account) - to access this location. For more information, refer to the Azure Blob storage documentation.
  3. Define action LocationProperties on the Snowflake location with the following parameters:
    • /StagingDirectoryHvr: the location where HVR will create the temporary staging files (e.g. wasbs://myblobcontainer).
    • /StagingDirectoryDb: the location from where Snowflake will access the temporary staging files. If /StagingDirectoryHvr is an Azure location, this parameter should have the same value.
    • /StagingDirectoryCredentials: the Azure security credentials. The supported format is "azure_account=azure_account;azure_secret_access_key=secret_key".
  4. Hadoop client should be present on the machine from which HVR will access the Azure Blob FS. Internally, HVR uses the WebHDFS REST API to connect to the Azure Blob FS. Azure Blob FS locations can only be accessed through HVR running on Linux or Windows, and it is not required to run HVR installed on the Hadoop NameNode although it is possible to do so. For more information about installing Hadoop client, refer to Apache Hadoop Releases.

    Hadoop Client Configuration

    The following are required on the machine from which HVR connects to Azure Blob FS:

    • Hadoop 2.6.x client libraries with Java 7 Runtime Environment or Hadoop 3.x client libraries with Java 8 Runtime Environment. For downloading Hadoop, refer to Apache Hadoop Releases.
    • Set the environment variable $JAVA_HOME to the Java installation directory.
    • Set the environment variable $HADOOP_COMMON_HOME or $HADOOP_HOME or $HADOOP_PREFIX to the Hadoop installation directory, or the hadoop command line client should be available in the path.
    • One of the following configuration is recommended,
      • Add $HADOOP_HOME/share/hadoop/tools/lib into Hadoop classpath.
      • Create a symbolic link for $HADOOP_HOME/share/hadoop/tools/lib in $HADOOP_HOME/share/hadoop/common or any other directory present in classpath.

        Since the binary distribution available in Hadoop website lacks Windows-specific executables, a warning about unable to locate winutils.exe is displayed. This warning can be ignored for using Hadoop library for client operations to connect to a HDFS server using HVR. However, the performance on integrate location would be poor due to this warning, so it is recommended to use a Windows-specific Hadoop distribution to avoid this warning. For more information about this warning, refer to Hadoop Wiki and Hadoop issue HADOOP-10051.

    Verifying Hadoop Client Installation

    To verify the Hadoop client installation,

    1. The HADOOP_HOME/bin directory in Hadoop installation location should contain the hadoop executables in it.
    2. Execute the following commands to verify Hadoop client installation:

      $JAVA_HOME/bin/java -version
      $HADOOP_HOME/bin/hadoop version
      $HADOOP_HOME/bin/hadoop classpath
      
    3. If the Hadoop client installation is verified successfully then execute the following command to check the connectivity between HVR and Azure Blob FS:

      To execute this command successfully and avoid the error "ls: Password fs.adl.oauth2.client.id not found", few properties needs to be defined in the file core-site.xml available in the hadoop configuration folder (for e.g., <path>/hadoop-2.8.3/etc/hadoop). The properties to be defined differs based on the Mechanism (authentication mode). For more information, refer to section 'Configuring Credentials' in Hadoop Azure Blob FS Support documentation.

      $HADOOP_HOME/bin/hadoop fs -ls wasbs://myblobcontainer/

    Verifying Hadoop Client Compatibility with Azure Blob FS

    To verify the compatibility of Hadoop client with Azure Blob FS, check if the following JAR files are available in the Hadoop client installation location ( $HADOOP_HOME/share/hadoop/tools/lib ):

    hadoop-azure-<version>.jar
    azure-storage-<version>.jar
    

Compare and Refresh Source

  • The User should have permission to read replicated tables.

    grant select on tbl to hvruser