Requirements for Hive ACID
This section describes the requirements, access privileges, and other features of HVR when using Hive ACID (Atomicity, Consistency, Isolation, Durability) for replication. For information about the capabilities supported by HVR on Hive ACID, see Capabilities for Hive ACID.
For information about compatibility and supported versions of Hive ACID with HVR platforms, see Platform Compatibility Matrix.
HVR uses ODBC connection to the Hive ACID server. One of the following ODBC driver should be installed on the machine from which it connects to the Hive ACID server HortonWorks ODBC driver 2.1.7 (and above) or Cloudera ODBC driver 2.5.12 (and above).
HVR can deliver changes into Hive ACID tables as a target location for its refresh and integration. Delivery of changes into Hive ACID tables for Hive versions before 2.3 is only supported with action ColumnProperties /TimeKey.
This section lists and describes the connection details required for creating Hive ACID location in HVR.
|Hive ODBC Connection|
|Hive Server Type||The type of Hive server. Available options:
|Service Discovery Mode||The mode for connecting to Hive. This field is enabled only if Hive Server Type is Hive Server 2. Available options:
|Host(s)||The hostname or IP address of the Hive server.|
When Service Discovery Mode is ZooKeeper, specify the list of ZooKeeper servers in following format [ZK_Host1]:[ZK_Port1],[ZK_Host2]:[ZK_Port2], where [ZK_Host] is the IP address or hostname of the ZooKeeper server and [ZK_Port] is the TCP port that the ZooKeeper server uses to listen for client connections.
|Port||The TCP port that the Hive server uses to listen for client connections. This field is enabled only if Service Discovery Mode is No Service Discovery. |
|Database||The name of the database schema to use when a schema is not explicitly specified in a query. You can still issue queries on other schemas by explicitly specifying the schema in the query. |
|ZooKeeper Namespace||The namespace on ZooKeeper under which Hive Server 2 nodes are added. This field is enabled only if Service Discovery Mode is ZooKeeper.|
|Mechanism||The authentication mechanism for connecting HVR to Hive Server 2. This field is enabled only if Hive Server Type is Hive Server 2. Available options: |
|User||The username to connect HVR to Hive server. This field is enabled only if Mechanism is User Name or User Name and Password. |
|Password||The password of the User to connect HVR to Hive server. This field is enabled only if Mechanism is User Name and Password.|
|Service Name||The Kerberos service principal name of the Hive server. This field is enabled only if Mechanism is Kerberos.|
|Host||The Fully Qualified Domain Name (FQDN) of the Hive Server 2 host. The value of Host can be set as _HOST to use the Hive server hostname as the domain name for Kerberos authentication.|
If Service Discovery Mode is disabled, then the driver uses the value specified in the Host connection attribute.
If Service Discovery Mode is enabled, then the driver uses the Hive Server 2 host name returned by ZooKeeper.
This field is enabled only if Mechanism is Kerberos.
|Realm||The realm of the Hive Server 2 host.|
It is not required to specify any value in this field if the realm of the Hive Server 2 host is defined as the default realm in Kerberos configuration. This field is enabled only if Mechanism is Kerberos.
|Linux / Unix|
|Driver Manager Library||The directory path where the Unix ODBC Driver Manager Library is installed. For a default installation, the ODBC Driver Manager Library is available at /usr/lib64 and does not need to specified. When UnixODBC is installed in for example /opt/unixodbc-2.3.2 this would be /opt/unixodbc-2.3.2/lib.|
|ODBCSYSINI||The directory path where odbc.ini and odbcinst.ini files are located. For a default installation, these files are available at /etc and does not need to be specified. When UnixODBC is installed in for example /opt/unixodbc-2.3.2 this would be /opt/unixodbc-2.3.2/etc.|
|ODBC Driver||The user defined (installed) ODBC driver to connect HVR to the Hive server.|
|SSL Options||Displays the SSL options.|
|Enable SSL||Enable/disable (one way) SSL. If enabled, HVR authenticates the Hive server by validating the SSL certificate shared by the Hive server.|
|Two-way SSL||Enable/disable two way SSL. If enabled, both HVR and Hive server authenticate each other by validating each others SSL certificate. This field is enabled only if Enable SSL is selected.|
|SSL Public Certificate||The directory path where the .pem file containing the client's SSL public certificate is located. This field is enabled only if Two-way SSL is selected.|
|SSL Private Key||The directory path where the .pem file containing the client's SSL private key is located. This field is enabled only if Two-way SSL is selected.|
|Client Private Key Password||The password of the private key file that is specified in SSL Private Key. This field is enabled only if Two-way SSL is selected.|
Hive ACID on Amazon Elastic MapReduce (EMR)
To enable Hive ACID on Amazon EMR,
- Add the following configuration details to the hive-site.xml file available in /etc/hive/conf on Amazon EMR:
- Save the modified hive-site.xml file.
- Restart Hive on Amazon EMR.
<!-- Hive ACID support --> <property> <name>hive.compactor.initiator.on</name> <value>true</value> </property> <property> <name>hive.compactor.worker.threads</name> <value>10</value> </property> <property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> </property> <property> <name>name>hive.enforce.bucketing</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nostrict</value> </property> <!-- Hive ACID support end -->
For more information on restarting a service in Amazon EMR, refer to How do I restart a service in Amazon EMR? in AWS documentation.
Integrate and Refresh Target
Burst Integrate and Bulk Refresh
- HVR requires an AWS S3 or HDFS location to store temporary data to be loaded into Hive ACID. If AWS S3 is used to store temporary data then HVR requires the AWS user with 'AmazonS3FullAccess' policy to access this location. For more information, refer to the following AWS documentation:
- Amazon S3 and Tools for Windows PowerShell
- Managing Access Keys for IAM Users
- Creating a Role to Delegate Permissions to an AWS Service
- /StagingDirectoryHvr: the location where HVR will create the temporary staging files. The format for AWS S3 is s3://<S3 Bucket>/<Directory> and for HDFS is hdfs://<NameNode>:<Port>/<Directory>
- /StagingDirectoryDb: the location from where Hive ACID will access the temporary staging files.
If /StagingDirectoryHvr is an AWS S3 location, then /StagingDirectoryDb should be same as /StagingDirectoryHvr.
- /StagingDirectoryCredentials: the AWS security credentials. The supported formats are 'aws_access_key_id=<''key''>;aws_secret_access_key=<''secret_key''>' or 'role=<''AWS_role''>'. How to get your AWS credential or Instance Profile Role can be found on the AWS documentation webpage.