Getting Started with AWS
HVR for AWS Quickstart Guide (Oracle to Redshift)
Jump to: Getting Started
This quickstart guide references a licensed HVR for AWS. The guide is written under the assumption the reader is familiar with relational SQL databases (Oracle is used as the source and Redshift is the target in this example) and database replication concepts. The user should also have an awareness of log-based change data capture, Amazon Web Services, as well as managing and connecting to EC2 instances.
HVR is a real-time heterogeneous data replication software that can be deployed across platforms including Amazon AWS. With HVR you get everything you need for data replication: table creation and initial data load, change data capture and delivery, and data validation/repair. For optimum processing and minimal resource utilization, a best practice implementation requires an installation of the HVR software on the source and on the target server. We recommend you use the capture once, deliver multiple capability to support a more highly available deployment across multiple availability zones.
There are three offerings for HVR for AWS:
- HVR licensed for one source, and one AWS-based target.
- HVR licensed for five sources, and one AWS-based target.
- HVR Bring Your Own License (BYOL) – this version requires a separate HVR license that may have been obtained directly from HVR Software, or through one of the licensed AMIs.
HVR for Amazon Web Services (AWS)
HVR for AWS, available from the AWS marketplace, is an AWS AMI containing all necessary components to connect to Amazon Redshift, Oracle and PostgreSQL on RDS, S3 or any HVR supported target database on EC2, enabling replication from/into all supported on-premises platforms. Note HVR for AWS is currently available only on Linux; connectivity to SQL Server requires an HVR installation on Windows (optionally running as an agent). With a few clicks, the AWS platform is ready to receive or send data through HVR.
HVR for AWS takes care of the firewall settings, necessary database drivers and required HVR components on AWS. Connecting from on-premises to AWS normally does not require any additional measures.
HVR Topology and Process Placement
An HVR environment requires one system to be designated the HVR hub. The hub is where the information about the HVR configuration is maintained and where the processes that control the replication are initiated. The GUI always connects to the hub.
HVR also has the concept of remote agents. These lightweight agents are typically run locally to the source and target endpoints to provide the most efficient access to the endpoints and maximize the efficiency of network communications. The hub communicates with these agents and orchestrates data movement from source to target.
HVR supports a variety of topologies, but when replicating from an on-premises database to the cloud, it is generally easiest to place the HVR hub on-premises and use the HVR for AWS AMI as the remote agent for the cloud data store. The hub always establishes the TCP/IP communication sessions with the remote agents. Having the hub on-premises means that the TCP/IP session to the AWS target is always an outgoing connection. i.e. established from the on-premise hub to AWS. In this scenario, no firewall ports have to be opened to allow access into the on-premise network, it just needs to be able to establish a session with the remote AMI. For more information about advanced options, for example to include a proxy installation to mitigate the risk of opening up the firewall to the on-premises data center, please refer to online resources.
For additional high availability across multiple availability zones we recommend you set up multiple agents in different availability zones as separate targets. The setup in HVR to capture once and deliver into multiple availability zones is trivial with hardly any extra setup steps required.
In this quickstart, the hub shares its configuration database with the source database. In a production environment, this hub database could be installed on a separate server, and as needed a separate agent may be installed on the source database server(s).
The example in this quickstart shows replication setup from an existing on-premises Oracle database running on Linux, to an existing AWS Redshift target, moving data into Redshift via an S3 bucket, with an HVR hub running on-premises on Linux using a separate schema in the Oracle source database as its hub database repository.
The following steps are required to set up HVR replication:
- Step 1: Obtain the HVR for AWS AMI
- Step 2: Create the Hub/Repository Database On-premises
- Step 3: Install the HVR software on-premises
- Step 4: Create the target location (Redshift)
- Step 5: Create the source location (Oracle)
- Step 6: Create the replication configuration (HVR channel)
- Step 7: Initialize the channel
- Step 8: Create and populate the target tables
- Step 9: Start change data capture replication
Each step is described in detail below.
Step 1: Obtain the HVR for AWS AMI
Before you can launch a new AMI, you must have an EC2 key pair which will be used to securely access the AMI. If you have an existing EC2 key pair defined, you can use that. You will need the private key file (<key-pair-name>.pem) on you client system in order to connect to the AMI.
If you don’t have an existing key pair defined you can create one using the ‘Key Pairs’ sub-menu within the EC2 Dashboard.
Once created, the associated private key file will be downloaded through your web browser to your local computer.
- To obtain HVR for AWS AMI, sign into the AWS portal, go to the AWS Marketplace, in the search bar search for “HVR” and select “HVR for AWS”. For optimum efficiency select the region you wish the HVR AMI to run in, this should be the same region as your target database in AWS, in this case the Redshift cluster.
- Click “Continue”.
- Review the ‘Region’ setting. Make sure this is the same region in which your target database is running.
- Review the ‘Key Pair’ setting. The ‘Key Pair’ should be the key pair you intend using to access this AMI.
- AWS will show you the approximate cost of running Next, click “Accept Software Terms & Launch with 1-Click”.
- You will be shown the regular costs of the AMI you are going to deploy. A 7-day free trial is included with HVR.
- You will now see the AMI being deployed in the Marketplace. This may take a few minutes. Once it is finished, the details of the created AMI can be accessed in the AWS EC2 Console. Note down the Public DNS name or IP address of the AMI as this will be required when connecting to the AMI.
- Once the AMI has been deployed and is running you can start an ssh session to the new AMI instance with the following command:
ssh -i <key-pair-name>.pem ec2-user@<Public-DNS-name>
Note. If you are using PuTTY as your ssh client, you will have to convert the .pem file to a PuTTY private key file (.ppk) using the puttygen utility.
Once logged in, the banner displayed (see example below) provides you with the file location of the SSL public certificate you need for an on-premises HVR installation to connect to the HVR AMI and the location on the AMI from which you can download the HVR software to install on-premises.
In this quickstart guide, we assume the HVR hub will be run on a Linux system with the HVR hub/repository database residing in a separate schema on the Oracle source database.
Use sftp, either on a command line or through a utility like winscp, to copy the public certificate .pub_cert file to your on-premises hub machine and to download the Linux and Windows HVR packages.
The Linux package should be placed on the on-premises Oracle source database server and the Windows package on the on-premises Windows laptop/desktop to run the GUI. Note the GUI can also run on a Mac or on a Linux-based system.
Step 2: Create the Hub/Repository Database On-premises
The Oracle source database will probably already exist, but the hub database tables need to be created. It is recommended that a dedicated database user is used for HVR to keep all the hub tables owned by one user. This example uses the Oracle user ‘hvr’ to own the HVR repository tables and the user ‘demo’ to own the tables we will be replicating. These users require the following permissions to be granted:
grant create session to hvr;
grant create table to hvr;
grant create trigger to hvr;
grant create procedure to hvr;
grant create session to demo;
grant create table to demo;
grant create trigger to demo;
grant create procedure to demo;
grant select any dictionary to demo;
grant select any transaction to demo;
Refer to Chapter 2 of the HVR User Manual for instructions for other databases supported by HVR. Note a pdf of the User Manual is installed in the doc directory below the location identified by HVR_HOME.
Optional step: Create source tables using demo channel definitions
If you created an empty source database for your first HVR environment, you need to populate it with tables. For a quick start, HVR includes in the distribution a few example replication setups, called “demo channels”. Please find them in the directory %HVR_HOME%demo. Demo01 contains two tables, dm01_product and dm01_order, which can be loaded into your channel. You may do so by using the following commands:
Demo scripts are also provided for a variety of other databases.
Step 3: Install the HVR Software On-premises
The HVR installation itself is identical whether you plan to use the installation as the hub, as an agent, or to run the GUI.
A lightweight process, the HVR remote listener, has to run on the Oracle source database server. The HVR GUI communicates with this process and for an agent installation the agent performs the database access on behalf of the hub. In this quick start we are using the Linux user hvr to run this agent, and installing the hvr software in /home/hvr/hvr_home.
Copy the HVR for Linux package you downloaded in Step 1 to a temporary directory on the Oracle server.
Login as user hvr and create 3 directories for HVR’s use:
Set the required HVR environment variables:
Set the required Oracle environment variables:
export ORACLE_HOME= export LD_LIBRARY_PATH=$HVR_HOME/lib:$ORACLE_HOME/lib
Consult Chapter 2 of the HVR User Manual for information on connecting to source databases other than Oracle. Unzip the software:
tar –xvzf /hvr-5.2.1_2.3-linux_glibc2.5-x64-64bit.tar.gz
Start the hvr agent daemon to listen for connections from the HVR gui on port 4343. Note this command disables password authentication for connections to the hub. Review chapter 3 of the User Manual for instructions on how to secure the connection.
$HVR_HOME/bin/hvrremotelistener -dN 4343
To perform the installation on a Windows-based laptop or desktop, place the HVR Windows installer you downloaded in step 1 on the machine that will be running the GUI. Run the installer, follow the installed wizard accepting the defaults.
Copy the ssl public .pub_cert file you downloaded from the AMI in Step 1 and place it into /home/hvr/hvr_home/lib/cert on your on-premises Linux-based hub machine. This will be used later in step 6.
First Start of HVR
Start the HVR GUI by clicking its icon. On first start HVR will ask for the credentials of the hub database (“register hub”). Select ‘Oracle’ as database class and enter the details for your hub database you just created. ORACLE_HOME points to the local Oracle client installation. You can either connect to the Oracle database by providing an SID or a TNS name. Leave “connect to HVR on a remote machine unchecked”. Once you click the ‘Connect’ button, HVR will create the necessary HUB tables in the Oracle database and be ready for use.
Refer to the Quick Start appendix in the HVR User Manual for connecting to database types other than Oracle.
Note HVR will complain that no license key is found. The license key is part of the AMI you created and will be referenced once we define the connection into AWS.
Step 4: Create the Target Location (Redshift)
This quickstart describes configuring Amazon Redshift as the HVR target database, although many other AWS targets are supported in both the RDS and user managed scenarios.
With AWS Redshift, there needs to be a target Redshift cluster with an existing database and an S3 bucket available. For this quickstart, we’re using an S3 bucket called sw-hvr-bucket and a Redshift cluster called sw-cluster and a database called swdb.
Note this location will be referencing the cloud license that is included with the AMI.
In the HVR hub, create a new Location Configuration, by right clicking ‘Location Configuration’ and selecting ‘New Location’. Fill in the details as follows:
Once all the connection details are complete, click on the ‘Test Connection’ button. A dialog indicating a successful connection will be displayed.
Next, configure the properties describing the S3 bucket staging location. Double-click the LocationProperties entry for ‘rshft’ in the Actions pane.
Add the additional properties show below:
For optimum high availability repeat these steps with another Redshift database and S3 bucket in a different availability zone.
Step 5: Create the Source Location (Oracle)
To create a location for your on-premises source database, right click on ‘Location Configuration’ and select ‘New Location’. The dialog for location properties appears. In this quick start the Oracle source database is on the same machine as your hub, so the Location Configuration should be set up as follows:
Instructions for configuring connections to databases other than Oracle can be found in the Quickstart appendix for each database of the HVR User Manual.
Press the ‘Test Connection’ button. A dialog indicating successful connection will be displayed.
Step 6: Create the Replication Configuration (HVR channel)
The replication configuration can be found under “Channel Definitions”. It should be empty. Create your first channel by right clicking selecting “Channel Definitions” and selecting “New Channel”. Fill in the channel name and description, then click ‘OK’
The quickstart Channel should now be listed with 2 sub-items, “Location Groups” and “Tables”.
We are going to add location groups, actions and tables to the channel to complete the setup:
First create 2 location groups, one for source and one for target, called SOURCE and TARGET respectively, by right-clicking “Location Groups” and selecting “New Group.” For SOURCE, select your Oracle source database location to be part of the group, for TARGET your Redshift database location.
For a high availability installation with multiple Redshift databases in different availability zones simply check the box for multiple physical targets.
Next, we’ll add the actions required to tell HVR what to do on the locations. For the SOURCE location group, the action is Capture, for the TARGET it is Integrate.
Right-click on the SOURCE location group, press “New Action” and select “Capture”. Make sure the top part of the dialog shows Channel quickstart, Group SOURCE, Table *. This is the context for this action.
Next, right click on the TARGET location group and select the Action “Integrate”.
Please check the option /Burst Integrate action catalog. This forces the operations to Redshift to be performed in performance-optimized mode transferring all data through S3 into Redshift before applying it to the target.
You will be presented a list of tables, select the ones to include in the channel (shift + click to select multiple tables)
Click Add, then OK on the next confirmation dialog, then close the table selection window.
Step 7: Initialize the Channel
Now the configuration of the channel is finished and all it left to let HVR create the necessary jobs and structures to replicate. Choose “HVR Initialize” from the menu by right-clicking on the channel name. Leave the radio button on Create or Replace Objects and all locations checked and press “Initialize”.
At this point, the replication jobs are created but not started.
If you set up multiple targets across availability zones then you will see multiple integrate jobs, one for each target.
Step 8: Create and Populate the Target Tables
If your source database already contains data, you will want to make sure this data is on your target as well. HVR can be used to create the necessary tables on the target to hold this data (target DDL creation), and perform the initial data load.
Choose HVR Refresh and select your target (the source will be preselected). Check “create absent tables” “if mismatched” for target DDL creation and then press “Refresh”.
Step 9: Start Change Data Capture Replication
Go to the scheduler and start the jobs. The scheduler must be running for the jobs to start and HVR will prompt you to start the scheduler if it isn’t already running.
Your replication is now running and will replicate change data as it occurs on your Oracle source tables.
With replication running there are a few extra steps that can be explored:
- Compare the changes using HVR Compare. Use the pop-up menu on the channel definition to invoke the compare function.
- Inspect replication statistics. Use the pop-up menu on the Scheduler node and invoke Statistics.
- Set up automated management tasks and alerts: use the pop-up menu on the Scheduler node and invoke Maintenance Tasks.