Quick Start for HVR into Redshift
This appendix shows how to set up an HVR channel (called hvr_demo01) to replicate between a local Oracle database and an Amazon Redshift MPP database residing in the Amazon cloud (AWS). To connect efficiently to AWS, an HVR agent installation on an EC2 Linux VM is used. To use HVR Bulk Refresh and Integrate with Burst, an additional AWS S3 location is created as well, which functions as an intermediate step to load data using Redshifts ultrafast "Copy from" function.
The steps actually start by creating new databases and tables for HVR to replicate between. In real life these databases would already exist and be filled with the user tables, but for simplicity everything is created from scratch. Also, for simplicity, we will assume the source database resides on the hub as well and Oracle has already been installed there.
Before following this quickstart, please make sure the requirements have been met; see Requirements for Redshift.
- 1 Create Test Databases and Tables
- 2 Install HVR on-premises
- 3 Install HVR remote listener agent using HVR Image for AWS
- 4 Create the Hub Database
- 5 Connect To Hub Database
- 6 Create Oracle and Redshift Locations
- 7 Create Location Groups
- 8 Define Actions
- 9 Perform Initial Loading and Table Creation
- 10 Enable Replication with HVR Initialize
- 11 Start Scheduling of Replication Jobs
- 12 Test Replication
- 13 HVR Compare and Refresh
Create Test Databases and Tables
In the AWS portal, create a Redshift database using AWS -> Redshift Cluster -> Launch with cluster identifier hvrcluster and database nametestdb2.
If you don't have a VPC and security group yet, you will create them during step Additional Configuration
. You can use the defaults to create a minimal cluster or choose a multi node cluster. After creation, you can either create the tables in the cluster using the above .cre and .mod scripts through the SQL Workbench console or let HVR create them during initial loading (HVR Refresh with Create Absent Tables).
In order to use Bulk Refresh and Burst Integrate, HVR requires an additional S3 location to load data from into Redshift using the "Copy from" feature. In the AWS portal, create an S3 location using AWS -> S3 -> Create Bucket with identifier Copylocation_testdb2. Create it in the same region as your Redshift database.
Install HVR on-premises
First read section Introduction which explains the HVR's terminology and architecture. In particular this explains the importance of a hub database.
Then install the HVR software on the hub machine by following the installation steps in section Installing HVR on Windows or Installing HVR on Unix or Linux. If the hub machine is a Unix machine then HVR can either be installed on a Windows PC (so the HVR GUI can run on the PC and connect to the Unix hub machine) or the HVR GUI can be run on the Unix hub machine and connect back to an X server running on a PC.
This Quickstart assumes the Oracle Database on the hub server is also the source database. Most real-time integration scenarios use log-based capture. To enable log-base capture, configure the following:
- The user name that HVR uses must be in Oracle's group. On Unix and Linux this can be done by adding the user name used by HVR to the line in /etc/group that begins with dba. On Windows right–click My Computer and select Manage ▶ Local Users and Groups ▶ Groups ▶ ora_dba ▶ Add to Group ▶ Add.
- The Oracle instance should have archiving enabled. Archiving can be enabled by running the following statement as sysdba against a mounted but unopened database: alter database archivelog. The current state of archiving can be checked with query select log_mode from v$database.
The current archive destination can be checked with query select destination, status from v$archive_dest. By default, this will return values USE_DB_RECOVERY_FILE_DEST, VALID, which is inside the flashback recovery area. Alternatively, an archive destination can be defined with the following statement: alter system set log_archive_dest_1='location=/disk1/arc' and then restart the instance.
Install HVR remote listener agent using HVR Image for AWS
If you don't have the HVR Image for AWS available, your can obtain it by following the steps in New Installation of HVR Image for AWS or do the install manually: Install HVR Agent on Amazon AWS manually
For installing the HVR remote listener agent on a Linux VM, HVR Image for AWS will be used.
- Got to the AWS console, go to the EC2 section and press Create Instance
- In Step 1. Choose AMI use function my AMIs and select the HVR Agent AMI of your desired location. Create the image in the same region as the earlier created Redshift database and S3 storage bucket:
- In Step 2. Choose Instance Type select type t2.micro:
- If appropriate select a preferred network (VPC) subnet or use the default in Step 3. Configure Instance. Name the instance and select security group:
- On the next 2 steps 4. Add Storage and 5. Tag Instance just stick with the defaults
- HVR uses TPC/IP connectivity on port 4343. Configure an incoming connection for the HVR listener deamon on this port and limit the IP range as narrow as possible in step 6. Configure Security Group. When done click on Review and Launch:
- In the Amazon console retrieve the instance IP address:
Create the Hub Database
Create the hub database, in which the HVR GUI will store the channel definition. This is actually another user/schema in the Oracle instance.
$ sqlplus system/manager SQL> create user hvrhub identified by hvr 2 default tablespace users 3 temporary tablespace temp 4 quota unlimited on users; SQL> grant create session to hvrhub; SQL> grant create table to hvrhub; SQL> grant create sequence to hvrhub; SQL> grant create procedure to hvrhub; SQL> grant create trigger to hvrhub; SQL> grant create view to hvrhub; SQL> grant execute any procedure to hvrhub;
Connect To Hub Database
Start the HVR GUI on a PC by clicking on the HVR GUI icon (this is created by the HVR Installer for Windows) or by running hvrgui on Linux.
First, Register the hub database: right–click on hub machines ▶ Register hub. Enter connection details.
In this example the hub is a machine called guam, where an INET daemon is listening on port 4343. See section Installing HVR on Unix or Linux for how to configure this.
For a new hub database a dialog will prompt Do you wish to create the catalogs?; answer Yes.
Create Oracle and Redshift Locations
Next create the source location using right–click on Location Configuration ▶ New Location.
For the source, enter the credentials of the Oracle database testdb1 created earlier. The corresponding HVR location will be called db1:
In this example there is no need to check Connect to HVR on remote machine because testdb1 is on the same machine as the hub.
Ignore the Group Membership tab for now.
Create another location for the Redshift database testdb2 too and call it db2. Now do tick Connect to HVR on remote machine, because HVR needs to connect to the HVR remote listener agent installed before. Fill in the details of the VM running the remote listener there. Use operating system user hvr with any password. Choose Redshift as database and fill in the Redshift Connection info.
Create Location Groups
Add location db1 as a member of this group by checking the box for db1.
Then create a second location group, called DECENTRAL that has members db2 and db3.
The new channel also needs a list of tables to replicate. This can be done as follows; right–click on Tables ▶ Table Explore.
- Choose the first of the three locations ▶ Connect.
- In the Table Explore window, click on both tables and click Add.
- In new dialog HVR Table Name click OK.
- Close the Table Explore window.
- Perform table select again on one of the other locations and confirm that all tables to be replicated have value Same in column Match.
The new channel needs two actions to indicate the direction of replication and a third action for the S3 copy location needed for the "Copy from" feature:
- Right–click on group CENTRAL ▶ New Action ▶ Capture.
- Right–click on Group DECENTRAL ▶ New Action ▶ Integrate. Check /Burst, this enables HVR to use the "Copy from" feature of Redshift which will increase significantly the integrate speed.
- Right–click on Group DECENTRAL ▶ New Action ▶ LocationProperties. Check both /StagingDirectoryHvr and /StagingDirectoryDb with argument s3://Copylocation_testdb2/. Check /StagingDirectoryCredentials with argument 'aws_access_key_id=<''key''>;aws_secret_access_key=<''secret_key''>'.
These keys are created when your AWS account was created. You can manage these keys for specific users at AWS -> Identity & Access Management -> Users -> Security Credentials The keys should belong to a user having full S3 access - e.g. the user that created the bucket in the first place.
Note that the Actions pane only displays actions related to the objects selected in the left–hand pane. So click on channel hvr_demo01 to see all three actions.
Perform Initial Loading and Table Creation
HVR Refresh copies the data from one location to another location and optionally creates missing or mismatched tables and keys. In the HVR GUI, right-click on the channel and select HVR Refresh
For the source select location db1 and for target select location check db2. Check the options Create Absent Tables, With Key, Recreate Mismatched Tables and click Refresh.
Enable Replication with HVR Initialize
Now that the channel definition is complete, create the runtime replication system.
Right–click on channel hvr_demo01 ▶ HVR Initialize. Choose Create or Replace Objects and click HVR Initialize.
From the moment that HVR Initialize is done, all changes to database testdb1 will be captured by HVR when its capture job looks inside the logging.
HVR initialize also creates three replication jobs, which can be seen under the Scheduler node in the GUI.
Start Scheduling of Replication Jobs
Next, instruct the HVR Scheduler to trigger the replication jobs.
The replication jobs inside the Scheduler each execute a script under $HVR_CONFIG/job/hvrhub/hvr_demo01 that has the same name as the job. So job hvr_demo01–cap–db1 detects changes on database testdb1 and stores these as transactions files on the hub machine.
The other job (hvr_demo01–integ–db2) picka up these transaction files and perform inserts, updates and deletes on the target database.
To test replication, make a change in testdb1:
testdb1/hvr SQL> insert into dm01_product values (1, 19.99, 'DVD'); SQL> commit;
In the HVR log file you can see the output of the jobs by clicking on View Log. This log file can be found in $HVR_CONFIG/log/hubdb/hvr_demo01–cap–db1.
The job output looks like this:
hvr_demo01–cap–db1: Scanned 1 transaction containing 1 row (1 ins) for 1 table. hvr_demo01–cap–db1: Routed 215 bytes (compression=40.6%) from 'db1' into 1 location. hvr_demo01–cap–db1: Capture cycle 1. hvr_demo01–integ–db2: Integrate cycle 2 for 1 transaction file (215 bytes). hvr_demo01–integ–db2: Integrated 1 change from 'dm01_product' (1 ins). hvr_demo01–integ–db2: Integrate used 1 transaction and took 0.004 seconds. hvr_demo01–integ–db2: Waiting...
This indicates that the jobs replicated the original change to testdb2. A query on testdb2 confirms this:
SQL> select * from dm01_product;
HVR Compare and Refresh
HVR Compare checks whether two locations have identical rows, and HVR Refresh copies the content of one location to the second location. In the HVR GUI, right–click on a channel ▶ HVR Compare (or HVR Refresh). Choose two locations by clicking on the Select buttons.