Quick Start for HVR into Redshift

From HVR
Jump to: navigation, search

This appendix shows how to set up an HVR channel (called hvr_demo01) to replicate between a local Oracle database and an Amazon Redshift MPP database residing in the Amazon cloud (AWS). To connect efficiently to AWS, an HVR agent installation on an EC2 Linux VM is used. To use HVR Bulk Refresh and Integrate with Burst, an additional AWS S3 location is created as well, which functions as an intermediate step to load data using Redshifts ultrafast "Copy from" function.

The steps actually start by creating new databases and tables for HVR to replicate between. In real life these databases would already exist and be filled with the user tables, but for simplicity everything is created from scratch. Also, for simplicity, we will assume the source database resides on the hub as well and Oracle has already been installed there.

Before following this quickstart, please make sure the requirements have been met; see Requirements for Redshift.

WD-Quickstart-for-Redshift.png

Create Test Databases and Tables

Generally when getting started with HVR a source schema with tables and data already exists. If so then this step can be skipped.

This Quickstart uses two empty tables named dm01_product and dm01_order. In an existing Oracle database, create a test schema and create the tables using the following commands.
Make sure the database is setup for log-based capture as described in Grants for Log-Based Capture Database (Oracle).

$ sqlplus system/manager
SQL> create user testdb1 identified by hvr
 2  default tablespace users
 3  temporary tablespace temp
 4  quota unlimited on users;
 
SQL> grant create session to testdb1;
SQL> grant create table to testdb1;
SQL> grant create sequence to testdb1;
SQL> grant create procedure to testdb1;
SQL> grant create trigger to testdb1;
SQL> grant create view to testdb1;
SQL> grant execute any procedure to testdb1;

Create the test tables.

$ cd $HVR_HOME/demo/hvr_demo01/base/oracle
$ sqlplus testdb1/hvr < hvr_demo01.cre
$ sqlplus testdb1/hvr < hvr_demo01.mod

In the AWS portal, create a Redshift database using AWS -> Redshift Cluster -> Launch with cluster identifier hvrcluster and database nametestdb2.
SC-AWS-Console Redshift Create.png
If you don't have a VPC and security group yet, you will create them during step Additional Configuration
SC-AWS-Console Redshift Config Additional.png

. You can use the defaults to create a minimal cluster or choose a multi node cluster. After creation, you can either create the tables in the cluster using the above .cre and .mod scripts through the SQL Workbench console or let HVR create them during initial loading (HVR Refresh with Create Absent Tables).


In order to use Bulk Refresh and Burst Integrate, HVR requires an additional S3 location to load data from into Redshift using the "Copy from" feature. In the AWS portal, create an S3 location using AWS -> S3 -> Create Bucket with identifier Copylocation_testdb2. Create it in the same region as your Redshift database.

SC-AWS-CreateBucket Copylocation testdb2.png

Install HVR on-premises

First read section Introduction which explains the HVR's terminology and architecture. In particular this explains the importance of a hub database.

Then install the HVR software on the hub machine by following the installation steps in section Installing HVR on Windows or Installing HVR on Unix or Linux. If the hub machine is a Unix machine then HVR can either be installed on a Windows PC (so the HVR GUI can run on the PC and connect to the Unix hub machine) or the HVR GUI can be run on the Unix hub machine and connect back to an X server running on a PC.

This Quickstart assumes the Oracle Database on the hub server is also the source database. Most real-time integration scenarios use log-based capture. To enable log-base capture, configure the following:

  • The user name that HVR uses must be in Oracle's group. On Unix and Linux this can be done by adding the user name used by HVR to the line in /etc/group that begins with dba. On Windows right–click My Computer and select Manage ▶ Local Users and Groups ▶ Groups ▶ ora_dba ▶ Add to Group ▶ Add.
  • The Oracle instance should have archiving enabled. Archiving can be enabled by running the following statement as sysdba against a mounted but unopened database: alter database archivelog. The current state of archiving can be checked with query select log_mode from v$database.

The current archive destination can be checked with query select destination, status from v$archive_dest. By default, this will return values USE_DB_RECOVERY_FILE_DEST, VALID, which is inside the flashback recovery area. Alternatively, an archive destination can be defined with the following statement: alter system set log_archive_dest_1='location=/disk1/arc' and then restart the instance.

Install HVR remote listener agent using HVR Image for AWS

If you don't have the HVR Image for AWS available, your can obtain it by following the steps in New Installation of HVR Image for AWS or do the install manually: Install HVR Agent on Amazon AWS manually
For installing the HVR remote listener agent on a Linux VM, HVR Image for AWS will be used.

  1. Got to the AWS console, go to the EC2 section and press Create Instance
  2. In Step 1. Choose AMI use function my AMIs and select the HVR Agent AMI of your desired location. Create the image in the same region as the earlier created Redshift database and S3 storage bucket:
  3. SC-AWS-Console Choose Image.png SC-AWS-Console Select AMI.png

  4. In Step 2. Choose Instance Type select type t2.micro:
  5. SC-AWS-Console Choose Instance Type.png

  6. If appropriate select a preferred network (VPC) subnet or use the default in Step 3. Configure Instance. Name the instance and select security group:
  7. SC-AWS-Console Configure Instance.png

  8. On the next 2 steps 4. Add Storage and 5. Tag Instance just stick with the defaults
  9. HVR uses TPC/IP connectivity on port 4343. Configure an incoming connection for the HVR listener deamon on this port and limit the IP range as narrow as possible in step 6. Configure Security Group. When done click on Review and Launch:
  10. SC-AWS-Console Security Group.png

  11. In the Amazon console retrieve the instance IP address:
  12. SC-AWS-Console ShowInstances.png

Create the Hub Database

Create the hub database, in which the HVR GUI will store the channel definition. This is actually another user/schema in the Oracle instance.

$ sqlplus system/manager
SQL> create user hvrhub identified by hvr
 2  default tablespace users
 3  temporary tablespace temp
 4  quota unlimited on users;
 
SQL> grant create session to hvrhub;
SQL> grant create table to hvrhub;
SQL> grant create sequence to hvrhub;
SQL> grant create procedure to hvrhub;
SQL> grant create trigger to hvrhub;
SQL> grant create view to hvrhub;
SQL> grant execute any procedure to hvrhub;


Connect To Hub Database

Start the HVR GUI on a PC by clicking on the HVR GUI icon (this is created by the HVR Installer for Windows) or by running hvrgui on Linux.

First, Register the hub database: right–click on hub machines ▶ Register hub. Enter connection details.

SC-Hvr-RegisterHub Oracle remote 4343.png
In this example the hub is a machine called guam, where an INET daemon is listening on port 4343. See section Installing HVR on Unix or Linux for how to configure this.

For a new hub database a dialog will prompt Do you wish to create the catalogs?; answer Yes.

Create Oracle and Redshift Locations

Next create the source location using right–click on Location Configuration ▶ New Location. For the source, enter the credentials of the Oracle database testdb1 created earlier. The corresponding HVR location will be called db1:
SC-Hvr-Location Oracle db1.png

In this example there is no need to check Connect to HVR on remote machine because testdb1 is on the same machine as the hub.
Ignore the Group Membership tab for now.

Create another location for the Redshift database testdb2 too and call it db2. Now do tick Connect to HVR on remote machine, because HVR needs to connect to the HVR remote listener agent installed before. Fill in the details of the VM running the remote listener there. Use operating system user hvr with any password. Choose Redshift as database and fill in the Redshift Connection info.


SC-Hvr-Location Redshift decre agentVM.png


Create Location Groups

The channel needs two location groups. Under the new channel: right–click on Location Groups ▶ New Group. Enter a group name (for instance CENTRAL).
SC-Hvr-LocationGroup demo01 CENTRAL generic.png

Add location db1 as a member of this group by checking the box for db1.

Then create a second location group, called DECENTRAL that has members db2 and db3.

The new channel also needs a list of tables to replicate. This can be done as follows; right–click on Tables ▶ Table Explore.

  • Choose the first of the three locations ▶ Connect.
  • In the Table Explore window, click on both tables and click Add.
  • In new dialog HVR Table Name click OK.
  • Close the Table Explore window.
  • Perform table select again on one of the other locations and confirm that all tables to be replicated have value Same in column Match.

Define Actions

The new channel needs two actions to indicate the direction of replication and a third action for the S3 copy location needed for the "Copy from" feature:

  • Right–click on group CENTRAL ▶ New Action ▶ Capture.
  • Right–click on Group DECENTRAL ▶ New Action ▶ Integrate. Check /Burst, this enables HVR to use the "Copy from" feature of Redshift which will increase significantly the integrate speed.
  • Right–click on Group DECENTRAL ▶ New Action ▶ LocationProperties. Check both /StagingDirectoryHvr and /StagingDirectoryDb with argument s3://Copylocation_testdb2/. Check /StagingDirectoryCredentials with argument 'aws_access_key_id=<''key''>;aws_secret_access_key=<''secret_key''>'.

These keys are created when your AWS account was created. You can manage these keys for specific users at AWS -> Identity & Access Management -> Users -> Security Credentials The keys should belong to a user having full S3 access - e.g. the user that created the bucket in the first place.


SC-Hvr-Gui Channel demo01-LocationProperties.png

Note that the Actions pane only displays actions related to the objects selected in the left–hand pane. So click on channel hvr_demo01 to see all three actions.

Perform Initial Loading and Table Creation

HVR Refresh copies the data from one location to another location and optionally creates missing or mismatched tables and keys. In the HVR GUI, right-click on the channel and select HVR Refresh

SC-Hvr-Refresh demo01 create absent tables with key recreate.png

For the source select location db1 and for target select location check db2. Check the options Create Absent Tables, With Key, Recreate Mismatched Tables and click Refresh.

Enable Replication with HVR Initialize

Now that the channel definition is complete, create the runtime replication system.

Right–click on channel hvr_demo01 ▶ HVR Initialize. Choose Create or Replace Objects and click HVR Initialize.

SC-Hvr-InitializeFinished demo01.png

From the moment that HVR Initialize is done, all changes to database testdb1 will be captured by HVR when its capture job looks inside the logging.

HVR initialize also creates three replication jobs, which can be seen under the Scheduler node in the GUI.

Start Scheduling of Replication Jobs

Start the Scheduler on the hub machine by clicking in the HVR GUI on the Scheduler node of the hub database.
SC-Hvr-Gui-Scheduler oracle.png

Next, instruct the HVR Scheduler to trigger the replication jobs.

SC-Hvr-Gui demo01 Start 2locs.png

The replication jobs inside the Scheduler each execute a script under $HVR_CONFIG/job/hvrhub/hvr_demo01 that has the same name as the job. So job hvr_demo01–cap–db1 detects changes on database testdb1 and stores these as transactions files on the hub machine.

The other job (hvr_demo01–integ–db2) picka up these transaction files and perform inserts, updates and deletes on the target database.

Test Replication

To test replication, make a change in testdb1:

testdb1/hvr
SQL> insert into dm01_product values (1, 19.99, 'DVD');
SQL> commit;

In the HVR log file you can see the output of the jobs by clicking on View Log. This log file can be found in $HVR_CONFIG/log/hubdb/hvr_demo01–cap–db1.

SC-Hvr-Gui demo01 viewlog.png

The job output looks like this:

hvr_demo01–cap–db1: Scanned 1 transaction containing 1 row (1 ins) for 1 table.
hvr_demo01–cap–db1: Routed 215 bytes (compression=40.6%) from 'db1' into 1 location.
hvr_demo01–cap–db1: Capture cycle 1.
hvr_demo01–integ–db2: Integrate cycle 2 for 1 transaction file (215 bytes).
hvr_demo01–integ–db2: Integrated 1 change from 'dm01_product' (1 ins).
hvr_demo01–integ–db2: Integrate used 1 transaction and took 0.004 seconds.
hvr_demo01–integ–db2: Waiting...

This indicates that the jobs replicated the original change to testdb2. A query on testdb2 confirms this:

SQL> select * from dm01_product;
prod_id prod_price prod_descrip
1 19.99 DVD

HVR Compare and Refresh

HVR Compare checks whether two locations have identical rows, and HVR Refresh copies the content of one location to the second location. In the HVR GUI, right–click on a channel HVR Compare (or HVR Refresh). Choose two locations by clicking on the Select buttons.

SC-Hvr-Compare bulk generic 2locs.png

The outcome of the comparison is displayed below;
SC-Hvr-CompareResult demo01 identical.png