Requirements for AWS

Last updated on Jul 23, 2020

AWS (Amazon Web Services) is Amazon's cloud platform providing the following services relevant for HVR:

  • EC2 Elastic Cloud Computing instances are Virtual Machines in the AWS cloud. These VMs can be either Linux or Windows-based. This is "Infrastructure as a Service" (IaaS). HVR can run on an EC2 Instance provided the OS is supported by HVR (Linux, Windows server). This scenario is identical to running HVR in a data center for an on-premises scenario.
  • Amazon Redshift is Amazon's highly scalable clustered data warehouse service. HVR supports Redshift as a target database, both for initial load/refresh and in Change Data Capture mode. For more information, see Requirements for Redshift.
  • Amazon RDS is Amazon's Relational Database Service. HVR supports MariaDB, MySQL, Aurora, Oracle, PostgreSQL, and Microsoft SQL Server running on Amazon RDS. Note that log-based capture is not supported for Microsoft SQL Server on Amazon RDS.
  • Amazon EMR (Elastic Map Reduce) is Amazon's implementation of Hadoop. It can be accessed by using HVR's generic Hadoop connector. For more information, see Requirements for HDFS.
  • Amazon S3 storage buckets are available as staging area to load data into Redshift, can be used as a file location target (optional with Hive external tables on top), or for staging for other databases (Hive Acid, Snowflake).

Architecture

There are different types of configuration topologies supported by HVR when working with AWS. The following ones are most commonly used:

  • A: Connecting to an AWS resource with the HVR hub installed on-premises. To avoid poor performance due to low bandwidth and/or high latency on the network, the HVR Agent should be installed in AWS. Any size instance will be sufficient for such use case, including the smallest type available (T2.Micro).
  • B.1: Hosting the HVR hub in AWS to pull data from an on-premises source into AWS. For this use case, the hub database can be a separate RDS database supported as a hub by HVR. The HVR Agent may be installed on an AWS EC2 instance and be configured to connect to the hub database. For this topology (B.1), using the HVR Agent on EC2 is optional. However, it may provide a better performance, as opposed to remotely connecting the HVR to RDS over the Internet. If the HVR Agent is used on EC2 to connect to RDS, then communication with the HVR hub over the HVR protocol is fast and is not affected by network latency that much.
  • B.2: Alternatively the hub database can be installed on the EC2 VM.
  • C: Performing cloud-based real-time integration. HVR can connect to only cloud-based resources, either from AWS or from other cloud providers.

AMI Agent Configuration Notes

HVR provides a special packaged edition HVR Image for AWS: a pre-configured Linux AMI (Amazon Machine Image) containing HVR's remote listener agent including Redshift and Oracle drivers. For more information, see Installing HVR on AWS using HVR Image
Alternatively, an Agent or Hub can be set up by doing a manual installation as described in Installing HVR on UNIX or Linux, with the following notes:

  • An instance t2.micro is sufficient to run HVR as an agent. HVR running as a hub requires at least instance type T2.medium for more memory.
  • Open the firewall to allow remote TCP/IP HVR connections to the HVR installation in AWS (e.g. topology A), by default on port 4343. Restrict the port access to the originator's public IP address. If the instance has to connect to an on-premises installation of HVR (topology B), then (a) add the HVR TCP/IP protocol to the on-premises firewall and DMZ port forwarding to be able to connect from AWS to on-premises, or (b) configure a VPN.
  • When HVR is running as a hub, it needs temporary storage to store replication data. Add this when creating the instance. 10 GB is normally sufficient.
  • Install the appropriate database drivers (e.g. Redshift, Oracle, SQL Server). For Redshift, follow the instructions in Requirements for Redshift. Download the Oracle Client installation on Oracle's website.
  • The hub database can be an RDS database service or a local installation of a supported database in the VM. Install the appropriate database drivers in the HVR hub instance to connect to the database.
  • An HVR HUB machine running in an AWS Linux instance can be remotely managed by a Windows PC registering the remote hub.
  • File replication is supported in AWS.
  • By default, network traffic is not encrypted. For production purposes we strongly advise to use SSL encryption to securely transport data over public infrastructure to the cloud. For more information, see Encrypted Network Connection.