Mark Van de Wiel

Six Steps Toward Successful Cloud Data Migrations

…And minimize downtime

This post provides a set of best practices to facilitate cloud migrations with minimal downtime and minimal risk, irrespective of the type of cloud service you plan to adopt, focusing on the application data that has to be migrated. Generally, databases store application data and change continuously. Application servers, interfacing with the database, can be migrated more easily using a lift and shift approach because they don’t store the data.

Organizations large and small are adopting the cloud. With security in the cloud has come a long way, and as data center servers are up for renewal, organizations choose the cloud to host their applications, databases, and file system. Recently, large organizations including AT&T[1], GE and CapitalOne[2] have publicly come out with their intentions to shift very significant percentages of their workloads to the cloud.

Cloud services are available in the form of:

  • Software as a Service (SaaS) in which the application is hosted and managed by the cloud vendor. Salesforce is a perfect example of a SaaS offering.
  • Platform as a Service (PaaS), offering a platform such as a database as a service. Examples include Amazon’s Relational Database Service (RDS) and Microsoft AzureSQL databases. For a PaaS offering the provider manages the platform for its customers e.g. providing backup services, ensuring sufficient storage is allocated, etc.
  • Infrastructure as a Service (IaaS): hardware/virtual machines in the cloud available to run any compatible software. Amazon’s Elastic Cloud Computing (EC2) servers and Microsoft Azure Virtual Machines are a perfect example of IaaS, and so is Amazon’s S3 (Simple Storage System).

Planning Your Cloud Migration

Planning Cloud Data Migration The amount of effort to be put into a cloud migration depends on the complexity of your IT infrastructure. To achieve minimal downtime, take a phased approach to cloud migration, in which system by system or application by the application will be migrated, rather than an entire data center at the time. Organizations with hundreds if not thousands of applications, databases, and systems need to prepare for a long-term cloud migration effort.

In preparation for cloud migration with minimal downtime there are multiple considerations to take into account:

1. Network utilization, and related to that, performance.

In an environment with data transferring across a Wide Area Network(WAN), it is important to make optimum use of network resources, especially given some cloud providers will charge for data transfers into the cloud.

2. Security

To achieve minimal downtime some if not all of the data will have to be transferred across the wire into the cloud. Do this securely e.g. using SSL to avoid data breaches.

3. Firewall Requirements

Related to security, what are the requirements to open the firewall? Will the cloud have to be able to reach into the corporate data infrastructure, and if so, how can you limit exposure?

4. Scale 

Especially as a large organization, with possibly hundreds of databases, you want to be able to perform multiple migrations at any one time, make sure performance is adequate, and the setup is manageable.

Upgrade, or maybe even migrate technologies. As part of the migration to the cloud, you may consider changing the technology you use for your data platform. Maybe you plan to change the technology version, or perhaps you consider changing the underlying technology altogether (e.g. shift from Oracle to PostgreSQL).

Migrating to the cloud involves the following steps:

Throughout the migration to the cloud it is important you perform tests to ensure your migration will be successful.

  1. Instantiate the target database

    • Create the target database structures, including user accounts, tables, indexes, constraints, stored procedures and any other database objects.
    • Perform the initial database load. Consider data volume, and means of transferring the data, when planning this step. When using network communication consider available network bandwidth, and whether data can be compressed.
  2. Synchronize the database.

    • To achieve minimum downtime, and assuming the instantiation will take a significant amount of time, there is a need to synchronize the database with changes since the initial load took place.
  3. Switch on-premises use of the application(s) to the cloud.

    • The switch will involve downtime but if well-planned the downtime associated with the switchover will be minimal. At this point consider a continuation of synchronizing the databases, with data now flowing out of the cloud back into the on-premises database.

For example: post the initial database load or as systems are synchronized, validate the data and attempt to mimic a load that matches the production load when the application migration is complete.

HVR for Cloud Migrations

The HVR solution, built for continuous, real-time data integration between database technologies in a heterogeneous environment, is ideally suited to manage cloud database migrations with minOn Prem to Cloud date Migrationimal downtime. HVR supports a multitude of commonly-used relational database technologies, both as a source log-based Change Data Capture (CDC), and continuous integration. The HVR architecture prescribes a distributed setup in which one installation acts as the control center – the hub – and other installations are referred to as agents. This architecture provides a number of benefits:

1. Optimized network communication. When HVR executables communicate over TCP/IP data transfer will always be compressed, and using large block transfer. The proprietary compression algorithm achieves about 3x better compression than gzip on typical database data.

2. Firewall requirements. HVR’s central installation – the hub – always initiates the communication to talk to the agents. With HVR’s flexibility you can choose where to install the hub – on-premises or in the cloud – and only require the firewall into the other environment to be opened. HVR can also be configured to use its own proxy for an additional layer of access control, and to prevent the requirement to open the firewall to many different network addresses. Communication would always flow through the proxy.

3. SSL encryption is optional for HVR’s communication and highly recommended for cloud scenarios.

4. HVR’s architecture scales extremely well with distributed agents doing the hard work of performing CDC, and target databases integrating the changes. With a central point of control, the operator can easily keep track of many data flows, and automatic monitoring can help ensure notifications go out in case of errors.

5.Thanks to HVR’s rich support for database platforms and versions a technology migration as part of the move to the cloud is trivial.

These benefits apply when HVR is used to perform the initial data load, CDC with continuous integration, and compare/repair. The use of the initial load is integrated with CDC to align initial data load with continuous replication to avoid data loss. With its heritage in database replication, the HVR technology is also extremely resilient to any environmental issues during the migration like temporary loss of network and database or server restarts.

Six Steps for Cloud Data Migration

Let’s walk through the scenario to migrate an on-premises 11.2 Oracle Database to an Amazon cloud RDS 12.2 Oracle database.

Step 1. Prepare the environment for real-time replication, and start CDC

Install HVR on the source database server[3] and install HVR in the availability zone of the target RDS database (HVR is featured in the AWS and Azure marketplaces to simplify this step). Consider what installation of HVR you want to use for the hub, or install HVR on a separate server to act as the hub. An on-premises hub only requires the firewall into the cloud to be opened which is generally easier to accept by the network administrators.

With the installations in place, hook up the HVR GUI – possibly using a separate installation on your desktop – to the hub, and define the channel definition. The HVR website resource section has a variety of videos to show how this is done.

Once the channel definition is completed, initialize the channel to be ready for real-time replication, and start capture. With capture running the HVR hub starts accumulating compressed transaction files.

Step 2. Create target database objects

The first step is to ensure all target database objects exists. HVR can help create table definitions – with compatible data types if a technology change is part of the migration – and create primary keys, but often that is insufficient to support the application. Use alternative mechanisms to include secondary indexes, database stored procedures, triggers etc. For Oracle to Oracle a metadata only schema export is ideally suited for this step. Heterogeneous scenarios require alternative technologies such as Amazon’s Schema Conversion Tool.

Step 3. Perform the initial data load

Depending on the database size it is unlikely that all data will be loaded in a single transaction. If the database schema includes referential integrity constraints, and data will not be loaded in a single transaction, and possibly in parallel across tables, then these constraints must be disabled.

HVR Refresh automatically takes care of the required steps as part of the initial load. Use the On-line refresh option to skip previous integration, and wait for the refresh to complete. If your data volume is modest and integrity constraints are deferrable then consider using HVR’s ability to refresh as of a moment with integrity constraints only set to deferred during the data load (but not disabled). If constraints were disabled during the initial load – for example because tables were loaded in parallel at different points in time, then HVR will automatically re-enable them when data integration reaches a point of consistency.

Depending on source and target database and version there are other options to perform the initial load, and HVR can work with these as well. Please do note that configuration changes in the setup may have to be implemented if initial load and continuous integration are not both performed by HVR.

Step 4. Start integration

This step is as simple as starting the integration job that was created as part of the initialization in step 1. Depending on the duration of the initial load it will take time for the integration to catch up.

Step 5. Compare the data

For peace of mind you should at least sample the data to feel comfortable that the data on the target is identical to the source database data. This is even more important in a heterogeneous setup when source and target data types may not match. HVR compare provides the ability to compare the data, with Restrict actions available to define filters. Consider using Restrict to at least compare a manageable subset of the data.

At this point, also as part of data validation, you may run some reports on the data to get a feel for system performance.

Step 6. Reverse roles, and switch to the cloud

The switch between using the on-premises database and the one in the cloud will involve downtime. With source and target databases already in sync, the downtime can literally be brought down to seconds.

To reduce risk for the migration consider switching the roles of source and target. To do this, prepare the cloud system as a source with on-premises as the target – in HVR this is as simple as assigning the location (connection) to the correct location group – review the definitions in case changes have to be made, and initialize the channel to create the new jobs to reverse the data flow post-migration, and start the jobs.

At this point you are ready to switch applications. Make sure that prior to switching all source transactions have been applied to the target database.

Post the switch, stop capture on the on-premises system, and integration on the cloud system. With the migration complete you can now decide how long you want the fallback option to be in place. There may be a point in time that license renewal comes up, or the system gets retired, forcing you to eliminate the fallback option. Otherwise, as soon as you feel comfortable with the new system, stop the replication and retire the on-premises database.

Following are a couple videos demonstrating how you can migrate from an on-premise database to Azure and AWS.

Migration to Azure

Best Practices for Data Integration to AWS


On to the next migration.



[3] HVR provides multiple options when performing log-based CDC from Oracle, including capture from a physical standby database, and archive log only mode. To achieve optimum efficiency the HVR agent should be installed on the server(s) that perform log-based CDC, whether that is the primary, the standby or an archive log only machine.





About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

Test drive
Contact us