Three Key Requirements for Continuous Integration in Cloud Environments

In my previous blog post, I discussed hybrid cloud computing. The phrase Hybrid Cloud Computing is used to describe the coexistence of multiple environments, at least one of which is a cloud environment. Hybrid cloud computing introduces new data integration requirements for data/database-dependent application.

In this blog post, I will discuss three key requirements for data integration in the cloud: Optimized Data Transfer, Security, and Manageability. I provide considerations for each of these capabilities that will hopefully help you better evaluate cloud data integration solutions.

1. Optimized Data Transfer

Optimizing data transfer is important not only to maximize performance, but also to limit the cost.

A cloud availability zone is essentially a data center managed by the cloud provider. In a hybrid cloud environment, data integration into or out of the availability zone, is data integration over a Wide Area Network (WAN) that may have a small charge per GB of data transferred. Optimizing data transfer is important not only to maximize performance but also to limit the cost. So, how can you optimize data transfer?

  • The first consideration is to only transfer changes between environments. The phrase commonly used to describe such approach for databases is Change Data Capture (CDC). Post an initial data synchronization (as needed) capture and transfer only incremental changes.

 

  • Data compression can be applied prior to sending data to further minimize data transfer, increase performance, and lower costs.

 

  • Data transfer across a WAN should be optimized to perform as little back-and-forth communication as possible to limit sensitivity for high latency on network communication. Large block transfer is a technique to achieve this, with on top of that, an approach to maximize available bandwidth despite relatively high latency.

2. Security

HVR_data_integration_azureWith data transferring between data centers, security has to be top of mind. Securing data has multiple aspects:

  • Exposure: how to limit exposure to security breaches. All data centers and corporate networks use firewalls. Review firewall settings and requirements to enable connectivity. Obviously, data connectivity must be established, but consider whether firewalls have to be opened in both directions, and also explore options to limit exposure. (e.g. through the use of a proxy (to avoid the need to expose access to a production system directly in the firewall)) Finally, lock down the firewall as much as possible, down to an individual server if possible.

 

  • Authentication: with access to a system exposed, prevent access by using strong authentication rules. Password authentication is an approach but also look for certificate authentication or even options for dual factor authentication.

 

  • Secure data transfer: utilize either VPN or SSL connectivity when passing data across the wire.

3. Manageability

Considering hybrid cloud is a given in the foreseeable future, how do you plan to manage cloud data integration? The following are some questions to consider:

  • Will you be configuring individual data flows or does a single console provide an overview of the data integration flows?
  • How resilient is the setup against relatively common interruptions? (e.g. network glitches or system restarts)
  • Can you set up automatic alerts when there are issues that require operator intervention when SLAs would otherwise not be met?
  • Can you easily review the current state of the data flows?
  • How do you gain insight into what happened with a data flow so you can prevent interruptions or anticipate and avoid problems? 

Hybrid cloud introduces additional data integration challenges relative to data integration within the scope of a data center. Make sure you prepare well as you start adopting the cloud.

My next post will cover, at a high level, HVR’s architecture. HVR recommends a distributed architecture and the use of agents to address many of the hybrid cloud data integration challenges. However, we will see that despite the use of agents by no means does the HVR software have to run on the production database server(s) (though there can be benefits in doing so).

If you are looking for more information about data integration in the cloud, following are some additional resources you may be interested in:

We also welcome you to contact us if you have questions.

About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

© 2018 HVR

Live Demo Contact Us