Five Common Cloud Integration Challenges

...and How to Overcome Them

Companies are adopting the cloud at an unprecedented rate. While the benefits of cloud computing are undeniable and well-publicized, data integration presents a unique set of challenges, particularly in hybrid cloud environments where applications and data reside both on-premises and in the cloud.  In this article, we discuss five common cloud integration challenges and the best practices for addressing them.

Data Security

According to a report by data protection company, Bitglass 90% of IT and IT security practitioners are moderately to very concerned about data security in the cloud.  In the spectrum of cloud integration problems, failure to ensure adequate data security is potentially the most damaging and costly to an organization. When integrating data between systems within the cloud or, particularly in hybrid cloud environments where data is transmitted between cloud-based and on-premises systems, it is critical to address potential vulnerabilities that fall outside of the scope of the cloud or on-premises security provisions.

Best Practice: Encrypt Data in Transit

Among cloud integration best practices this may seem obvious, but it cannot be overstated that all data in transit must be encrypted. SSL encryption should be used for data being transmitted between locations. If using remote database connections, use the native encryption provided by most database connection drivers. If your data integration solution includes the deployment of local agents acting as proxy to communicate with the database, ensure that communications between agents are encrypted.

Best Practice: Limit Exposure through the Firewall

In hybrid cloud environments, when moving data between the cloud and on-premises systems, opening the firewall is a necessity that most network administrators anxiously resist. Carefully evaluate the requirements for data integration flows that cross the firewall to find the solution that minimizes exposure of enterprise systems and data.

Consider whether the firewall must be opened in both directions. Look for data integration solutions that allow you to open only a single machine and port pair in a single direction.

Where possible lock down the firewall to a single server, ideally in a DMZ or perimeter network. Deploy an agent on the server to act as proxy, handling communications with on-premises systems to limit exposure.

Optimize Network Efficiency

Data integration in a hybrid cloud environment takes place over a Wide Area Network (WAN). These networks generally provide adequate data transmission bandwidth, but they can become overburdened due to network latency. As such, it is critical to optimize your cloud data integration architecture to make efficient use of network resources.

Many dataIcons-real-time-analytics integration architectures rely primarily on remote database connections to transmit data. Remote database connections send frequent small and often uncompressed data packets back and forth across the network. This chattiness is not an issue on a high-speed, low-latency LAN but can overburden a slower WAN connection.

Best Practice: Use Agents Instead of Remote Database Connections

Consider instead a data integration solution that uses agents deployed at the source or destination database to manage communications and minimize the volume and number of data transmissions. Agents offer the following benefits:

  • Perform data filtering and change data capture at or close to the source to reduce the transfer of redundant data.
  • Compress data before sending to minimize the use of network bandwidth.
  • Use large block data transfer to minimize the number of back and forth transmissions across the network.

Impact on Operational Systems

Depending on the method used, the process of monitoring and capturing data changes on source systems can increase the load on the source system causing slowdowns in operational systems.

In trigger-based change data capture, each change is recorded in a separate shadow table on the source database. At set intervals the data capture program queries the shadow tables to retrieve the changes. Both the shadow table updates and the data capture queries place an additional processing load on the source system.

SQL based change data capture solutions directly query the source tables at intervals to capture data changes. These queries add to the processing load on the source system.

Best Practice: Use Log-Based Change Data Capture

Consider instead a log-based change data capture solution that asynchronously reads the database transaction logs and propagates the changes to the destination system. By reading asynchronously from the database transaction log and requiring no additional table updates or query processing, log-based change data capture places no additional load on the source system.

Best Practice: Reliable Data Capture

Accurately and reliably capturing data changes and propagating them from source to target is the core function of any data integration solution.

Trigger-based data capture solutions are reliable while the triggers remain active. However, it is at times necessary to disable triggers, for example, when performing a direct path load to the database, or when the DBA needs to temporarily disable them for administrative tasks. Any data changes that occur while the triggers are disabled will not be captured and passed to the destination system. Triggers also do not capture changes to database table definitions, only changes to the data.

 

Interested in learning more about how HVR can help you overcome cloud data integration challenges?

Join Us for a Live Demo

About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

© 2019 HVR

Try now Contact us