Home Creating a Real-Time Copy of Your Big Data Database
Creating a Real-Time Copy of Your Big Data Database
Big Data is the “shiny new toy” that everyone is playing around with at the moment. Companies have embraced these new technologies (e.g HDFS, S3, Hive etc)–something which was once only fundamental to the likes of Facebook, Twitter and Google, due to the high volumes of data they generate.
Big Data use cases, however, have evolved to meet the demands of organizations looking to analyze various data types in order to offer better services to their consumers. This means organizations are now adopting and going live with Big Data technologies. As a result of this adoption, an important question to consider is–what happens when these technologies become mission-critical for continuous operations? Like traditional platforms, the data contained within these technologies need to be protected from unplanned and planned outages. In fact, a common request we hear from our customers is, “Can you help make our Hadoop environment highly available?”
Backup Plan for Your Big Data
Justifications for this type of use case are to either load balance workload across two environments, or to protect the company’s operations if one environment was to come down. This may sound odd, as the whole purpose of Hadoop is to spread the data multiple times within the clustered nodes which make it highly available by default (as well efficient for Hadoop processing). But this does not protect the environment if all the clustered nodes are based within a single data center. As a data replication software provider, we do not position ourselves as a true disaster recovery solution for Hadoop, as we cannot achieve RTO (recovery time objective) and RPO (recovery point objective) of zero. Where we are seeing success in helping with our customer’s challenges is with the ability to take real-time data changes and replicate to multiple environments – capture once, use everywhere.
Point and Click to Your Backup Database
This is a natural extension to one of our existing use cases, whereby customers are using HVR to feed their data lake with real-time data, captured non-intrusively from a wide variety of sources. Because of our distributed architecture, configuration, deployment and manageability of 100+ sources is done through a centralized control. In this particular scenario, we can add another target to our replication flow with a simple click, without the need to take down the target or reconfigure the whole setup, thus providing the same data to two or more Hadoop environments.
Please contact us if you would like to learn more about how HVR can help with your data lake and big data initiatives.
Zulf is the Senior Solutions Architect at HVR. He has worked in the Data Integration space for over 20 years for companies such as Oracle and Pentaho.