Effortlessly Consolidate Data into your Data Lake
More organizations are adopting data lakes as part of their architecture for their low cost and efficiency in storing large volumes of data. The idea is simple: Instead of storing data in a purpose-built data store, you move it into a data lake in its original format. This eliminates the upfront costs of data ingestion and transformation. Once data is in the lake, it’s available to everyone in the organization for analysis.
But while a data lake is an efficient solution, it does not come without challenges:
- Initial on-boarding of data from multiple sources
- Continual updates in real-time
- Moving high volumes of data to the data lake while mitigating chatter and latency
We sat down with 451 Research Director, Matt Aslett to discuss best practices for building a data lake.
Get the right features for consolidating and moving data into your Enterprise Data Lake
We enable you to easily move your data into your lake and update your data lake in real-time. HVR’s solution includes:
- Initial data load from multiple sources to the data lake
- Log-based change data capture for real-time updates
- Compare and repair feature to ensure data accuracy
Benefits of using HVR to feed your data lake include:
- Efficiently load and update data
- Move high volumes of data for real-time analysis
- Accelerate data movement with minimal impact on systems
- Scale the solution for multiple projects and systems
We invite you to a free trial of HVR to experience how HVR can help you optimize your Data Lake initiative.
HVR 5.2: Optimized for Data Lake Adoption
HVR 5.2 provides even faster, more secure, and accurate data movement in a modern environment. Latest enhancements of HVR.52
include features that help our customers optimize their adoption of data lake technologies. Enhancements include:
Native Hive Support
The data lake is typically the initial landing zone for the data. Data scientists commonly use access to the data lake to identify data sets of interest in order to slice-and-dice the data in a different, more suitable, high-performance analytical engine. Native Hive Support facilitates data retrieval on file systems S3 and HDFS. Tables are created automatically with the correct data types and changes to the table definitions on the source are automatically propagated. Users can now use SQL access to perform data discovery with “off-the-shelf” BI tools.
Taking large volume scenarios to the next level. The data lake release introduces a number of performance enhancements to deliver data into files. With S3’s unique file handling specific enhancements let to up to 5x faster data loads into S3.
Amazon Key Management Service Support
Most data stores provide data security at-rest. However in-flight data exposure is a common concern with cloud computing. With HVR 5.2’s support for client-side encryption, integrated with Amazon’s Key Management Service (KMS), data is truly encrypted end-to-end.
Traditional relational databases follow commonly accepted transactional rules to enforce data consistency. With most data lakes built on file systems and changes commonly stored per table, a lot of the consistency between data objects is lost, and many use cases cannot handle this challenge well. MetadataManifests enable data publication at a transactionally consistent representation of the source system, even when multiple integrators deliver data in parallel.
Big Data Compare
Wide data lake adoption requires users’ trust in the data. HVR 5.2 enables programmatic data validation against the data in your data lake, even on HDFS and S3.
Download the HVR 5.2 Data Sheet.
Still Researching Data Lake Solutions?
Watch: Five Key Things to Know About Data Lakes
This year we welcomed Philip Russom, Ph.D of TDWI to share his Data Lake research in a webinar. In this hour long recording, get the latest on Data Lake myths, best practices and tips to getting started. Also featured is our CTO, Mark Van de Wiel, who talks about how HVR has helped a global manufacturing company consolidate data across business units into a data lake for one version of the truth as they aim to be a digital organization.
Just Getting Started? Get The Best Practice Guide for Implementing a Data Lake
Most database administrators and data architects want to learn why and how to implement a data lake. A recent survey by Hortonworks found that roughly 50 percent of respondents were actively learning about how to capitalize on the benefits of a data lake while an additional 20 percent of respondents were already involved in data lake initiatives.
WHY ALL THE INTEREST? BIG DATA. Data lakes provide a single repository for storing massive amounts of all types of data—unstructured, semi-structured, and structured—in its native format. They grant access and insight into all this data without time-consuming preparation.
We created this guide for those who are looking to start a new data lake initiative. This guide describes how data lakes developed, benefits for your organization, the technology necessary to build a data lake and best practices for getting started.
Commonly Asked Question: Can the Data Lake Replace My Data Warehouse?
From what we have observed in the marketplace and talking to customers about their logical reference architecture, there is still a need for data warehousing. For all of Hadoop’s hype, it is still in its infancy to generate the kind of performance for doing complex queries and mixed workloads, lacking the kind of features that made data warehousing a must. e.g. optimal indexing strategies, efficiently performing complex table joins with a range of terabytes of data, and an optimizer for determining the best path for queries.
|Data Warehouse||vs.||Data Lake|
|Rigid, structured and needs to be processed||Data||Structured, semi-structured and unstructured in its raw format|
|Schema on write||Data Analysis Strategy||Schema on read|
|Enterprise, strategic reporting||Reporting||Discovery, operational reporting|
|Expensive as data volumes grow||Storage||Used commodity hardware which is typically cheaper|
But when you combine all these technologies together you eliminate all the disadvantages and reap all the benefits. Granted that not all companies will require all these technologies in a single moment, what technology you deploy will be based on your end-user requirements and data you are pulling in. But what is fundamental in these architectures is the combination of a data lake and data warehouse working in a unified manner. Read more about data lakes vs data warehouses in our blog post.