We have been talking about the cloud and what it means and how the cloud has always been around in different forms and the term cloud also can mean different things to different people. Then we moved on to discussing what the different parts of the cloud mean, the different acronyms and that there are different cloud offerings. In this blog, we will begin our discussion of a few of the various cloud vendors, in this post we focus on AWS. We will discuss a few of their different cloud offerings and what it means to the companies consuming these products.
Amazon, depending on your reference it refers to the world’s longest river or the largest online retailer. Some, that are reading this (including the author) are old enough to remember that Amazon originally was just selling books on the internet. Amazon has always been reinventing and reshaping itself. In 2006, Amazon came out with a new division called Amazon Web Services (AWS). Many consider AWS to be the first of the modern cloud vendors. AWS allowed people to use the many different products that they have by ‘renting’ the product by the month, day or even hour. DBAs could spin up databases to run a quick test and only pay for the time that they used it. The modern cloud had arrived.
AWS started with just a few simple offerings. These days there are over 150 different products with many different variations available. For the purposes of this blog we will focus on the AWS products that HVR supports, many consider those four products the AWS staples:
- AWS has a storage file offering called S3. S3 is short for Simple Storage Service. There are a few other storage offerings but S3 was the first and probably most famous.
- EC2 is the next product. EC2 stands for Elastic Compute Cloud. Think of EC2 as virtual servers.
- The third product to cover is RDS. RDS stands for Relational Database Service. AWS manages the database and you the customer focus on an application that runs on the database.
- The last product family to cover is that of the many target ‘databases’ that AWS supports. Whether it is the AWS developed database Redshift, or the AWS developed NoSQL database AWS Neptune or it could be a streaming system like Kafka or AWS Kinesis.
Let’s take a closer look at each of these four options.
S3 was one of the very first products that AWS offered. S3’s name is just what it is. It is really a simple storage service. You can really store any digital media there. Think of those files on your laptop in a folder. Now think of that folder (called buckets in S3) in the cloud on S3. From a collection of photographs to old Oracle archive logs, consumers can use it as a place to store items that they just don’t have space for at their data centers. In reality, individual consumers won’t store their photos here as they will more than likely use the service that Amazon Prime offers. Many websites around the world are now built and supported with millions of digital objects hosted on AWS S3 sites around the world. Developers will upload the objects to S3 and then point their websites to those objects. When a person looks at a website they may not even know that they are connected to S3 on the backend. But S3 is a whole lot more than just an object store. S3 has a whole host of APIs (Application Programming Interface) allowing customers to connect their data store with a myriad of applications.
Many websites around the world are now built and supported with millions of digital objects hosted on AWS S3 sites around the world.
One of the great features is that as a customer of S3 you don’t have to prepurchase space you will only be charged for what you use. This is great for businesses that are just scaling up. As your need for space increases over time you just put more data into S3 being charged for what you use. This pay as you go model revolutionized how customers consumed cloud storage. If we look back to this blog S3 is a good example of IaaS (Infrastructure as a Service). Three reasons why S3 may be for you:
- Affordable: Only pay for what you use
- Scalable: Increases in size without any configuration by users
- Available: Amazon promises an SLA of 99% uptime
Another example of AWS’ IaaS is EC2. EC2 was another simple offering that was one of the original offerings by AWS. EC2 instances are often called ‘compute’ instances. An instance is just a virtual server that AWS hosts. AWS allows customers to size these instances to fit their configurations. Customers can determine storage size, memory size and number of CPUs along with a host of other features that are customizable. This was a huge advancement at the time. This allowed the everyday DBA or developer to pick and choose what options they need when making servers. They did not depend on the system administrators at their place of employment. AWS EC2 allowed a great deal of freedom. By having these virtual instances in the cloud it allowed customers to avoid purchasing hardware up front. It also allowed customers to start small and then increase the size and power of these virtual servers as needed. Imagine a company where they wanted to run 30 tests on a new development feature. It might take 30 days if each test were run in serial fashion with their on-premise hardware. By using instances in an EC2 instance, a QA department could run all 30 tests at the same time in 30 different environments overnight. And when the tests were done they could shut down those instances thereby only paying for the time they used. This was a game changer.
Many customers like to place databases in EC2. Customers can choose between Linux and windows environments and have control over everything that goes on them but the luxury of not having to administer those operating system environments as that is handled by AWS. The customers could still configure and administer the databases to their liking as well as the applications running on top of those databases.
A great feature of AWS EC2 is that of the AMI. AMI stands for Amazon Machine Image. There are lots of great things you can do with an AMI. A customer can set up their database and application just the way they want it. Then they can make a copy of the image at that stage.
They can start QA tests on that AMI. When the tests are over shut it down. When you are ready to do another test you start from the same baseline. Or maybe you want your customers to have preconfigured databases and applications? These images would be all set up and ready to populate with a customer’s data. They would take the image and make it their own. Think of AMIs as a great way to make cookie cutter images that can be shaped and molded beyond that form. You don’t have to start from scratch each time.
EC2 was widely successful for AWS. AWS then took the whole idea of hosting images up one step, moving from IaaS to PaaS (Platform as a Service). DBAs saw that having their databases on EC2 freed them up from having to maintain the operating system work, and AWS extended that to now include much of the DBA drudgery work. RDS takes many of the time-consuming DBA tasks such as patching and backups and allows DBAs to focus on their application and tuning. AWS offers many different RDBMS (Relational Database Management Systems) databases
The Oracle and SQL Server databases have long been popular commercially available on-premise databases. Many customers also took advantage of the EC2 model and had their databases hosted in the AWS cloud. With RDS, customers were to be able to move one step further. Oracle RDS has two license models, bring your own license (BYOL) as well as a license included model. This allows great flexibility for the customers who have already bought and paid for Oracle licenses. They can now move their databases to the cloud without paying for licenses again. For new projects, customers can purchase the Oracle RDS model that includes the Oracle license. This allows customers to pay as part of their RDS service (priced hourly but usually paid monthly) and don’t have to worry about spending a large cash payment all at once. The SQL Server RDS model only has the license included model. If you would have told customers 10 years ago that they could have hourly pricing for Oracle they would have laughed and called you crazy.
There are also three different versions of RDS for Open Source databases are also available; MySQL, PostgreSQL and MariaDB. Over the past decade, open source databases have become more and more popular. With these databases all available in the RDS offering it makes it easier than ever to deploy these databases in the cloud. As these databases are all open source, there are no licenses to worry about and pricing is by the hour.
AWS also offers another RDS solution, Amazon Aurora. Amazon Aurora was developed by AWS and is MySQL AND PostgreSQL compatible. AWS claims that Aurora is faster than both MySQL and PostgreSQL. This relational database was developed with the cloud in mind from inception. So if looking for an open source compatible cloud database this might be a good choice for you to explore.
In 2013, AWS released Redshift. AWS Redshift is an analytical database built for data warehousing (as opposed to the OLTP (Online Transaction Processing)). Redshift was born in the cloud and there is no on-premise version of Redshift. There are many features that are in Redshift natively because it is cloud-based. You can scale up or scale down the sizes you need based on your requirements. And like most cloud products this allows you to only pay for your usage. Redshift also has a nice feature that will allow you to query objects stored in S3, which enables you to have a data lake structure in S3.
Kafka has seen more and more use in the most recent years. There are a few different ways in which you can run Kafka on AWS. You could, of course, have your Kafka solution on a Linux EC2 instance. AWS has a managed service called MSK (Managed Service Kafka). This service is similar to the other managed service offerings by Amazon, your data but Amazon handles the DBA type duties allowing you to focus on your data. This Kafka flavor is Apache Kafka so most Kafka users will not have to make any major changes. AWS also offers something called AWS Kinesis. While Kinesis is not Kafka it has many Kafka like features and it is used in a similar fashion for ingesting data durably, reliably, and with scalability in mind. Kafka and Kinesis share common concepts, including replication, sharding/partitioning, and components (consumer and producers).
AWS currently has over 20 different geographical regions where they have data centers and have announced a further 4 more that will come online over the next few years. By having some many data centers spread across the globe will allow customers to have a global footprint. This will allow customers to have a data center close and the ability to put a disaster recovery site in a different geographical location. One concern that many cloud customers bring up is what if the data center goes down? That is why AWS introduced the Availability Zone. This allows customers to achieve high availability (HA). If one data center has an issue, the applications running could be brought up in another Availability Zone. Availability Zones are in data centers in the same region for fault tolerance and low latency. Availability Zones are connected to each other by a private AWS network that allows automatic failover in the case of a disaster.
As seen, AWS has many many different offerings. You could say that if your IT department needed something they could certainly find it at AWS, just like a household could find any item at Amazon.com. From object storage in S3, to simple hosted operating systems like EC2 Linux and Windows Amazon has basic data center offerings. But AWS has more with hosted and managed commercial and open source databases with their RDS offerings as well as their own homegrown databases like AWS Aurora and AWS Redshift. AWS has become a huge force in cloud offerings in recent years and will remain a dominant player as they keep innovating to stay at the top of the cloud race.
In my next post, I will discuss Azure.