How to cut DWH and DataLake costs for Amazon Web Services?

AWS currently provides 175 services. By choosing a crushing and specialization approach, Amazon has closed two tasks: maximum efficiency for solving customer's problems at a low price of one service; high total cost of ownership of infrastructure due to the use of a large number of services. It is impossible to accurately calculate what the project will result in when dozens of different services are involved. Based on 10 years of experience working with DWH and BI projects, we have compiled an architecture that, in our opinion, is the most efficient and at the same time cheap. 

What functions should the system perform: 

  • collection of data from various sources 
  • data cleansing 
  • data enrichment
  • upload to DataLake 
  • loading and building DWH 
  • data Mapping 
  • machine learning and predictive analytics 
  • BI  

All infrastructure must work in AWS

To build an analytical pipeline, Amazon suggests using about 30 services. Experience shows that you can do five If you do not have the task to build a spaceship and surprise Elon Musk, then you will be enough: 

Amazon side: EC2 ECS S3 RDS 

open source solutions: Python, PostgreSQL, Hive, Presto, Apache Superset 

To deploy all open source solutions, we use Amazon's EC2 and ECS services. 

ETL - python, SQL. 

ML - python.

  1. All company data is uploaded to S3-based DataLake
  2. from DataLake we transform and load data into DWH (PostgreSQL) based on RDS AWS. 
  3. We organize work with DataLake through Hive. 
  4. We unite DataLake and DWH through Presto. 
  5. BI - Apache Superset, PowerBI 

The Application

PrestoSQL table aggregator driven by SQL language, with support for multiple connectors. With Presto, you can combine different data sources from classic databases to modern hdfs repositories. The internal device automatically performs optimization operations on requests in order to reduce the load and processing time. Presto also has the ability to connect from Python applications, thereby replacing the need to connect to postgreSQL databases directly.  

Hive Metastore technology for creating databases that are located on the file system. In particular, it allows you to build DWH and Data Lake based on S3, which in turn provides unlimited disk space for data storage and quick access to them.

Quality results with us

To ensure security, all communication with the outside world can go through IAM and KMS AWS. Thus, the most expensive operations in terms of costs will be conducted on open source solutions, and Amazon services will be responsible for speed and security. This architecture allows you to close most of the tasks of the average customer. 

 So, using no more than 5 Amazon services and proven open source solutions, you can reduce the cost of analytical pipeline at times, without losing performance and security. 

The benefits

According to average estimates, the cost of owning and using AWS infrastructure for an average company should not exceed $ 1000-3000 per month. Maintenance and modernization of 2000-3500 dollars a month The transfer of the entire infrastructure takes from 2 to 6 months. 

We are able to help you pay less for your DWH at Amazon Web Services. Find out more now! 

 Please feel free to contact us via e-mail: sales@3alica.com or fill the contact form below.   

Andrei Shimanskij

Head of DWH&Data Science dep
LinkedIN
CRM-форма появится здесь

№1

Qlik Select Partner in Belarus

10+

Years in Business

100+

Full-time developers

450+

Projects Completed

Client Testimonials

A2 is trusted by more than 2000+ happy users from all around the world.