To build an analytical pipeline, Amazon suggests using about 30 services. Experience shows that you can do five
If you do not have the task to build a spaceship and surprise Elon Musk, then you will be enough:
Amazon side: EC2 ECS S3 RDS
open source solutions: Python, PostgreSQL, Hive, Presto, Apache Superset
To deploy all open source solutions, we use Amazon's EC2 and ECS services.
ETL - python, SQL.
ML - python.
- All company data is uploaded to S3-based DataLake
- from DataLake we transform and load data into DWH (PostgreSQL) based on RDS AWS.
- We organize work with DataLake through Hive.
- We unite DataLake and DWH through Presto.
- BI - Apache Superset, PowerBI