Tools
Debezium - an open source distributed platform for change data capture
Hudi - "Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing."
Upsolver - "Continuous SQL Pipelines for Cloud Data Lakes. No custom coding. No orchestration. No infrastructure maintenance."
DBT - "dbt helps data teams work like software engineers—to ship trusted data, faster. collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. Now anyone who knows SQL can build production-grade data pipelines."
Metorikku - A simplified, lightweight ETL Framework based on Apache Spark
BI tools that directly connect to a DB.
redash - Connect and query your data sources, build dashboards to visualize data and share them with your company.
Metabase - "is an easy-to-use, open source business intelligence tool that lets you analyze data from a variety of data destinations and sources. It also follows a simple and fast setup process. Its data visualization capabilities are exceptional and can be showcased in a user-friendly way, without using SQL. With Metabase, you can easily share live dashboards, automated reports, and questions with the rest of your team." - by fullstackgrowth.com
Superset - Apache Superset is a modern data exploration and visualization platform
Stitch - Stitch rapidly moves data from 130+ sources into a data warehouse so you can get to answers faster, no coding required.
SnowPlow - Generate complete, accurate and well-structured event data across all platforms and channels in a common format, with the Snowplow Behavioral Data Platform.
Workato - A SINGLE PLATFORM FOR INTEGRATION & WORKFLOW AUTOMATION ACROSS YOUR ORGANIZATION
AWS Deequ - Test data quality at scale
Last updated