Tools

  1. Debezium - an open source distributed platform for change data capture

  2. Hudi - "Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing."

  3. Upsolver - "Continuous SQL Pipelines for Cloud Data Lakes. No custom coding. No orchestration. No infrastructure maintenance."

  4. DBT - "dbt helps data teams work like software engineers—to ship trusted data, faster. collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. Now anyone who knows SQL can build production-grade data pipelines."

  5. Metorikku - A simplified, lightweight ETL Framework based on Apache Spark

  6. Stitch - Stitch rapidly moves data from 130+ sources into a data warehouse so you can get to answers faster, no coding required.

  7. SnowPlow - Generate complete, accurate and well-structured event data across all platforms and channels in a common format, with the Snowplow Behavioral Data Platform.

  8. Workato - A SINGLE PLATFORM FOR INTEGRATION & WORKFLOW AUTOMATION ACROSS YOUR ORGANIZATION

  9. AWS Deequ - Test data quality at scale

Last updated