Lakes & Warehouses

Definitions

  1. DataLakeHouse - Firebolt comparison with Snowflake vs Databricks.

    1. Delta lake is a data lake that can store raw unstructured, semi-structured, and structured data. When combined with Delta Engine it becomes a data lakehouse.

  2. What is SnowFlake, 2 - Snowflake decouples the storage and compute functions, which means organizations that have high storage demands but less need for CPU cycles, or vice versa, don’t have to pay for an integrated bundle that requires them to pay for both. Users can scale up or down as needed and pay for only the resources they use.

  3. data mart

    1. talend on data marts + 3 types (dependent, independent, hybrid)

    2. (good) netsuite on data marts - the three types ^ + structures (star, snowflake, denormalized) + comparisons

  4. Data Lake

    1. monitoring health status at scale using great expectations and spark

Comparisons

  1. Snowflake vs Delta Lake vs Fire Bolt - "Databricks Delta Lake and Delta Engine is a lakehouse. You choose it as a data lake, and for data lakehouse-based workloads including ELT for data warehouses, data science and machine learning, even static reporting and dashboards if you don’t mind the performance difference and don’t have a data warehouse.

    Most companies still choose a data warehouse like Snowflake, BigQuery, Redshift or Firebolt for general-purpose analytics over a data lakehouse like Delta Lake and Delta Engine because they need performance.

    But it doesn’t matter. You need more than one engine. Don’t fight it. You will end up with multiple engines for very good reasons. It’s just a matter of when. "

  2. Snowflake Intro and demo

Use Cases

  1. Hunters on their architecture, airflow, snowflake, snowpipe, flink, rockdb, cluster optimization during ingestion, monitoring metrics, cost.

Snowflake

  1. getting started with SF tasks - sql or procedures, schedules, B-tree tasks.

Chroma

  1. AI native open source embedding database, github

ClickHouse

Feature engineering

Data lake Table Formats

Apache Iceberg

Databricks Delta Lake

Last updated