Search...

Senior Data Engineer Data Lakehouse Infrastructure

Skills

About the Role

You will design, implement, and scale a modern data lakehouse to support complex analytical and real-time workloads. You will own data modeling, ingestion, metadata management, and query performance optimization. You will build and orchestrate ETL and streaming pipelines, implement open table formats and governance, and create automation and observability for operational reliability.

Requirements

  • 5+ years of experience in data or software engineering focused on distributed data systems
  • Proven experience building and scaling data platforms on GCP
  • Strong command of query engines such as Trino Presto Spark or Snowflake
  • Experience with table formats like Apache Hudi Iceberg or Delta Lake
  • Proficient programming skills in Python and strong SQL or SparkSQL abilities
  • Hands-on experience with Airflow and GCP-native orchestration and streaming services

Responsibilities

  • Architect and scale a high-performance data lakehouse on GCP
  • Design build and optimize distributed query engines such as Trino Spark or Snowflake
  • Implement metadata management using open table formats like Iceberg or Hudi
  • Develop and orchestrate ETL and ELT pipelines with Airflow Spark and GCP-native tools
  • Build streaming and batch data pipelines using Dataflow and Kafka
  • Optimize query performance and data modeling for analytical workloads
  • Automate operational tasks including cluster scaling and self-serve infrastructure
  • Implement observability and data discovery frameworks for governance

Benefits

  • Equity plan participation