Senior Data Engineer Data Lakehouse Infrastructure

19 hours agoSeniorSalary: 190K - 220KNorth America Remote Full Time Data Science Jobs by TRM Labs

Skills

Query Optimization Sparksql Dataflow Trino Starrocks Iceberg Hudi Gcs Dataproc Data Lakehouse Apache Airflow Bigquery Airflow Snowflake Gcp Streaming Spark Metadata Sql Python Elt Composer Etl Kafka Data Modeling

About the Role

You will design, implement, and scale a modern data lakehouse to support complex analytical and real-time workloads. You will own data modeling, ingestion, metadata management, and query performance optimization. You will build and orchestrate ETL and streaming pipelines, implement open table formats and governance, and create automation and observability for operational reliability.

Requirements

5+ years of experience in data or software engineering focused on distributed data systems
Proven experience building and scaling data platforms on GCP
Strong command of query engines such as Trino Presto Spark or Snowflake
Experience with table formats like Apache Hudi Iceberg or Delta Lake
Proficient programming skills in Python and strong SQL or SparkSQL abilities
Hands-on experience with Airflow and GCP-native orchestration and streaming services

Responsibilities

Architect and scale a high-performance data lakehouse on GCP
Design build and optimize distributed query engines such as Trino Spark or Snowflake
Implement metadata management using open table formats like Iceberg or Hudi
Develop and orchestrate ETL and ELT pipelines with Airflow Spark and GCP-native tools
Build streaming and batch data pipelines using Dataflow and Kafka
Optimize query performance and data modeling for analytical workloads
Automate operational tasks including cluster scaling and self-serve infrastructure
Implement observability and data discovery frameworks for governance

Benefits

Equity plan participation

Skills

About the Role

Requirements

Responsibilities

Benefits

Similar Jobs