AWS Data Engineer

Apply Now

Remote

Posted on: October 4, 2025

What Is the role?

We need a data engineer who can build and operate production data pipelines on AWS. You’ll work with S3, Glue, and Athena daily — ingesting data from various sources, transforming it into usable formats, and making it queryable for analytics and AI teams. This is a hands-on role where you own the data layer end-to-end.

Key Responsibilities

Pipeline Development:

  • Design and build ELT pipelines using AWS Glue (ETL jobs, crawlers, Data Catalog) and S3
  • Ingest data from relational databases, APIs, event streams, and flat files
  • Implement schema evolution, partitioning strategies, and file format optimization (Parquet, ORC, Iceberg)
  • Build orchestrated workflows using Glue Workflows, Step Functions, or Airflow

Data Lake & Storage:

  • Design and maintain S3-based data lake architecture with clear layer separation (raw, cleaned, curated)
  • Optimize S3 layout for query performance and cost — partitioning, compaction, and lifecycle policies
  • Implement data cataloging and metadata management with Glue Data Catalog

Query & Analytics:

  • Optimize Athena queries for performance and cost
  • Build views and tables that analytics and BI teams can self-serve from
  • Support data modeling for analytics use cases (star schema, dimensional modeling)

Quality & Operations:

  • Implement data quality checks and validation at each pipeline stage
  • Set up monitoring and alerting for pipeline failures and data anomalies (CloudWatch)
  • Enforce data access controls, IAM policies, encryption, and governance
  • Document data flows, schemas, and pipeline dependencies

Required Skills

AWS Data Services (Hands-on):

  • S3 — data lake storage, lifecycle policies, access control, and layout optimization
  • AWS Glue — ETL jobs (PySpark), crawlers, Data Catalog, and job bookmarks
  • Athena — writing and optimizing analytical queries over S3 data
  • Step Functions or Glue Workflows — pipeline orchestration
  • CloudWatch — monitoring, logging, and alerting for data pipelines
  • IAM / KMS — data security, encryption, and access management

Data Engineering Fundamentals:

  • 2+ years building data pipelines in production
  • Strong SQL skills — complex joins, window functions, CTEs, and query optimization
  • Experience with columnar formats (Parquet, ORC) and partitioning strategies
  • Understanding of data lake design patterns and layer separation (bronze/silver/gold or raw/cleaned/curated)
  • Data modeling for analytics: star schema, wide tables, and dimensional modeling
  • Python for ETL scripting and transformations

General:

  • Git and CI/CD for data pipeline code (GitHub Actions, CodePipeline)
  • Data quality testing and validation approaches
  • Clear communication — can translate business data needs into technical designs

Preferred Skills

  • Experience with Apache Iceberg or Delta Lake table formats
  • Streaming ingestion with Kinesis or Kafka, and CDC tools (Debezium, DMS)
  • Familiarity with Redshift, EMR, or Lake Formation
  • Experience supporting ML pipelines and feature stores
  • Airflow for pipeline orchestration
  • Scala or PySpark beyond basic Glue jobs
  • Experience at a consulting or product engineering firm

Personal Qualities

  • You care about data quality — bad data downstream bothers you
  • Methodical debugger — can trace a pipeline failure from alert to root cause
  • Thinks about cost from the start, not as an afterthought
  • Documents data flows and schemas without being asked
  • Comfortable working across teams (analytics, ML, product)

What We Offer

  • Opportunity to work on GenAI, cloud-first projects for diverse clients
  • Collaborative engineering culture with mentoring and career growth
  • Competitive salary and benefits (location-adjusted)
  • Flexible work arrangements