AWS Data Engineer - Careers at Quabyt

What Is the role?

We need a data engineer who can build and operate production data pipelines on AWS. You’ll work with S3, Glue, and Athena daily — ingesting data from various sources, transforming it into usable formats, and making it queryable for analytics and AI teams. This is a hands-on role where you own the data layer end-to-end.

Key Responsibilities

Pipeline Development:

Design and build ELT pipelines using AWS Glue (ETL jobs, crawlers, Data Catalog) and S3
Ingest data from relational databases, APIs, event streams, and flat files
Implement schema evolution, partitioning strategies, and file format optimization (Parquet, ORC, Iceberg)
Build orchestrated workflows using Glue Workflows, Step Functions, or Airflow

Data Lake & Storage:

Design and maintain S3-based data lake architecture with clear layer separation (raw, cleaned, curated)
Optimize S3 layout for query performance and cost — partitioning, compaction, and lifecycle policies
Implement data cataloging and metadata management with Glue Data Catalog

Query & Analytics:

Optimize Athena queries for performance and cost
Build views and tables that analytics and BI teams can self-serve from
Support data modeling for analytics use cases (star schema, dimensional modeling)

Quality & Operations:

Implement data quality checks and validation at each pipeline stage
Set up monitoring and alerting for pipeline failures and data anomalies (CloudWatch)
Enforce data access controls, IAM policies, encryption, and governance
Document data flows, schemas, and pipeline dependencies

Required Skills

AWS Data Services (Hands-on):

S3 — data lake storage, lifecycle policies, access control, and layout optimization
AWS Glue — ETL jobs (PySpark), crawlers, Data Catalog, and job bookmarks
Athena — writing and optimizing analytical queries over S3 data
Step Functions or Glue Workflows — pipeline orchestration
CloudWatch — monitoring, logging, and alerting for data pipelines
IAM / KMS — data security, encryption, and access management

Data Engineering Fundamentals:

2+ years building data pipelines in production
Strong SQL skills — complex joins, window functions, CTEs, and query optimization
Experience with columnar formats (Parquet, ORC) and partitioning strategies
Understanding of data lake design patterns and layer separation (bronze/silver/gold or raw/cleaned/curated)
Data modeling for analytics: star schema, wide tables, and dimensional modeling
Python for ETL scripting and transformations

General:

Git and CI/CD for data pipeline code (GitHub Actions, CodePipeline)
Data quality testing and validation approaches
Clear communication — can translate business data needs into technical designs

Preferred Skills

Experience with Apache Iceberg or Delta Lake table formats
Streaming ingestion with Kinesis or Kafka, and CDC tools (Debezium, DMS)
Familiarity with Redshift, EMR, or Lake Formation
Experience supporting ML pipelines and feature stores
Airflow for pipeline orchestration
Scala or PySpark beyond basic Glue jobs
Experience at a consulting or product engineering firm

Personal Qualities

You care about data quality — bad data downstream bothers you
Methodical debugger — can trace a pipeline failure from alert to root cause
Thinks about cost from the start, not as an afterthought
Documents data flows and schemas without being asked
Comfortable working across teams (analytics, ML, product)

What We Offer

Opportunity to work on GenAI, cloud-first projects for diverse clients
Collaborative engineering culture with mentoring and career growth
Competitive salary and benefits (location-adjusted)
Flexible work arrangements