What Is the role?
We need a data engineer who can build and operate modern data platforms on Azure. You’ll work primarily with Microsoft Fabric and Azure Data Factory — building lakehouses, data pipelines, and semantic models that analytics and AI teams depend on. This is a hands-on role where you own the data platform end-to-end.
Key Responsibilities
Data Platform & Lakehouse:
- Build and maintain lakehouse architectures on Microsoft Fabric using OneLake and Delta Lake
- Implement medallion architecture (bronze, silver, gold) with clear data contracts between layers
- Design and manage data pipelines using Fabric Data Pipelines, Data Factory, and Fabric Notebooks
- Leverage zero ETL patterns (Fabric Mirroring, Synapse Link, Direct Lake) where they reduce complexity
Pipeline Development:
- Ingest data from relational databases, APIs, event streams, and files
- Write transformations in PySpark and Spark SQL within Fabric Notebooks
- Build orchestrated, reliable workflows with proper error handling and retry logic
- Implement schema evolution and handle late-arriving data gracefully
Modeling & Analytics:
- Build semantic models that Power BI and analytics teams can self-serve from
- Design dimensional models (star schema) optimized for query performance
- Optimize Delta Lake tables: Z-ordering, compaction, liquid clustering, and partitioning
Quality & Governance:
- Implement data quality checks and validation at each pipeline stage
- Set up monitoring and alerting using Fabric monitoring hub and Azure Monitor
- Enforce data governance with Microsoft Purview, RBAC, and row-level security
- Document data flows, schemas, and pipeline dependencies
Required Skills
Microsoft Fabric & Azure Data (Hands-on):
- Microsoft Fabric — OneLake, Lakehouses, Data Pipelines, and Notebooks in production
- Azure Data Factory — copy activities, dataflows, pipelines, and triggers
- Delta Lake — table format, ACID transactions, time travel, and optimization techniques
- PySpark / Spark SQL — writing and optimizing distributed transformations
- Fabric monitoring — pipeline runs, Spark job metrics, and failure alerting
Data Engineering Fundamentals:
- 2+ years building data platforms in production
- Strong SQL skills — complex joins, window functions, CTEs, and query optimization
- Lakehouse architecture and medallion design patterns (bronze/silver/gold)
- Data modeling for analytics: star schema, dimensional modeling, and semantic models
- Understanding of zero ETL concepts: Mirroring, Synapse Link, and Direct Lake mode
Security & Governance:
- Azure security basics: Entra ID, RBAC, managed identities, and Key Vault
- Microsoft Purview for data cataloging and lineage
- Row-level security and workspace-level access control in Fabric
General:
- Python for data engineering and automation
- Git and CI/CD for data pipeline code (Azure DevOps or GitHub Actions)
- Data quality testing and validation approaches
- Clear communication — can translate business data needs into technical designs
Preferred Skills
- Real-time streaming with Azure Event Hubs, Fabric Eventstreams, or KQL Database
- Fabric Real-Time Intelligence and Power BI Direct Lake integration
- Azure Databricks or Synapse Spark pools
- Data mesh principles and domain-oriented workspace architecture
- Supporting ML pipelines (Azure ML, MLflow, feature stores)
- Infrastructure as Code (Bicep, Terraform) for data platform resources
- Experience at a consulting or product engineering firm
Personal Qualities
- You care about data quality — bad data downstream bothers you
- Methodical debugger — can trace a pipeline failure from alert to root cause
- Thinks about cost and compute optimization from the start
- Documents data flows and schemas without being asked
- Comfortable working across teams (analytics, ML, product, Power BI developers)
What We Offer
- Opportunity to work on GenAI, cloud-first projects for diverse clients
- Collaborative engineering culture with mentoring and career growth
- Competitive salary and benefits (location-adjusted)
- Flexible work arrangements