A.A
Home Projects Blog
Projects

A collection of data engineering projects showcasing scalable pipelines, real-time processing, and cloud infrastructure implementations.

Filter by technology: All Apache Airflow Apache Beam BigQuery Docker Google Dataflow Java MinIO PubSub Python dbt
Dynamic Schema-Agnostic Dataflow Pipeline for BigQuery
Dynamic Schema-Agnostic Dataflow Pipeline for BigQuery

This Apache Beam and Dataflow pipeline provides a metadata-driven approach for automating data ingestion into BigQuery, highlighting architectural flexibility by enabling schema handling without code changes. The project showcases skills in building reusable, scalable ETL workflows designed for efficient, large-scale data processing within the Google Cloud ecosystem. For a comprehensive overview of the project's capabilities, review the repository details on GitHub

JavaApache BeamGoogle Dataflow
+2 more
Jun 2025
Scalable Workflow Orchestration: Advanced Data Pipelines with Apache Airflow
Scalable Workflow Orchestration: Advanced Data Pipelines with Apache Airflow

The project titled "Scalable Workflow Orchestration: Advanced Data Pipelines with Apache Airflow" serves as a comprehensive laboratory for mastering complex workflow orchestration, task scheduling, and DAG creation using Apache Airflow. It demonstrates best practices in pipeline reliability, featuring custom error handling, automated retries, and dependency management to showcase a production-grade environment.

Apache AirflowPythonDocker
+3 more
Mar 2026