Danushka D. Practical Data Engineering with Apache Projects...2026

Download Download Torrent Opens in your torrent client (e.g. qBittorrent)
Category Other
Size23.19 MB
Added1 month ago (2026-02-03 10:25:01)
Health
Good13/1
Info Hash21AB469FCE05D2E05C75401BDE7CBCA3B441FBAE
Peers Updated3 hours ago (2026-03-24 06:49:44)

Report Torrent

0 / 300

Description


Textbook in PDF format

This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using Open Source solutions. Focusing on 10 real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more.
Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios.
At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering.
In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering.
The book is organized into three main parts, each focusing on different aspects of data engineering:
Part 1: Data Lakehouses, Iceberg, Batch ETL, and Orchestration – This part focuses on building data storage solutions and implementing batch processing pipelines. You'll learn how to set up a data lakehouse with Apache Iceberg, create ETL pipelines with Spark, visualize data with Superset, and orchestrate workflows with Airflow.
Part 2: Streaming Data and Real-Time Analytics – This part explores real-time data processing using technologies like Kafka, Debezium, and Flink. You'll implement change data capture, streaming ETL, fraud detection, and a low-latency analytics dashboard with ClickHouse.
Part 3: Machine Learning and Feature Engineering – The final part delves into advanced applications, including feature engineering for machine learning and vector similarity search for sentiment analysis.
You Will Learn:
The foundational concepts of data engineering and practical experience in solving real-world data engineering problems
How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino
10 hands-on data engineering projects
Troubleshoot common challenges in data engineering projects
Who is this book for:
Early-career data engineers and aspiring data engineers who are looking to build a strong foundation in the field; mid-career professionals looking to transition into data engineering roles; and technology enthusiasts interested in gaining insights into data engineering practices and tools

×