

With over 10,000 DAGs in use, weekends and Mondays could be particularly stressful, as data engineers would find their product dashboards lagging behind, with some tasks still in the queue or failing without notifying anyone. Managing a large number of DAGs also proved to be challenging. With many dependencies and the challenges of running DAGs locally, testing was often done in the development environment by running the DAG end-to-end through the UI or by using the command line in the shared environment. Testability of DAGs was another area of concern. As a result, engineers had to be creative, sometimes resorting to Redis or writing data to disk. While XComs can be used for this purpose, they were not widely adopted at the time. Passing data between tasks also proved challenging. While this issue can be mitigated with better engineering practices, such as creating modules and adhering to the DRY (Don't Repeat Yourself) principle, not everyone followed these practices. In the beginning, many users had large DAGs with seemingly shareable steps, but there was no standard convention for doing so. Initially, they used a virtual machine with Vagrant, but later migrated to a cloud-based solution.Īnother challenge was reusing code. To address this issue, they eventually set up a shared cloud environment to run the DAGs. Although they could load the web server locally, running the DAGs was difficult. First, running Airflow locally was quite tricky.

While working with Airflow, Tommy Dang and his colleagues encountered several challenges. After leaving Airbnb in 2020, he founded Mage in 2021. They also gained experience in machine learning infrastructure.

However, the company reintroduced the role in 2019, recognizing its importance.ĭuring their time at Airbnb, Tommy worked on numerous development tools, particularly those related to data manipulation and transformation.
Data apache airflow insight software#
In 2018, Airbnb eliminated the data engineering role, and data engineers became software engineers. They worked on various projects, including app development in Ruby and JavaScript, before transitioning to a data engineering role. Tommy Dang, a former software engineer at Airbnb, joined the company in early 2015. Why did Tommy start Mage to overcome Airflow's limitations? In the following section, there's a summary of the concrete challenges Tommy and Luke encountered in their past working with Airflow and why both of them decided to build an alternative to it. The team consists of nine engineers, and there are an astounding 14,000 DAGs in use. Challenges of AirflowĪ bit of history about Airflow: it was initially developed in late 2014, open-sourced in the summer of 2015, and to this day, there is a dedicated team at Airbnb working solely on Airflow. So, without further ado, let's dive into the world of data orchestration and explore what Airflow, Mage, and Kestra have to offer. By the end of this article, you'll be better equipped to make decisions about which tool is best suited for your data engineering needs. We'll also address some common questions and concerns raised by the community, and discuss potential future developments in the data orchestration space. We'll begin by giving an overview of each tool, followed by demos showcasing their unique capabilities. Today, we'll present an in-depth analysis of Airflow, Mage, and Kestra. However, we see an increasing number of alternative solutions emerging, such as Mage and Kestra, which are gaining traction in the industry. According to a recent survey conducted by Ben from SeattleDataGuy, Airflow was the primary orchestration tool used by more than 40% of respondents. To address these concerns, the Paris Airflow Community Meetup, organized by Christophe Blefari (you should subscribe to his newsletter) invited representatives from Mage and Kestra to provide insights and demos, allowing you to understand their unique offerings and how they compare to Airflow.īy the end of this article, you'll have a clearer understanding of the similarities and differences between these tools, as well as their potential impact on the data engineering field.īut before we delve into the details, let's take a moment to reflect on the current state of data engineering. We've noticed a surge of questions from the community, such as whether to migrate from Airflow to another tool, or which of these tools is the best fit for specific use cases. With this article, our goal is to offer you a natural perspective on these three tools so that you can make informed decisions for your projects. As the world of data engineering evolves, professionals like you are tasked with navigating the growing landscape of tools available for orchestrating and scheduling data workflows.
