An honest Review Of Astronomer Certification.
Big part of my work as a data engineer consists of designing reliable, efficient and reproducible ETL jobs.
Over the last two years, Apache Airflow has been the main orchestrator I have been using for authoring, scheduling and monitoring data pipelines.
For this reason, I recently decided to challenge myself by taking the Astronomer Certification for DAG Authoring which is meant to assess knowledge of designing and creating data pipelines following best practices.
I cleared the exam more than one month after starting the preparation course provided by Astronomer cause I mainly studied during weekends and I really wanted to absorb what the course had to offer me.
In this article, I would like to share with you my honest review on the course, the strategy I used to clear the exam and answer to some questions among which:
- Who is this certification for?
- What does the exam consist of?
- Is it worth your time?
Also, toward the end, I present 5 questions from the exam that I got wrong and share with you why I made those mistakes and how you can avoid them.
Astronomer is the leading provider of cloud-based data orchestration platforms powered by Apache Airflow.
Their services include deploying and managing one or multiple Airflow instances in the cloud, allowing clients to focus on building, running and monitoring data pipelines, instead of worrying about managing their environments.
The company currently offers two professional certifications:
- Apache Airflow Fundamental Certification | Level: Basic
- Apache Airflow DAG Autoring | Level: Intermediate
In particular, studying for the Apache Airflow DAG Authoring Certification, prepares you to design and create reliable data pipelines in Python, following best practices.
The certification is addressed to all data professionals (among which data engineers, BI engineers, data scientists) that consistently use Apache Airflow to perform their job and wish to prove their knowledge.
Because the exam is meant to assess more advanced topics, Astronomer recommends at least 6 months of practical experience with Airflow.
They also mention that “If you have a solid experience with creating DAGs, then you may be ready to apply your skills directly to the certification exam.”
However, I strongly recommend you to take advantage of the preparation course offered by Astronomer. This is because, despite I have been using Airflow for more than 2 years, I had never applied good part of the concepts taught in the preparation course and those ended up appearing quite often in the exam.
The exams consists of 75 multiple choice questions and you are given 60 minutes to complete it. The passing score is 70%, pretty generous as you only need 53 correct answers to pass.
However, please do not underestimate the exam: in order to nail it, you will have to show that you master the different features that Airflow offers to create DAGs, the pros and cons of each one as well as their limitations.
You should be confident while taking design choices for data pipelines according to the specific use cases. You should have a solid knowledge about most common operators and have familiarity with less common ones, particularly in the context of defining DAG dependencies, setting different branches, wait for events through sensors etc…
What Strategy Did I Use?
I purchased the exam and preparation course in a bundle for $150. Usually this gives you access to two exam attempts, meaning that if you fail once, you can retry for FREE.
Then, I watched all the videos in the preparation course once, without taking notes or spending too much time on them and attempted the exam straight after to get a sense of the type of questions and their difficulty. Funnily enough, I scored 50/75, meaning that I failed but I was only 3 correct answers away from the lower threshold.
However, at this point I knew exactly the type of questions to expect and the topics on which I struggled the most, therefore I watched all videos a second (and in some cases a third) time. At this round, I took lots of notes and tried to replicate part of the code on my local Airflow environment.
Finally, one morning I decided it was time to re-attempt the exam: I managed to score 62/75 that means 12 correct answers more than the first try, but still a bit below my expectations (given all the additional time investment!).
I have been an “A” student in the past, but that does not necessarily pay the bills and is time consuming, so I am very much satisfied with my almost 83% correct answers rate as I can switch my focus on something else.
Once I passed the exam, I have received an official certificate that was shared as digital credential via Credly. The badge looks like this:
When it comes to assess if the the time I invested into the certification was worth it, I would honestly say that I am in between saying: “YES and NO”.
I reckon the course was well structured and nicely delivered by the tutor. It was not the first course I took with Marc Lamberti and I really like his positive attitude, and his accent, so watching videos was kind of entertaining.
Through the course I got exposed to a number of topics and features that I never used before in Airflow and this made me grow as professional and will allow me to share this knowledge back on the workplace.
Despite it’s impossible to determine how good someone is with Airflow only through a certification, I would say that investing time and resources to study for the exam, shows employers that I am committed on mastering Airflow and passionate about it. If anything, I am one step closer to become an expert in the field.
Also, Astronomer should be considered a leader in the market when it takes to providing cloud-based Airflow services, so this was the best (if not the only) choice to get Airflow certified.
However, the fact that they have no established competitors is also a drawback: this is because offering Airflow certifications is really a secondary business for Astronomer, used to advertise their main services and generate leads indirectly.
For example, I found weird for the exam not to be proctored: if you and your colleague take the exam one after the other and she passes it before you attempt it, she will be able to have access to the complete list of questions and answers and this means that you could use it to know the exact solutions beforehand. Without a proctored exam, people are allowed to sit exams without respecting the value of integrity and I don’t like the idea.
On top of that, I have reasons to believe questions are not rotating too much (if not at all) in the exam. I had such an impression because I attempted the exam twice and I can tell that, in both occasions, questions were mostly the same. I would suggest to Astronomer that having a larger pool of randomly rotating questions would make the certification even more respected.
As a last point, despite being extremely helpful, the idea of having two exam attempts, induces students not to prepare thoroughly enough (at least for the first try – and I am guilty of this too), cause in case of failure, financial consequences and peer-pressure will be minimal. I would suggest Astronomer to introduce some sort of challenge after the first attempt like a higher passing score.
Whether you are pondering to take an Airflow certification or already been studying for a while, knowing the type of questions you will have to face on the day, could help you identifying topics that require revision.
In this section, I present 5 mock questions that are very similar to the ones I found in the DAG Authoring exam, where I provided the wrong answer. I will share with you the correct answer instead and explain why I personally got confused at the time.
Your DAG has:— A start date set to the 1st of January 2022
— A schedule interval set to @daily
— An end date set to the 5th of January 2022How many DAG Runs will you end up with?Options- 3
— 5 → CORRECT ANSWER
The correct answer is 5 DAG Runs in total, because the DAG will be triggered for the first time on the 2d of January at midnight and so on until the 6th of January according to the formula:
triggered_date = start_date + schedule_interval
So remember that
triggered_date are three different concepts in Airflow and to compute the number of DAG Runs you simply need to sum number of triggered dates. In the exam I got confused because for some reason I believed the interval was exclusive, meaning that the 5th of January was not included as a
start_date. Of course this is not the case…Silly me!
What are some different ways of creating DAG dependencies? (Select all that apply)Options— ExternalTaskSensor → CORRECT ANSWER
— TriggerDagRunOperator → CORRECT ANSWER
— SubDAGs (even if you know that it is BAD) → CORRECT ANSWER
This questions has multiple correct options (3 to be precise), because
TriggerDagOperator as well as
SubDAGs all ways to create DAG dependencies, despite SubDAGs are not best practice.
Funnily enough, I got this right at the first attempt but wrong at the second because I assumed they were trying to bias me by adding the
SubDAGs option, but that was actually also correct.
Can you run this task twice for the same execution date (to backfill, for example)?Options— YES
— NO → CORRECT ANSWER
This question is a bit tricky if you don’t read the code carefully: the correct answer is NO, because as it is, the SQL code can only be run once. To make the task idempotent, so that it can be run multiple times, the code should be changed to:
CREATE TABLE IF NOT EXISTS planes(…)
I got this one wrong because I forgot to pay attention to the SQL code in the
PosgresOperator and I knew by experience that it is possible to backfill through the UI or CLI, so I have naively answered YES.
You want to process your data incrementally. Therefore you need to get the current execution date of your DAG Run.What is the best way to get it from the PythonOperator?Options- A → CORRECT ANSWER
Apparently the correct answer is A, however I don’ recall seeing anything like that in the preparation course. I have actually selected B, cause the
**context variable can also be used to access the execution date variable
I would suggest Astronomer to revise this question or create a dedicated video that goes more in depth on the topic. As it is now, it seems a bit confusing to me.
With the PythonOperator, what is the most efficient way to push multiple XComs at once?Options- A
- B → CORRECT ANSWER
The correct answer is B because the most efficient way to push multiple XComs is indeed to specify the type of the value returned by the
PythonOperator (that in this case is a dictionary) while defining the function. This method is clearly addressed in the video about XComs with the TaskFlow API. In the exam I wrongly selected A, forgetting that the value type has also to be clearly stated on top.
In this article I shared an honest review of the Apache Airflow DAG Authoring Certification provided by Astronomer.
While studying to pass the exam, I could not find a lot of extra material and feedback out there, so this is my way to give something back to the community and help those people that are thinking about getting certified.
Hope the suggestions and mock questions I shared, will help you to nail the exam very soon and please feel free to contact me if you need additional help.