Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

Jinja/DBT SQL Templating in JupyterLab/VSCode | by Jean-Claude Cote | Jul, 2022

admin by admin
September 11, 2022
in Artificial Intelligence


Quickly prototype your SQL templates

Photo by Joanna Kosinska on Unsplash

SQL itself doesn’t lend itself well to reusability. To achieve reusability SQL is often templated using libraries like Jinja. For example Apache Superset leverages Jinja templating in its Dataset definitions and of course DBT is a tool built entirely around Jinja templates. Jupyterlab-sql-editor natively supports Jinja templating making it possible to prototype jinja-sql templates for Superset and DBT right inside JupyterLab and VSCode.

Create reusable SQL using Jinja templates

The --jinja option enables Jinja templating support making any variable or function declared in your notebook available for substitution inside Jinja tags {{}}

For example you can use a variable to represent the actual table name used in your SQL statements. This can be useful when you have database schema for development and production tables.

In our previous article we saw that the %%sparksqlmagic supports many output formats; text, interactive grid, interactive JSON tree, html, skip. The %%sparksql magic has one more trick up its sleeve! Forgive my pun.

The --output sqloption renders the resulting SQL statement with syntax highlighting. It is a convenient way to validate your Jinja templated code. In this example we use a python list variable to automate the generation of a SELECT statement.

As we can see the Jinja {% for item in list %}has produced the desired columns.

Tapping into the power of DBT

Using Jinja templating to automate the generation of SQL statement is very powerful and in fact an entire project is built around this very idea. DBT is a framework to develop modular SQL models with SELECT statements and includes a ref()Jinja macro which handles dependency management (which models to build first).

DBT includes a plethora of Jinja macro aimed at simplifying the elaboration of SQL statements. DBT has a package management system allowing third party developer to contribute libraries of macros. There are lots of them on the DBT Hub

%%sparksql magic enables you to tap into a wealth of macros by simply replacing the--jinjawith the --dbt option and by configuring your notebook with the location of your DBT project. We will illustrate the use of DBT using the example project jaffle_shop.

In the jaffle_shop there is a model called orders.sql. We can place the content of this file inside a Jupyter cell and use the --dbt to process the template using DBT. We can use the —-output sqlto render the statement and the--output htmlto execute the query and display the results.

Notice the use of the DBT ref() macro. This macro refers to existing DBT models within your DBT project. %%sparksql actually uses the DBT framework to render the SQL template. All the DBT models and macros are made available to %%sparksql magic.

We can render the SQL produced by DBT. The output is pretty long, we only show a part of it. Notice the payment_method loop producing the expected columns. Also notice select * from _dbt__cte__stg_orders which is they way DBT handles the materialization of ephemeral models. More details here.

As seen before we can execute and display the results. Notice the DBT log outputs before the actual results are displayed.

Using %%sparksql is a convenient way to prototype your DBT code. We have illustrated the use of DBT using a pre-existing model. To create a DBT model from scratch you might want to simply display the data of existing tables (DBT sources) or pre-existing DBT models. Suppose you want to create a new model based on the stg_payments . You can start by showing the data in stg_payments

%%sparksql --dbt --output htmlselect * from {{ ref('stg_payments') }}

Then start transforming this dataset using DBT’s Jinja macros. All the while seeing the resulting output or rendering the SQL.

%%sparksql --dbt{% set payment_methods =
['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
select
order_id,
{% for payment_method in payment_methods -%}
sum(case when payment_method = '{{ payment_method }}'
then amount else 0 end) as {{ payment_method }}_amount,
{% endfor -%}
sum(amount) as total_amount
from {{ ref('stg_payments') }}
group by order_id

Once you have a great templated query you can move it to production by simply copying the template into a DBT .sqlmodel file and be confident that it will actually do what you expect it to do.

Visual Studio Code DBT Power User

DBT Power User is a popular VSCode extension for DBT projects. It supports many DBT autocompletion features for your .sql model files.

Visual Studio Code supports working with Jupyter Notebooks natively and thus the %%sparksql magic works inside VSCode. As shown before all you need to do to leverage a DBT project in a notebook is to give the location of your DBT to %%sparksql

Once a notebook cell’s language is set to SQL the autocompletion of DBT Power User will kick in and you’ll get the same benefits as when editing an .sql file. Notice that if you install the sql-language-server in your VSCode it will automatically change the language to SQL when it detects a cell with a %%sparksql magic.

Remember that VSCode autocompletion is triggered by rather than in JupyterLab.

For example if we select the is_incremental suggestion shown above we get the following code inserted into our notebook cell. The same behaviour as DBT Power User in a .sql model file.

The bottom line is that %%sparksqlmagic works the same way in VSCode and JupyterLab notebooks. You can render DBT models into SQL, execute queries and view the results. Similarly VSCode notebooks are a great way to prototype DBT models. Once you are satisfied with your model you can copy it into an .sqlmodel file.

In this article we showed how to leverage %%sparksql to easily prototype templated Spark SQL in JupyterLab and VSCode. We focused on Spark, however jupyterlab-sql-editor also includes a %%trino magic! More SQL engines might be added in the future. Contributions are welcomed! Here’s our git repo CybercentreCanada/jupyterlab-sql-editor.



Source link

Previous Post

Kryptonite of the correlations. It is easy to get lost in the world of… | by Asif Syed | Sep, 2022

Next Post

“The weirdest paradox in statistics (and machine learning)” | by Jon Walsh | Sep, 2022

Next Post

“The weirdest paradox in statistics (and machine learning)” | by Jon Walsh | Sep, 2022

ML use cases in Marketing, Media & Publishing… | by Aditya Tadas | Sep, 2022

The CRISP-DM Methodology for Data Science

Related Post

Artificial Intelligence

Exploring TensorFlow Model Prediction Issues | by Adam Brownell | Feb, 2023

by admin
February 2, 2023
Machine Learning

Different Loss Functions used in Regression | by Iqra Bismi | Feb, 2023

by admin
February 2, 2023
Machine Learning

How to organize bills? – 3 ways to track bills

by admin
February 2, 2023
Artificial Intelligence

How to decide between Amazon Rekognition image and video API for video moderation

by admin
February 2, 2023
Artificial Intelligence

The Future of AI: GPT-3 vs GPT-4: A Comparative Analysis | by Mohd Saqib | Jan, 2023

by admin
February 2, 2023
Deep Learning

6 Ways To Streamline Tech Hiring With A Recruitment Automation Platform

by admin
February 2, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.