Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

DISTINCT is not a SQL Function. How the use of parentheses when using… | by Giorgos Myrianthous | Sep, 2022

admin by admin
September 9, 2022
in Artificial Intelligence


How the use of parentheses when using DISTINCT keyword in SQL could cause confusion

Photo by Scott Webb on Unsplash

One of the most common misinterpretations that many SQL users do — even the more experienced ones — is related to the way DISTINCT is applied to the specified columns. It is quite common to see SQL queries attempting to apply a SELECT DISTINCT clause over a subset of columns that need to be returned by the query.

And the misconception lies in the view that DISTINCT is a function that essentially accepts s column name over which it is going to be applied, whilst the columns specified outside of the “function call” won’t be taken into account when computing the results.

But in reality, when trying to use parentheses in order to try and make DISTINCT effective only over the columns enclosed in parentheses won’t work in the way you’d expect it to work.

Now let’s create an example table that we will be referencing throughout this article to demonstrate a few concepts and help us clarify precisely how DISTINCT keyword works in SQL.

Create example rental table — Source: Author

And now let’s add a few records in the newly created address table.

Add sample rows into our example rental table — Source: Author

Now let’s query the results to see the final example table:

Content of example rental table — Source: Author

The misconception about DISTINCT in SQL

Now let’s assume we want to get the distinct rows from our rental table, using two different fields, namely customer_id and store_id. In other words, we would like to answer the following question:

What are the unique combinations of customer_id and store_id in our rental table?

To answer the above query we can simply query our table and fetch the DISTINCT values for customer_id and store_id columns:

Select the unique combinations of customer_id and store_id fields in rental table — Source: Author

Now if we wanted to retrieve only a unique set of customers such that in the query result above we could only see one row per customer, then we would need to refine our query to do so.

And here’s exactly where the misconception about DISTINCT lies in. A lot of users, have the (wrong!) impression that DISTINCT is a function in which we can specify the columns we want to consider when applying it over our target table.

If you attempt to enclose customer_id in parentheses when “calling” (not the right verb here since this is not a function) DISTINCT you’ll notice that it has no effect at all:

SELECT DISTINCT with a column in parentheses — Source: Author

We can still see “duplicate” customer IDs in our query result. This is because SELECT DISTINCT clause, will always take into account all the column names specified, regardless of whether they were enclosed in parentheses.

In fact all of the expressions shown below are indeed equivalent:

  • SELECT DISTINCT customer_id, store_id FROM rental;
  • SELECT DISTINCT (customer_id), store_id FROM rental;
  • SELECT DISTINCT (customer_id), (store_id) FROM rental;
  • SELECT DISTINCT (store_id), customer_id FROM rental;
  • SELECT DISTINCT ((customer_id)), store_id FROM rental;

As a final note, I would highly recommend to avoid the use of parentheses when using SELECT clause with the DISTINCT qualifier as it might make other people (who may not be aware of what we’ve discussed today) mis-interpret the query and accidentally think that your intention is to apply DISTINCT over a single column, despite the fact that this is not possible as we demonstrated already.

PostgreSQL and DISTINCT ON

If you are working with Postgres and would like to apply DISTINCT only to a subset of the columns you want to retrieve in your results, then you can take advantage of DISTINCT ON.

SELECT DISTINCT eliminates duplicate rows from the result.

SELECT DISTINCT ON eliminates rows that match on all the specified expressions.

— Postgres Documentation

This is an extension built on top of DISTINCT of standard SQL and it returns the first row for each set of rows that match the specified expression.

Example with DISTINCT ON clause in SQL — Source: Author

Note however that when using DISTINCT ON it makes sense to also use the ORDER BY clause. In this way, you will be able to specify the conditions for picking the desired result from conflicting rows. For instance, if two rows match your expression (in the above example, we have two records qualifying for customer with id 100).

Now let’s assume that we want to fetch unique customer IDs with the corresponding store ID, but this time if multiple competing rows exist we would like to get the one with the smallest amount:

SELECT DISTINCT ON with ORDER BY — Source: Author

Note that the store_id that corresponds to customer_id=100 has changed, since the rental row with the smallest amount is different now that we have ordered our results by amount in ascending order.

But in general, if you don’t really care about the order then you can omit it.

Final Thoughts

It is important to understand how the DISTINCT keyword works with SELECT statements in SQL since this is a source of confusion for many users — and I’d say even for experienced ones.

When writing queries with SELECT DISTINCT clauses, many users tend to use DISTINCT in the same way as they would with proper SQL functions. In other words, they enclose one column in the parentheses while at the same time they provide more column names after the clause — for example SELECT DISTINCT(user_id), first_name FROM ....

When reading such queries (and obviously the people writing them) you may end up believing that SELECT DISTINCT is only applicable over the column specified (e.g. user_id) and not over the remaining columns (e.g. first_name). And as we’ve seen in today’s article, this is a misconception and quite dangerous assumption to make when writing queries.

Finally, we discussed a special case in PostgreSQL database that allows users to specify explicitly which column to be taken into account when applying DISTINCT, using the special DISTINCT ON clause.

Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Related articles you may also like



Source link

Previous Post

Graphs as Scikit-Learn metaestimators | by Big-O | Sep, 2022

Next Post

Introdução ao uso de dados desbalanceados e machine learning para detecção de fraudes | by Camila Waltrick | Camila Waltrick | Sep, 2022

Next Post

Introdução ao uso de dados desbalanceados e machine learning para detecção de fraudes | by Camila Waltrick | Camila Waltrick | Sep, 2022

Bellowing During Coffee Grinding. Cause and effect | by Robert McKeon Aloe | Sep, 2022

Mux + CLIPr: Simulcasting for deeper engagement | by Mack Reed | The Moment | Sep, 2022

Related Post

Artificial Intelligence

Exploring TensorFlow Model Prediction Issues | by Adam Brownell | Feb, 2023

by admin
February 2, 2023
Machine Learning

Different Loss Functions used in Regression | by Iqra Bismi | Feb, 2023

by admin
February 2, 2023
Machine Learning

How to organize bills? – 3 ways to track bills

by admin
February 2, 2023
Artificial Intelligence

How to decide between Amazon Rekognition image and video API for video moderation

by admin
February 2, 2023
Artificial Intelligence

The Future of AI: GPT-3 vs GPT-4: A Comparative Analysis | by Mohd Saqib | Jan, 2023

by admin
February 2, 2023
Deep Learning

6 Ways To Streamline Tech Hiring With A Recruitment Automation Platform

by admin
February 2, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.