Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

How to Compare Two Tables For Equality in BigQuery | by Giorgos Myrianthous | Jan, 2023

admin by admin
January 26, 2023
in Artificial Intelligence


Compare tables and extract their differences with standard SQL

Photo by Zakaria Ahada on Unsplash

Comparing tables in BigQuery is a crucial task when testing the results of data pipelines and queries prior to productionizing them. The ability to compare tables allows for the detection of any changes or discrepancies in the data, ensuring that the data remains accurate and consistent.

In this article we will demonstrate how to compare two (or more) tables on BigQuery and extract the records that differ (if any). More specifically, we will showcase how to compare tables with identical columns as well as tables with a different amount of columns.

First, let’s start by creating two tables with some dummy values that we will then be referencing throughout this tutorial in order to demonstrate a few different concepts.

-- Create the first table
CREATE TABLE `temp.tableA` (
`first_name` STRING,
`last_name` STRING,
`is_active` BOOL,
`no_of_purchases` INT
)
INSERT `temp.tableA` (first_name, last_name, is_active, no_of_purchases)
VALUES
('Bob', 'Anderson', True, 12),
('Maria', 'Brown', False, 0),
('Andrew', 'White', True, 4)

-- Create the second table
CREATE TABLE `temp.tableB` (
`first_name` STRING,
`last_name` STRING,
`is_active` BOOL,
`no_of_purchases` INT
)
INSERT `temp.tableB` (first_name, last_name, is_active, no_of_purchases)
VALUES
('Bob', 'Anderson', True, 12),
('Maria', 'Brown', False, 0),
('Andrew', 'White', True, 6),
('John', 'Down', False, 0)

Comparing records of tables with the same columns

Now that we have created our two example tables, you should have noticed that there are a couple of differences between them.

SELECT * FROM `temp.tableA`;

+------------+-----------+-----------+-----------------+
| first_name | last_name | is_active | no_of_purchases |
+------------+-----------+-----------+-----------------+
| Bob | Anderson | true | 12 |
| Andrew | White | true | 4 |
| Maria | Brown | false | 0 |
+------------+-----------+-----------+-----------------+

SELECT * FROM `temp.tableB`;

+------------+-----------+-----------+-----------------+
| first_name | last_name | is_active | no_of_purchases |
+------------+-----------+-----------+-----------------+
| Bob | Anderson | true | 12 |
| Andrew | White | true | 6 |
| Maria | Brown | false | 0 |
| John | Down | false | 0 |
+------------+-----------+-----------+-----------------+

Now assuming that table temp.tableB is the latest version of some dataset whereas temp.tableA is an older one and we would like to see the actual differences (in terms of records) between the two tables, all we need is the following query:

WITH
table_a AS (SELECT * FROM `temp.tableA`),
table_b AS (SELECT * FROM `temp.tableB`),
rows_mismatched AS (
SELECT
'tableA' AS table_name,
*
FROM (
SELECT
*
FROM
table_a EXCEPT DISTINCT
SELECT
*
FROM
table_b
)

UNION ALL

SELECT
'tableB' AS table_name,
*
FROM (
SELECT
*
FROM
table_b EXCEPT DISTINCT
SELECT
*
FROM
table_a
)
)

SELECT * FROM rows_mismatched

Now the result will contain all the differences observed between the tables along with a reference to the table name where the records were found.

In our specific examples, tables A and B were having a difference in two records; The first one seems to be the record for Andrew White since this person has a different value for no_of_purchases field. Additionally, table tableB has one additional record that is not even present on table tableA.

+------------+------------+-----------+-----------+-----------------+
| table_name | first_name | last_name | is_active | no_of_purchases |
+------------+------------+-----------+-----------+-----------------+
| tableB | John | Down | false | 0 |
| tableB | Andrew | White | true | 6 |
| tableA | Andrew | White | true | 4 |
+------------+------------+-----------+-----------+-----------------+

Note: If you are not familiar with the WITH clause and Common Table Expressions (CTEs) in SQL, make sure to read the following article:

Comparing records of tables with different columns

Now let’s suppose you would like to compare the records between two tables having a different amount of columns. Obviously, we would have to do an apples-to-apples comparison meaning that we somehow need to extract only the common fields from both tables in order to be able to perform a meaningful comparison.

Let’s re-create our tables in order to generate some mis-matching columns so that we can then demonstrate how to deal with these cases:

-- Create the first table
CREATE TABLE `temp.tableA` (
`first_name` STRING,
`last_name` STRING,
`is_active` BOOL,
`dob` STRING
)
INSERT `temp.tableA` (first_name, last_name, is_active, dob)
VALUES
('Bob', 'Anderson', True, '12/02/1993'),
('Maria', 'Brown', False, '10/05/2000'),
('Andrew', 'White', True, '14/12/1997')

-- Create the second table
CREATE TABLE `temp.tableB` (
`first_name` STRING,
`last_name` STRING,
`is_active` BOOL,
`no_of_purchases` INT
)
INSERT `temp.tableB` (first_name, last_name, is_active, no_of_purchases)
VALUES
('Bob', 'Anderson', True, 12),
('Maria', 'Brown', True, 0),
('Andrew', 'White', True, 6),
('John', 'Down', False, 0)

Now our new tables have only three columns in common, namely first_name, last_name and is_active.

SELECT * FROM `temp.tableA`;

+------------+-----------+-----------+--------------+
| first_name | last_name | is_active | dob |
+------------+-----------+-----------+--------------+
| Bob | Anderson | true | '12/02/1993' |
| Andrew | White | true | '10/05/2000' |
| Maria | Brown | false | '14/12/1997' |
+------------+-----------+-----------+--------------+

SELECT * FROM `temp.tableB`;

+------------+-----------+-----------+-----------------+
| first_name | last_name | is_active | no_of_purchases |
+------------+-----------+-----------+-----------------+
| Bob | Anderson | true | 12 |
| Andrew | White | true | 6 |
| Maria | Brown | false | 0 |
| John | Down | false | 0 |
+------------+-----------+-----------+-----------------+

Now if we attempt to run the query we executed in the previous section where the two tables were having the same columns, we will end up with this error:

Column 4 in EXCEPT DISTINCT has incompatible types: STRING, INT64 at [13:7]

This is absolutely normal given that our tables no longer have matching columns. We need to slightly amend our initial query such that the very first CTEs will only select the mutual columns for every table. Our query will look as below:

WITH
table_a AS (
SELECT
first_name,
last_name,
is_active
FROM
`temp.tableA`
),
table_b AS (
SELECT
first_name,
last_name,
is_active
FROM
`temp.tableB`
),
rows_mismatched AS (
SELECT
'tableA' AS table_name,
*
FROM (
SELECT
*
FROM
table_a EXCEPT DISTINCT
SELECT
*
FROM
table_b
)

UNION ALL

SELECT
'tableB' AS table_name,
*
FROM (
SELECT
*
FROM
table_b EXCEPT DISTINCT
SELECT
*
FROM
table_a
)
)

SELECT * FROM rows_mismatched

The tables created in this section were having the following mismatches (when considering only their mutual columns):

  • The record for Maria Brown has differences in column is_active
  • Table tableB has one additional record (John Down) which is not present in tableA

These differences can be observed in query results shared below:

+------------+------------+-----------+-----------+
| table_name | first_name | last_name | is_active |
+------------+------------+-----------+-----------+
| tableB | Maria | Brown | false |
| tableB | John | Down | false |
| tableA | Maria | Brown | true |
+------------+------------+-----------+-----------+

Final Thoughts

In this article, we provided a comprehensive guide on how to compare tables in BigQuery. We highlighted the importance of this task in ensuring the accuracy and consistency of data and demonstrated multiple techniques for comparing tables with identical columns as well as tables with different amounts of columns. We also walked through the process of extracting records that differ between tables (if any).

Overall, this article aimed to equip readers with the necessary tools and knowledge to effectively and efficiently compare tables in BigQuery. I hope you found it useful!

Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Related articles you may also like



Source link

Previous Post

A Glimpse into the Future: Where AI is Headed in 2050 | by Ghina Naji | Jan, 2023

Next Post

App Innovation Report: Prioritizing human pain points to build high-impact experiences

Next Post

App Innovation Report: Prioritizing human pain points to build high-impact experiences

Explaining Explainable AI for Conversations

DALL·E 2 Pre-Training Mitigations

Related Post

Artificial Intelligence

10 Most Common Yet Confusing Machine Learning Model Names | by Angela Shi | Mar, 2023

by admin
March 26, 2023
Machine Learning

How Machine Learning Will Shape The Future of the Hiring Industry | by unnanu | Mar, 2023

by admin
March 26, 2023
Machine Learning

The Pros & Cons of Accounts Payable Outsourcing

by admin
March 26, 2023
Artificial Intelligence

Best practices for viewing and querying Amazon SageMaker service quota usage

by admin
March 26, 2023
Edge AI

March 2023 Edge AI and Vision Innovation Forum Presentation Videos

by admin
March 26, 2023
Artificial Intelligence

Hierarchical text-conditional image generation with CLIP latents

by admin
March 26, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.