Retail: Missing Barcodes

4pts
Walmart, Target, Amazon

Problem Statement

At Walmart, data integrity between the Point of Sale (POS) systems and the Master Product Catalog is vital for financial reporting. Occasionally, a product is sold before it is properly registered in the central system, leading to "orphan" sales.

Write a query to identify these orphan products. An orphan product is any product_id appearing in the sales_records table that does not exist in the master_catalog table.

Rules:

  • You must return the unique product_id of every missing product.
  • For each missing product, calculate the total number of times it was sold (total_orphan_sales) and the total revenue generated (total_lost_revenue).
  • Revenue is calculated as: quantity * unit_price.
  • Output columns: product_id, total_orphan_sales, total_lost_revenue.
  • Sort by total_lost_revenue descending, then product_id ascending.
Tests your understanding of
Joins, Set Theory, Data Validation and Subqueries

Input Tables

master_catalog
product_id(INTEGER)product_name(VARCHAR)category(VARCHAR)
1AppleProduce
2BreadBakery
3MilkDairy
4EggsDairy
5ButterDairy
sales_records
sale_id(INTEGER)product_id(INTEGER)quantity(INTEGER)unit_price(DECIMAL)
101120.5
10299115
103212
10488510
10599215
1067715
10788110

Expected Output

product_id(INTEGER)total_orphan_sales(INTEGER)total_lost_revenue(DECIMAL)
88260
99245
7715

Tags

MediumJoinsSet TheoryData ValidationSubqueries
20-30 min
42%

Hints