Top 5 Practical SQL Interview Questions

These five SQL problems are frequently asked in data engineering and analytics interviews because they test real work: ranking, deduplication, incremental logic, and business-style reporting. Each question includes a clear solution query. (Examples are written in a SQL Server-friendly style, but the logic works in most SQL engines.)


1) Find the Top 3 highest-paid employees in each department

Tests window functions and ranking within groups.

Solution (ROW_NUMBER)


SELECT dept_id, emp_id, emp_name, salary
FROM (
    SELECT
        e.*,
        ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
    FROM Employees e
) x
WHERE x.rn <= 3
ORDER BY dept_id, salary DESC;
      

2) Calculate a running total of spending per customer (by date)

Tests window aggregation and ordering.

Solution (SUM OVER)


SELECT
    customer_id,
    order_id,
    order_date,
    order_total,
    SUM(order_total) OVER (
        PARTITION BY customer_id
        ORDER BY order_date, order_id
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM Orders
ORDER BY customer_id, order_date, order_id;
      

3) De-duplicate records and keep only the latest row per key

Very common in ingestion pipelines. Tests CTE usage + window ranking for dedupe.

Solution (CTE + ROW_NUMBER)


WITH ranked AS (
    SELECT
        t.*,
        ROW_NUMBER() OVER (
            PARTITION BY business_key
            ORDER BY updated_at DESC, id DESC
        ) AS rn
    FROM StagingTable t
)
DELETE FROM ranked
WHERE rn > 1;
      

Replace business_key with your natural key (e.g., customer_id, emp_id) and updated_at with your latest timestamp column.

4) Return customers who have orders above the overall average order value

Tests subqueries and benchmarking logic.

Solution (Subquery)


SELECT DISTINCT o.customer_id
FROM Orders o
WHERE o.order_total > (
    SELECT AVG(order_total) FROM Orders
)
ORDER BY o.customer_id;
      

5) Perform an upsert (MERGE): update existing rows and insert new rows

This is a day-to-day data engineering requirement: incremental loads and idempotent pipelines.

Solution (MERGE)


MERGE INTO DimCustomer AS tgt
USING StgCustomer AS src
ON tgt.customer_id = src.customer_id
WHEN MATCHED AND (
       ISNULL(tgt.customer_name,'') <> ISNULL(src.customer_name,'')
    OR ISNULL(tgt.city,'')          <> ISNULL(src.city,'')
) THEN
    UPDATE SET
        tgt.customer_name = src.customer_name,
        tgt.city          = src.city,
        tgt.updated_at    = GETDATE()
WHEN NOT MATCHED BY TARGET THEN
    INSERT (customer_id, customer_name, city, created_at, updated_at)
    VALUES (src.customer_id, src.customer_name, src.city, GETDATE(), GETDATE());
      

Summary

  • Ranking within groups (Top-N) using ROW_NUMBER
  • Running totals using SUM OVER
  • De-duplication using CTE + ROW_NUMBER
  • Benchmark filtering using subqueries
  • Incremental loading using MERGE (upsert)

Popular posts from this blog

Exploring the Largest UK Employers: A Power BI Visualization

Master Databricks Asset Bundles Through Hands-On Practice

PySpark Important Last-Minute Notes