Top 50 SQL Interview Questions

Top 50 SQL Interview Questions and Answers (2024 Edition)
Structured Query Language (SQL) is the lifeblood of data analysis, backend engineering, and database administration. Whether you are interviewing for a role as a Data Engineer, Data Scientist, Data Analyst, or Backend Software Engineer, you are practically guaranteed to face a SQL interview.
While many candidates focus heavily on Python or algorithms, they often neglect their SQL skills, assuming that basic SELECT, FROM, and WHERE clauses will carry them through. However, modern tech interviews at companies like Meta, Amazon, and Netflix dive deep into complex aggregations, self-joins, CTEs (Common Table Expressions), and Window Functions.
In this massive, comprehensive guide, we have compiled the Top 50 SQL Interview Questions, categorized by difficulty. We have included not just the questions, but the exact optimal queries, explanations of edge cases, and tips on how to talk through your logic with your interviewer.
Part 1: Basic SQL Concepts & Queries (Questions 1 - 15)
These questions test your fundamental understanding of SQL syntax, basic filtering, and simple table operations. You should be able to answer these rapidly and flawlessly.
1. What is the difference between WHERE and HAVING?
This is arguably the most common introductory SQL question.
WHEREis used to filter records before any groupings are made. It operates on individual rows.HAVINGis used to filter values after they have been grouped using aGROUP BYclause. It operates on aggregated records.
Example:
SELECT department, SUM(salary)
FROM employees
WHERE status = 'Active'
GROUP BY department
HAVING SUM(salary) > 500000;
2. What are the different types of JOINs in SQL?
You must clearly articulate the four main types of joins:
- INNER JOIN: Returns records that have matching values in both tables.
- LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table. If there is no match, the result is NULL from the right side.
- RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left.
- FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table. Note: MySQL does not support FULL OUTER JOIN natively; it must be emulated using a UNION of LEFT and RIGHT joins.
3. How do you find the second highest salary from an Employee table?
This is a classic problem. There are multiple ways to solve it, but using LIMIT and OFFSET is the cleanest if supported by the dialect (like PostgreSQL or MySQL).
SELECT DISTINCT salary
FROM Employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
If using a subquery (more universal):
SELECT MAX(salary)
FROM Employee
WHERE salary < (SELECT MAX(salary) FROM Employee);
4. What is a Primary Key vs. a Foreign Key?
- Primary Key (PK): A column (or a set of columns) that uniquely identifies each row in a table. It cannot contain NULL values, and it must be strictly unique.
- Foreign Key (FK): A column (or a set of columns) in one table that refers to the Primary Key in another table. It is used to enforce referential integrity and establish links between data in two tables.
5. What is the difference between DELETE, TRUNCATE, and DROP?
Interviewers use this to test your understanding of Data Manipulation Language (DML) vs. Data Definition Language (DDL).
DELETE(DML): Removes specific rows based on aWHEREcondition. It can be rolled back. It is slower because it logs individual row deletions.TRUNCATE(DDL): Removes all rows from a table by deallocating the data pages. It cannot be rolled back in most systems (except SQL Server). It is much faster than DELETE.DROP(DDL): Removes the entire table definition, data, indexes, and constraints from the database entirely.
6-15. Basic Query Challenges
(For the sake of this extensive guide, we will summarize the concepts tested in basic queries.)
- Filtering with
LIKEand wildcards (%,_). - Using
INvs.EXISTS. - Handling NULL values using
IS NULLorCOALESCE(). - Understanding the
ORDER BYexecution order. - Using
UNIONvs.UNION ALL(UNION ALL is faster because it does not incur a sorting penalty to remove duplicates).
Part 2: Intermediate Data Aggregation & Joins (Questions 16 - 35)
These questions represent the bulk of standard Data Analyst interviews. They require you to combine multiple tables, aggregate data, and handle edge cases like duplicate rows or missing keys.
16. Write a query to find employees who earn more than their managers.
Assume you have an Employee table with id, name, salary, and managerId.
Solution: This requires a Self-Join.
SELECT e1.name AS Employee
FROM Employee e1
INNER JOIN Employee e2 ON e1.managerId = e2.id
WHERE e1.salary > e2.salary;
Why this matters: Self-joins are frequently used in hierarchical data, such as organizational charts or category trees.
17. How do you find duplicate records in a table?
Assume a Users table with an email column. How do you find emails that appear more than once?
Solution: Use GROUP BY and HAVING.
SELECT email, COUNT(email) as num_occurrences
FROM Users
GROUP BY email
HAVING COUNT(email) > 1;
18. What is a Common Table Expression (CTE) and when would you use it?
A CTE (defined using the WITH clause) is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement.
- Benefits: They make complex queries much more readable compared to deeply nested subqueries. They can also be recursive, which is vital for traversing tree structures.
Example of readability:
WITH HighEarners AS (
SELECT id, name FROM Employees WHERE salary > 100000
)
SELECT name FROM HighEarners WHERE department = 'Engineering';
19. Find the top 3 departments with the highest average salary.
You need to join an Employee table with a Department table, aggregate, and limit.
SELECT d.name AS Department, AVG(e.salary) as avg_salary
FROM Employee e
JOIN Department d ON e.department_id = d.id
GROUP BY d.name
ORDER BY avg_salary DESC
LIMIT 3;
20-35. Intermediate Aggregation Concepts
Interviewers will test your ability to handle:
CASE WHENstatements: Transforming column outputs conditionally (e.g., categorizing users into 'High', 'Medium', 'Low' tiers based on spend).- Date Functions: Grouping data by month using
DATE_TRUNC()orEXTRACT(). - Handling Zeroes: Calculating click-through rates (CTR) and using
NULLIF()to prevent division-by-zero errors.
Part 3: Advanced SQL & Window Functions (Questions 36 - 50)
If you are interviewing for a Data Engineer or Senior Data Scientist role at a top-tier tech company, Window Functions are guaranteed to appear. These functions allow you to perform calculations across a set of table rows that are related to the current row, without collapsing them into a single aggregate row.
36. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?
This is the most critical window function question.
ROW_NUMBER(): Assigns a unique, sequential integer to each row. No two rows will have the same number, even if their values are identical.RANK(): Assigns the same rank to identical values, but skips the next numbers. (e.g., 1, 2, 2, 4, 5).DENSE_RANK(): Assigns the same rank to identical values, but does not skip the next numbers. (e.g., 1, 2, 2, 3, 4).
37. Write a query to find the highest-paid employee in each department.
Using Window Functions makes this trivial and highly performant compared to nested subqueries.
Solution:
WITH RankedSalaries AS (
SELECT
name,
department_id,
salary,
DENSE_RANK() OVER(PARTITION BY department_id ORDER BY salary DESC) as rnk
FROM Employee
)
SELECT name, department_id, salary
FROM RankedSalaries
WHERE rnk = 1;
Interviewer Tip: Emphasize that you use DENSE_RANK() instead of ROW_NUMBER() in case two employees are tied for the highest salary.
38. Calculate a 7-day rolling average of daily sales.
Time-series data is incredibly common in tech. You must know how to use the ROWS BETWEEN framing clause.
SELECT
date,
sales,
AVG(sales) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_7d_avg
FROM daily_sales;
39. How do you calculate Year-Over-Year (YoY) growth using SQL?
You will need the LAG() function, which allows you to access data from a previous row in the same result set without using a self-join.
WITH YearlyRevenue AS (
SELECT
EXTRACT(YEAR FROM order_date) as yr,
SUM(revenue) as total_revenue
FROM Orders
GROUP BY 1
),
LaggedRevenue AS (
SELECT
yr,
total_revenue,
LAG(total_revenue, 1) OVER (ORDER BY yr) as prev_yr_revenue
FROM YearlyRevenue
)
SELECT
yr,
total_revenue,
prev_yr_revenue,
((total_revenue - prev_yr_revenue) / prev_yr_revenue) * 100.0 AS yoy_growth_percentage
FROM LaggedRevenue;
40. What is an Execution Plan (EXPLAIN) and how do you use it to optimize a query?
Senior candidates must understand what happens under the hood.
When you prepend EXPLAIN (or EXPLAIN ANALYZE in Postgres) to a query, the database engine returns the query plan. You look for:
- Sequential Scans (Seq Scan): This means the database is scanning every single row in a table. If the table is large, this is a massive bottleneck. You should consider adding an Index.
- Nested Loops vs Hash Joins: Hash joins are generally faster for large datasets, while nested loops are fine for small ones.
- Cost: An arbitrary unit representing the computational cost of an operation.
41-50. The Hardest SQL Concepts
The final questions in a grueling interview will touch upon:
- Recursive CTEs: Finding all subordinates under a CEO in a self-referencing employee table.
- Pivoting Data: Converting rows to columns using
FILTERclauses orCASE WHENinside aggregates (sincePIVOTisn't supported in all dialects). - Gap and Island Problems: Identifying continuous ranges of dates where an event occurred (e.g., finding the longest winning streak). This usually involves complex combinations of
ROW_NUMBER()and grouping.
Conclusion & Next Steps
Mastering these 50 concepts will put you in the top 5% of candidates taking SQL interviews. However, reading about SQL is vastly different from writing it under pressure.
How to Practice:
- Understand your Dialect: Know the subtle differences between PostgreSQL, MySQL, and SQL Server. Most modern tech companies use PostgreSQL syntaxes for their data warehouses (like Redshift or Snowflake).
- Format your Code: Interviewers deduct points for unreadable, unformatted "spaghetti" SQL. Always capitalize your keywords (
SELECT,FROM), indent your subqueries, and use table aliases (FROM users u). - Use Mock Interviews: The best way to prepare is to practice verbally explaining your query logic before you type it. Use InterviPrep AI to simulate realistic SQL pair-programming interviews. Our AI will grade your query's efficiency, style, and correctness instantly.