SQL & Data Interviews · 9 min read

SQL Interview Questions: The 9 Window Function Patterns That Solve 80% of LeetCode SQL

If you can write a clean window function under pressure, you can pass almost every analytics and data-engineer SQL screen.

1,675 words

Most SQL interview questions look unique on the surface — top-K customers per region, second highest salary, longest streak of consecutive logins, daily active user retention — but underneath, almost all of them are nine recurring patterns. Eight of the nine are window functions. The candidates who pass the LeetCode SQL track and the HackerRank SQL screen do not memorize hundreds of queries. They learn to recognize when a question wants a window function, pick the right window frame, and write the query without looking up the syntax. This guide is that catalog.

Why window functions own the SQL interview

Window functions let you answer 'per group' and 'per row in context of group' questions without collapsing the rows into an aggregate. That is the entire shape of the SQL interview. 'Top three salaries per department,' 'first purchase per customer,' 'difference from previous month per region,' 'rank within bucket' — every one of those is a window function. SQL interviewers love them because they distinguish the candidate who memorized GROUP BY from the candidate who actually thinks in sets.

The syntax is universally OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN ...). PARTITION BY defines the group, ORDER BY defines the per-row position within the group, and the optional frame clause defines the window of rows the function sees. Memorize that skeleton — every pattern below fills in the blanks.

Pattern 1 — Ranking: ROW_NUMBER vs RANK vs DENSE_RANK

Ranking questions are the most common SQL interview shape. 'Find the second highest salary,' 'top three earners per department,' 'most recent order per customer.' Pick the right ranker and the answer is one CTE plus one filter.

FunctionBehavior on tiesUse it when
ROW_NUMBERArbitrary tiebreak, unique numbersYou want exactly one row per group (Nth match)
RANKSame rank for ties, gaps afterYou want all top-N including ties, gaps OK
DENSE_RANKSame rank for ties, no gapsYou want all top-N including ties, no gaps
sql
-- Top 3 highest-paid employee per department, ties allowed.
WITH ranked AS (
  SELECT
    employee_id, department_id, salary,
    DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rk
  FROM employees
)
SELECT employee_id, department_id, salary
FROM ranked
WHERE rk <= 3
ORDER BY department_id, salary DESC;

Pattern 2 — LAG and LEAD: previous and next row

Any question that asks for a delta, a streak, or a comparison to the previous or next event is a LAG/LEAD problem. 'Month-over-month revenue change,' 'time between consecutive logins per user,' 'consecutive identical readings.' LAG and LEAD give you the value from the previous or next row in the partition order, so you can compute deltas without a self-join.

sql
-- Month-over-month change in revenue per region.
SELECT
  region,
  month,
  revenue,
  revenue - LAG(revenue) OVER (PARTITION BY region ORDER BY month) AS delta,
  100.0 * (revenue - LAG(revenue) OVER (PARTITION BY region ORDER BY month))
         / NULLIF(LAG(revenue) OVER (PARTITION BY region ORDER BY month), 0)
         AS pct_change
FROM monthly_revenue;

Pattern 3 — Running totals and moving averages

SUM(), AVG(), and COUNT() take an OVER clause too. With a frame, you get running totals, cumulative counts, and moving averages — the bread and butter of analytics interviews. The frame syntax is the lever: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW gives you a running total, and ROWS BETWEEN 6 PRECEDING AND CURRENT ROW gives you a 7-day moving average.

sql
-- 7-day moving average of daily revenue per product.
SELECT
  product_id, day, revenue,
  AVG(revenue) OVER (
    PARTITION BY product_id
    ORDER BY day
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS ma_7d
FROM daily_revenue;

Note the subtle but critical detail: ROWS BETWEEN counts physical rows, RANGE BETWEEN counts logical values within the ORDER BY. For most calendar problems with one row per day, ROWS is correct. If you have gaps in your time series, RANGE BETWEEN INTERVAL '6' DAY PRECEDING is the calendar-aware version — and Postgres supports it natively.

Pattern 4 — Gaps and islands: consecutive groupings

Gaps and islands is the SQL interview classic dressed in twelve different costumes: 'longest streak of daily logins per user,' 'consecutive failed payments,' 'merge overlapping time intervals,' 'count uninterrupted runs.' The recipe is always the same — assign each row two row numbers, subtract them, group by the difference. Rows in the same 'island' share the same difference.

sql
-- Longest streak of consecutive login days per user.
WITH numbered AS (
  SELECT
    user_id, login_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date)
      - ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date)
      AS grp_diff
  FROM logins
)
SELECT user_id, MAX(streak) AS longest_streak
FROM (
  SELECT user_id, COUNT(*) AS streak
  FROM numbered
  GROUP BY user_id, grp_diff
) AS streaks
GROUP BY user_id;

The trick: when two row numbers are computed in the same order, their difference is constant within a contiguous run. Break the run, the difference jumps, and you get a new island. It is the most elegant pattern in the SQL toolkit.

Pattern 5 — Deduplication: keep one row per key

Deduplication is the most common production SQL task and a frequent interview ask. 'Keep only the most recent record per user,' 'one row per order_id with the latest status.' ROW_NUMBER over the dedup key, ordered by the recency column, then filter to rk = 1.

sql
-- Keep the most recent status row per order.
WITH ranked AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY updated_at DESC) AS rk
  FROM order_status_log
)
SELECT * FROM ranked WHERE rk = 1;

Pattern 6 — Cohort and retention queries

Cohort retention is the analytics-interview signature problem. 'Of users who signed up in March, what percentage logged in N days later?' The pattern: compute each user's first event (cohort assignment) with FIRST_VALUE or MIN, then bucket subsequent events by 'days since first event,' then aggregate.

sql
-- Day-N retention by signup cohort week.
WITH first_event AS (
  SELECT user_id, MIN(event_date) AS signup_date
  FROM events WHERE event_type = 'signup' GROUP BY user_id
),
activity AS (
  SELECT
    fe.user_id,
    DATE_TRUNC('week', fe.signup_date) AS cohort_week,
    e.event_date - fe.signup_date AS days_since_signup
  FROM first_event fe
  JOIN events e USING (user_id)
  WHERE e.event_type = 'login'
)
SELECT
  cohort_week,
  days_since_signup,
  COUNT(DISTINCT user_id) AS retained_users
FROM activity
GROUP BY cohort_week, days_since_signup
ORDER BY cohort_week, days_since_signup;

Patterns 7-9 — Pivot, unpivot, and time-series joins

Pivoting (rows to columns) and unpivoting (columns to rows) are most cleanly done with FILTER clauses and UNION ALL respectively. Postgres has FILTER built in; MySQL and SQL Server need CASE WHEN. Both questions show up on the LeetCode SQL hard list.

Time-series joins — 'for each event, find the most recent reference value as of that timestamp' — are LATERAL joins or correlated subqueries. They are the hardest SQL interview shape because they require thinking about temporal alignment. If a question mentions 'as of' or 'at the time of,' you are joining laterally.

sql
-- For each order, find the FX rate as of the order timestamp.
SELECT
  o.order_id, o.amount, o.currency, o.created_at,
  fx.rate
FROM orders o
JOIN LATERAL (
  SELECT rate FROM fx_rates
  WHERE fx_rates.currency = o.currency
    AND fx_rates.effective_at <= o.created_at
  ORDER BY fx_rates.effective_at DESC
  LIMIT 1
) fx ON TRUE;

The last meta-skill is reading the schema. Most interview slip-ups happen because the candidate did not notice that two tables share a non-obvious key, or that a status field is denormalized into the events log. Spend the first ninety seconds reading the schema. Name the keys, name the cardinality, name the time column. The query writes itself after that.

Performance, EXPLAIN, and the questions that go beyond syntax

Senior data interviews go past correctness and into performance. Expect questions like 'this query takes ten seconds on a hundred million row table — what would you do' or 'walk me through the EXPLAIN plan and where the cost is.' These are not gotcha questions; they are the daily reality of the role and the interviewer wants to see whether you have lived in a production query plan.

  • Know how to read an EXPLAIN ANALYZE in your target database. Identify sequential scans, index scans, hash joins versus nested loops, and where the cost concentrates.
  • Name the three things that fix most slow queries: an index on the filtered column, a rewrite to push filters before joins, or denormalization to avoid the join entirely.
  • Understand cardinality estimation failures — when a query optimizer underestimates rows by 100x, the chosen plan becomes catastrophic. ANALYZE statistics fix it more often than candidates expect.
  • Be ready to talk about partitioning by time for large append-only tables, and the tradeoff against cross-partition queries.

And finally: be honest about your dialect. Postgres, MySQL, BigQuery, Snowflake, and SQL Server all share roughly 80% of syntax and disagree on the other 20%. If you only know Postgres, say so. The interviewer would rather you write clean Postgres than guess at MySQL idioms and produce something that does not run.

Stop grinding. Start patterning.

Alpha Code is a patterns-first interview prep platform — coding, system design, behavioral, mocks, and ML/AI engineering all under one $19/mo subscription.