SQL (Structured Query Language) is the most universally required skill across all data-related roles. Whether you are a data analyst, data scientist, business analyst, backend developer, or product manager — you will use SQL every single day. Unlike programming languages that change rapidly, SQL has remained fundamentally stable for decades. What you learn today in SQL is still valid and valuable 10 years from now. This guide takes you from your first SELECT query all the way to advanced window functions and real business analysis examples used at Indian product companies like Swiggy, Razorpay, and Zepto.
Why SQL Is Non-Negotiable for Every Data Role
Data lives in databases. To access data, you write SQL. There is no shortcut. Even tools that abstract SQL — Power BI, Tableau, Looker — all generate SQL queries behind the scenes. When these tools cannot handle your specific analytical need, you drop back to raw SQL. Google BigQuery, Amazon Redshift, Snowflake, and Azure Synapse — every major data warehouse platform uses SQL as its primary query language. The AI/ML hype has not diminished SQL's importance one bit; if anything, SQL skills are more valuable because every ML model needs training data that must be extracted and prepared using SQL.
Core SQL Concepts — SELECT, WHERE, GROUP BY, ORDER BY
The SELECT statement is the foundation of all data retrieval in SQL. SELECT specifies which columns you want; FROM specifies which table; WHERE filters rows based on conditions; GROUP BY aggregates rows sharing the same values; HAVING filters aggregated results; ORDER BY sorts the output; and LIMIT restricts the number of rows returned.
WHERE clause conditions use comparison operators (=, !=, >, <, >=, <=), logical operators (AND, OR, NOT), and special operators: IN (matches any value in a list), BETWEEN (range check), LIKE (pattern matching with % wildcard and _ single character), IS NULL/IS NOT NULL (null checks). Mastering WHERE conditions is essential because data filtering is the most common SQL operation in real analysis work.
GROUP BY groups rows with the same values in specified columns into summary rows. Always use aggregate functions — COUNT(), SUM(), AVG(), MAX(), MIN() — with GROUP BY. HAVING filters the grouped results (WHERE runs before grouping; HAVING runs after). A classic business question: "What is the total revenue by product category, showing only categories with more than ₹10 lakh in revenue?" — this requires SELECT, GROUP BY, and HAVING together.
JOINs Explained — INNER, LEFT, RIGHT, FULL
JOINs combine rows from two or more tables based on a related column. Understanding JOINs is the single most important SQL skill for data analysis, as real-world data is always spread across multiple tables.
INNER JOIN: Returns only rows that have matching values in both tables. Most common JOIN type. If a customer has no orders, they will not appear in an INNER JOIN of customers and orders. Think of it as the intersection of two sets.
LEFT JOIN (LEFT OUTER JOIN): Returns all rows from the left table and matching rows from the right table. Rows from the left table with no match in the right table still appear, with NULL values for right table columns. Extremely useful for "find all customers and their orders, including customers with no orders" queries.
RIGHT JOIN: Opposite of LEFT JOIN — returns all rows from the right table. Less commonly used because you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order.
FULL OUTER JOIN: Returns all rows from both tables. Rows with no match in the other table appear with NULLs. Useful for finding discrepancies between two datasets — "find all records that are in Table A but not Table B, or in Table B but not Table A."
Aggregate Functions and Subqueries
Aggregate functions summarise data: COUNT() counts rows (use COUNT(*) for all rows, COUNT(column) to exclude NULLs), SUM() totals a column, AVG() calculates the average, MAX() and MIN() find extremes, and STRING_AGG() (PostgreSQL) or GROUP_CONCAT() (MySQL) concatenate string values. These are the building blocks of every business report.
Subqueries (nested SELECT statements within another query) allow you to use the result of one query as input to another. Correlated subqueries reference columns from the outer query — powerful but potentially slow on large datasets. Use subqueries in the WHERE clause ("find customers who spent more than the average order value"), in the FROM clause as derived tables, and in the SELECT clause for calculated values. When subqueries become complex or repetitive, CTEs are the better choice.
CTEs and Window Functions — Advanced SQL That Gets You Hired
CTEs (Common Table Expressions) are temporary named result sets defined using the WITH clause. They make complex queries far more readable by breaking them into named, logical steps. A CTE defined once can be referenced multiple times in the main query. Recursive CTEs (WITH RECURSIVE) handle hierarchical data like organisational charts, category trees, and path-finding problems.
Window Functions are the most powerful advanced SQL feature for data analysis, and they consistently appear in data analyst and data scientist interviews at Indian product companies. Unlike GROUP BY, window functions do not collapse rows — you get one row per input row, plus the calculated window value.
The OVER() clause defines the window: PARTITION BY divides the result set into groups; ORDER BY sorts within the partition; ROWS BETWEEN or RANGE BETWEEN specifies which rows to include in the calculation.
Essential window functions for analysis: ROW_NUMBER() assigns unique sequential integers within a partition (useful for deduplication and "select the latest record per user" queries); RANK() and DENSE_RANK() handle ties differently (RANK leaves gaps, DENSE_RANK does not); LAG(column, n) accesses the value from n rows before the current row (month-over-month comparison); LEAD(column, n) accesses n rows ahead; SUM() OVER with ORDER BY creates running totals; AVG() OVER with ROWS BETWEEN creates moving averages.
SQL for Business Analysis — Real Examples
Sales Analysis Example: A common request at any e-commerce company: "For each salesperson, show their total sales this month, their rank within their region, the difference from the #1 salesperson in their region, and their month-over-month growth." This single query requires JOINs (salesperson + orders + regions), GROUP BY (aggregate by salesperson), window functions (RANK() OVER and LAG()), and CTEs (to keep the query readable). Being able to write this from scratch in an interview is what separates strong SQL candidates from average ones.
Cohort Analysis Example: "For customers who made their first purchase in each month, what percentage were still making purchases 30, 60, and 90 days later?" Cohort analysis is ubiquitous at growth-focused companies like Zepto, Swiggy, and Nykaa. It requires identifying each customer's first purchase date (MIN(order_date) OVER PARTITION BY user_id), calculating days since first purchase, and pivoting results by cohort and retention period.
Want to practice SQL with real business datasets and expert feedback? UnstopGrowth's Data Science course includes SQL-intensive modules with actual company data scenarios. Book your free demo class →
SQL Tools and Career Paths
| Tool | Best For | Cost | Scale |
|---|---|---|---|
| PostgreSQL | Learning, web apps, production | Free | Medium-Large |
| MySQL | Web development, LAMP stack | Free | Medium |
| BigQuery | Analytics at scale, cloud data | Pay-per-query | Massive |
| SQLite | Prototyping, mobile, Python | Free | Small |
| Snowflake | Enterprise data warehouse | Paid | Very Large |
| Redshift | AWS-native data warehouse | Paid | Very Large |