Data Science

SQL for Data Analysis 2026 — Complete Beginner to Advanced Guide

SQL is the most universally required skill in data — analysts, scientists, engineers, and even product managers need it. This complete guide takes you from basic SELECT queries to advanced window functions, CTEs, and real business analysis examples used daily at Indian tech companies.

By Rajan Verma
17 min read

SQL (Structured Query Language) is the most universally required skill across all data-related roles. Whether you are a data analyst, data scientist, business analyst, backend developer, or product manager — you will use SQL every single day. Unlike programming languages that change rapidly, SQL has remained fundamentally stable for decades. What you learn today in SQL is still valid and valuable 10 years from now. This guide takes you from your first SELECT query all the way to advanced window functions and real business analysis examples used at Indian product companies like Swiggy, Razorpay, and Zepto.

Practice Environment: Install PostgreSQL (free, download from postgresql.org) or use SQLite (zero installation, built into Python). For cloud-based practice with real large datasets, DB-Fiddle.com and Mode Analytics both offer free SQL sandboxes. Practice with real data — do not just read SQL; write it and run it.
95%
Data analyst job listings requiring SQL
₹4–35 LPA
Data analyst salary range in India
2–3 wks
To learn basic SQL for real use
700M+
Databases running SQL worldwide

Why SQL Is Non-Negotiable for Every Data Role

Data lives in databases. To access data, you write SQL. There is no shortcut. Even tools that abstract SQL — Power BI, Tableau, Looker — all generate SQL queries behind the scenes. When these tools cannot handle your specific analytical need, you drop back to raw SQL. Google BigQuery, Amazon Redshift, Snowflake, and Azure Synapse — every major data warehouse platform uses SQL as its primary query language. The AI/ML hype has not diminished SQL's importance one bit; if anything, SQL skills are more valuable because every ML model needs training data that must be extracted and prepared using SQL.

Core SQL Concepts — SELECT, WHERE, GROUP BY, ORDER BY

The SELECT statement is the foundation of all data retrieval in SQL. SELECT specifies which columns you want; FROM specifies which table; WHERE filters rows based on conditions; GROUP BY aggregates rows sharing the same values; HAVING filters aggregated results; ORDER BY sorts the output; and LIMIT restricts the number of rows returned.

WHERE clause conditions use comparison operators (=, !=, >, <, >=, <=), logical operators (AND, OR, NOT), and special operators: IN (matches any value in a list), BETWEEN (range check), LIKE (pattern matching with % wildcard and _ single character), IS NULL/IS NOT NULL (null checks). Mastering WHERE conditions is essential because data filtering is the most common SQL operation in real analysis work.

GROUP BY groups rows with the same values in specified columns into summary rows. Always use aggregate functions — COUNT(), SUM(), AVG(), MAX(), MIN() — with GROUP BY. HAVING filters the grouped results (WHERE runs before grouping; HAVING runs after). A classic business question: "What is the total revenue by product category, showing only categories with more than ₹10 lakh in revenue?" — this requires SELECT, GROUP BY, and HAVING together.

JOINs Explained — INNER, LEFT, RIGHT, FULL

JOINs combine rows from two or more tables based on a related column. Understanding JOINs is the single most important SQL skill for data analysis, as real-world data is always spread across multiple tables.

INNER JOIN: Returns only rows that have matching values in both tables. Most common JOIN type. If a customer has no orders, they will not appear in an INNER JOIN of customers and orders. Think of it as the intersection of two sets.

LEFT JOIN (LEFT OUTER JOIN): Returns all rows from the left table and matching rows from the right table. Rows from the left table with no match in the right table still appear, with NULL values for right table columns. Extremely useful for "find all customers and their orders, including customers with no orders" queries.

RIGHT JOIN: Opposite of LEFT JOIN — returns all rows from the right table. Less commonly used because you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order.

FULL OUTER JOIN: Returns all rows from both tables. Rows with no match in the other table appear with NULLs. Useful for finding discrepancies between two datasets — "find all records that are in Table A but not Table B, or in Table B but not Table A."

Aggregate Functions and Subqueries

Aggregate functions summarise data: COUNT() counts rows (use COUNT(*) for all rows, COUNT(column) to exclude NULLs), SUM() totals a column, AVG() calculates the average, MAX() and MIN() find extremes, and STRING_AGG() (PostgreSQL) or GROUP_CONCAT() (MySQL) concatenate string values. These are the building blocks of every business report.

Subqueries (nested SELECT statements within another query) allow you to use the result of one query as input to another. Correlated subqueries reference columns from the outer query — powerful but potentially slow on large datasets. Use subqueries in the WHERE clause ("find customers who spent more than the average order value"), in the FROM clause as derived tables, and in the SELECT clause for calculated values. When subqueries become complex or repetitive, CTEs are the better choice.

CTEs and Window Functions — Advanced SQL That Gets You Hired

CTEs (Common Table Expressions) are temporary named result sets defined using the WITH clause. They make complex queries far more readable by breaking them into named, logical steps. A CTE defined once can be referenced multiple times in the main query. Recursive CTEs (WITH RECURSIVE) handle hierarchical data like organisational charts, category trees, and path-finding problems.

Window Functions are the most powerful advanced SQL feature for data analysis, and they consistently appear in data analyst and data scientist interviews at Indian product companies. Unlike GROUP BY, window functions do not collapse rows — you get one row per input row, plus the calculated window value.

The OVER() clause defines the window: PARTITION BY divides the result set into groups; ORDER BY sorts within the partition; ROWS BETWEEN or RANGE BETWEEN specifies which rows to include in the calculation.

Essential window functions for analysis: ROW_NUMBER() assigns unique sequential integers within a partition (useful for deduplication and "select the latest record per user" queries); RANK() and DENSE_RANK() handle ties differently (RANK leaves gaps, DENSE_RANK does not); LAG(column, n) accesses the value from n rows before the current row (month-over-month comparison); LEAD(column, n) accesses n rows ahead; SUM() OVER with ORDER BY creates running totals; AVG() OVER with ROWS BETWEEN creates moving averages.

SQL for Business Analysis — Real Examples

Sales Analysis Example: A common request at any e-commerce company: "For each salesperson, show their total sales this month, their rank within their region, the difference from the #1 salesperson in their region, and their month-over-month growth." This single query requires JOINs (salesperson + orders + regions), GROUP BY (aggregate by salesperson), window functions (RANK() OVER and LAG()), and CTEs (to keep the query readable). Being able to write this from scratch in an interview is what separates strong SQL candidates from average ones.

Cohort Analysis Example: "For customers who made their first purchase in each month, what percentage were still making purchases 30, 60, and 90 days later?" Cohort analysis is ubiquitous at growth-focused companies like Zepto, Swiggy, and Nykaa. It requires identifying each customer's first purchase date (MIN(order_date) OVER PARTITION BY user_id), calculating days since first purchase, and pivoting results by cohort and retention period.

Want to practice SQL with real business datasets and expert feedback? UnstopGrowth's Data Science course includes SQL-intensive modules with actual company data scenarios. Book your free demo class →

SQL Tools and Career Paths

ToolBest ForCostScale
PostgreSQLLearning, web apps, productionFreeMedium-Large
MySQLWeb development, LAMP stackFreeMedium
BigQueryAnalytics at scale, cloud dataPay-per-queryMassive
SQLitePrototyping, mobile, PythonFreeSmall
SnowflakeEnterprise data warehousePaidVery Large
RedshiftAWS-native data warehousePaidVery Large
Key Takeaway: SQL is the single most foundational data skill you can have — required in 95% of data analyst roles and a significant component of data engineer and data scientist roles. Learn PostgreSQL first (free, full-featured, standard-compliant). Master basic querying in 2–3 weeks, intermediate JOIN and aggregation in 4–8 weeks, and advanced window functions and CTEs in 3–6 months. Combined with Python (pandas), SQL opens doors to ₹4–35 LPA data careers in India's booming tech sector.
SQL Data Analysis SQL Tutorial 2026 SQL Window Functions SQL JOINs CTEs SQL BigQuery PostgreSQL Data Analyst India SQL Career

Frequently Asked Questions

Basic SQL (SELECT, WHERE, JOIN, GROUP BY) can be learned in 2–3 weeks of daily practice. Intermediate SQL (subqueries, aggregations, multiple JOINs) takes 4–8 weeks. Advanced SQL (window functions, CTEs, query optimisation, query plan analysis) takes 3–6 months of working with real datasets. Most data analyst interviews test intermediate SQL, so 4–8 weeks is the minimum viable preparation time.
PostgreSQL is the best choice for learning SQL. It is free, open-source, has excellent documentation, follows SQL standards closely, and supports advanced features including window functions, CTEs, JSON handling, and full-text search. MySQL is a close second and is more commonly used in web development contexts. Once you know SQL on one database, switching to another takes days, not weeks.
SQL alone is not sufficient for most data analyst roles in India. You also need either Python (with pandas for data wrangling and matplotlib/seaborn for visualisation) or advanced Excel, plus a business intelligence tool like Tableau, Power BI, or Looker Studio. SQL is the foundation — it is non-negotiable — but it is always combined with other tools in actual job requirements.
A window function performs a calculation across a set of rows that are related to the current row, without collapsing those rows the way GROUP BY does. Common window functions include ROW_NUMBER() (assigns a sequential number within a partition), RANK() (same as ROW_NUMBER but ties get the same rank), LAG()/LEAD() (access values from previous/next rows), and SUM/AVG/COUNT OVER (running totals and moving averages). Window functions are essential for cohort analysis, funnel analysis, and time-series calculations.
A data analyst with strong SQL skills in India earns ₹4–8 LPA at entry level (0–2 years). With 2–4 years of experience, SQL + Python + a BI tool commands ₹8–16 LPA. Senior data analysts and analytics managers with 5+ years earn ₹18–35 LPA at product companies. Adding data engineering skills (dbt, Spark, BigQuery) significantly increases earning potential.
Rajan Verma
Data Analytics Coach & Senior SQL Expert | UnstopGrowth

Rajan Verma has 15 years of experience in data analytics, SQL optimisation, and data engineering. He has worked with SQL across MySQL, PostgreSQL, BigQuery, and Redshift, and teaches data science and analytics at UnstopGrowth in Chandigarh, helping students land data analyst and data engineer roles across India.

Master SQL + Python for Data Careers

UnstopGrowth's Data Science course covers SQL, Python (Pandas), and machine learning with real business datasets. Placement support included.