Introduction to SQL

Basic concepts

  • SQL (Structured Query Language): Programming language designed to manage data held in relational databases.
  • Relational Databases: Database which stores information in one (ore more) tables (aka relations).
  • Table: Data organized in columns (set of elements of same type), and rows (each one representing an entry).
  • Types: INTEGER, REAL, TEXT, DATE
  • Primary key: Uniquely identifies a row. Thus, it must be unique, only one column, not NULL.
  • Foreign key: Primary key of a table appearing in a different table.

Table Creation

CREATE TABLE tbl (
   id INTEGER, 
   name TEXT, 
   age INTEGER
);

To those, you can add constrains after the type such as:

  • PRIMARY KEY: Uniquely identifies the row (must be unique)
  • UNIQUE: Each row must have a different value.
  • NOT NULL: Cannot be null.
  • DEFAULT val: Assigns a default value.

Attempts to add elements which don’t satisfy those conditions will result into a constrain violation.

Content Modification

Insertion of an element:

INSERT INTO tbl (col1, col2, col3)
VALUES (1, 'Justin Bieber', 22);

Add/Remove columns:

ALTER TABLE tbl ADD COLUMN col TEXT;
ALTER TABLE tbl DROP COLUMN col;

Edit element in table:

UPDATE tbl 
SET col = '@taylorswift13' 
WHERE id = 4; 

Delete elements certifying condition cond

DELETE
FROM tbl
WHERE col IS NULL;

Querying

Query: Extract information stored in a database.

Select

SELECT * FROM tbl;
SELECT col1, col2 FROM tbl;
SELECT col1 AS 'Alias' FROM tbl; --displays different column name
SELECT DISTINCT col1 FROM tbl; --only display different elements

Where

Usage:

SELECT ...
FROM tbl
WHERE ...

Examples:

WHERE cond;
WHERE col LIKE 'bla_bla'; --'_' can be any character
WHERE col LIKE 'bla%'; --'%' can be any string (also empty)
WHERE col IS (NOT) NULL; 
WHERE col BETWEEN val1 AND val2; --Both inclusive
WHERE col IN (SELECT col2 FROM tbl) -- If element present in col2

Simple conditions use the format:

col =, !=, >, < val

You can combine multiple conditions as:

WHERE cond1 AND cond2
WHERE cond1 OR cond2

Similarly, one can create a new column modifying values as:

SELECT name,
 CASE
  WHEN col > 8 THEN 'Something'
  ELSE 'Something else'
 END AS 'New col'
FROM tbl;

Aggregates

Calculations performed on multiple rows of the table, for instance:

SELECT COUNT(col)
SELECT COUNT(DISTINCT col) --interesting when joining tables
SELECT SUM(col)
SELECT AVG(col)
SELECT MIN(col)
SELECT MAX(col)
SELECT ROUND(col, decimals) --number of decimals as an int

Basic usage:

SELECT FUNC(col)
FROM tbl;

GROUP BY: Allows us to perform an operation on some column and give a result wrt groups of identical data in another one

SELECT col1, AVG(col2)
FROM tbl
GROUP BY col1

HAVING: Allows to apply a filter on the groups. It works exactly the same as WHERE but working on the groups instead of the original data.

SELECT col1, COUNT(*)
FROM tbl
GROUP BY col1
HAVING COUNT(*) > 0 --Only display groups of > 10 elements

Instead of using the column reference in the GROUP BY, we could just use the index used in the SELECT order. In this case,

GROUP BY 1

would be the same as

GROUP BY col1

CASES

SELECT
   CASE
      WHEN cond1 THEN result1
      WHEN cond2 THEN result2
      ELSE result3
   END AS new_col_name
FROM table

CTE

Store a query output as a temporal table for later usage.

WITH CTE AS
(
   --Your query
)

You can combine multiple helper tables as:

WITH CTE_1 AS
(
   --Your query using original data
),
CTE_2 AS
(
   --Your query using CTE_1
)

-- Here you can use CTE_2

Multiple Tables

INNER JOIN: Only keeps values which are in both matched columns.

SELECT t1.col1, t2.col2
FROM tbl1 t1
JOIN tbl2 t2 ON t1.col_a = t2.col_b

I think this is equivalent to:

SELECT t1.col1, t2.col2
FROM tbl1 t1, tbl2 t2
WHERE t1.col_a = t2col_b

LEFT JOIN: Keeps only the exact same values from table 1.

SELECT t1.col1, t2.col2
FROM tbl1 t1
LEFT JOIN tbl2 t2 ON t1.col_a = t2.col_b

CROSS JOIN: Combines all rows of table 1 with all rows of table 2.

UNION: Appends table2 to table 1.

PIVOTING

Groups by a column elements and converts each element aggregation into different columns.

SELECT 
  -- Whatever you want
FROM
(
   -- Insert query that produces pivotable_col
) AS TMP
PIVOT
(
   -- Aggregate function over newly grouped categories
   FOR pivotable_col IN ([category_1], [category_2], ...)
) AS PVT

Window functions

OVER PARTITION: Aggregates data same as GROUP BY but without grouping (leaves multiple rows with the repeated information).

SELECT 
  -- Whatever columns you want
   FUNC(col1) OVER(PARTITION BY col2) AS new_col_name
FROM tbl

Another useful example to index ordered rows:

SELECT  ROW_NUMBER() OVER(ORDER BY col2 ASC) AS new_col_name
FROM tbl

This will apply the FUNC aggregate on elements of col1 grouping by col2 but displaying each row.

Misc.

COALESCE uses first not null:

COALESCE(col_1, col_2, ...)

Can also be used to substitute null for zero in a column:

COALESCE(col_1, 0)