100 Days of Data Engineering Day 2: SQL
20240402- catching up with this one since I started a day behind. I understand all the concepts here, but should probably reinforce some of the syntax so I can do things without looking at the code
Here is a summary of the video SQL Basics Tutorial for Beginners – Part 1
This video is a part 1 of a series on SQL Basics Tutorial for Beginners. In this video, the instructor will teach you how to set up SQL Server Management Studio and create tables.
Here are the actionable steps on how to set up SQL Server Management Studio and create tables:
- Download and install SQL Server Management Studio from Microsoft.
- Download and install a free version of SQL Server Express.
- Open SQL Server Management Studio and connect to the server.
- Create a new database.
- Create a new table by writing a T-SQL script.
- Insert data into the table.
Timeline of the video:
- Introduction (0:00)
- Downloading and installing SQL Server Management Studio (1:00)
- Downloading and installing SQL Server Express (2:00)
- Connecting to the server (3:00)
- Creating a new database (4:00)
- Creating a new table (5:00)
- Inserting data into the table (6:00)
Here is a summary of the video Intermediate SQL Joins
The speaker talks about different types of joins and gives use cases for each type.
The video starts with introducing what joins are and what they are used for. Then the speaker goes through different types of joins including inner joins, full outer joins, left outer joins, and right outer joins. For each type of join, the speaker explains what it is and how it works with an example.
Inner joins are used to combine rows from two tables based on a shared column. The output table will only include rows where the data in the join column matches in both tables.
Full outer joins are used to combine all rows from two tables, regardless of whether there is a match in the join column. If there is no match in one table, the corresponding columns in the output table will be filled with null values.
Left outer joins are used to combine all rows from the left table and matching rows from the right table. If there is no match in the right table, the corresponding columns in the output table will be filled with null values.
Right outer joins are used to combine all rows from the right table and matching rows from the left table. If there is no match in the left table, the corresponding columns in the output table will be filled with null values.
The speaker also talks about use cases for joins. For example, a manager can use a join to find out which employee makes the most money. Or an accountant can use a join to calculate the average salary for a particular job title.
Here are the actionable steps from the video:
- Use an inner join to combine rows from two tables based on a shared column, and you only want to include rows where the data in the join column matches in both tables.
- Use a full outer join to combine all rows from two tables, regardless of whether there is a match in the join column.
- Use a left outer join to combine all rows from the left table and matching rows from the right table.
- Use a right outer join to combine all rows from the right table and matching rows from the left table.
Here is a summary of the video Using Case Statements in SQL
The video is about using case statements in SQL. The speaker, Alex Freiburg, provides two examples of how to use case statements.
In the first example, Alex uses a case statement to categorize employees into different age groups. He creates a new column named “age_group” that labels employees as “old” if their age is greater than 30 and “young” if their age is less than or equal to 30.
In the second example, Alex uses a case statement to calculate salary raises for employees. He creates a new column named “salary_after_raise” that adds a raise amount to the employee’s current salary. The amount of the raise is determined by the employee’s job title. Salesmen get a 10% raise, accountants get a 5% raise, HR gets a 3% raise, and all other employees get a 3% raise.
Here are the actionable steps on how to create a case statement in SQL:
- Specify the columns you want to return in the SELECT clause.
- Use the CASE statement followed by WHEN to specify the condition you want to check.
- Then specify what value you want returned if the condition is met.
- You can add multiple WHEN statements to define different conditions and their corresponding return values.
- Use the ELSE statement to specify the return value if none of the WHEN conditions are met.
- Use END to close the CASE statement.
Timeline of the video:
- Introduction (0:00)
- What are case statements (1:00)
- Example 1: Categorizing employees by age (2:00)
- Example 2: Calculating salary raises (4:00)
- How to create a case statement (6:00)
- Specify columns to return (6:00)
- Use CASE statement with WHEN (7:00)
- Specify return value (8:00)
- Add multiple WHEN statements (9:00)
- Use ELSE statement (10:00)
- Close the CASE statement (11:00)
- Conclusion (12:00)
Self-JOINs and CROSS JOINs
There are two useful kinds of JOINs that we can use to solve queries of medium difficulty. These are called “self JOIN
” and CROSS JOIN
.
SELF JOINs
Self-joins are not really a new kind of JOIN
. They are just a JOIN
from a table to itself. However, self-joins tend to come up in certain medium-level questions.
Consider the following employees
table:
id | name | manager_id |
---|---|---|
1 | Alice | NULL |
2 | Bob | 1 |
3 | Chuck | 1 |
4 | Drake | 2 |
5 | Eve | 2 |
This company’s structure is:
- Alice
- Bob
- Drake
- Eve
- Chuck
- Bob
In this case, if we wanted to know who the manager is for each employee, all the information we need is already in the employees
table.
To show this information, we can join the employees
table to itself:
SELECT e.name as employee, m.name as manager
FROM employees as e LEFT JOIN employees as m
ON e.manager_id = m.id
Note that we used a LEFT JOIN, since the CEO, Alice, has no manager.
Also note that we use different aliases for the “left” employees table (e) and the “right” employees table (m), so that SQL is not confused as to which “version” of the table we are referring to.
CROSS JOIN
CROSS JOIN
is a clause that returns all row combinations within two tables.
For example, if we have the tables cars
and colors
:
cars
model |
---|
Ford |
Chevrolet |
Toyota |
colors
color |
---|
Red |
Blue |
Black |
Then the CROSS JOIN
will return all of their combinations:
model | color |
---|---|
Ford | Red |
Ford | Blue |
Ford | Black |
Chevrolet | Red |
Chevrolet | Blue |
Chevrolet | Black |
Toyota | Red |
Toyota | Blue |
Toyota | Black |
CROSS JOINS
tend to generate very large tables, so they are usually costly. However, they virtually allow us to iterate over two tables directly with SQL.
In general, we should try to avoid them, but it’s good to know they exist in case we need them.