100 Days of Data Engineering Day 1

this is a program by youtuber [[https://theseattledataguy.com]] This program is interesting to me since this is sort of a set curriculum of bite sized chunks. I can try to go through them quickly in a day if I am familiar with the material. In any case, I am sure I will become stronger after going through this program and blogging up my notes than I would have been without it.

These challenges are a little easier for me to gamify since they are into day-sized chunks. Usually when going through a class, sometimes we get a bit bored or lost, or behind and don’t have metrics on how to get back on track. Then our brains just give up. My big ruler is the fact that I started this April 1st and should be finished by the end of June.

At the very least I will go through at least one day of training every day. I will start this in the morning so i don’t run out of energy or brainjuice to complete it. Some days will be tough when I have responsibilities in the evenings. But i will for sure try to stack those days with extra planning to get through what I need to do.

100 Days Note
100 days is just a little over 3 months and I don’t believe 3 months is truly sufficent to “become a data engineer” or at the very least it feels a little fast. There is no need to rush. The real purpose of this 100 days is to get you into the habit of practicing. If aftwards you want to dig into specific subjects, do that! Don’t let this 100 days limit you.
Day 1For day one, what I reccomend is taking the time to answering some questions and write out your plan to commit to the next 100 days on social media or somewhere people can help keep you accontable. A discord group, slack, etc
1. What do you hope to accomplish by the end of the 100 days
2. Are there any topics you’d like to learn that aren’t covered?
Take a moment to write your goals
Day 21. Downloading SQL Server And Creating A Tables
2. Joins
3. Case Statements
4. Self Joins And Cross Joins
Day 31. SQL Interview Tips
2.Solving More Problems With SQL
Day 41. Partition By
2. CTE (Common Table Expression)
3. Stored Procedures
Day 51. Loops Strings And Tuples2. Functions
3. Mutabiltiy4. Error Handling
Day 6Basic Linux Commands 1/3This video is quite long, so I’ve put it over the next three days. So you can watch a little over 1.5 hours a dayLinux
Day 7Basic Linux Commands 2/3Linux
Day 8Basic Linux Commands 3/3Linux
Day 91.Data Modeling Basics
2.Normalization Vs Denormalization
Data Model
Day 10Read Chapters 1,2,3 in Kimballs Data Warehousing ToolkitData Model
Day 11What is Data Pipeline | How to design Data Pipeline ? – ETL vs Data pipeline (2023)Data Pipeline
Day 12SQL Project ExampleSQL Deeper Dive
Day 131. 262. Trips and Users
2. Popularity of Hack
3. Average Salaries
4. 626. Exchange Seats
SQL Deeper Dive
Day 14Use the bigquery-public-data.stackoverflow.* data set and answer some of the following questions and come up with some of your own
1. What percentage of stackoverflow questions that ended with a “?” had accepted answers
2. Are there certain programming langauges that are more likely to have accepted answers
3. Do certain programming languages have questions that get answered more quickly
4. Do certain programming langauges get more answers on average than others?
1. What questions can you answer using this data set?
2. Are there places you can join the data set?
3. Write out 10 questions you think you can answer
SQL Deeper Dive
Day 15Continue from yesterday with new questions. Come up with some of your own?SQL Deeper Dive
Day 16AWS Certificate PrepThis video is a a 10 hour video, I’d reccomend you break it down into 2 hour segments over the next few days. You should also take notes and share them. Also, another benefit here is if you feel confident, you might be able to consider taking a cert once you’re done with this set of videos and some of the projectsCloud
Day 17AWS Certificate PrepCloud
Day 18AWS Certificate PrepCloud
Day 19AWS Certificate PrepCloud
Day 20AWS Certificate PrepCloud
Day 211. GCP Intro
2. GCP and VPC
Day 221. GCP Bigquery
2. GCP Cloud Composer
Day 231. Azure Vocab
2. Azure Opex Vs Capex
3. Azure Geographics And Regions
4. Azure Basic Compute Services
Day 241. Azure Private Networks And VPCs
2. Azure Storage
3. Azure Big Data Services
4. Azure Serverless Computing
Day 251. Data Structures And Algorithms Review Chapters 1-5
2. Introduction to Linked Lists (Data Structures & Algorithms #5)
3.Introduction to Recursion (Data Structures & Algorithms #6)
Day 261.Data Structures And Algorithms Review Chapters 8-11
2 Big O Notation
Day 271. WEB SCRAPING2. Reading CSVs, JSON And APIsGo through this article and if you have time, and then if you have time see if you have time to start a projectProgramming
Day 28Keeping time, scheduling, tasks and launching programsProgramming
Day 29Programming Your Own Thing
Using the prior few days readings, try coming up with some small mini projects. Perhaps you can automate a task such as scraping a website, or hitting and API. But take your time and enjoy some free time just trying things out for yourself
Day 301. Learn Database Normalization – 1NF, 2NF, 3NF, 4NF, 5NF
2. Logical Data Model
Data Model
Day 311. Database Denormalization
2. Article TBD(I’ll be writing one shortly)
Data Model
Day 32Read Chapters 4,5,6 in Kimballs Data Warehousing ToolkitData Warehousing
Day 33Agile Data Warehouse Chapters 1,2(and if time 3)Data Warehousing
Day 341. What Is A Data Pipeline
2. ETLs, Data Pipelines, Etc
Data pipelines
Day 35Basic Data Pipeline ProjectData pipelines
Day 36Live QA And Pipeline Sign UpI’ll be running a QA on the 36th day(or so) that should be the 7th of February. We can use it as a time for people to ask questions and then I’ll attach a link the the live in the futureProgress Review And QA
Day 37At this point you may need some time to catch up. If that’s the case, then the next three days can be used for that. But if you have the time, here are some articles and videos
1. What Is Query Driven Modeling2. What Is Change Data Capture
3. Stateful Streaming
Catch Up
Day 381. Airflow Is Not An ETL Tool
2. Databricks Vs Snowflake
3. Data Engineering Vocab
Catch Up
Day 391. Why Is Data Engineering Important
2. MongoDb Is Not For Analytics
Catch Up
Day 40At the end of day 40, you should take a moment and review what you have learned overall(otherwise you’ll forget all of your hard work)Write A Review
Day 41Read How To Start Your Next Data Engineering ProjectMini Project
Day 421. Pick a data source, (also you can find some more here and here)
2. Write out 10-15 questions you’d like to answer
3. Select 3 or 4 of those questions as the ones you’ll focus on
4. Design a basic dashboard you can build in 2-3 days based on the questions(pick a solution like Tableau, Powerbi, or easy to work with dashboarding solution)
5. Pick a data storage solution to use like Snowflake, Postgres, etc
6. Kick-off your project
Mini Project
Day 431. Load your data into your data storage system
2. Perform a general EDA to understand what your data looks like, either with SQL or Python
3. Answer your questions from day 1
4. Write up your current progress and note down which code or SQL is actually going to be used
Mini Project
Day 441. You should hopefully have an idea of the data properties so you can create a basic data model and the queries required ot create it
2. Create a process that automate those queries, either using Cron or some other form of scheduler
3. Create a layer that can be used for the analytics(aggregate tables, views, etc)
Mini Project
Day 45Continue with any uncompleted tasks from the past few daysMini Project
Day 46Run some basic data quality checks to ensure your data is accurateMini Project
Day 47Start to create your dashboard and populate itMini Project
Day 48Finish DashboardMini Project
Day 49Run some final QA and decide how you’d like to display this project(also general catchup)Mini Project
Day 50Write a blog, post or create a github repo to share your projectMini Project
Day 51Video To Be Filmed By Seattle Data GuyTool Intro
Day 521. What Is Apache Spark
2. Downloading And Working With Spark
3. Quickstart Spark
Day 531. RDD Programming
2. Pyspark Tutorial
Day 54Long Pyspark TutorialSpark
Day 551. Docker Intro And Setting Up Airflow
2. Docker In An Hour
Day 561. Airflow Intro
2. Airflow Tutorial 2 hour walk through
Day 571.Set-up Airflow yourself on an ec2 instance
2. Set-up basic DAG that pulls data from a one of these data sources(TODO)
Day 581. Challenges You Will Face With Airflow2. Common Mistakes You’ll Make Setting Up AirflowAirflow
Day 59Take some time to review what you’ve learned thus far or take some time off! Here are some other things you could do.
1. Write about what you’ve learned, and what you still don’t understand
2. Find a friend who you can teach some of the concepts you’ve learned(teaching is a great way to learn)
Catch Up and Review
Day 60Same as the prior dayCatch Up and Review
Day 611. Intro To Databricks2. Setting Up Databricks
3. Load Data Into Databricks
Day 621. Databricks Delta Table
2. Databricks Delta Table Video
Day 631.What Is Trino
2. Setting Up Trino
Day 64Continue setting up trino and working with it
Day 65Data Governance BookData Governance
Day 66Data Governance Live – Sign UpData Governance
Day 671.Creating A Data Governance Framework
2.Data Governance for Modern Organizations, Part 1
Data Governance
Day 681. What Is A Data Catalog
2. Data Catalog Case Study3. Datahub Purpose And Architecture
Data Catalogs And Lineage
Day 691. 6 Pillars Of Data Quality
2. How And Why We Need To Implement Data Quality Now!3. Data Quality And Examples
Data Quality
Day 701. Data Quality Examples With SQL
2. Data Quality With DBT
Data Quality
Day 71Start your own project
1. Pick a data set you can pull
2. Plan out what questions you’d like to answer(List out 10-15 questions you’d like to answer)
3. Pick 4-5 of those questions to focus on
3. Start to plan out how you’ll serve up the data/insights(dashboard, ML, application, etc)
4. Decide on some tools you’d like to use
Project Planning
Day 721. Set-up your infrastructure, Cloud components, Airflow, etc
2. Set-up any database/storage system you will use
Day 73From here, you’ll likely need to take this project on yourself. Create a project plan for the next 30 or so days. It doesn’t have to take all of the next few days. But really look at this as a time you can learn and try out lots of ideas. But for the most part you’ll take on a similar approach. Set-up your infrastructure, load your data, analyze it, figure out what you’d like to display, etc
Day 74Run the project as you’ve planned out
Day 75
Day 76
Day 77
Day 78
Day 79
Day 80
Day 81
Day 82
Day 83
Day 84
Day 85
Day 86
Day 87
Day 88
Day 89
Day 90
Day 91
Day 92
Day 93
Day 94
Day 95
Day 96
Day 97
Day 98
Day 99
Day 100Write a blog, post or create a github repo to share your project

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *