100 Days of Data Engineering Day 1
this is a program by youtuber [[https://theseattledataguy.com]] This program is interesting to me since this is sort of a set curriculum of bite sized chunks. I can try to go through them quickly in a day if I am familiar with the material. In any case, I am sure I will become stronger after going through this program and blogging up my notes than I would have been without it.
These challenges are a little easier for me to gamify since they are into day-sized chunks. Usually when going through a class, sometimes we get a bit bored or lost, or behind and don’t have metrics on how to get back on track. Then our brains just give up. My big ruler is the fact that I started this April 1st and should be finished by the end of June.
At the very least I will go through at least one day of training every day. I will start this in the morning so i don’t run out of energy or brainjuice to complete it. Some days will be tough when I have responsibilities in the evenings. But i will for sure try to stack those days with extra planning to get through what I need to do.
100 Days Note | |||
100 days is just a little over 3 months and I don’t believe 3 months is truly sufficent to “become a data engineer” or at the very least it feels a little fast. There is no need to rush. The real purpose of this 100 days is to get you into the habit of practicing. If aftwards you want to dig into specific subjects, do that! Don’t let this 100 days limit you. | |||
Day | Task | Notes | Category |
Day 1 | For day one, what I reccomend is taking the time to answering some questions and write out your plan to commit to the next 100 days on social media or somewhere people can help keep you accontable. A discord group, slack, etc 1. What do you hope to accomplish by the end of the 100 days 2. Are there any topics you’d like to learn that aren’t covered? | Take a moment to write your goals | |
Day 2 | 1. Downloading SQL Server And Creating A Tables 2. Joins 3. Case Statements 4. Self Joins And Cross Joins | SQL | |
Day 3 | 1. SQL Interview Tips 2.Solving More Problems With SQL | SQL | |
Day 4 | 1. Partition By 2. CTE (Common Table Expression) 3. Stored Procedures | SQL | |
Day 5 | 1. Loops Strings And Tuples2. Functions 3. Mutabiltiy4. Error Handling | Programming | |
Day 6 | Basic Linux Commands 1/3 | This video is quite long, so I’ve put it over the next three days. So you can watch a little over 1.5 hours a day | Linux |
Day 7 | Basic Linux Commands 2/3 | Linux | |
Day 8 | Basic Linux Commands 3/3 | Linux | |
Day 9 | 1.Data Modeling Basics 2.Normalization Vs Denormalization | Data Model | |
Day 10 | Read Chapters 1,2,3 in Kimballs Data Warehousing Toolkit | Data Model | |
Day 11 | What is Data Pipeline | How to design Data Pipeline ? – ETL vs Data pipeline (2023) | Data Pipeline | |
Day 12 | SQL Project Example | SQL Deeper Dive | |
Day 13 | 1. 262. Trips and Users 2. Popularity of Hack 3. Average Salaries 4. 626. Exchange Seats | SQL Deeper Dive | |
Day 14 | Use the bigquery-public-data.stackoverflow.* data set and answer some of the following questions and come up with some of your own1. What percentage of stackoverflow questions that ended with a “?” had accepted answers 2. Are there certain programming langauges that are more likely to have accepted answers 3. Do certain programming languages have questions that get answered more quickly 4. Do certain programming langauges get more answers on average than others? | 1. What questions can you answer using this data set? 2. Are there places you can join the data set? 3. Write out 10 questions you think you can answer | SQL Deeper Dive |
Day 15 | Continue from yesterday with new questions. Come up with some of your own? | SQL Deeper Dive | |
Day 16 | AWS Certificate Prep | This video is a a 10 hour video, I’d reccomend you break it down into 2 hour segments over the next few days. You should also take notes and share them. Also, another benefit here is if you feel confident, you might be able to consider taking a cert once you’re done with this set of videos and some of the projects | Cloud |
Day 17 | AWS Certificate Prep | Cloud | |
Day 18 | AWS Certificate Prep | Cloud | |
Day 19 | AWS Certificate Prep | Cloud | |
Day 20 | AWS Certificate Prep | Cloud | |
Day 21 | 1. GCP Intro 2. GCP and VPC 3. GCP IAM | Cloud | |
Day 22 | 1. GCP Bigquery 2. GCP Cloud Composer | Cloud | |
Day 23 | 1. Azure Vocab 2. Azure Opex Vs Capex 3. Azure Geographics And Regions 4. Azure Basic Compute Services | Cloud | |
Day 24 | 1. Azure Private Networks And VPCs 2. Azure Storage 3. Azure Big Data Services 4. Azure Serverless Computing | Cloud | |
Day 25 | 1. Data Structures And Algorithms Review Chapters 1-5 2. Introduction to Linked Lists (Data Structures & Algorithms #5) 3.Introduction to Recursion (Data Structures & Algorithms #6) | Programming | |
Day 26 | 1.Data Structures And Algorithms Review Chapters 8-11 2 Big O Notation | Programming | |
Day 27 | 1. WEB SCRAPING2. Reading CSVs, JSON And APIs | Go through this article and if you have time, and then if you have time see if you have time to start a project | Programming |
Day 28 | Keeping time, scheduling, tasks and launching programs | Programming | |
Day 29 | Programming Your Own Thing Using the prior few days readings, try coming up with some small mini projects. Perhaps you can automate a task such as scraping a website, or hitting and API. But take your time and enjoy some free time just trying things out for yourself | Programming | |
Day 30 | 1. Learn Database Normalization – 1NF, 2NF, 3NF, 4NF, 5NF 2. Logical Data Model | Data Model | |
Day 31 | 1. Database Denormalization 2. Article TBD(I’ll be writing one shortly) | Data Model | |
Day 32 | Read Chapters 4,5,6 in Kimballs Data Warehousing Toolkit | Data Warehousing | |
Day 33 | Agile Data Warehouse Chapters 1,2(and if time 3) | Data Warehousing | |
Day 34 | 1. What Is A Data Pipeline 2. ETLs, Data Pipelines, Etc | Data pipelines | |
Day 35 | Basic Data Pipeline Project | Data pipelines | |
Day 36 | Live QA And Pipeline Sign Up | I’ll be running a QA on the 36th day(or so) that should be the 7th of February. We can use it as a time for people to ask questions and then I’ll attach a link the the live in the future | Progress Review And QA |
Day 37 | At this point you may need some time to catch up. If that’s the case, then the next three days can be used for that. But if you have the time, here are some articles and videos 1. What Is Query Driven Modeling2. What Is Change Data Capture 3. Stateful Streaming | Catch Up | |
Day 38 | 1. Airflow Is Not An ETL Tool 2. Databricks Vs Snowflake 3. Data Engineering Vocab | Catch Up | |
Day 39 | 1. Why Is Data Engineering Important 2. MongoDb Is Not For Analytics | Catch Up | |
Day 40 | At the end of day 40, you should take a moment and review what you have learned overall(otherwise you’ll forget all of your hard work) | Write A Review | |
Day 41 | Read How To Start Your Next Data Engineering Project | Mini Project | |
Day 42 | 1. Pick a data source, (also you can find some more here and here) 2. Write out 10-15 questions you’d like to answer 3. Select 3 or 4 of those questions as the ones you’ll focus on 4. Design a basic dashboard you can build in 2-3 days based on the questions(pick a solution like Tableau, Powerbi, or easy to work with dashboarding solution) 5. Pick a data storage solution to use like Snowflake, Postgres, etc 6. Kick-off your project | Mini Project | |
Day 43 | 1. Load your data into your data storage system 2. Perform a general EDA to understand what your data looks like, either with SQL or Python 3. Answer your questions from day 1 4. Write up your current progress and note down which code or SQL is actually going to be used | Mini Project | |
Day 44 | 1. You should hopefully have an idea of the data properties so you can create a basic data model and the queries required ot create it 2. Create a process that automate those queries, either using Cron or some other form of scheduler 3. Create a layer that can be used for the analytics(aggregate tables, views, etc) | Mini Project | |
Day 45 | Continue with any uncompleted tasks from the past few days | Mini Project | |
Day 46 | Run some basic data quality checks to ensure your data is accurate | Mini Project | |
Day 47 | Start to create your dashboard and populate it | Mini Project | |
Day 48 | Finish Dashboard | Mini Project | |
Day 49 | Run some final QA and decide how you’d like to display this project(also general catchup) | Mini Project | |
Day 50 | Write a blog, post or create a github repo to share your project | Mini Project | |
Day 51 | Video To Be Filmed By Seattle Data Guy | Tool Intro | |
Day 52 | 1. What Is Apache Spark 2. Downloading And Working With Spark 3. Quickstart Spark | Spark | |
Day 53 | 1. RDD Programming 2. Pyspark Tutorial | Spark | |
Day 54 | Long Pyspark Tutorial | Spark | |
Day 55 | 1. Docker Intro And Setting Up Airflow 2. Docker In An Hour | Docker | |
Day 56 | 1. Airflow Intro 2. Airflow Tutorial 2 hour walk through | Airflow | |
Day 57 | 1.Set-up Airflow yourself on an ec2 instance 2. Set-up basic DAG that pulls data from a one of these data sources(TODO) | Airflow | |
Day 58 | 1. Challenges You Will Face With Airflow2. Common Mistakes You’ll Make Setting Up Airflow | Airflow | |
Day 59 | Take some time to review what you’ve learned thus far or take some time off! Here are some other things you could do. 1. Write about what you’ve learned, and what you still don’t understand 2. Find a friend who you can teach some of the concepts you’ve learned(teaching is a great way to learn) | Catch Up and Review | |
Day 60 | Same as the prior day | Catch Up and Review | |
Day 61 | 1. Intro To Databricks2. Setting Up Databricks 3. Load Data Into Databricks | Databricks | |
Day 62 | 1. Databricks Delta Table 2. Databricks Delta Table Video | Databricks | |
Day 63 | 1.What Is Trino 2. Setting Up Trino | Trino/Presto | |
Day 64 | Continue setting up trino and working with it | ||
Day 65 | Data Governance Book | Data Governance | |
Day 66 | Data Governance Live – Sign Up | Data Governance | |
Day 67 | 1.Creating A Data Governance Framework 2.Data Governance for Modern Organizations, Part 1 | Data Governance | |
Day 68 | 1. What Is A Data Catalog 2. Data Catalog Case Study3. Datahub Purpose And Architecture | Data Catalogs And Lineage | |
Day 69 | 1. 6 Pillars Of Data Quality 2. How And Why We Need To Implement Data Quality Now!3. Data Quality And Examples | Data Quality | |
Day 70 | 1. Data Quality Examples With SQL 2. Data Quality With DBT | Data Quality | |
Day 71 | Start your own project 1. Pick a data set you can pull 2. Plan out what questions you’d like to answer(List out 10-15 questions you’d like to answer) 3. Pick 4-5 of those questions to focus on 3. Start to plan out how you’ll serve up the data/insights(dashboard, ML, application, etc) 4. Decide on some tools you’d like to use | Project Planning | |
Day 72 | 1. Set-up your infrastructure, Cloud components, Airflow, etc 2. Set-up any database/storage system you will use | ||
Day 73 | From here, you’ll likely need to take this project on yourself. Create a project plan for the next 30 or so days. It doesn’t have to take all of the next few days. But really look at this as a time you can learn and try out lots of ideas. But for the most part you’ll take on a similar approach. Set-up your infrastructure, load your data, analyze it, figure out what you’d like to display, etc | ||
Day 74 | Run the project as you’ve planned out | ||
Day 75 | |||
Day 76 | |||
Day 77 | |||
Day 78 | |||
Day 79 | |||
Day 80 | |||
Day 81 | |||
Day 82 | |||
Day 83 | |||
Day 84 | |||
Day 85 | |||
Day 86 | |||
Day 87 | |||
Day 88 | |||
Day 89 | |||
Day 90 | |||
Day 91 | |||
Day 92 | |||
Day 93 | |||
Day 94 | |||
Day 95 | |||
Day 96 | |||
Day 97 | |||
Day 98 | |||
Day 99 | |||
Day 100 | Write a blog, post or create a github repo to share your project |