I ended last year with something I've been trying to finish ever since Zach Wilson put up his Community Data Engineering Bootcamp over at
DataExpert.io. And boy oh boy was it a roller coaster. The course really made me question my life choices as to why I chose to transition to data engineering as a career — and I'm glad it did. My work mostly revolves around databases and I use SQL everyday, so I thought how hard could it be. I was so wrong.
Here is an overview of the technical things I learned over the course of the Bootcamp:
📊 Dimensional & Fact Data Modeling : Learned how to construct dimensional and fact data models using structs/arrays, how to cater for data models when Slowly Changing Dimensions are involved, and data compression techniques like date lists.
✨Apache Spark Fundamentals : Learned Apache Spark's architecture, types of joins, shuffle partitions and parallelism.
📈Applying Analytical Patterns : Learned about growth accounting, survivorship analysis, advanced SQL techniques (window functions, grouping sets, etc.), and state change tracking (New, Retired, Continued Playing, Returned from Retirement).
⌛Real-Time Pipelines with Flink and Kafka : Learned about streaming, near real-time and real-time pipelines, Flink window types (count, tumbling, sliding, session), and watermarking.
🎑Data Visualization and Impact : How to make effective dashboards and aggregate data properly so that users can extract meaningful business insights.
🧑🏽🔧Data Pipeline Maintenance : Learned about common pipeline issues, technical debt management, and data migration strategies. Besides the technical aspects, I also learned how to make effective decisions when other stakeholders are involved — because more often than not, users have requirements that aren't always straightforward.
🔬KPIs & Experimentation : Learned how to design and measure data initiatives like a product manager — from experimentation frameworks to metrics that drive real business decisions.
📚Data Quality Patterns : Mastered techniques to ensure data is accurate, complete, and consistent — which is crucial for obtaining reliable results.
Currently I'm working on a capstone project that brings all of these concepts together into one end-to-end data engineering solution.