It’s an “A” again! 🎉
I just wrapped up a Delta Live Tables project—building a Bronze → Silver → Gold pipeline for Bitcoin trades.
One big takeaway: the quarantine pattern is a lifesaver. Instead of bad rows silently disappearing, they’re routed to a dedicated table where you can actually inspect and fix them.
This project was also a great reminder that Streaming + DLT + Unity Catalog don’t always fail where you expect.
A few things I genuinely wrestled with:
Config vs. Reality:
The defaults pointed to the wrong setup (stocks vs. crypto). Tightening the schema—like enforcing numeric prices and correctly mapping pair vs. symbol—mattered just as much as the overall architecture.
Data Location:
The pipeline reads JSON from cloud storage, not a notebook WebSocket. I had to align the polygon_raw_path with a real Unity Catalog Volume and load sample JSON before anything downstream would even run.
DLT Graph Quirks:
Saw “dataset could not be resolved” more than once. Learned to be precise with dlt.read vs. dlt.read_stream, keep the enriched layer as a @dlt.view, and apply expectations at the Silver layer so the quarantine logic stays honest.
Repo Hygiene
Hard lesson: stick to one source of truth. Duplicate folders meant Databricks wasn’t always running the code I thought it was.
End result: a fully green DLT pipeline ✅ and a much stronger understanding of how Medallion Architecture + quarantine works in practice.
Pro tip: When debugging Databricks streaming, read the error on the first failing dataset. It’s usually upstream—not the table you’re staring at.
Big thanks to Eumar Dias de Assis for the guidance on building end-to-end pipelines and wonderful sessions, and to Zach Wilson for the bootcamp—excited for the upcoming AI bootcamp 🚀
#DataEngineering #Databricks #DeltaLiveTables #MedallionArchitecture #UnityCatalog