Search This Blog

Data Engineer

Home
Azure Data Engineering Project
Microsoft Fabric End-to-End Project
About Me

About Me

I am a Data Engineer specialising in Azure, Databricks, SQL Server, and Microsoft Fabric. This blog documents real-world solutions, interview preparation, and production-grade data engineering practices.

Topics include:

Azure Data Factory & Databricks
Microsoft Fabric & Lakehouse
SQL & Data Warehousing
Real-time & Batch Data Pipelines

For collaboration, queries, or professional discussions:

Email: connect@themahesh.org

Exploring the Largest UK Employers: A Power BI Visualization

Understanding employment distribution among top companies can provide valuable insights into industry dominance and workforce trends. In this blog, I analyze the largest employers in the UK using a Power BI table visualization, sourced from CompaniesMarketCap . Source: CompaniesMarketCap Key Insights from the Data: Compass Group leads the ranking with 550,000 employees, dominating the food service industry. Tesco, the retail giant, follows with 330,000 employees. HSBC, a major player in banking, employs over 215,000 people. The total workforce among the top companies surpasses 1.98 million employees. Visualizing in Power BI: Using a table visualization, we can clearly compare the number of employees across different companies. Power BI’s sorting, aggregation, and filtering features enhance data readability and analysis. However, incorporating bar charts, conditional formatting, and KPIs could make the insights even more compelling. What’s Next? Would you add more interactive eleme...

Master Databricks Asset Bundles Through Hands-On Practice

15 min read | 100% Practical Guide Forget theory. Forget abstract examples. This is a hands-on, build-as-you-learn guide to mastering YAML through the lens of Databricks Asset Bundles (DABs) . By the end of this post, you'll go from never writing YAML to confidently deploying production-grade data pipelines as code. 🎯 What You'll Build: A complete Databricks workspace configuration including jobs, clusters, notebooks, and permissions—all defined in YAML and deployable with a single command. Level 0: YAML Basics BEGINNER The Golden Rules Rule #1: YAML uses spaces for indentation , never tabs. Standard is 2 spaces per level. Rule #2: YAML is case-sensitive . Name ≠ name Rule #3: Indentation = Structure . It defines parent-child relationships. ...

6 Common Databricks Mistakes Data Engineers Make (with Practical Fixes)

While working with Databricks, I noticed something important: most production issues don’t come from Spark itself — they come from how Databricks is used . The good news is that many of these problems are preventable with a few solid engineering practices. TL;DR Use job clusters for scheduled workloads Set aggressive auto-termination Use Delta Lake properly (ACID, schema evolution, time travel) Get file sizing + partitioning right Build idempotent pipelines with retries Add monitoring + alerting so systems catch failures early The 6 mistakes (and practical fixes) Below are the mistakes I see most often in Databricks projects — especially when teams move from development to production. Mistake 1: Using all-purpose clusters for scheduled jobs Why it hurts: All-purpose clusters are great for interactive exploration, but scheduled jobs on them often lead to noisy ...

About Me

Popular posts from this blog

Exploring the Largest UK Employers: A Power BI Visualization

Master Databricks Asset Bundles Through Hands-On Practice

6 Common Databricks Mistakes Data Engineers Make (with Practical Fixes)