
How AI on the Cloud Is Changing Everything | Narendra Mangala | TEDxGaya College of Engineering
Audio Summary
AI Summary
Imagine a CFO waking up to an automated report summarizing 400 million rows of sales data, flagging risks, and providing clear bullet points, all generated by an AI-powered system overnight. This isn't science fiction; it's the present reality, and my five-year research (2021-2026) has focused on building this infrastructure. My name is Narendra Mangalam, and I'll share the journey of data from dusty servers to decision-making tables, with AI on the cloud as the invisible engine.
Fifteen years ago, data was siloed in large enterprises. Understanding business holistically was a multi-day task for analysts, with slow databases and brittle ETL (Extract, Transform, Load) pipelines frequently failing. ETL, though the backbone, was fragile, with a single schema change upstream causing pipeline collapse.
Then came the cloud. Initially met with skepticism regarding security and control, platforms like Azure and AWS grew rapidly, offering not just storage but a new philosophy: design infrastructure for data first, then scale to match. This was a revolution I was part of.
In 2021, my research addressed the challenge of messy, raw, untrustworthy cloud data. The solution was architecture, specifically the Medallion architecture, a three-layer system like an oil refinery. The Bronze layer receives raw, unfiltered data. The Silver layer cleanses and structures it, correcting data types and applying business rules. The Gold layer houses aggregated, modeled, and semantically enriched business insights, feeding dashboards, reports, and machine learning models. My 2021 research showed that adopting this architecture on Azure Data Lake reduced data processing failures by 40% and accelerated time to insights.
The next question was speed. My 2021 research benchmarked PySpark against Scala for distributed data transformations, revealing that language choice significantly impacts pipeline runtime. These findings now guide enterprises in selecting processing frameworks based on measurable speed for specific workloads.
Trust in data extends beyond accuracy to governance. In 2022, my research shifted to this, leveraging Databricks Unity Catalog as a centralized governance layer. It catalogs, tags, and personalizes every data asset, defining granular access controls and automating PII masking or encryption. My research addressed managing governance across 50 business units, proposing a federated model: local autonomy with centralized oversight.
By 2023, AI's entry made governance an AI safety requirement. Biased, ungoverned data fed to ML models leads to confidently wrong answers at scale. My 2023 research focused on automating PII compliance using Unity Catalog in AI-driven ecosystems.
Late 2022 saw ChatGPT's release, bringing AI to every boardroom. For data engineers, this meant excitement for AI's capabilities (anomaly detection, SQL generation, data summarization) but also anxiety over the dramatically different infrastructure AI needs. ML models require vast amounts of clean, structured, versioned data. My 2023 research explored ML Ops-ready pipelines, where data pipelines are designed for training ML models at scale. The Gold layer evolved into a feature store, a repository of pre-engineered attributes for ML models. My 2023 paper documented building Gold layer semantic datasets for ML model training on Databricks.
In 2024, we asked if AI could build pipelines itself. My research on prompt-driven data transformations tested leading language models like ChatGPT 4 to generate ELT pipeline code from plain English. For straightforward transformations, LLM-generated code was production-ready with minimal editing. For complex logic, human expertise remained crucial; AI acted as a powerful accelerator, not a replacement.
Also in 2024, I compared Azure Databricks and Microsoft Fabric for AI-ready enterprise data architectures. Databricks excelled in raw compute, ML flexibility, and engineering depth, while Fabric shone in unified governance, end-to-end integration, and lower TCO for mid-sized enterprises. The key insight is choosing the platform best suited for specific data maturity, team skills, and strategic AI ambitions.
My 2025 research focused on Microsoft Fabric's One Lake, a single logical data lake serving both AI and BI workloads without duplication or synchronization complexities. This architectural breakthrough is revolutionary for managing data silos.
Also in 2025, my research ventured into agentic data pipelines, where AI agents autonomously make decisions about data flow, transformation, and routing. These agents monitor data quality in real-time, investigating anomalies, identifying root causes, and either fixing issues or escalating with detailed diagnoses. My paper documented the first enterprise-scale implementation, showing a 60% drop in mean time to pipeline discovery due to AI's relentless monitoring. Alongside this, I researched real-time feature engineering for streaming AI workloads, enabling AI systems to make decisions on data seconds old, crucial for applications like fraud detection.
Finally, in 2026, my research addressed responsible AI data architecture, embedding GDPR and PII compliance into ML Ops pipelines. Compliance cannot be an afterthought; it must be an architectural principle from the Bronze layer to model training. PII tagging, differential privacy, audit trails, and consent management are entry requirements for regulated industries.
The CFO's automated report in 2026 is the culmination