Summary

Overview

Work history

Education

Skills

Accomplishments

Timeline

ANKIT ARORA

Dubai,UAE

Summary

🧠 Innovative and results-driven Data Architect & Engineering Leader with 13+ years of experience building scalable, cloud-native data platforms and intelligent analytics systems.
🚀 Proven track record in designing high-throughput event ingestion architectures, real-time and batch pipelines, and AI-enabled data marts for business agility and to optimize data pipelines, resulting in significant time and cost savings.

👨‍💻 Experienced Data Engineer excelling in transforming complex datasets into actionable insights. 🤝 Known for a collaborative approach and 🧩 problem-solving mindset, driving impactful improvements in data infrastructure.
🔄 Adept at translating business goals into robust data solutions using modern technologies such as AWS ☁️, Spark ⚡, Trino 🔍, Kafka 📨, and Kubernetes 🐳.
💸 Recognized for driving efficiency through metadata governance, and machine learning–ready feature stores 🧬.
📐 Known for leading strategic design initiatives, authoring architecture blueprints, and enabling enterprise-wide data-driven decision-making

Overview

years of professional experience

years of post-secondary education

Work history

Lead Data Engineer

Careem

Dubai, UAE

06.2022 - 06.2025

🗃️ Careem Data Platform
🔧 Led end-to-end data warehouse delivery initiatives, working closely with cross-functional teams including product managers, analysts, and engineers.
⚙️ Spearheaded the development of a versatile Spark job orchestration framework that submits SQL jobs via JSON, enabling dynamic configuration, optimization, and reuse.
⏱️ Designed and deployed both real-time and batch ingestion mechanisms, improving data freshness, availability, and agility for analytical teams.
🧠 Built an AI-first Data Mart framework that enables users to run natural language queries in real-time through MCP AI server and other LLM-backed tools—translating business questions into dynamic SQL execution.
🤖 Ingested and operationalized inferred data from ChatGPT and similar AI tools, integrating it seamlessly into Careem’s pipeline ecosystem for enhanced decision intelligence.
📏 Pioneered a comprehensive data quality framework leveraging Great Expectations and OpenMetadata—standardizing both internal and external validation layers.
🪄 Orchestrated master data pipelines using Apache Airflow with common DAG templates, facilitating parallelism and dependency-based data mart builds.
☁️ Provisioned and configured scalable AWS EMR clusters via Terraform, optimizing for compute efficiency, autoscaling, and cost tracking.
🧾 Integrated Hive metadata layers into EMR, Trino, and Presto environments—seamlessly connecting cataloged datasets across platforms.
💸 Achieved major cost savings by refactoring SQL logic, reducing EMR runtimes, tagging resources, and enforcing cloud governance rules.
🚀 Championed CI/CD automation with GitHub Actions, enhancing deployment agility and integrity across production pipelines.
🐳 Migrated Spark workloads to Kubernetes, enabling fine-tuned resource isolation and observability.
📊 Built and launched a Superset-based data reporting interface, drastically improving query performance and data exploration for analysts.
📉 Integrated a low-latency Druid metrics layer for real-time dashboards, supporting high-concurrency access patterns and business-critical KPIs.
🧩 Delivered a Customer 360° Data Mart and a hybrid Feature Store (offline — Hive/Trino; online — Redis/Kafka), enabling sub-10 ms feature retrievals for ML models and experiment pipelines.
🪣 Architected a robust S3-backed data lake, consolidating raw system logs and structured data feeds into a unified schema-driven platform.
🧭 Maintained a centralized metadata and lineage management system, streamlining data discoverability and governance for end users.

🗃️ Event Platform & Feature Store
⚡ Designed and implemented an enterprise-scale Event Ingestion Platform Architecture, optimized for ultra-low latency (sub-second) and high-volume (60K+ events/sec) analytics using Kafka, Spark Streaming, and S3/Kafka sinks.
🧑‍💻 Developed a Self-Service Event Routing UI empowering downstream teams to seamlessly subscribe to real-time topics with zero engineering dependency.
📄 Authored and drove adoption of high-impact architectural blueprints via technical design documents:
✅ Mini-App Session Stitching Design Document (Owner): Unified fragmented session journeys across SuperApp verticals (Food, Pay, Ride) with hybrid SQL attribution and platform metadata enforcement.
✅ Decoupled Event Ingestion Architecture Blueprint (Contributor): Redefined ingestion architecture to decouple compute and storage, cutting latency by 40% and reducing infra cost by ~20%.
🧠 Positioned Careem’s data platform for AI-readiness by contributing to online & offline feature store architecture—powering ML models and experimentation with millisecond response times.

Senior Data Engineer

S&P Global

03.2019 - 06.2022

HR Analytics

Created the HR Repository from scratch for people team to provide a platform for business stakeholders, Reporting team and Data Scientist team.
Assist in setting strategic direction for database, infrastructure and technology through necessary research and development activities.
Work as Data Architect to build the HR Repository on AWS platform.
Design and implement various components of data pipeline that include data integration, storage, processing, and analysis of business data.
Lead the development of project outputs such as business case development, solution vision and design, user requirements, solution mockup, prototypes, and technical architecture, test cases, and deployment plans.
Leads data assets as per the enterprise standards, guidelines, and policies.
Works on streamline data flows and models; improve consistency, quality, accessibility, and security of Data.
Data Analysis on various business problem like Attrition Analysis, People Movement Dashboard, NLP analysis on several Surveys etc. using machine learning and python.

Senior Data Engineer

UnitedHealth Group

08.2016 - 03.2019

Unified DataWare House

Design the Interface between different OLTP systems (like based on mainframe systems, oracle databases, Db2 Database, Hadoop base data -lake, Network base MQs, Real time system) and UDW.
Design the framework of Dataware House.
Made the Architecture of metadata repository including control tables, UNIX local repository, archiving the old information and traceable fields in each table.
After go live of project do the scalability of UDW, optimization of components, purging and do the on-going improvements.
Develop the ETL jobs.

Senior Data Engineer

UnitedHealth Group

09.2013 - 08.2016

Oxford DataWare House

Onsite Coordinator as System level Owner and handles the team from India.
Root cause Analysis for recurring failures and apply fix wherever necessary.
Identify the areas of improvement, address them on priority basis and provide solution with low level design.
Impact Analysis on enhancements being made to ensure data quality.
In addition to the daily production support it includes monitoring, fix failures, and work on service calls.
Migrate Project from On-premises to cloud.
Recreate the ETL Jobs in AWS Cloud.
Do the Lift and Shift of Data using Schema Migration.
Testing and Validation of Data after loading to Cloud and match to Data in on-premises database.
Impact analysis of downstream applications and setup new connections for them.

Data Engineer

Tata Consultancy Services

05.2013 - 08.2013

Hyper Growth

Understand existing DataStage job's functionality and re-design them to have better performance.
Design sequencing of components to optimize performance.
Redesign database tables to fine tune extraction and loading of database tables.
Implement Low Level Design, High Level Design, Unit Test Cases and System Test Cases for jobs and module.

Data Engineer

Tata Consultancy Services

11.2012 - 05.2013

BI Clinical Prism

Understanding the technical requirements and design the solution.
Implementing automated sequencing of ETL Data load based on logical dependency across data entity.
Designing System test cases for ETL jobs and sequences.
Unit testing of ETL components.
Implementing complex logic through PL/SQL procedures which can be reused across DataStage jobs.
Performance Optimization of DataStage jobs and Oracle Queries.
Developed Unix Scripts for executing DataStage jobs and sequences.

Data Engineer

Tata Consultancy Services

01.2012 - 11.2012

MDM Reporting

This is warehousing project to generate the report from master data.
ETL Pipeline Development.
UNIT Testing.

Web Developer

Concept

05.2011 - 09.2011

Multiple Web Portal

Requirement Gathering.
Develop front end and back end of web portal using PHP, MySQL.

Education

B.Tech. - Compturer Engineering

Jaipur Engineering College and Research Centre

India

06.2007 - 05.2011

Skills

🗂️ Data Technologies & Platforms
🐘 Hadoop
⚡ Spark
🐍 Python
🧱 Data Modeling
🏗️ Data Architecture
🛡️ Data Governance
✔️ Data Quality
🔁 Online and Offline Feature Store
🔧 ETL Pipeline Design
🦴 Feast (Feature Store)
🤖 Generative AI
🧠 AI-Powered Data Mart
📈 Realtime Analytics

☁️ Cloud Platforms & Infrastructure
🖥️ Amazon EC2
📦 Amazon ECR
🌩️ Amazon EMR
🔄 Amazon ECS
🐳 Docker
📨 Kafka
🐙 Kubernetes
🧬 AWS Glue
⚙️ AWS Lambda
💻 UNIX Shell
🔍 Athena
☁️ Google Cloud
🚀 GCP Compute
🧭 Amazon S3
🏢 Amazon Redshift

🔍 Query Engines & Databases
❄️ Snowflake
🧮 Trino
🚇 Presto
🐬 MySQL
🟠 Oracle
🐝 Hive
📁 HDFS
📊 BigQuery

📉 Monitoring & Workflow Orchestration
📈 Amazon CloudWatch
⏰ Apache Airflow
🗓️ TWS (Tivoli Workload Scheduler)

📊 Visualization & Metadata
📉 Tableau
📘 OpenMetadata
🧪 Great Expectations
🔎 Amundsen

🧰 Data Migration & Analysis
🛠️ AWS DMS
🔬 Exploratory Data Analysis
📐 Hypothesis Testing
📊 Inferential Statistics
🧠 Machine Learning

Accomplishments

🚀 Designed and scaled a high-throughput event platform handling 60K+ events/sec with sub-second latency across real-time analytics pipelines.
🧠 Delivered an AI-first Data Mart, enabling natural language business queries through MCP AI server and ChatGPT-inferred prompt integration.
⚙️ Built a Customer 360° Data Mart and Feature Store (offline via Hive/Trino; online via Redis/Kafka) supporting
📄 Authored & implemented core design documents that redefined session stitching and decoupled event ingestion for SuperApp verticals.
⛓ Migrated Spark workloads to Kubernetes, automated deployment via GitHub Actions, and drove major cost savings through optimized EMR/Trino workloads.

Timeline

Lead Data Engineer

Careem

06.2022 - 06.2025

Senior Data Engineer

S&P Global

03.2019 - 06.2022

Senior Data Engineer

UnitedHealth Group

08.2016 - 03.2019

Senior Data Engineer

UnitedHealth Group

09.2013 - 08.2016

Data Engineer

Tata Consultancy Services

05.2013 - 08.2013

Data Engineer

Tata Consultancy Services

11.2012 - 05.2013

Data Engineer

Tata Consultancy Services

01.2012 - 11.2012

Web Developer

Concept

05.2011 - 09.2011

B.Tech. - Compturer Engineering

Jaipur Engineering College and Research Centre

06.2007 - 05.2011

ANKIT ARORA

Summary

Overview

Work history

Lead Data Engineer

Senior Data Engineer

Senior Data Engineer

Senior Data Engineer

Data Engineer

Data Engineer

Data Engineer

Web Developer

Education

B.Tech. - Compturer Engineering

Skills

Accomplishments

Timeline

Lead Data Engineer

Senior Data Engineer

Senior Data Engineer

Senior Data Engineer

Data Engineer

Data Engineer

Data Engineer

Web Developer

B.Tech. - Compturer Engineering

Similar Profiles

Noora BarassNoora Barass

Abdul aziz fahad alagbaryAbdul aziz fahad alagbary

Isaac KaizraIsaac Kaizra

Mirang ChawlaMirang Chawla

GANESH SIVARAMAKRISHNANGANESH SIVARAMAKRISHNAN