Summary
Overview
Work History
Education
Skills
Additional Information
Accomplishments
Certification
Languages
Work Availability
Work Preference
Timeline
Hi, I’m

Praveen Kumar

Sr.Principal Data&AI Architect and Director
Dubai,Dubai
Praveen Kumar

Summary

Senior Principal Data Architect and Associate Director of Data engineering and AI with 15+ years building enterprise data platforms on AWS, Azure, and GCP. Specialize in designing modern data lakes, warehouses, and AI/ML pipelines using Snowflake, Databricks, Spark, and Airflow. Led teams to deliver compliant (GDPR/HIPAA) data solutions for healthcare, finance, and retail. Hands-on expertise in Python, SQL, TensorFlow, and data governance. Directed cloud migrations, cost optimizations, and ML deployments (SageMaker, Vertex AI).

Overview

15
years of professional experience
1
Certification

Work History

Cargill Inc

Sr.Principal Architect, Data Engineering
07.2018 - Current

Job overview

  • Project : Cargill Data Platform providing service to entire Cargill data
    products. Developing the data products logistics and production.
  • Built hybrid cloud (AWS/Azure/GCP) data platforms, integrating IoT, ERP/CRM, and streaming data via Spark, Kafka, and cloud-native tools for real-time analytics and ML.
  • Designed scalable data models (star/snowflake) in Snowflake, BigQuery, Redshift, and Synapse to power ML pipelines and dashboards.
  • Automated ETL/ELT workflows with Airflow, Data Factory, and GCP Composer, enabling efficient data transformation for AI use cases.
  • Enforced data governance using Databricks Unity Catalog and cloud tools (Purview, Lake Formation) for lineage, security, and compliance.
  • Deployed ML models (forecasting, anomaly detection) via SageMaker,Vertex AI, and Azure ML, containerized with Docker/Kubernetes.
  • Applied DevOps (Terraform, CI/CD) to automate infrastructure and pipeline deployments, improving team collaboration.
  • Partnered with data scientists to operationalized models, monitor performance, and scale MLOps workflows.
  • Mentored teams on PySpark/SQL best practices and data quality standards.


Tools : AWS/Azure/GCP, Snowflake, Databricks, Spark, Kafka, Airflow, Terraform, SageMaker, Docker.

Hewlett Packard Enterprise

Manager, Data Analytics
07.2015 - 07.2018

Job overview

Project: HPE Universal Internet of Things (IoT) Platform

  • Led a team to design and deploy hybrid cloud (AWS, Azure, GCP) data pipelines for processing IoT telemetry data, enabling real-time analytics.
  • Built a unified data lake architecture using Snowflake and Databricks to streamline data access and improve analytics efficiency.
  • Developed scalable ETL/ELT pipelines with Apache Spark, Airflow,and PySpark , integrating diverse IoT device data for predictive maintenance workflows.
  • Implemented data governance frameworks (GDPR, ISO 27001compliance) to ensure data accuracy and security across platforms.
  • Automated cloud infrastructure and data workflows to optimize resource usage and reduce operational costs.
  • Collaborated on deploying AI/ML models for anomaly detection, enhancing device reliability and operational insights.
  • Mentored junior engineers on best practices, fostering team collaboration and timely project delivery.
    Tools : AWS/Azure/GCP, Snowflake, Databricks, Spark, Airflow, PySpark,
    Terraform, SageMaker, Unity Catalog.

L&T

Lead, Software Engineer
07.2014 - 07.2015

Job overview

  • Optimized SQL queries across cloud and on-premise databases (Snowflake, BigQuery) to enhance performance, enabling faster analytics and decision-making.
  • Built ETL pipelines (AWS Glue, Azure Data Factory) to ingest and transform large-scale datasets into centralized warehouses (Databricks, Snowflake), ensuring reliable data availability.
  • Automated data workflows using shell scripting and orchestration tools (Airflow), reducing manual effort and improving operational efficiency.
  • Collaborated with analytics and DevOps teams to clean, model (star schema), and interpret data, delivering actionable insights for business stakeholders.
  • Enforced data governance (Unity Catalog, Apache Atlas) and security protocols (logging, audits) to meet compliance standards (GDPR) and ensure data integrity.
  • Supported AI/ML initiatives by structuring datasets for model training (SageMaker, Azure ML) and integrating predictive outputs into business workflows.

Tools : AWS/Azure/GCP, Snowflake, Databricks, SQL, PySpark, Airflow, Unity Catalog, SageMaker.

Wipro Technologies

Project Engineer
10.2011 - 07.2014

Job overview

  • Contributed to all phases of the development lifecycle, from design to writing and developing.
  • Designed, developed and tested Java-based solutions using common standards and frameworks for ease of maintenance.
  • Automated repetitive tasks using Java, saving the team over 100 hours of manual work per month.
  • Designed RESTful APIs for Java applications, enabling seamless integration with front-end technologies and third-party services.
  • Applied Agile methodologies to manage Java development projects, improving team productivity and project delivery timelines.

Education

JNTU, Hyderabad

Bachelor of Science from Computer Science Engineering
2010

Skills

  • Cloud Platforms: AWS, Azure, GCP
  • Big Data & Open Source: Hadoop,Apache Spark, Kafka, Flink, Hive,Airflow, Beam, EMR, Databricks
  • Data Engineering & ETL: AWS Glue, Azure Data Factory, GCP Dataflow, dbt, Step Functions, Cloud Composer, Pub/Sub
  • Data Warehousing: Snowflake, BigQuery, SQL, NoSQL (Cassandra), Delta Lake
  • Data Governance: GDPR, CCPA, RBAC, KMS, Unity Catalog, Apache Atlas
  • AI/ML: TensorFlow, scikit-learn, Azure ML, PySpark, Vertex AI, REST APIs
  • DevOps & Infrastructure:Terraform, CloudFormation, Kubernetes, Docker, ECS/EKS, CI/CD
  • Programming: Python, PySpark, Scala, Java, Shell Scripting
  • Data Modeling & Tools: Erwin Data Modeler, SAP, Trifacta (Cloud Dataprep)
  • Leadership: Strategic Roadmaps, Cross-function

Additional Information

Active UAE Employment Visa

Accomplishments

  • Optimized ETL Pipelines: Enhanced ETL (Extract, Transform, Load) processes to increase efficiency and reduce processing time by implementing parallel processing, optimizing SQL queries, and leveraging distributed computing frameworks using Apache Spark.
  • Automated Data Pipeline Monitoring: Developed automated monitoring and alerting systems to proactively identify issues in data pipelines, enabling timely intervention and minimizing downtime, using tools like Apache Airflow, Prometheus, and Grafana.
  • Data Security and Compliance: Implemented data security measures and ensured compliance with data privacy regulations (e.g., GDPR, CCPA) by implementing encryption, access controls, and anonymization techniques, safeguarding sensitive data and mitigating risks of data breaches.

Certification

https://credentials.databricks.com/profile/praveenkumar243261/wallet

Languages

English
Bilingual or Proficient (C2)
Arabic
Upper intermediate (B2)
Hindi
Advanced (C1)
Availability
See my work availability
Not Available
Available
monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Work Preference

Work Type

Full TimeContract WorkGig Work

Work Location

RemoteHybridOn-Site

Important To Me

Career advancementWork-life balanceCompany CultureFlexible work hoursWork from home option

Timeline

Sr.Principal Architect, Data Engineering

Cargill Inc
07.2018 - Current

Manager, Data Analytics

Hewlett Packard Enterprise
07.2015 - 07.2018

Lead, Software Engineer

L&T
07.2014 - 07.2015

Project Engineer

Wipro Technologies
10.2011 - 07.2014

JNTU, Hyderabad

Bachelor of Science from Computer Science Engineering
Praveen KumarSr.Principal Data&AI Architect and Director