Project #5 : Alteryx Connectors Project, [ 03/2025 - Current]
Client : Alteryx
Description : Working on the Alteryx Connectors project, specializing in the development of scalable connectors using Python and the gRPC framework.
Roles and Responsibilities:
- Developed and maintained integration connectors for Google Drive, SharePoint, and Power BI, designed to be compatible across both Alteryx Designer (Flagship) and Alteryx Cloud platforms.
- Collaborated with teams working in Java and C++ to align backend connector logic and ensure smooth cross-platform functionality.
- Utilized GitLab for source control and CI/CD pipelines, ensuring robust versioning, automated testing, and continuous integration workflows.
- Gained hands-on experience with Alteryx Designer, validating connector behavior, conducting functional testing, and supporting deployment readiness.
- Ensured code reliability and maintainability through strong adherence to software engineering best practices and comprehensive unit/integration testing.
Project #4 : QA Performance Based Sampler, [ 08/2023 - 01/2024]
Client : HSBC Bank
Description : Built a performance based sampling approach, deploying apt statistical method and sampling techniques in QA function and shift focus from flat sampling methodology to performance based sampling methodology.This QA sampler tool is a desktop tool which allows QA team to upload the cases in .xls, .csv documents and generates the samples.
Roles and Responsibilities:
- Responsible for creating a user interface using PysimpleGUI which provide flexibility for user to select the options as per their requirement.
- Perform file and data validation and data formatting to check for file formats, missing info, data formats.
- Responsible for data cleaning, obtaining the structured data and fit into a relational data base system.
- Generated samples based on statistical procedures based on the requirement of client.
Project #3 : National Data Analytics Platform (NDAP), [04/2020 - 07/2023]
Client : NITI Aayog (Central Govt. of India)
Description : NITI Aayog in the space of promoting larger access and better use of data. Single point for accessing data across all Ministry(ies) of Government of India combined with intuitive visualization and self-service analytics. The portal will provide natural language understanding based advanced search features to find the appropriate data set across a huge variety of data spread across different areas. Standard Operating Procedures (SOPs) will be developed to keep data updated. Further, APIs, web-crawling and other methods will be used to keep the data fully up-to-date.
Roles and Responsibilities :
- Responsible for continuous sourcing and extraction of the data from various web portals using the techniques like web scraping, PDF extraction, etc.
- Responsible for data cleaning, obtaining the structured data and fit into a relational data base system.
- Developed a REST API using Flask framework and hosted it in AWS to enable seamless batch data transfer between client servers and data warehouse and implemented ETL pipelines on the data.
- Successfully interpreted data to identify key metrics and draw conclusions.
- Defined DAGs in Airflow using different operators and sensors, automated data pipelines using Airflow's scheduling features.
- Documented the environment setup, pipeline setup and maintenance procedures for future reference.
Project #2 : ADAM Accelerated Development of Analytics using Machine Learning Tool, [02/2019 - 03/2020]
Client : OTSI
Description : The aim of the project is to create an accelerator which focuses on productivity improvements during the Exploratory Data Analysis, Data processing and Model Building stages of Advanced Analytics Project Lifecycle. In simple ADAM(Accelerated Development of Analytics using Machine Learning Tool) can also be a helpful tool for the advanced user, by providing a simple wrapper function that performs a visualization, data-preprocessing and large number of modelling-related tasks that would typically require many lines of code, and by freeing up their time to focus on other aspects of the data science pipeline tasks such as feature engineering and model deployment.
Roles and Responsibilities:
- Develop an application using python for comprehensive exploration and profiling of the data collected from various types of source systems.
- Compare and evaluate different programming languages, tools, web frameworks for creating the AutoEDA, AutoDP and AutoML accelerators.
- All of these features together produce a report of insights generated from descriptive analysis, graphical data exploration, data preprocessing and model building.
- Perform bench mark testing of the accelerators against diverse data sources.
- Optimize the code for better performance on big data sets.
- Used the tools effectively for expediting the data analytics projects for few clients.
- Developed code for automating Exploratory Data Analysis(Accelerator) and Worked on rest API calls and integration with UI and wrapper classes for various data pre-processing techniques, wrapper classes for frequently using ML algorithms (Classification, Regression, Clustering).
- Designing test data and test cases to add more techniques to the developed wrapper classes to attain good accuracy for the provided data.
Project #1 : PYCKER (WEB SCRAPING), [08/2018 - 11/2018]
Client : Pycker
Description: Pycker provides you the best available information and articles on movies and celebrities. The main aim of the pycker project is to Collect all the movie reviews and the related content from various websites and process them into structured format and store them into the database and Collect data from various twitter user accounts using Tweepy and built sentiment analysis on top of the data.
Roles and Responsibilities :
- Create python code for scraping the website and captured movie related information.
- Designing test data and test cases to add more techniques to the developed wrapper classes to attain good accuracy for the provided data.
- Worked on scraping the news from different websites using selenium or requests package and pulled the information from twitter using tweepy package.
- Built an automated version to scrap all the websites at a time by creating a generalized code which works for all websites and pushes the collected data into database.
- Pre-process the collected Twitter data by using nltk library.