Michigan Health Information Networks | May 2023 - Present
Worked on backend development of a state-sponsored project “Social Determinants of Health”. Developed and deployed AWS Lambda functions using Python Boto library for processing, validating, and storing Electronic Health Records in AWS RDS. Created mock interfaces and implemented unit tests to validate the functionality of the database drivers. Took a test-driven design approach to ensure code quality and improved code coverage from 20% to 80%. Optimized PostgreSQL database performance using indexes and views, resulting in 80% faster processing. Created a PoC Project to evaluate Serverless architecture using AWS Lambda, SNS, CloudFormation and showcased it to the leadership.
Big Data Engineer
HSBC | July 2019 - August 2022
Provisioned Cloudera Hadoop clusters (HDFS, MapReduce, YARN, Hive, Oozie, Zookeeper, Solr) on-premise for production and disaster recovery in the UK and HK datacenters. Built "BigData360", a real-time alerting, monitoring system and designed dashboards for 80+ Hadoop dev and prod clusters using Grafana and Prometheus by collecting metrics from Cloudera APIs and Node Exporters. Generated monthly cost reports to support Big Data Capacity Planning and forecasting demand. Built a self-serving data cataloging tool to analyze tenant usage metrics and trends in real time using NiFi, MySQL and groovy saving ~150 workhours. Virtualized control nodes by migrating critical Hadoop Master services (NameNodes, YARN, Solr, Zookeeper, Hive) from on-premise servers to ESXi virtual machines to enhance cluster efficiency. Achieved cost avoidance of $700k on buying new expensive servers while saving 400 work hours in maintenance upgrades. Developed ETL pipelines and scripts in python to archive cold data. Set up HDFS Erasure coding policies to achieve savings of 50% in total storage space. Created automation using Ansible playbooks to streamline the provisioning of RedHat servers with Cloudera(CDP) Hadoop capabilities. Mentored and onboarded new SWEs. Conducted knowledge sharing sessions with my team.
Rutgers, The State University of New Jersey
Computer Science, MS | September 2022 - Present
Specializing in Massive Data Analytics
University of Pune, Pune