Badges
Certifications
Work Experience
Data Engineer
Publicis Sapient•  May 2023 - Present
-Creating ETL pipeline using Pyspark. -optimizing the spark SQL query. -Analyzing the huge amount of data using SQL to provide the important insights to the client -scheduling and monitoring the Pyspark job using AWS Glue. -Optimizing the Hive table structure using partitioning and bucketing -Optimizing the Pyspark job. -
Data Engineer
Dunnhumby•  April 2022 - May 2023
Automated ETL Pipeline across billion of rows of transactional data using Apache Airflow and Spark Designed and developed Script in Python to apply extraction logic on data and store data into their respective supply Hive tables Played a key role in Spark SQL queries optimizations which optimized the running time of ETL pipeline by 2 hours each day Analyzed complex data and identified anomalies, trends and risks to provide useful insights to improve internal controls. Optimized Data Warehouse with Hive partitioning and Bucketing Techniques which improved query performance for analyst. Handled Enhancement Request, Bug Fixing and production support for 5 different projects.
Software Engineer
NATIONAL INFORMATICS CENTER•  July 2019 - April 2022
Importing data from multiple sources into Hive using Sqoop Worked with highly structured and semi structured data of 20 TB in size (60 TB with replication factor of 3) Writing spark queries to analyze large amount of data Worked on Spark SQL optimizations to increase job performance and reduced running time by 3 hours each day Built Spark Scripts depending on requirement. Automated the data Ingestion using Sqoop Involved in documentation of all important and relevant information needed for deployment in PROD environment
Education
Rajeev Gandhi Technical University,Bhopal
Computer Science, BE•  August 2010 - May 2014