Solon Kumar Das

Badges

Certifications

Certificate: SQL (Basic)

skill

Certificate: SQL (Intermediate)

skill

Work Experience

Data Engineer 2
AB InBev• September 2022 - Present
I am part of the product team, working with the inventory planning product and handling various ETL scripts from logistics and inventory and creating data ingestion pipelines using Azure Cloud services.
Data Engineer 1
Ernst & Young• May 2021 - September 2022
1. Built a complete end-to-end Event-Driven Data Ingestion Pipeline on Azure DataBricks using Apache Spark for ingesting data into Microsoft Azure Cosmos Database - Cassandra API and then producing messages/calculative fields to Kafka topics where further calculations were done using the Camunda rule engine. 2. Added a Data Tracing Feature to the Framework in the form of a DataBricks orchestrated Job which collects the history of Data Loads and the status of the Load(Failure/success) and ingests the data into a SQL server table and produces a Data Load event to an Azure Event Hub. (Kafka-enabled) 3. Added a feature to stream each incoming message to the Kafka topics in real-time by the earliest offset which enabled the team to find out message loss during streaming and classify error messages from the error topic. The stats monitoring feature helped the team to gather lost messages very quickly in seconds and allowed them to re-run them through the pipeline. This also made debugging of errors in a stream of data of over a million records possible in seconds. Functionality: The Data loaded into Cosmos DB is further used by the client for Regulatory Reporting through Dashboarding Tools such as Power BI for various clients. (it's a multi-tenant data ingestion system) Level 1 Interview Panelist for the EY - Data Science team for various engagements. Tech Stack: Python, PySpark, Microsoft Azure, Databricks, Cosmos DB - Cassandra API, SQL, Kafka. Azure Event Hubs, Azure Blob storage, Databricks delta lake storage, Azure secrets vault, etc
Data Scientist
Kudos Finance• November 2020 - May 2021
Developed an End to End Credit scoring model which scores the customers of our B2B2C business for digital lending for disbursal of small ticket size consumer durable loans which decreased the loan approval TAT by 90%. This model helps Credit Managers to take a quick and informed decision as to whom to disburse Loans to and which customers to avoid disbursing loans due to bad repayment behavior or a history of defaulting on loan payments. "Created Customer Demographics and Repayment behavior Portfolios" which enabled the lending partners in making better business decisions for greater revenue generation. Challenges: Handled extremely uncleaned data, parsed unstructured text files provided by Experian Bureau, and pushed them into MySQL databases to be accessed for further analysis.Developed an End to End Credit scoring model which scores the customers of our B2B2C business for digital lending for disbursal of small ticket size consumer durable loans which decreased the loan approval TAT by 90%. This model helps Credit Managers to take a quick and informed decision as to whom to disburse Loans to and which customers to avoid disbursing loans due to bad repayment behavior or a history of defaulting on loan payments. "Created Customer Demographics and Repayment behavior Portfolios" which enabled the lending partners in making better business decisions for greater revenue generation. Challenges: Handled extremely uncleaned data, parsed unstructured text files provided by Experian Bureau, and pushed them into MySQL databases to be accessed for further analysis. Skills: Natural Language Processing (NLP) · Microsoft Excel · Seaborn · Pandas (Software) · Python (Programming Language) · SQL · Machine Learning
Data engineer
Accenture• September 2018 - November 2020
Developed an Object Management Tool for automatic object deployment of views into different environments. The tool is hosted on the Matillion Linux server and communicates with the Redshift database. Developed an Automatic JIL file generator utility using Python to schedule Autosys Jobs which reduces approximately 98% of manual task that was used to write a JIL file manually for a scheduler. Worked on a pyspark development project where I was responsible to develop pyspark scripts for different data transformations incoming from our client for their new non-prescription drug category. Developed a View Dependency Utility which takes the backup of the DDL definition of all the dependent objects of a said table through all hierarchies which helps the database developers to drop and recreate views without losing information. Conducted various training sessions for the Dev team of 40 people on different Python concepts, Big Data Architecture, and Data Cleaning and preparation which helps them upskill and understand flows that are frequently used in the project.

Education

Liverpool John Moores University
Data Science, MS• June 2020 - March 2021
Grade: Distinction Grade: Distinction Research Thesis on Employment Scam Prediction. Natural Language Processing | Domain: Societal Benefit and Employment Scam Challenges: The large text corpus of HTML fragments allowed a deep text mining opportunity through Word2Vec embeddings and AutoML (Pycaret) ran the top 15 powerful single and ensemble classifiers producing better results from previous research work on this specific topic making my thesis genuine with a less than 3% plagiarized work. The thesis report presentation link is given below. https://drive.google.com/file/d/1MaKagYS_9nqWN8ghYAWMPVD9LhInJpcT/view
International Institute of Information Technology, Bengalutu (IIIT-B)
Data Science, MS• June 2019 - June 2020
Grade: GPA: 3.62 (on a scale of 4)Grade: GPA: 3.62 (on a scale of 4) Coursework: Preparatory Course : Python Programming || Python for Data Science || Python Data Structures Data Analysis in Excel || Visualisation using Tableau || Data Visualisation in Python Course 1 : EDA and Statistics : Exploratory Data Analysis || Inferential Statistics || Hypothesis Testing Course 2 and 3 : Machine Learning : Linear Regression || Logistic Regression || Principal Component Analysis || Clustering Model Selection || Advance Linear Regression || Boosting || Decision Tree Models, Random Forest Course 4: Big Data Framework Database Design || MySQL(Intermediate to Advance) || Hadoop Framework || Hive-QL || Apache spark Framework || pyspark Course 5: Deep Learning (Elective) Artificial Neural Network || Convolutional Neural Network || Recurrent Neural Network || Gesture control application for Smart TV Final Project: Anomaly detection in Credit Card - Kaggle Worked on a highly imbalanced dataset and predicted with various ML Algorithms.
Veer Surendra Sai University of Technology
Information Technology, B.Tech• September 2014 - June 2018