“Managing data quality in Machine Learning” at Google Developer Group Cloud Community Day

Bangalore, 2022 [Presentation] [Poster]

In the current scenario where every ML system requires a ton of data to train, changes in the data during model refreshment or even during production will cause a performance drop, sometimes quite significantly. It has become a tremendously important task in the ML system lifecycle to periodically check quality issues in the data stream itself. There are existing libraries, open-source tools or full-fledged SaaS platforms to monitor those data quality metrics but the metric used oftentimes becomes too generic and might not be useful at all. There are simple data quality metrics, which can be developed individually and can be integrated with data quality tools/SaaS platforms to monitor them in production. In this talk, I will go through a couple of metrics for different types of data and use cases and how to use clustering and other unsupervised learning algorithms to build those metrics at the end will also try to show a demo with integrations and how it can be run in production.

“Things I learned while running neural networks on microcontroller” at PyData Global

2022 [Presentation] [Poster]

Running neural networks on production systems is quite difficult but running it on microcontrollers is different. The choice of the microcontroller, presence of a purpose-built processor, data I/O, model training and inferencing - all things change when the target deployment scenario changes from a cloud instance to a power-constraints microcontroller. In this talk, I will go through how to go about it as a novice and get a model running.

“Bessel’s Correction: Effects of (n-1) as the denominator in Standard deviation” at PyData Global

2022 [Presentation] [Poster]

When calculating standard deviation, the denominator being (n-1) for observation of n doesn’t make sense, until we dive a little deeper to understand the theory behind it. But even with the correction factor in place, the question can be asked is it really needed? How much does it influence the end result?

“Interpretable ML in production” at Google Developer Group Cloud Community Day

Bangalore, 2023 [Presentation] [Poster]

Validating an ML model with train-test accuracy metrics offers an initial understanding of viability but generating consistent inferencing with contextual business goals requires understanding how the deployed model works in different nature and how they will behave in case of soft data drift. In this talk, I will try to go through different explainability methods and how to employ them and how the choice of type of models affects or affects the interpretability in production inferencing.

“Considerations for LLMOps: Running LLMs in production” at AZConf

Chennai, 2023 [Presentation] [Poster]

With the recent explosion in development and interest in large language, vision and speech models, it has become apparent that running large models in production will be a key driver in enterprise adoption of ML. Traditional MLOps, i.e. running machine learning models in production, already has so many variabilities to address starting from data integrity, data drift and model optimization. Running a large model (language or vision) in production keeping in mind business requirements is different altogether. In this talk, I will try to explain the general framework for LLMOps and certain considerations while designing a system for inferencing a large model. A brief understanding of the current open-source tool sets will also be mentioned so that tool-chain selection is a bit easier.

“How can a learnt model unlearn something” at PyData Global

2023 [Presentation] [Poster]

In the recent past with the explosion of large language or vision models, it became inherently very costly to train models on new data. Coupled with that the various new data privacy legislations introduced or to be introduced make the “right to be forgotten” very costly and time-consuming. In this talk, we will go through the current state of research on “machine unlearning”, how a learnt model forgets something without retraining and a general demonstration of the machine unlearning framework.

Publication

Bokde, N.D., Patil, P.K., Sengupta, S. et al. VedicDateTime: An R package to implement Vedic calendar system. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-16553-w

Saradindu Sengupta

“Managing data quality in Machine Learning” at Google Developer Group Cloud Community Day