MLOps have become a vital part of the ML lifecycle for deploying, monitoring and updating models smoothly in the most-efficient manner for the organisation. Rodrigo Masini, the Lead MLOps Engineer at Tredence, has years of expertise in advanced analytics and ML. As a result, he has built a solid foundation in MLOps for companies and consequently created a transformation roadmap for them.
Masini has spent the past two years implementing new end-to-end production-level ML solutions in energy, retail, health, and finance/banking. Analytics India Magazine catches up with him to understand how Tredence helps clients integrate MLOps. He tells us what works and what doesn’t.
AIM: Scaling ML solutions is expensive for a company. How does Tredence intend to tackle the budget constraints while integrating MLOps?
Rodrigo: Tredence’s MLOps perspective combines effective project management, resource utilisation and a deep understanding of the client’s needs. MLOps encompasses domains beyond DevOps and data engineering. It isn’t just one capability that organisations must enable – it’s a living methodology evolving daily.
Our core objective is collaborating with clients and supporting them in their MLOps maturity journey. We work closely with them to identify the most cost-effective ML solutions that meet their requirements using MLOps best practices with a unique DS journey blueprint. In addition, we developed powerful DS/ML accelerators and a set of exclusive implementation and management methodologies, helping clients streamline ML use cases. This approach minimises unnecessary costs and ensures the client’s budget doesn’t shoot up.
Additionally, Tredence leverages its partnership with ML cloud platforms like Databricks, Sagemaker, and Vertex AI to reduce infrastructure costs while ensuring high performance. We are a company that works with any framework while offering clients the flexibility to scale their ML models.
AIM: MLOps call for seamless interaction between the development and deployment teams. How does Tredence plan to ensure easy coordination between teams for this?
Rodrigo: Tredence builds a unified and centralised platform for collaboration and communication for their clients, which can have different formats based on the organisation’s maturity level and use cases. For example, our experience shows that providing a single source of truth for all ML projects, including code, models, and metadata, is challenging. This is because not all companies have the organisational capability to maintain and manage this. However, ML Observability provides the benefits of tracking resources, data, models, and business metrics, enabling both teams to access the latest information and track the progress of each project in real-time.
This has been an effective way for Tredence to ensure quick feedback between teams. For instance, an ML Observability tool empowers teams with visibility of their ML pipelines, validating them and proactively reducing the risk of downtime and bugs. Also, clients that employed this tool had a substantial increase in meetings and collaborative sessions, fostering communication and establishing that everyone was on the same page. The benefit was a reduction in almost 80% of reworking and duplicated efforts between development and deployment teams.
Tredence supports development and deployment teams through Model Registry and Experimentation tracking. Those ML components help them understand the processes, tools, environments, and configurations used to develop and deploy models. For example, the model Registry promoted that those teams worked together seamlessly and achieved the common goal of delivering high-quality ML solutions.
AIM: Will the chain of approvals required in MLOps shorten the process? Is there a workaround for this?
Rodrigo: The chain of approvals in MLOps can lead to delays, but the process can be streamlined with the right approach. Approvals are often necessary to ensure the quality and security of the models. Still, a lengthy approval process can lead to decreased efficiency and increased time to market.
Tredence addresses this challenge by implementing an automated MLOps framework incorporating checks and balances at various ML development lifecycle process stages. For example, Feature Stores accelerate the model development lifecycle by sharing reusable features across models and implementing security access by attaching an access policy to the account. This helps maintain quality and meet security requirements while shortening the approval process. In addition, automated testing, continuous integration and deployment, and a streamlined review process also smoothen the process.
Another workaround is to adapt DevOps processes in MLOps so that cross-functional teams can make decisions and take ownership of the entire process, from development to deployment.
AIM: Since security has become a primary concern within companies, how will MLOps protect data from potential breaches?
Rodrigo: In the context of data security, MLOps can help protect against potential breaches by implementing several key strategies:
- Data masking: An ML observability tool that checks if sensitive data is masked before a model consumes it. A read policy can be attached and block access if sensitive data is masked.
- Access controls or ML eligibility usage: Implementing fine-grained access controls to tables limiting who has access to sensitive data and what actions they can perform with it.
- Model monitoring and a ground truth: Monitoring ML models for signs of tampering, drift, or bias. This can help identify when a model is being used for malicious purposes or when it is producing results that are not accurate.
- Isolated environments: Experimentation and deployment in a controlled, isolated environment restrict their access to sensitive data. This can help limit the risks of a data breach.
AIM: How important is standardising a process (which is still missing) while developing and deploying AI models in enterprises?
Rodrigo: According to a study by McKinsey, organisations that standardise their ML development process can expect a 40-60% improvement in time to deployment, a 25-40% reduction in model development costs and a 20-50% improvement in model accuracy.
The framework followed by companies across ML stages to ensure standardisation includes the following:
- Consistency: Standardisation provides consistent development and deployment across different projects and teams, leading to a more predictable and streamlined experience. A subsequent benefit is improved quality, leading to higher-quality models with fewer errors.
- Increased efficiency: By standardising the ML development lifecycle and MLOps process, teams can automate many manuals. This eventually increases scalability. You are freeing up valuable time and resources that can be used for other initiatives while you can expand from hundreds to thousands of models in production.
- Better governance: Standardisation can help ensure that ML use cases comply with relevant regulations and guidelines, cutting the risk of legal and reputational issues.
However, ironically, the challenge rose with the adoption of standardisation. Data scientists want the flexibility to try and experiment with tools and frameworks without constraints. This is the fun part that I keep hearing from junior ML developers about ML projects that have been shut down. Because when we set standards, we restrict them.
Companies have different ML team sizes, skill sets, maturity levels, and budgets depending on which the standardisation process can focus on data and then code. This is a typical pattern where organisations have robust DataOps and want to implement the same standards to the ML lifecycle. However, some tools, like Airflow, can be used in MLOps, despite not being appropriate. Airflow will lock ML teams to achieve the full potential of their ML use cases. The same is for autoML platforms. In standardisation, there are several hidden constraints and guardrails for no-code/low-code ML platforms — the absence of transparency blocks standardisation across the ML system.
The need for standardisation is one of the reasons I keep saying that MLOps is not a capability to be implemented; instead, it is a long journey that demands internal efforts and executive buy-in to add value to the business for the long term.