Malicious AI Models Found in Hugging Face Repository

Researchers Uncover Malicious Machine Learning Models

Researchers from JFROG uncovered malicious machine learning models, with the potential to execute attacking code and gain control over user systems. The issue arises from certain model distribution formats allowing the embedding of executable code. For instance, models using the “pickle” format can contain serialized objects in Python, along with file upload code. Tensorflow Keras models are also vulnerable, as they can be manipulated through Lambda Layer.

In response to this threat, Hugging Face has implemented scanning to detect serialized code substitutions, but the identified malicious models have shown the ability to bypass existing checks. Approximately 100 potentially harmful models were found, with 95% using PyTorch Framework and 5% utilizing TensorFlow. Common malicious activities include object hijacking, reverse shell creation, application launches, and file writes.

It is observed that many of the malicious models appear to be created by security researchers seeking rewards for uncovering vulnerabilities and evading Hugging Face’s security measures. Some models attempt benign actions like launching a calculator or sending network requests to simulate attack success. However, there are instances of models establishing reverse shells to connect attackers to the system.

For example, models like “Baller423/Goober2” and “Star23/Baller13” target systems that download PyTorch model files using the Torch.load() function. These models utilize the “__reduce__” method from the Pickle module to inject arbitrary Python code during the model loading process.


/Reports, release notes, official announcements.