Microsoft Unveils New Features to Safeguard AI

Microsoft announced the development of new technologies aimed at combating threats related to hacking of AI systems. The AI Spotlighting and AI WatchDOG functions are designed to protect against two types of attacks: the introduction of malicious instructions and the use of “poisoned” content.

Two new approaches to security

  • AI Spotlighting shares user instructions from harmful content, enabling the system to analyze data without processing hidden threats.
  • AI WatchDOG functions as a watchdog, identifying hostile instructions and thwarting attempts to hack the system.

Furthermore, Microsoft has introduced a new tool for AI researchers and security professionals – Pyrit (Python Risk Identification Toolkit). This toolkit helps in proactively identifying risks and vulnerabilities in AI systems.

Attack scenarios and methods for neutralization

An attacker can employ two main methods to attack AI: manipulating user requests and injecting malware.

In the first scenario, the attacker can provide harmful instructions through a user request. In the second scenario, the attacker can trick the AI into processing seemingly harmless documents that contain hidden instructions. For example, when analyzing a “poisoned” email, AI may expose passwords or leak confidential information unknowingly to the user.

Microsoft warns that attacks using “poisoned content” have a success rate of over 20%. Spotlighting reduces this rate to below the detection threshold, maintaining the overall performance of AI systems.

Multi-level defense and Crescendo attacks

To enhance defense, Microsoft has developed a query filtering system that analyzes the entire history of interactions with AI to detect potential threats.

This filtering system aims to protect against a new type of attack on AI, known as CressCendo. CressCendo deceives the model into generating malicious content by using its own responses. Through carefully crafted questions or cues that gradually lead AI to the desired outcome, attackers can bypass defenses and filters in less than 10 interactions.

The company stresses that safeguarding against sequential requests that may seem harmless individually but collectively compromise protective measures is crucial for ensuring AI system safety. According to Microsoft, the measures implemented significantly reduce the risk of successful attacks, bolstering the security of systems amidst constantly evolving cyber threats.

/Reports, release notes, official announcements.