Study of influence of AI assistants like Copilot on safety of code

A group of researchers from Stanford University studied the effect of the use of intellectual assistants when writing code, on the appearance of vulnerabilities in the code. The solutions based on the Openai Codex machine learning platform, such as Github Copilot, allowing the formation of rather complex code blocks, up to ready -made functions, were considered. Fears are due to the fact that since the real code of the GITHUB public repositories is used to train the machine learning model, including those containing vulnerabilities, the synthesized code can repeat errors and offer the code in which there are vulnerabilities.

47 volunteers who have different experience in programming – from students, to professionals with ten years of experience were involved in conducting a study. The participants were divided into two groups – experimental (33 people) and control (14 people). Both groups had access to any libraries and Internet resources, including ready-made examples with Stack Overflow. The experimental group was given the opportunity to use the AI ​​assistant.

Each participant was given 5 tasks related to writing code, in which it is potentially easy to make errors leading to vulnerabilities. For example, there were tasks to write encryption and decryption functions, the use of digital signatures, processing data involved in the processing of file tracks or the formation of SQL checks, manipulation with large numbers in the Code in the language, input processing displayed in Web pages. To consider the impact of programming languages ​​on the safety of the code obtained using AI assistants, the tasks were prepared for Python, SI and JavaScript.

In the end, it was revealed that the participants who used the intellectual AI assistant based on the model codex-davinci-002 , prepared a much less safe code than participants who did not use the AISSistan. In general, only 67% of the group members who used
AI Assistant, were able to provide the correct and safe code, while in another group this figure was 79%.

At the same time, the self -esteem indicators were reverse – the participants who used the AISSistan believed that their code would be safer than the participants from another group. In addition, it was noted that the participants who trusted the AI ​​assistant less and spent more time on the analysis of the promoted prompts and made changes to them, allowed less vulnerabilities in the code.

For example, the code copied from cryptographic libraries contained safer values ​​of default parameters than in the code proposed by the AISSistan. Also, when using the AI ​​assistant, the choice of less reliable encryption algorithms and the lack of authenticity of the authenticity of the returned values ​​were recorded. In the task related to the manipulation of the numbers in the language of SI, in the code written using the AI ​​assistant, more errors were made that lead to integer overflow.

/Media reports cited above.