Ondřej Surý (Ondřej Surý), Director of DNS Engineering at ISC, former creator of the CZ.NIC Labs, developing the Knot DNS server, summed up the results of experiments on using large language models to analyze, correct and modernize the code base BIND 9 DNS servers, prototyping new projects, and preparing student teaching materials. My impressions of using AI were that large language models are well suited for quickly creating prototypes, understanding unfamiliar code and automating simple routine tasks, but when used to solve large problems it is doubtful that they save time, since a lot of time is spent on setting the problem, studying and checking the result, as well as subsequent refinement.
In the first experiment, the AI assistant Claude Code was tasked with detecting problems in the code base of the BIND DNS server 9, focusing on security issues and code modernization. Not one of the proposed fixes was accepted into the codebase, as Claude generated technically correct but practically useless code. For example, problems noted were reserved identifiers and potential integer overflows that were prevented by the compiler and did not require correction. The experiment was considered a waste of time.
In the second experiment, Ondrej asked Claude to write a telemetry system that integrates with different packages and minimizes metadata leaks. Claude Code prepared prototypes of the client and server, but without a proper understanding of the environment and the problems that arise during testing. Additionally, Google Gemini and ChatGPT were used for testing, and each AI model found errors as a result of the work of other models.
The method was suitable for quickly creating a prototype, but Ondřej noted that when working, he felt like the secretary of a robot overlord. Getting a prototype out quickly was encouraging at first, but in the end it felt like the entire AI development process took longer than manually writing code from scratch. A lot of time was spent analyzing the solution proposed by AI, checking for meaningless changes and reworking – the prototype had to be redone, since the quality of the code after AI turned out to be mediocre and the code included a large number of repeated structures.