The Finnish company Flow Computing has developed a new approach to improving processor performance to return to them a key role in the world of modern computing devices.
Flow Computing proposes to abandon the usual architectures with many identical nuclei and replace them with a hybrid system that combines standard nuclei with unique parallel processing units (Parallel Processing Unit (PPU), which will increase productivity to existing solutions.
Instead of traditional solutions with several identical nuclei, the new architecture involves the use of 4 standard nuclei and 64 PPU-yader in the same space. This approach is designed to increase the efficiency when performing tasks that can be performed in parallel. На конференции IEEE Hot Chips в августе команда Flow Computing introduced the concept of its architecture.
PPU accelerate the implementation of parallel tasks, where the standard CP does not cope with processing, and the transfer of the problem to the graphic processor (GPU) can be too costly. It is noted that such a technology will optimize the work even with small amounts of tasks that were previously considered unsuitable for parallelization due to costs for their distribution and synchronization.
The Flow Computing said that a computer architecture is difficult to optimize for consistent and parallel tasks at the same time. Therefore, the development of the company involves the separation of functions: consistent tasks are processed by standard core nuclei, and parallel to PPU nuclei, which allows you to use the strengths of each type of nuclei.
In order to achieve optimization in parallel data processing, the PPU architecture is focused on 4 key requirements:
- Reducing delays when accessing memory, which means finding methods not only idle while the next one is loaded from the memory Part of the data;
- providing sufficient bandwidth for communication between data streams – processor instructions, which are performed in parallel;
- Effective synchronization, which means ensuring the performance of parallel parts of the code in the correct order;
- The use of low -level parallel processing is the ability to use several functional blocks that actually perform mathematical and logical operations at the same time.
Flow Computing has developed an architecture that can cope with these requirements.
PPU uses multi -traffic to hide the delays of access to memory. When the stream causes data from memory, another stream can start execution while the first