Apache Cloudberry 2.0.0, which continues the development of the open code base of the GreenPlum DBMS, has been released. GreenPlum was turned into a closed product by Broadcom after acquiring VMware. Apache Cloudberry 2.0.0 is the first release of the project since the code was transferred to the Apache community. The project is currently in the Apache incubator and will become one of the primary Apache projects once infrastructure and accompanying elements are ready.
Cloudberry DBMS is a distributed version of the open DBMS PostgreSQL optimized for analytical queries over large data sets (Data Warehouse). It uses Massively Parallel Processing (MPP) architecture for parallel data processing, allowing scalability to petabyte sizes by dividing data into segments within a server cluster.
Improvements in Apache Cloudberry 2.0.0 include:
- The transition to PostgreSQL 14 code base (GreenPlum was based on PostgreSQL 12).
- Support for dynamic tables to automatically update query results, beneficial for real-time data analysis, data lakehouse architecture, and ETL processes.
- Optimized planning and execution of distributed queries.
- Improved resource management for memory and CPUs in component assemblies.
- Enhanced data distribution and parallel query processing.
- Expanded backup strategies in distributed environments.
- Licenses and code file designs brought into compliance with Apache Foundation requirements.
- Improved C++ and Python language project assembly processes.