- Greenplum Database performs best with a denormalized schema design suited for MPP analytical processing. For example, Star or Snowflake schema, with large fact tables and smaller dimension tables.
- Use the same datatype for columns used in joins between tables.
- Set vm.overcommit_memory to 2.
- Do not configure the operating system to use a large number of pages.
- Use gp_vmem_protect_limit to set the maximum memory that the instance can allocate for all work being done in each segment database.
- You can use gp_vmem_protect_limit by calculating:
- gp_vmem – the total memory available to Greenplum Database
gp_vmem = ((SWAP + RAM) – (7.5 GB + 0.05 * RAM)) / 1.7
Where SWAP is the host swap space in GB, and RAM is the host RAM in GB
- max_acting_primary_segments – the maximum number of primary segments that could be running on a host when mirror segments are activated due to a host or segment failure
- gp_vmem_protect_limit
gp_vmem_protect_limit = gp_vmem / acting_primary_segments
Convert to MB to set the value of the configuration parameter.
- In a scenario where many work files are generated calculate the gp_vmem factor with the following formula to account for the work files:
gp_vmem = ((SWAP + RAM) – (7.5GB + 0.05 * RAM - (300KB *
total_#_workfiles))) / 1.7
- Never set gp_vmem_protect_limit too high or larger than the physical RAM on the system.
- Use the calculated gp_vmem value to calculate the setting for the vm.overcommit_ratio operating system parameter:
vm.overcommit_ratio = (RAM - 0.026 * gp_vmem) / RAM
- Use the statement_mem to allocate the memory used for a query per segment database.
- Use the resource queues to set the numbers of active queries (ACTIVE_STATEMENTS) and the amount of memory (MEMORY_LIMIT) that can be used by queries in the queue.
- Associate all the users with a resource queue. Do not use the default queue.
- Set the priority to match the real needs of the queue for the workload and time of the day. Avoid using MAX priority.
- Ensure that the resource queue memory allocations do not exceed the setting for gp_vmem_protect_limit.
- Dynamically update the resource queue settings to match the daily operations flow.
Refer to VMware Greenplum documentation to get the latest recommendations and best practices for Greenplum.