Durable Memory Tuning

Tuning Durable Memory and Operating System for Best Performance and Throughput

❗️

This is a legacy Apache Ignite documentation

The new documentation is hosted here: https://ignite.apache.org/docs/latest/

This section comprises performance suggestions and tuning parameters for Durable Memory and Ignite Native Persistence. General configuration parameters are listed in the Memory Configuration chapter of this documentation.

General Tuning

This section encompasses general recommendation for proper Durable Memory tuning regardless of whether Ignite is used in pure in-memory mode or with persistence enabled.

Adjust Swappiness Settings

An operating system might start swapping pages from RAM to disk when overall RAM usage hits a threshold. Swapping can significantly affect the performance of the Ignite node's process. You can adjust the operating system's setting to prevent this. If you are on Unix, the best option is to either decrease vm.swappiness parameter to 10, or set it to 0 if Ignite Native Persistence is enabled:

sysctl –w vm.swappiness=0

Share RAM

The operating system and other applications need some portion of RAM to fulfill their tasks. Plus, consider allocating some memory for Java Heap that is actively used by the Ignite node that processes queries and tasks issued by your applications. See Java Heap turning recommendations for more details.

Overall, if Ignite is used in the pure in-memory mode (no persistence), then you should not give more than 90% of the RAM to Ignite's durable memory.

If Ignite Native Persistence is enabled, then the operating system requires extra RAM for its page cache in order to optimally sync up the data to disk. Assuming this usage scenario, an Ignite node's overall memory usage (durable memory + Java heap) should not exceed more than 70% of the RAM.

For instance, the configuration below shows how to set 4 GB of RAM for Ignite durable memory requirements:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">

<!-- Redefining maximum memory size for the cluster node usage. -->  
<property name="dataStorageConfiguration">
  <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
    <property name="defaultDataRegionConfiguration">
      <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
        <!-- Setting the size of the default region to 4GB. -->
        <property name="maxSize" value="#{4L * 1024 * 1024 * 1024}"/>
      </bean>
    </property>
  </bean>
</property>
  
<!-- The rest of the parameters. -->
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();

// Changing total RAM size to be used by Ignite Node.
DataStorageConfiguration storageCfg = new DataStorageConfiguration();

// Setting the size of the default memory region to 4GB to achieve this.
storageCfg.getDefaultDataRegionConfiguration().setMaxSize(
    4L * 1024 * 1024 * 1024);

cfg.setDataStorageConfiguration(storageCfg);

// Starting the node.
Ignition.start(cfg);

JVM Tuning

If you use Native Persistence, we recommend that you set the MaxDirectMemorySize JVM parameter to <walSegmentSize * 4 >. With default WAL settings, the value amounts to 256MB.

Native Persistence Related Tuning

This section comprises recommendations to consider if Ignite Native Persistence is enabled.

Page Size

Ignite's page size (DataStorageConfiguration.pageSize) should be no less than the page size of your storage device (SSD, Flash, etc.) and the cache page size of your operating system.

The operating system's cache page size can be easily checked using system tools and parameters.

The page size of the storage device such as SSD may be specified in the device specification. If the manufacturer does not disclose this information, try to run SSD benchmarks to figure out the number. If it is difficult to sort out this number, use 4 KB as Ignite's page size. Many manufacturers have to adapt their drivers for 4 KB random-write workloads because a variety of standard benchmarks use 4 KB by default. This white paper from Intel confirms that 4 KB should be enough.

Once you pick the most optimal page size, apply it to the cluster configuration:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
  <property name="dataStorageConfiguration">
    <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
      <!-- Set the page size to 4 KB -->
      <property name="pageSize" value="#{4 * 1024}"/>
    </bean>
  </property>
  
  <!--- Additional settings ---->
</bean>
// Ignite configuration.
IgniteConfiguration cfg = new IgniteConfiguration();

// Durable memory configuration.
DataStorageConfiguration storageCfg = new DataStorageConfiguration();

// Changing the page size to 4 KB.
storageCfg.setPageSize(4096);

// Applying the new configuration.
cfg.setDataStorageConfiguration(storageCfg);

Separate Disk Device for WAL

Consider using separate drives for the partition and index files of Ignite Native Persistence and its WAL. Ignite actively writes to both the partition/index files and WAL, thus, by having separate physical disk devices for each you can double the overall write throughput. The example below shows how to achieve this:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
   ...		
  <!-- Enabling Ignite Native Persistence. -->
  <property name="dataStorageConfiguration">
    <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
      <!--
          Sets a path to the root directory where data and indexes are
          to be persisted. It's assumed the directory is on a separated SSD.
      -->
      <property name="storagePath" value="/var/lib/ignite/persistence"/>

      <!--
          Sets a path to the directory where WAL is stored.
          It's assumed the directory is on a separated HDD.
      -->
      <property name="walPath" value="/wal"/>

      <!--
          Sets a path to the directory where WAL archive is stored.
          The directory is on the same HDD as the WAL.
      -->
      <property name="walArchivePath" value="/wal/archive"/>
    </bean>
  </property>
    ...
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();

// Configuring Ignite Native Persistence.
DataStorageConfiguration storeCfg = new DataStorageConfiguration();

// Sets a path to the root directory where data and indexes are to be persisted.
// It's assumed the directory is on a separated SSD.
storeCfg.setStoragePath("/var/lib/ignite/persistence");

// Sets a path to the directory where WAL is stored.
// It's assumed the directory is on a separated HDD.
storeCfg.setWalPath("/wal");

// Sets a path to the directory where WAL archive is stored.
// The directory is on the same HDD as the WAL.
storeCfg.setWalArchivePath("/wal/archive");

// Starting the node.
Ignition.start(cfg);

Increasing WAL Segment Size

The default WAL segment size (64 MB) may be inefficient in high load scenarios because it causes WAL to switch between segments too frequently and switching is a somewhat costly operation (see how write-ahead logging works in the Write-Ahead Log section). Setting the segment size to a higher value (up to 2 GB) may help reduce the number of switching operations. However, this will increase the overall volume of the write-ahead log.

Changing WAL Mode

Consider other WAL modes as alternatives to the default mode. Each mode provides different degrees of reliability in case of node failure and that degree is inversely proportional to speed, i.e. the more reliable the WAL mode, the slower it is. Therefore, if your use case does not require high reliability, you can switch to a less reliable mode.

See WAL Modes for more details.

WAL Deactivation

There are situations where disabling the WAL can help improve performance.

Pages Writes Throttling

Ignite periodically starts the checkpointing process that syncs dirty pages from memory to disk. This process happens in the background without affecting the application's logic.

However, if a dirty page, scheduled for checkpointing, is updated before being written to disk, its previous state is copied to a special region called a checkpointing buffer. If the buffer gets overflowed, Ignite will stop processing all updates until the checkpointing is over. As a result, the writes performance can drop to zero as shown in​ this diagram:

399

The same situation occurs if the dirty pages threshold is reached again while the checkpointing is in progress. This will force Ignite to arrange one more checkpointing execution and to halt all the update operations until the first checkpointing is over.

Both situations usually arise when either a disk device is slow or the update rate is too intensive. To mitigate and prevent these performance drops, consider enabling the pages write throttling algorithm. The algorithm brings the performance of update operations down to the speed of the disk device whenever the checkpointing buffer fills in too fast or the percentage of the dirty pages soar rapidly.

👍

Pages Write Throttling in a Nutshell

Refer to the Ignite wiki page maintained by Apache Ignite persistence experts to get more details about the throttling and its causes.

The example below shows how to enable write throttling:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
   ...		
  <!-- Enabling Ignite Native Persistence. -->
  <property name="dataStorageConfiguration">
    <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
      <!-- Enable write throttling. -->
      <property name="writeThrottlingEnabled" value="true"/>
    </bean>
  </property>
    ...
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();

// Configuring Ignite Native Persistence.
DataStorageConfiguration storeCfg = new DataStorageConfiguration();

// Enabling the writes throttling.
storeCfg.setWriteThrottlingEnabled(true);

// Starting the node.
Ignition.start(cfg);

Checkpointing Buffer Size

The size of the checkpointing buffer, explained in the section above, is one of the checkpointing process triggers.

The default buffer size is calculated as a function of the data region size:

Data Region Size

Default Checkpointing Buffer Size

< 1 GB

MIN (256 MB, Data_Region_Size)

between 1 GB and 8 GB

Data_Region_Size / 4

> 8 GB

2 GB

The default buffer size can be nonoptimal for some of the write-intensive workloads because the pages write throttling algorithm will slow down your writes whenever the size comes to the critical mark. To keep the write performance at the desired pace while the checkpointing is in progress, consider increasing DataRegionConfiguration.checkpointPageBufferSize and enable write throttling to prevent performance​ drops:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
   ...		
  <!-- Enabling Ignite Native Persistence. -->
  <property name="dataStorageConfiguration">
    <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
      <!-- Enable write throttling. -->
      <property name="writeThrottlingEnabled" value="true"/>
      
      <property name="defaultDataRegionConfiguration">
        <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
          <!-- Enabling persistence. -->
          <property name="persistenceEnabled" value="true"/>
                
          <!-- Increasing the buffer size to 1 GB. -->
          <property name="checkpointPageBufferSize" 
                    value="#{1024L * 1024 * 1024}"/>
        </bean>
      </property>
    </bean>
  </property>
    ...
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();

// Configuring Ignite Native Persistence.
DataStorageConfiguration storeCfg = new DataStorageConfiguration();

// Enabling the writes throttling.
storeCfg.setWriteThrottlingEnabled(true);

// Increasing the buffer size to 1 GB.
storeCfg.getDefaultDataRegionConfiguration().setCheckpointPageBufferSize(
  1024L * 1024 * 1024);

// Starting the node.
Ignition.start(cfg);

In the example above, the checkpointing buffer size of the default region is set to 1 GB.

📘

When is the Checkpointing Process Triggered?

Checkpointing is started if either dirty pages count goes beyond the totalPages * 2 / 3 value or DataRegionConfiguration.checkpointPageBufferSize is reached. However, if the pages write throttling is used, then DataRegionConfiguration.checkpointPageBufferSize is never accounted because it cannot be reached due to the way the algorithm works.

Enabling Direct I/O

Typically when an application requests to access data from disk, the OS gets the data and stores it in a file buffer cache. Similarly, for every write operation, the OS first writes the data in a cache and then transfers to the disk. To eliminate this process, you can enable Direct I/O in which case the data is read and written directly from/to the disk bypassing the file buffer cache.

Direct I/O plugin in Ignite is used for the checkpointing process where the dirty pages in RAM are written to the disk. It is recommended to use the Direct I/O plugin for write-intensive or mixed workloads.

Note that Direct I/O cannot be enabled specifically for WAL files. However, enabling the Direct I/O plugin provides a slight benefit regarding the WAL files - the WAL data will not be stored in the OS buffer cache for too long; it will be flushed (depending on the WAL mode) at the next page cache scan and removed from the page cache.

To enable the Direct I/O plugin, add ignite-direct-io-2.4.0.jar and jna-4.5.0.jar to the classpath of your application. These jar files are available in the libs/optional/ignite-direct-io folder of your Ignite distribution. Alternatively, when running a stand-alone node, you can copy the ignite-direct-io folder to the libs folder of your Ignite distribution before running the ignite.{sh|bat} script.

To disable Direct I/O, set the Ignite system property - IGNITE_DIRECT_IO_ENABLED=false

Get more details from Ignite Direct I/O Wiki Section.

Purchase Production-Level SSDs

Note that the performance of Ignite Native Persistence may drop after several hours of intensive write load due to the nature of how SSD operates. Consider buying fast production-level SSDs to keep the performance at high level for a long term.

SSD Over-provisioning

Performance of random writes on a 50% filled disk is much better than on a 90% filled disk because of the SSDs over-provisioning. Consider buying SSDs with higher over-provisioning rate and make sure the manufacturer provides the tools to adjust it.

👍

Intel 3D XPoint

Consider using 3D XPoint drives instead of regular SSDs to avoid the bottlenecks caused by a low over-provisioning setting and constant garbage collection at the SSD level. Read more here.