Java High Performance Read Patterns

Read Into New
Read Into Existing
Read Out Of
Navigator

Jakob Jenkov
Last update: 2016-03-14

How your Java application reads data can have a big impact on its read performance. In this article I will describe a few different read patterns and explain their performance characteristics.

Read Into New

The first Java read pattern is the read-into-new pattern. This is what you would typically learn in university as the "right" way of reading data.

The read-into-new pattern is the pattern of having a read method which reads some kind of data and returns a new data structure with the read data. Here is first a simple example data structure:

public class MyData {
    public int val1 = 0;
    public int val2 = 0;
}

Here is an example read method which reads data into a MyData object:

public MyData readMyData(byte[] source) {
    MyData myData = new MyData();

    myData.val1 = source[0];
    myData.val2 = source[1];

    return myData;
}

As you can see, the readMyData() method returns a MyData object. First a MyData object is created. Second, the readMyData() method reads data into the MyData object. Third, the MyData object is returned to the calling code.

What is worth noting about this read pattern is that every time you call the readMyData() method a new MyData object is returned. That is why the pattern is called read-into-new, meaning data is read into a new object.

If the readMyData() method is called frequently that will lead to a lot of MyData objects being created. That puts pressure on the object allocation system and the garbage collector. This results in lower performance, and possibly longer garbage collection pauses from time to time.

Another disadvantage of the read-into-new pattern is that each object may be located in very different areas of the computer's memory. This means that the chance of the object being the CPU cache is low.

Read Into Existing

The read-into-existing pattern reads data into an existing object instead of create a new object for every call to the read method. This means that the same object can be reset and reused for multiple calls to the read method. Here is how the earlier readMyData() method would look using the read-into-existing pattern:

 public MyData readMyData(byte[] source, MyData myData) {

    myData.val1 = source[0];
    myData.val2 = source[1];

    return myData;
}

The main difference from the previous version to this version is that this version takes the MyData object to read the data into as parameter. It is now up to the caller of the readMyData() method to decide if an existing MyData instance should be reused, or if a new instance should be created.

Reusing a MyData instance rather than creating a new will save time and memory compared to always creating a new instance. It will also lower the pressure on the Java garbage collector, so the risk of long garbage collection pauses is reduced.

Reusing an object also means that the chance the object is located in the CPU cache is much higher than when you create a new object for each call to the readMyData() method.

Read Out Of

The read-out-of pattern does not read data into objects. Instead it read the needed values directly from the underlying data source.

Reading values directly from the data source can save some time because data does not first need to be copied into an object before it can be used. When needed, the values are copied directly out of the underlying data source.

Reading values directly from the data source also has the advantage that only the data that is actually used will be copied out of the underlying data source. Thus, if the reading code only needs part of the data, only those parts are copied out.

To change the previous example code to read data directly from the underlying source we need to change the implementation of the MyData class:

public class MyData() {

    private byte[] source = null;

    public MyData() {
    }

    public void setSource(byte[] source) {
        this.source = source;
    }

    public int getVal1() {
        return this.source[0];
    }

    public int getVal2() {
        return this.source[1];
    }
}

To use the MyData class in its new variation, you will use code like this:

byte[] source = ... //get bytes from somewhere

MyData myData = new MyData();

myData.setSource(source);

int val1 = myData.getVal1();
int val2 = myData.getVal2();

Notice first, that you can reuse the MyData instance. Just call setSource() when you need to read data out of a new byte array.

Second, data is only copied out once - from the byte array to the code using the value. It is not first copied from the byte array to the MyData object, and then from there to whatever calculation needs the value.

Third, only if you actually call both getVal1() and getVal2() will the corresponding data be read out of the underlying byte array. If only one of the values is needed by a calculation, only that value needs to be read out of the byte array. This saves time when only part of the data is used.

A read method that reads data into an object does most often not know how much of the data is needed. Thus it is normal to copy all the data into the object. Unless you create multiple read methods tailored for each calculation, but that adds more work to your plate.

Navigator

If the underlying data source contains multiple "records" or "objects", you can change the read-out-of pattern to the navigator pattern. The navigator patterns works like the read-out-of pattern but adds methods for navigating between the records or objects in the underlying source.

Assuming that each MyData object consists of 2 bytes from the underlying source, here is how the MyData class would look with a navigation method added:

public class MyData() {

    private byte[] source = null;
    private int    offset = 0;

    public MyData() {
    }

    public void setSource(byte[] source, int offset) {
        this.source = source;
        this.offset = offset;
    }

    public int getVal1() {
        return this.source[this.offset];
    }

    public int getVal2() {
        return this.source[this.offset + 1];
    }

    public void next() {
        this.offset += 2;  //2 bytes per record
    }

    public boolean hasNext() {
        this.offset < this.source.length;
    }
}

The first change is the that the setSource() method now takes an extra parameter called offset. This is not strictly necessary, but that enables the MyData navigator to start from an offset into the source byte array instead of the first byte.

The second change is that the getVal1() and getVal2() methods now use the value of the internal offset variable as index into the source array when reading values out.

The third change is the addition of the next() method. The next() method increments the internal offset variable by 2, so that the offset variable points to the next record in the array.

The fourth change is the addition of the hasNext() method which returns true if the source byte array has more records (bytes) in it.

You use the navigator version of MyData like this:

byte[] source = ... // get byte array from somewhere

MyData myData = new MyData();
myData.setSource(source, 0);

while(myData.hasNext()) {
    int val1 = myData.getVal1();
    int val2 = myData.getVal2();

    myData.next();
}

As you can see, using the MyData class in the navigator pattern implementation is pretty straightforward. Very similar to using a standard Java Iterator .

Next: Micro Batching

Tweet
	Jakob Jenkov