Java High Performance Read Patterns
Jakob Jenkov |
How your Java application reads data can have a big impact on its read performance. In this article I will describe a few different read patterns and explain their performance characteristics.
Read Into New
The first Java read pattern is the read-into-new pattern. This is what you would typically learn in university as the "right" way of reading data.
The read-into-new pattern is the pattern of having a read method which reads some kind of data and returns a new data structure with the read data. Here is first a simple example data structure:
public class MyData { public int val1 = 0; public int val2 = 0; }
Here is an example read method which reads data into a MyData
object:
public MyData readMyData(byte[] source) { MyData myData = new MyData(); myData.val1 = source[0]; myData.val2 = source[1]; return myData; }
As you can see, the readMyData()
method returns a MyData
object. First a MyData
object is created. Second, the readMyData()
method reads data into the MyData
object.
Third, the MyData
object is returned to the calling code.
What is worth noting about this read pattern is that every time you call the readMyData()
method
a new MyData
object is returned. That is why the pattern is called read-into-new, meaning
data is read into a new object.
If the readMyData()
method is called frequently that will lead to a lot of MyData
objects
being created. That puts pressure on the object allocation system and the garbage collector. This results in
lower performance, and possibly longer garbage collection pauses from time to time.
Another disadvantage of the read-into-new pattern is that each object may be located in very different areas of the computer's memory. This means that the chance of the object being the CPU cache is low.
Read Into Existing
The read-into-existing pattern reads data into an existing object instead of create a new object for every
call to the read method. This means that the same object can be reset and reused for multiple
calls to the read method. Here is how the earlier readMyData()
method would look using the
read-into-existing pattern:
public MyData readMyData(byte[] source, MyData myData) { myData.val1 = source[0]; myData.val2 = source[1]; return myData; }
The main difference from the previous version to this version is that this version takes the MyData
object to read the data into as parameter. It is now up to the caller of the readMyData()
method
to decide if an existing MyData
instance should be reused, or if a new instance should be
created.
Reusing a MyData
instance rather than creating a new will save time and memory compared to
always creating a new instance. It will also lower the pressure on the Java garbage collector, so the risk of
long garbage collection pauses is reduced.
Reusing an object also means that the chance the object is located in the CPU cache is much higher than when
you create a new object for each call to the readMyData()
method.
Read Out Of
The read-out-of pattern does not read data into objects. Instead it read the needed values directly from the underlying data source.
Reading values directly from the data source can save some time because data does not first need to be copied into an object before it can be used. When needed, the values are copied directly out of the underlying data source.
Reading values directly from the data source also has the advantage that only the data that is actually used will be copied out of the underlying data source. Thus, if the reading code only needs part of the data, only those parts are copied out.
To change the previous example code to read data directly from the underlying source we need to change the implementation of the
MyData
class:
public class MyData() { private byte[] source = null; public MyData() { } public void setSource(byte[] source) { this.source = source; } public int getVal1() { return this.source[0]; } public int getVal2() { return this.source[1]; } }
To use the MyData
class in its new variation, you will use code like this:
byte[] source = ... //get bytes from somewhere MyData myData = new MyData(); myData.setSource(source); int val1 = myData.getVal1(); int val2 = myData.getVal2();
Notice first, that you can reuse the MyData
instance. Just call setSource()
when you
need to read data out of a new byte array.
Second, data is only copied out once - from the byte array to the code using the value. It is not first copied
from the byte array to the MyData
object, and then from there to whatever calculation needs the
value.
Third, only if you actually call both getVal1()
and getVal2()
will the corresponding
data be read out of the underlying byte array. If only one of the values is needed by a calculation, only
that value needs to be read out of the byte array. This saves time when only part of the data is used.
A read method that reads data into an object does most often not know how much of the data is needed. Thus it is normal to copy all the data into the object. Unless you create multiple read methods tailored for each calculation, but that adds more work to your plate.
Navigator
If the underlying data source contains multiple "records" or "objects", you can change the read-out-of pattern to the navigator pattern. The navigator patterns works like the read-out-of pattern but adds methods for navigating between the records or objects in the underlying source.
Assuming that each MyData
object consists of 2 bytes from the underlying source, here is how
the MyData
class would look with a navigation method added:
public class MyData() { private byte[] source = null; private int offset = 0; public MyData() { } public void setSource(byte[] source, int offset) { this.source = source; this.offset = offset; } public int getVal1() { return this.source[this.offset]; } public int getVal2() { return this.source[this.offset + 1]; } public void next() { this.offset += 2; //2 bytes per record } public boolean hasNext() { this.offset < this.source.length; } }
The first change is the that the setSource()
method now takes an extra parameter called
offset
. This is not strictly necessary, but that enables the MyData
navigator
to start from an offset into the source byte array instead of the first byte.
The second change is that the getVal1()
and getVal2()
methods now use the
value of the internal offset
variable as index into the source array when reading values out.
The third change is the addition of the next()
method. The next()
method increments
the internal offset
variable by 2, so that the offset
variable points to the next
record in the array.
The fourth change is the addition of the hasNext()
method which returns true if the source
byte array has more records (bytes) in it.
You use the navigator version of MyData
like this:
byte[] source = ... // get byte array from somewhere MyData myData = new MyData(); myData.setSource(source, 0); while(myData.hasNext()) { int val1 = myData.getVal1(); int val2 = myData.getVal2(); myData.next(); }
As you can see, using the MyData
class in the navigator pattern implementation is pretty
straightforward. Very similar to using a standard Java Iterator
.
Tweet | |
Jakob Jenkov |