RION Performance Benchmarks
Jakob Jenkov |
One of the RION Design Goals is that RION should be fast to read and write. To verify that we have indeed met that design goal we have benchmarked RION against other data formats and toolkits. This page contains the results of these benchmarks. Of course, no data format is best at everything, but as you can see, RION does pretty well across the many situations measured.
RION vs. JSON vs. Protobuf vs. MessagePack vs. CBOR
We have compared RION to JSON, Protobuf (Google Protocol Buffers), MessagePack and CBOR.
First of all we have compared RION to JSON because JSON is a commonly used format for exchanging data over a network. JSON is a natural choice if the client is a web browser because web browsers have built-in support for parsing JSON to JavaScript objects. But JSON is also often used as data format between backend services despite not being the fastest, most compact or most flexible and expressive data format you could use for that purpose. For JSON we have used Jackson's JSON APIs which are known to be among the fastest JSON APIs out there.
Second, we have benchmarked RION against ProtoBuf, MessagePack and CBOR which are all binary data formats. Since RION is a binary data format it is more fair to benchmark RION against these data formats than JSON. For Protobuf we have used Google's Protocol Buffers implementation. For MessagePack and CBOR we have used Jackson's implementations.
Toolkits and APIs
We have benchmarked IAP Tools, Jackson (2.5.3 + 2.6.3) and Google Protocol Buffers (3.0.0-alpha-2). Furthermore, both IAP Tools and Jackson has multiple APIs you can use, so we have (or will soon) benchmark those too.
Jackson is used for JSON, MessagePack and CBOR.
Both IAP Tools and Jackson have a Java Reflection based API which figures out what fields to serialize
via reflection. Benchmarks of reflection based APIs are suffixed with a (R)
.
Both IAP Tools and Jackson also have an API where you need to "hand code" the reading and writing of objects.
These APIs perform better than the reflection based APIs, but require more hand coding by developers using them.
Benchmarks measuing hand coded APIs are suffixed with an (H)
Google Protocol Buffer reads and writes are always hand coded.
IAP Tools also has an "optimized" option where property names of objects are left out, so only property
values are written. Benchmarks measuring this option has an extra O
added to the suffix.
For instance (HO)
or (RO)
.
We have used a red color for JSON (textual format), yellow colors for other binary formats (MessagePack, CBOR, Protobuf) and a green color for RION formats.
Benchmark Information
The benchmarks are all implemented using the JMH - Java Microbenchmark Harness. We have attempted to make the benchmarks as fair as possible (to our knowledge). Of course we may have overlooked something. Therefore the benchmark code is publicly available on GitHub:
https://github.com/jjenkov/iap-tools-java-benchmarks
The benchmarks are executed on an Intel Core i7-4770 Quad-Core Haswell server which has no other work load than these benchmarks. The benchmarks are executed with Java JDK 1.8.0_u60, 64 bit edition with no special JVM flags enabled.
Length vs. Throughput
We have measured both the serialized length of various formats as well as the throughput of read and write operations. For serialized length, a lower number is generally better. For throughput a higher number is generally better.
The length of serialized data matters. More compact data transfers faster over networks - especially over encrypted connections where it is currently recommended to turn off compression because of the CRIME and BREACH attacks.
Benchmark Configurations
We have benchmarked the reading and writing of the supported data types individually, and we have a benchmarked the reading of writing of objects with mixed data types to get a picture of the average performance you can expect.
We have measured the individual data types in the following configurations:
- 1 object with 1 field - of the given data type
- 10 objects with 1 field - of the given data type
- 100 objects with 1 field - of the given data type
- 1000 objects with 1 field - of the given data type
- 1 object with 10 fields - of the given data type
- 10 objects with 10 fields - of the given data type
- 100 objects with 10 fields - of the given data type
- 1000 objects with 10 fields - of the given data type
A single object with a single field gives an impression of how RION performs with small objects containing a few fields of the given data type.
The single object with 10 fields gives an impression of how RION performs with objects with more fields of the given data type. 10 fields means that the overhead of writing the object (the object overhead) is spread out over more fields. The performance of reading and writing an object with 10 fields of a given data type thus gives you a more precise impression of the read / write performance of that data type.
The reading and writing arrays of objects with fields of the given data type is done to show the performance of reading and writing bigger data structures. We have measured these configurations with 10, 100 and 1000 objects in the array. 10 because 10 is a common number of objects to send back from e.g. a web service (e.g. 10 search results, or browsing through a larger number of results 10 at a time). 100 and 1000 to give a picture of the performance of reading and writing larger numbers of objects.
Read Throughput
This section contains read throughput benchmarks for a variety of objects with different numbers of properties and data types. The objects are the same as used during write throughput and serialized length benchmarks later.
By throughput is meant the number of times per second a given API can read an object from serialized form. The higher throughput the better.
Mixed Types
The mixed type throughput benchmark uses an object with a boolean, int, float, double and string field (5 fields).
The object read looks like this:
public class Pojo1Mixed { public boolean field0 = true; public long field1 = 1234; public float field2 = 123.12F; public double field3 = 123456.1234D; public String field4 = "abcdefg"; }
Boolean
By boolean we mean a value of true or false.
The object read looks like this:
public class Pojo1Boolean { public boolean field0 = true; }
The object read looks like this:
public class Pojo10Boolean { public boolean field0 = true; public boolean field1 = false; public boolean field2 = true; public boolean field3 = false; public boolean field4 = true; public boolean field5 = false; public boolean field6 = true; public boolean field7 = false; public boolean field8 = true; public boolean field9 = false; }
Float
By float we mean 32 bit floating point numbers.
The object read looks like this:
public class Pojo1Float { public float field0 = 1.1F; }
The object read looks like this:
public class Pojo10Float { public float field0 = 1.1F; public float field1 = 12.12F; public float field2 = 123.132F; public float field3 = 1234.1234F; public float field4 = 12345.12345F; public float field5 = -1.1F; public float field6 = -12.12F; public float field7 = -123.123F; public float field8 = -1234.1234F; public float field9 = -12345.12345F; }
Double
By double we mean 64 bit floating point numbers.
The object read looks like this:
public class Pojo1Double { public double field0 = 1.1D; }
The object read looks like this:
public class Pojo10Double { public double field0 = 1.1D; public double field1 = 12.12D; public double field2 = 123.132D; public double field3 = 1234.1234D; public double field4 = 12345.12345D; public double field5 = -1.1D; public double field6 = -12.12D; public double field7 = -123.123D; public double field8 = -1234.1234D; public double field9 = -12345.12345D; }
Long
By long we mean up to 64 bit integers.
The object read looks like this:
public class Pojo1Long { public long field0 = 1; }
The object read looks like this:
public class Pojo10Long { public long field0 = 1; public long field1 = 255; public long field2 = 65535; public long field3 = 1_000_000; public long field4 = 1_000_000_000; public long field5 = -1; public long field6 = -255; public long field7 = -65535; public long field8 = -1_000_000; public long field9 = -1_000_000_000; }
String
By string we mean Java String which is read from a sequence of UTF-8 encoded characters.
The object read looks like this:
public class Pojo1String { public String field0 = "a"; }
The object read looks like this:
public class Pojo10String { public String field0 = "a"; public String field1 = "ab"; public String field2 = "abc"; public String field3 = "abcd"; public String field4 = "abcde"; public String field5 = "abcdef"; public String field6 = "abcdefg"; public String field7 = "abcdefgh"; public String field8 = "abcdefghi"; public String field9 = "abcdefghij"; }
Read and Use Throughput
If you want maximum performance you should not parse RION data into Java objects. You should use the RION data directly in its binary form. RION was designed with this use case in mind, so it is possible to do. To give you an idea of the performance difference we have made a simple benchmark that compares the performance of reading RION data into Java objects before using it to reading the RION data directly in its binary form.
The benchmark reads a single object with a nested array of objects. The nested array of objects contains 10
objects, each with a single field of type long
(64 bit integer). The benchmark sums the values of the
10 long fields.
The first benchmark first parses the RION data into a Java object graph before summing the long fields. This benchmark uses Java reflection to build the object graph - which is the slowest way to work with RION data.
The second benchmark sums the fields by reading them directly from the raw RION data.
As you can see, the performance difference is massive.
Write Throughput
This section contains write throughput benchmarks of a variety of objects with different numbers of properties and data types. The objects are the same as are used for the serialized length benchmarks later.
By throughput is meant the number of times per second a given API can write an object to serialized form. The higher throughput the better.
Mixed Type
The mixed type throughput benchmark uses an object with a boolean, int, float, double and string field (5 fields).
The object written looks like this:
public class Pojo1Mixed { public boolean field0 = true; public string field1 = 1234; public float field2 = 123.12F; public double field3 = 123456.1234D; public String field4 = "abcdefg"; }
Boolean
The pojos used are the same as for the boolean read benchmarks.
Float
The pojos used are the same as for the float read benchmarks.
Double
The pojos used are the same as for the double read benchmarks.
Long
The pojos used are the same as for the long read benchmarks.
String
The pojos used are the same as for the String read benchmarks.
Serialized Length
This section contains serialized length measurements of a variety of objects with different numbers of properties and different data types. The smaller the serialized length the better. The objects are the same as used for the throughput benchmarks shown earlier.
We have not included Google Protocol Buffers in all serialized length comparisons. Google Protocol Buffers requires generating code for each data type to serialize. Generating classes for all the different configurations we test would require a lot of work.
An important note about MessagePack and CBOR: Our serialized length measurements are without advanced features like string back reference enabled. With such more advanced features their serialized lengths might be shorter (but never shorter than RION tables).
Boolean
Here is what the object with 1 boolean field looks like:
public class Pojo1Boolean { public boolean field0 = true; }
Here is what the object with 10 boolean fields looks like:
public class Pojo10Boolean { public boolean field0 = true; public boolean field1 = false; public boolean field2 = true; public boolean field3 = false; public boolean field4 = true; public boolean field5 = false; public boolean field6 = true; public boolean field7 = false; public boolean field8 = true; public boolean field9 = false; }
Long
A "Long" is an up to 64 bit integer.
Here is what the object with 1 long field looks like:
public class Pojo1Long { public long field0 = 1; }
Here is what the object with 10 long field looks like:
public class Pojo10Long { public long field0 = 1; public long field1 = 255; public long field2 = 65535; public long field3 = 1_000_000; public long field4 = 1_000_000_000; public long field5 = -1; public long field6 = 255; public long field7 = -65535; public long field8 = -1_000_000; public long field9 = -1_000_000_000; }
Float
By "float" we mean a 32 bit floating point number.
The object with the 1 float property looks like this:
public class Pojo1Float { public float field0 = 1.1F; }
The object with the 10 float properties looks like this:
public class Pojo10Float { public float field0 = 1.1F; public float field1 = 12.12F; public float field2 = 123.132F; public float field3 = 1234.1234F; public float field4 = 12345.12345F; public float field5 = -1.1F; public float field6 = -12.12F; public float field7 = -123.123F; public float field8 = -1234.1234F; public float field9 = -12345.12345F; }
Double
By "double" we mean 64 bit floating point number.
The object with the 1 double property looks like this:
public class Pojo1Double { public double field0 = 1.1D; }
The object with the 10 double properties looks like this:
public class Pojo10Double { public double field0 = 1.1D; public double field1 = 12.12D; public double field2 = 123.132D; public double field3 = 1234.1234D; public double field4 = 12345.12345D; public double field5 = -1.1D; public double field6 = -12.12D; public double field7 = -123.123D; public double field8 = -1234.1234D public double field9 = -12345.12345D; }
String
By "String" is meant sequences of UTF-8 characters.
The object with the 1 String property looks like this:
public class Pojo1String { public String field0 = "a"; }
The object with the 10 double properties looks like this:
public class Pojo10String { public String field0 = "a"; public String field1 = "ab"; public String field2 = "abc"; public String field3 = "abcd"; public String field4 = "abcde"; public String field5 = "abcdef"; public String field6 = "abcdefg"; public String field7 = "abcdefgh"; public String field8 = "abcdefghi"; public String field9 = "abcdefghij"; }
Tweet | |
Jakob Jenkov |