Polymorph Data Language

Jakob Jenkov
Last update: 2022-07-23

Polymorph Data Language (PDL) is a textual language which can be translated into the binary Polymorph Data Encoding (PDE). This enables you to write data in a text editor and convert it into binary PDE. You can also convert PDE to Polymorph Data Language if you want to. Thus, PDL makes it easier to work with PDE.

Polymorph Data Language Syntax

The core Polymorph Data Language syntax looks like this for atomic fields:

fieldtype("field value")

As you can see, an atomic PDL field consists of:

  • Field type
  • (
  • Field value
  • )

For composite fields the PDL syntax looks like this:

fieldtype("field value") {

   nestedfieldtype("nested field value")

}

As you can see, a composite PDF field consists of:

  • Field type
  • (
  • Field value (if any - composite fields often only have nested fields - no field value itself)
  • )
  • {
  • Nested fields (0 to N)
  • }

Comments

You can embed comments inside a Polymorph Data Language document. PDL supports both single line and multi-line comments.

A single line comment starts with a # and lasts until the end of the line.

A multiline comment starts with two-character combination and ends with a two-character combination. PDL supports 3 different two-character combinations. These 3 combinations have been chosen to make them as easy to write as possible for as many people around the world as possible - based on the various different keyboard layouts around the world.

The PDL multi-line comment begin two-character combinations are:

#"
#'
#@

The PDL multi-line comment end two-character combinations are:

"#
'#
@#

Here are a few PDL comment examples:

# This is a single line comment

utf8("This is a real field")


#"
 This is a multi-line comment
   That ends on the next line
"#


#' This is also a
   multi-line comment '#


#@ This is also a
   multi-line comment @#


#" This is a nested multi-line comment
    #' with mixed start and end character combinations
       #@
           and it is allowed
       "#
    @#
'#

As you can see, it is allowed to nest multi-line comments within each other. This can sometimes be handy, when commenting out a larger section which already has multi-line comments inside.

As you can also see, PDL does not require strict matching between the character combinations used to mark the beginning and the end of a multi-line comment section. They are just considered "multi-line comment begin" and "multi-line comment end" markers - regardless of the exact character combination used.

Raw Field Value Syntax

All PDE fields that have a value (boolean and null fields have no field value) have a byte sequence as their field value. To represent a byte sequence in textual format - PDL uses a set of textual syntaxes which can be translated into bytes. This means, that it is up to you to choose how you will write the value of a field in PDL.The value is simply translated into bytes, and then that byte sequence is what is considered the value of the field.

To better understand this let us look at a simple example. In the following PDL UTF-8 field example - you can see that the value of the UTF-8 field is encoded using a string embedded within two quotes ("..."):

utf8("Hello PDL")

Notice the value is specified using the string "Hello PDL" . This means a value encoded using UTF-8 - because the value is enclosed in quotes.

However, don't have to specify the value of an UTF-8 field using UTF-8 encoding. You could actually specify the field value of an utf8 field using a hexadecimal syntax instead, like this:

utf8(:48 65 6C 6C 6F 20 50 44 4C:)

Notice how the value of the utf8 field is specified using the string :48 65 6C 6C 6F 20 50 44 4C:

That string consists of bytes represented via their hexadecimal syntax - enclosed in colons. The byte sequence is the exact same as would result from the UTF-8 encoded "Hello PDL" string.

As you can see - the field type does not mandate what syntax you specify its field value in. In the end, all value strings are translated into raw bytes - and it is those bytes that count.

That means, that it is easy to pack raw bytes into any type of field using a hexadecimal value syntax, or to pack UTF-8 bytes into a bytes field using a UTF-8 value syntax. Here is an example of specifying a field value for a PDL bytes field using hexadecimal and UTF-8 encoding:

bytes("Hello PDL")

bytes(:48 65 6C 6C 6F 20 50 44 4C:)

Value Syntaxes

Polymorph Data Language supports the following textual syntaxes which can be translated into raw byte sequences:

  • Hex syntax
  • Base 64 syntax
  • UTF-8 syntax
  • Integer syntax
  • Floating point syntax

There is an example of each of these syntaxes below. The comments after the # are not part of the syntaxes. They are just there to help you see which syntax is which.

:48656C6C6F 20 50444C:       # Hex

|SGVsbG8gUERM|               # Base64

"Hello PDL"                  # UTF-8

+12347                       # Positive integer in decimal notation
-56744                       # Negative integer in decimal notation

%123.45                      # 4 byte floating point in decimal notation
/123.4567890                 # 8 byte floating point in decimal notation

Polymorph Data Language Field Types

Polymorph Data Language supports the same field types that Polymorph Data Encoding supports. That means:

  • boolean
  • positive integer
  • negative integer
  • 4 byte floating point
  • 8 byte floating point
  • bytes
  • UTF-8
  • UTC
  • copy
  • reference
  • key
  • object
  • table

Each of these field types will be explained in more detail in the following sections.

Boolean

boolean(+0)
boolean(+1)

Table


        

Jakob Jenkov

Featured Videos















Core Software Performance Optimization Principles




Advertisements

High-Performance
Java Persistence
Close TOC

All Trails

Trail TOC

Page TOC

Previous

Next