2022-07-08: Kaitai Struct v0.10 released

Kaitai project is happy to announce release of new major version of Kaitai Struct, declarative markup language to describe various binary data structures — binary file formats, network stream packets, etc.

The basic idea of Kaitai Struct is that a particular format can be described using Kaitai Struct language (in a .ksy file), which then can be compiled using kaitai-struct-compiler into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read described data structure from a file / stream and provide access to its contents in a nice, easy-to-comprehend API.

This release celebrates reaching 3000 stars on GitHub. It doesn’t bring new language features, but improves a number of existing aspects to provide a better experience when using Kaitai Struct in your projects.

Release highlights

  • General compilation improvements:
    • Prevent referring to non-existent enum members as my_enum::unknown_member (8dcd1be)
    • Prevent duplicate member names in enum definition (1cbaff9) — they’re incompatible with the concept of enum in all target languages
    • Ensure that IDs of params are unique and don’t collide with seq fields or instances within a type (#923)
    • Allow whitespace in type invocation: even type: ' nested :: type ( 1 + 2 , data ) ' now works (#792)
    • Add style warnings reporting non-standard names for size fields (should use len_ + subject) and repeat count fields (should use num_ + subject) — see style guide
      • they are only recommendations and don’t prevent compilation
      • only available in the command-line kaitai-struct-compiler on the JVM platform (not in the Web IDE or in the JavaScript build at npm)
    • Add the ability to report multiple problems at once instead of stopping after the first error — used for “type validation” errors and style warnings for now (only on JVM compiler builds, not JS builds)
    • Improve readability of problems listed in the compiler output
    • Force UTF-8 as output encoding in generated files (don’t rely on system defaults)
    • --ksc-json-output: add warnings at the same level as errors, don’t use octal escapes (e.g. “\274” ⟶ “\u00bc”) in string values (invalid in JSON)
    • Use SnakeYAML (the YAML parser used by JVM compiler builds) 1.25 ⟶ 1.28, which no longer contains the DoS vulnerability allowing a “billion laughs” attack (50f80d7)
  • Runtime API changes:
    • C++: kstream::to_string now works for all integer types up to 64 bits (not just int as before), has better performance and portability (cpp_stl#50)
    • Go: ReadBitsInt{Be,Le} now accept the number of bits as uint8int (go@a5c5c1e)
    • Java: readBytesTerm, processXor now accept a single byte value as intbyte
    • JavaScript: update UMD envelopes to support Web Workers and modules (in the runtime library, generated parsers and JS compiler builds)
    • JavaScript: readBitsInt{Be,Le} now throw ErrorRangeError when trying to read more than 32 bits
    • Lua: add zzlib as a submodule to support process: zlib
    • Python: validation errors now extend BaseExceptionException for easier catching (python#53)
    • Python: add API_VERSION tuple used by generated modules to check their compatibility with the runtime library (python#49)
  • Notable improvements:
    • Make methods read_bits_int_{be,le} for reading bit integers reliable (fix all bugs) and faster (#949)
    • No longer preallocating arrays to the capacity of repeat-expr entries, which could cause excessive memory allocations in invalid files (f5fe28e)
    • Fix valid (and contents) on unnamed seq fields (for contents, this was a 0.9 regression: #825)
    • Construct: add support for enums
    • Go: implement encoding: UTF-16{BE,LE}
    • Go, Lua: implement valid/expr (#435)
    • Java: fix broken parse instances on Java 7 and 8 when using prebuilt io.kaitai:kaitai-struct-runtime:0.9 from Maven Central (java#34)
    • Java: fix terminator values from 0x80 to 0xff (java#35)
    • Lua: map 1-bit type: b1 to boolean to match Kaitai Struct design (see docs)
    • Lua: fix undecided calculated endianness incorrectly treated as big-endian
    • Lua: implement process: zlib (see Installation section of Lua runtime for how to enable zlib support)
    • Nim: fix encoding: ASCII on Windows (#960)
    • Perl: fix array literals, implement all byte array operations, substring and str.to_i(2) methods
    • PHP: support PHP 8 (php#8)
    • Python: generated parsers no longer import pkg_resources, which caused performance and usability issues (#804) — the runtime library API version check now compares tuples instead
    • Python: read_bytes checks if a large read request (8 MiB or more) can be satisfied, even before any bytes are read (python#61)
    • Ruby: validation error messages now display byte arrays as hex dumps, similar to Java (ruby#4)
    • (Java — already in 0.9), Lua, PHP: fix translation of unsigned 64-bit integer literals — i.e. from 2**63 = 0x8000_0000_0000_0000 to 2**64 - 1 = 0xffff_ffff_ffff_ffff (fd7f308, Lua: #837)
      • these languages don’t have actual 64-bit unsigned integers, but they do have 64-bit signed integers, so the result will be negative, but all 64 bits of precision will be preserved
    • Fix translation of integer -2**63 = -0x8000_0000_0000_0000 (e33828a)
  • Generated code style improvements:
    • Go: change header comment to match Go conventions for generated sources (#847)
    • Lua: fix broken indentation after a repeat: until field
    • Python: simpler return statements in instance getters
  • Infrastructure updates:

0.10 released 2022-07-08

Download

Try it in Web IDE