2020-10-16: Kaitai Struct v0.9 released

After a lot of time and effort, Kaitai project is happy to announce release of new major version of Kaitai Struct, declarative markup language to describe various binary data structures — binary file formats, network stream packets, etc.

The basic idea of Kaitai Struct is that a particular format can be described using Kaitai Struct language (in a .ksy file), which then can be compiled using kaitai-struct-compiler into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read described data structure from a file / stream and provide access to its contents in a nice, easy-to-comprehend API.

With the previous 0.8 release, Kaitai project celebrated 1000 stars on GitHub, and until 0.9 version, it has collected more than 2000 stars. Thank you all for your support!

This version introduces C++11 target (which uses smart pointers), several handy features (like validations and little-endian bit integers), fixes a lot of bugs and includes quite a few infrastructure improvements.

Release highlights

  • New targets support:
    • Python with Construct library
    • HTML - intended for documentation, preliminary support
    • Nim - entry-level support (51% tests pass score)
  • New KSY language features:
    • doc-ref supports list of references (#269)
    • meta/tags allows specification of multiple tags to allow better navigation in the format gallery (#572)
    • Allow accessing nested types using :: syntax: foo::bar (#275)
    • Implement parsed data validations using valid key (#435)
    • Implement compile-time sizeof and bitsizeof operators (#84)
      • Type-based: sizeof<u4>, bitsizeof<b13>, sizeof<user_type>
      • Value-based: file_header._sizeof, flags._bitsizeof (file_header, flags are fields defined in the current type)
    • Implement little-endian bit-sized integers (docs)
      • Support choosing endianness using le / be suffix: type: b12le, type: b1be
      • Add meta/bit-endian key for selecting default bit endianness (le / be)
  • Expression language:
    • Forced byte array and true array literals (#371) and empty typed array literals (#372)
    • New methods:
      • byte arrays: length
    • Allow pure types for type casting: .as<u2>, .as<str> (#463)
  • General compilation improvements:
    • Support Maven-like directory trees by not adding subdir src for outputs of Go+Java anymore (#287). While this breaks existing builds most likely, it puts those languages in line with all others and adding subdirs is easier for the user than removing some added by Kaitai automatically.
    • Better error messages (#488)
    • Support for .ksy files with UTF-8 BOM (#499)
    • Error messages are routed to stderr rather than stdout (#509)
    • --debug mode split into --no-auto-read and --read-pos (#332)
    • C++: add C++11 mode
      • Add --cpp-standard CLI option: pass --cpp-standard 11 to enable C++11 mode (98 is default)
      • C++11 target:
        • uses #pragma once (instead of #ifndef FOO_H_ header guards)
        • uses std::unique_ptr<foo> for owning pointers, raw pointers foo* for non-owning
        • supports array literals
    • --no-auto-read implemented for C++
    • C++: official Windows and Visual C++ support
    • Fix case conversions to be locale-independent (#708)
  • Runtime API changes:
    • Add exceptions Validation{Not{Equal,AnyOf},{Less,Greater}Than,Expr}Error inheriting from common ancestor ValidationFailedError - thrown on failed validations defined with valid or contents key (#435)
    • Add method read_bits_int_le for parsing little-endian bit-sized integers (docs)
    • Deprecated classes and methods:
      • ensure_fixed_contents ⟶ explicit if that asserts readBytes(n) to be equal to the expected n-byte array (throwing ValidationNotEqualError if it fails)
      • UnexpectedDataErrorValidationNotEqualError
      • read_bits_intread_bits_int_be
  • Major bugfixes:
    • params/type - add support for:
      • specific user types
      • enum types (#413)
      • byte arrays (bytes)
      • arrays (u2[], struct[], etc.)
    • enum with undefined values in enum list never crashes a parser (#523 for Python, #300 for Java)
    • Fix coercing different string/bytearray/enum/boolean types (e.g. parsed from stream and created from literal value) in conditional op (? :) or array literal
    • Substring not cannot be used in expressions (#556)
    • Bit-sized integers were not accounted for properly in repeat: eos (#548)
    • Fix switching with else case (_: foo) only (#595)
    • C++: fix all known memory leaks
    • C++: fix absolute imports (#794)
    • Java: more consistent closure of underlying IO streams on forced close() (#497)
    • Java: fix reading user types in type-switching in --no-auto-read mode (#204)
    • Python: work around circular dependencies generation
    • PHP: fix invalid namespace declarations when no --php-namespace specified (#637)
  • Tooling around the compiler updates:
  • Infrastructure updates:
    • Unstable binary builds are available for all platforms after every CI build at Bintray (#63)
    • KSY language reference replaced with documentation generated from JSON schema
    • https://formats.kaitai.io/ is rebuilt automatically with CI/CD
    • Brand new modular CI/CD system for compiler, underlying CI-agnostic, working on multiple different OSes in parallel (Linux, Windows, macOS) and showing status at https://ci.kaitai.io/
    • Generate test assertion specs from language-agnostic KST specs

0.9 released 2020-10-16

Download

Try it in Web IDE