2025-09-07: Kaitai Struct v0.11 released

Kaitai project is happy to announce release of new major version of Kaitai Struct, declarative markup language to describe various binary data structures — binary file formats, network stream packets, etc.

The basic idea of Kaitai Struct is that a particular format can be described using Kaitai Struct language (in a .ksy file), which then can be compiled using kaitai-struct-compiler into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read described data structure from a file / stream and provide access to its contents in a nice, easy-to-comprehend API.

This release finally brings serialization support for Java and Python! It adds decent support for Rust, thanks to Oleh Dolhov and Vitaly Reshetyuk. Many fixes to the import functionality were added, so if something related to imports didn’t work before, try it now. It also brings numerous improvements to the Web IDE, in particular the ability to show a partial object tree up if a parsing error occurs, which greatly facilitates reverse engineering and debugging (see the previous blog post for more details).

Many of the improvements in this version were supported by the NLnet Foundation.

This is the last version of Kaitai Struct to support Python 2.7 and Ruby 1.9.3 - 2.3. Future versions will require at least Python 3.4 (or possibly even higher, see #821) and Ruby 2.4.

Release highlights

  • New target languages:
    • Rust
  • New compilation options:
    • -w/--read-write: serialization support, currently only for Java and Python (see Serialization guide)
      • implies --no-auto-read, so _read() must always be called manually to parse from a stream
      • new method _check() performs consistency checks - must be called on each object after the last change to its seq fields or instances, otherwise _write() will throw a ConsistencyNotCheckedError
      • new method _write()
      • new methods _invalidate{Inst}() (Java) / _invalidate_{inst}() (Python) for each value instance inst allow invalidating (forgetting) the cached value so that the instance can obtain a new value
    • --zero-copy-substream {true|false} (default is true): zero-copy substreams, currently only for Java and Ruby (#44)
      • this removes _raw_* fields from the generated code - if you need them, use --zero-copy-substream false
  • New KSY language features:
    • valid/in-enum: true validates that the parsed value is defined in the enum specified by the enum key
    • type: strz in combination with encoding: UTF-16{BE,LE} or encoding: UTF-32{BE,LE} now properly terminates the string on a 2-byte or 4-byte null character (#187)
    • to-string in a type definition can be used to provide a concise human-readable string representation of the object (#732)
      • it will be used to override the standard method for converting an object to a string, which is typically called toString() (or similar), __str__() in Python, to_s in Ruby, Display trait in Rust
      • displayed in the console visualizer (ksv), but not yet in the Web IDE, which still uses the -webide-representation key for this purpose
  • KSY language changes:
    • valid now applies to each individual element, not to the whole array as before; this also fixes a 0.9 regression, which prevented the use of contents with repetition (#1117)
    • bytes.to_s(encoding) now requires the encoding argument to be a string literal (#1051)
  • Expression language:
    • Add initial support for f-strings f"foo={foo}": only strings and integers can be interpolated, formatting options are not yet supported (#1073)
    • Improve error messages when the number or types of method arguments don’t match (compiler#269)
  • General compilation improvements:
    • Add warnings about the use of aliases and non-canonical spellings of popular encodings in the encoding key, warn against using unknown encodings (#393)
      • a known issue is that reported YAML paths are incorrect in some situations, see #1227
    • Sort instances, types, enums, enum entries and switch cases in the generated code (5f561e1)
    • Pass _root and _parent in recursive invocations of the top-level type in the same .ksy spec (#1089)
    • Fix _root and _parent incorrectly passed to imported nested types (compiler#283)
    • Fix that unused nested types (i.e. unreachable from the top-level type) were not taken into account when deriving the _parent type (#961)
    • Fix missing runtime validation of parse instances with contents (#1011)
    • Fix missing compile-time checks of top-level parameters (#1086)
    • Fix sporadic import failures caused by race conditions in the compiler, which typically manifested as error: unable to find type ... for one of the imported types (#951)
    • Fix meta/ks-opaque-types: true when using imports (#295)
    • Fix duplicate warnings when using imports (compiler#267)
    • --ksc-json-output: preserve input .ksy paths in output JSON keys exactly without slash normalization (#507)
  • Runtime API changes:
    • Add ValidationNotInEnumError exception, which is thrown if the valid/in-enum: true validation fails
    • Add bytesTerminateMulti and readBytesTermMulti methods needed for type: strz + encoding: UTF-16/UTF-32 support to all runtime libraries (#187)
    • C++ runtime library: add Win32 API-based encoding option (cpp_stl#61)
    • C++ runtime library: fix syntax error in C++20 mode (cpp_stl#68)
    • C++ runtime library: fix violations of strict aliasing rules (cpp_stl#73)
    • C#: target netstandard2.0 (csharp@7b1ac6d) - fixes KaitaiStruct.Runtime.CSharp v0.10.0 contains indirect vulnerable references (csharp#20)
    • Go: require Go 1.23 or higher (61b70ac)
    • Java, Python: new method _fetchInstances() (Java) / _fetch_instances() can be used to recursively fetch all parse instances so that the input stream can be closed; this is especially useful with serialization when reading from one file and writing to another
    • Java, Python: all runtime library methods that deal with byte-aligned types now include a call to align_to_byte() / alignToByte(), which ensures proper alignment to a byte boundary after using bit-sized integers (type: bX), instead of the compiler often inserting them incorrectly (#1070)
    • Java: declare all arrays as List instead of ArrayList - this is a potentially breaking change (#1116)
    • JavaScript: all generated modules now export an object that encapsulates the class constructor function instead of the constructor function itself - this is a breaking change! (#1074)
      • this enables support for circular/out-of-order imports
    • JavaScript: port the runtime library to TypeScript (javascript#25)
    • JavaScript: make byte array literals use Uint8Array, not number[] (ec064e3)
    • Lua: fix Lua 5.4 compatibility of encoding: UTF-8 (lua#12)
    • Python: make all parsing exceptions inherit from KaitaiStructError instead of raising generic exceptions (python#80)
  • Notable improvements:
    • Generate import statements also for imported nested types, not just for imported top-level types as before (#703)
    • Generate import statements also for imported enums (#651)
    • Fix all known cases of missing parentheses when translating user expressions (compiler#277)
    • Go: implement type casting .as<> (f65fd5b)
    • Go: prevent runtime library methods from returning successfully on partial reads (92f8048)
    • C++11: fix array subscript (indexing) translated into invalid C++ code (#1038)
    • Graphviz: implement all missing features that prevented the compilation of many specifications in the format gallery (#698)
    • Graphviz: display valid and contents (bfdd54a)
    • C#: fix translation of enum_val.to_i (#802)
    • Python: enum_val.to_i now works even if enum_val represents a value not defined in the enum (#815)
    • Ruby: fix bytes.to_s(encoding) so that it always returns UTF-8 strings (be695f5)
    • Java: fix bytes subscript (indexing) operator so that it produces unsigned byte values (43d044a)
  • Web IDE improvements:
    • Display a partial object tree up to the field where the parsing error occurred, mark incomplete and invalid fields with icons (see blog post)
    • Fix the error TypeError: {ImportedType} is not a constructor when loading a .ksy specification with imports for the first time since loading the Web IDE, support circular imports (webide#169)
    • Replace the existing YAML parser used for parsing .ksy specifications with js-yaml - this fixes a number of problems in YAML parsing with the old parser, for example:
      • an expression starting with a hex literal 0x.. is no longer incorrectly parsed as a constant (e.g. pos: 0x1 + offset is not interpreted as pos: 0x1ffe)
      • binary notation 0b... is no longer parsed as 0
      • duplicate keys are rejected instead of silently overwriting each other (see webide#165 for more details)
    • Fix a number of issues (open “Errors” pane doesn’t disappear when the error has already been fixed, hex dump interval is not highlighted when an object tree node is selected, changes to the set of opened nodes are not persisted) that occurred in a certain combination of saved open object tree nodes, .ksy spec and input file (webide#162)
    • Fix -webide-representation on imported types (webide#163)
    • Improve error message when importing non-existent/unavailable .ksy specs (webide#161)
    • Fix accessibility issues (webide#184)
    • Show _unnamed* fields created by omitting id in seq fields (#1064)
  • Packaging / infrastructure improvements:
    • Update compiler dependencies (compiler#230)
    • npm package kaitai-struct-compiler now returns the compiler object itself instead of a constructor function (called KaitaiStructCompiler). This is a breaking change, so make sure to adapt your code: replace (new KaitaiStructCompiler()).compile(...) with KaitaiStructCompiler.compile(...) (compiler#222)
    • ksy_schema (official JSON Schema for .ksy files): add all missing keys, allow only canonical encoding names in the encoding key
    • Console visualizer - commands ksv, ksdump
      • fix Ruby 3 compatibility on Windows (visualizer#48)
      • support forward slashes in input .ksy paths on Windows (visualizer#52)
      • ksdump: include _unnamed* fields created by omitting id in seq fields (#1064)
      • publish a new version 0.11 to RubyGems (last published version was 0.7)

0.11 released 2025-09-07

Download

Try it in Web IDE