lea-improvement-proposals

LIP: 1
Layer: Core Encoding
Title: Extend Tag 10 for Sub-Typed Data (IxData Field)
Author: Allwin Ketnawang
Created: 2025-04-27
Superseded-By: 6
Status: Replaced

Abstract

This LIP proposes replacing the original CTE v1.0 Tag 10 "Index Reference" field with a more versatile structure named the "Index and Extended Data Field" (or "IxData Field" for short). This new structure utilizes the previously reserved bits 1-0 of the header byte to introduce four sub-types, enabling the efficient encoding of: legacy 4-bit indices, standard variable-length integers (ULEB128/SLEB128), common fixed-size data types (integers, floats), and single-byte constants (including booleans). This enhances the expressiveness and efficiency of the CTE format.

Motivation

The original CTE v1.0 specification defined the Tag 10 field solely as a 1-byte reference containing a 4-bit index (0-15) into preceding key or signature lists. While useful, this is limiting. Applications built on CTE often require encoding other fundamental data types compactly, such as:

Integers larger than 15 or needing variable-length encoding for efficiency.
Signed integers.
Standard fixed-size types like int32, uint64, float, and double without the overhead of the generic Command Data field.
Atomic boolean values (true/false) or other single-byte markers (like null) in a highly compact, unambiguous way.

This proposal addresses these needs by repurposing the Tag 10 field. By using the two previously reserved bits (1-0) as a sub-type selector, we can introduce multiple data formats under a single tag, significantly increasing the utility and efficiency of the CTE format while maintaining partial backwards compatibility for the original index reference use case.

Specification

This LIP replaces Section 4.3 of the CTE v1.0 specification document entirely with the following:

4.3. Index and Extended Data Field (IxData Field) (Tag `10`)

Tag: 10
Purpose: Encodes different types of data including simple list indices, variable-length integers using standard encodings, standard fixed-size data types, and single-byte constants or markers.
Identification: Fields of this type are identified by the Tag 10 in bits 7-6 of the header byte. The specific interpretation and format are determined by the Sub-Type field located in bits 1-0 (SS) of the header byte.

General Header Byte Structure:

Bits	Field	Description
7-6	Tag (`10`)	Identifies this as the IxData Field family.
5-2	Sub-Data (`XXXX`)	4-bit value whose meaning depends on the Sub-Type (`SS`) field.
1-0	Sub-Type (`SS`)	Determines the format and interpretation (`00`, `01`, `10`, `11`).

Endianness Note: Unless otherwise specified, all multi-byte numerical data within CTE v1.1 fields (including sub-types defined below and Command Data) MUST be encoded using Little-Endian byte order. LEB128 encodings follow their own standard byte order.

4.3.1. Sub-Type `00`: Legacy Index Reference

Sub-Type Code (SS): 00
Purpose: Provides a zero-based index referencing an item within the preceding Public Key List (Section 4.1) or Signature List (Section 4.2). The context determines which list is being referenced. This maintains compatibility with CTE v1.0.
Format:
- Header Byte: 10 IIII 00
  
  Bits Field Description
  
  7-6 Tag (10) IxData Field family.
  
  5-2 Index (IIII) The 4-bit index value (0-15).
  
  1-0 Sub-Type (00) Specifies Legacy Index Reference format.
- Data: No following data bytes.
Constraints: The Index value (IIII) MUST correspond to a valid position within the relevant list.
Total Size: 1 byte.
Reserved Values: None within this sub-type.
Example (Reference Index 5): Header Byte: 10 0101 00 = 0x94

4.3.2. Sub-Type `01`: Variable-Length Encoded Integer (Varint)

Sub-Type Code (SS): 01
Purpose: Encodes a signed or unsigned integer value using standard variable-length encoding schemes (LEB128) or represents the value 0 directly.

Format:

Header Byte: 10 EEEE 01

Bits	Field	Description
7-6	Tag (`10`)	IxData Field family.
5-2	Encoding Scheme (`EEEE`)	Specifies the encoding method or direct value (see table below).
1-0	Sub-Type (`01`)	Specifies Varint format.

Data: For LEB128 schemes, followed by the bytes constituting the variable-length encoded integer according to the relevant standard. The number of data bytes is determined by the LEB128 encoding itself (via continuation bits).

Encoding Schemes (EEEE):

`EEEE` (Bin)	`EEEE` (Dec)	Header Byte (Hex)	Encoding Scheme / Value	Notes	Size (Bytes)
`0000`	0	`0x81`	Value `0`	Represents the integer value 0 directly.	1 (Header only)
`0001`	1	`0x85`	ULEB128	Unsigned LEB128 encoded integer data follows.	1 + Data Length
`0010`	2	`0x89`	SLEB128	Signed LEB128 encoded integer data follows.	1 + Data Length
`0011`	3	`0x8D`	Reserved	For future variable-length encoding schemes.	-
...	...	...	...	(Codes 3-15 are Reserved)	...
`1111`	15	`0xBD`	Reserved	For future variable-length encoding schemes.	-

Constraints:
- Decoders MUST correctly implement ULEB128 and SLEB128 decoding.
- While LEB128 can encode arbitrarily large integers, implementations MAY impose practical limits based on transaction size constraints or application needs (e.g., limiting to 64-bit or 128-bit range).
Total Size: 1 byte for value 0; 1 + Length(LEB128 Data) bytes otherwise.
Reserved Values: Encoding scheme codes 3 through 15 are reserved.
Example 1 (Value 300 / 0x12C using ULEB128):
- ULEB128 requires 2 data bytes: 0xAC, 0x02. EEEE = 0001.
- Header Byte: 10 0001 01 = 0x85
- Following Data: 0xAC, 0x02
Example 2 (Value -100 using SLEB128):
- SLEB128 requires 2 data bytes: 0x9C, 0x7F. EEEE = 0010.
- Header Byte: 10 0010 01 = 0x89
- Following Data: 0x9C, 0x7F

4.3.3. Sub-Type `10`: Fixed Data Type

Sub-Type Code (SS): 10
Purpose: Encodes a value belonging to a standard, fixed-size data type identified by a type code.
Format:
- Header Byte: 10 TTTT 10
  
  Bits Field Description
  
  7-6 Tag (10) IxData Field family.
  
  5-2 Type Code (TTTT) Specifies the fixed data type (see table below).
  
  1-0 Sub-Type (10) Specifies Fixed Data Type format.
- Data: Followed by the number of bytes corresponding to the specified Type Code. Data is encoded in Little-Endian byte order where applicable (multi-byte integers, floats).

Type Codes (TTTT):

`TTTT` (Bin)	`TTTT` (Dec)	Data Type	Size (Bytes)	Notes
`0000`	0	`int8_t`	1	Signed 8-bit integer
`0001`	1	`int16_t`	2	Signed 16-bit integer
`0010`	2	`int32_t`	4	Signed 32-bit integer
`0011`	3	`int64_t`	8	Signed 64-bit integer
`0100`	4	`uint8_t`	1	Unsigned 8-bit integer
`0101`	5	`uint16_t`	2	Unsigned 16-bit integer
`0110`	6	`uint32_t`	4	Unsigned 32-bit integer
`0111`	7	`uint64_t`	8	Unsigned 64-bit integer
`1000`	8	`float`	4	IEEE 754 Single-Precision
`1001`	9	`double`	8	IEEE 754 Double-Precision
`1010`	10	Reserved	-	For future use
...	...	...	...	(Codes 10-15 are Reserved)
`1111`	15	Reserved	-	For future use

Constraints: The data following the header MUST match the size and format required by the specified Type Code.
Total Size: 1 + Size(Type[TTTT]) bytes.
Reserved Values: Type codes 10 through 15 are reserved.
Example (Value -100 as int16_t):
- -100 is 0xFF9C (Little-Endian: 9C FF). TTTT = 0001.
- Header Byte: 10 0001 10 = 0x86
- Following Data: 0x9C, 0xFF

4.3.4. Sub-Type `11`: Single-Byte Constant/Marker

Sub-Type Code (SS): 11
Purpose: Encodes specific predefined constant values (like boolean true/false) or semantic markers using only a single byte. The meaning is determined directly by the Value Code.

Format:

Header Byte: 10 XXXX 11 (The header byte is the entire field)

Bits	Field	Description
7-6	Tag (`10`)	IxData Field family.
5-2	Value Code (`XXXX`)	Specifies the constant or marker value (0-15).
1-0	Sub-Type (`11`)	Specifies Single-Byte Constant/Marker format.

Data: No following data bytes.

Value Codes (XXXX):

`XXXX` (Bin)	`XXXX` (Dec)	Header Byte (Hex)	Defined Meaning	Notes
`0000`	0	`0x83`	`false`	Boolean false value
`0001`	1	`0x87`	`true`	Boolean true value
`0010`	2	`0x8B`	Reserved	For future use
...	...	...	...	(Codes 2-15 are Reserved)
`1111`	15	`0xBF`	Reserved	e.g., For Null, Separator

Constraints: Implementations should recognize the defined constant values.
Total Size: 1 byte.
Reserved Values: Value codes 2 through 15 are reserved for future standard constants or markers.
Example (Representing false): The single byte 10 0000 11 = 0x83 is used.

Rationale

Sub-Typing: Utilizing the two previously reserved bits (1-0) of the Tag 10 header byte allows extending functionality significantly without consuming a new top-level tag, preserving the core 2-bit tag structure of CTE.
Varint (SS=01): Adopting ULEB128 and SLEB128 provides standard, efficient variable-length encodings for unsigned and signed integers, respectively. Including a direct representation for '0' (EEEE=0000) optimizes the most common default integer value.
Fixed Types (SS=10): Directly encoding common fixed-width types (integers up to 64 bits, float, double) is often more convenient and potentially faster to decode than using variable-length encodings or the generic Command Data field for these types. Little-Endian is chosen as a common standard.
Single-Byte Constants (SS=11): Allocating a dedicated sub-type for single-byte constants like true and false provides an unambiguous and highly efficient (1 byte) representation. This keeps the SS=10 (Fixed Type) logic consistent (header always implies following data) and leaves room in SS=11 for other markers (e.g., null, separators) if needed later.

Backwards Compatibility

This proposal introduces new interpretations for the Tag 10 field based on the value of bits 1-0.

Partially Compatible: CTE processors compliant only with v1.0 will correctly interpret the Legacy Index Reference (SS=00) sub-type, as its format (10 IIII 00) remains unchanged.
Incompatible: V1.0 processors will not recognize or correctly decode data encoded using the new sub-types (SS=01, SS=10, SS=11). Encountering these formats will likely lead to parsing errors or data misinterpretation.

Adoption of this LIP requires updating CTE processors to recognize and handle all four defined sub-types of the Tag 10 IxData Field. This change effectively defines CTE v1.1.

Security Considerations

LEB128 Decoding (SS=01): Implementations decoding ULEB128 and SLEB128 must protect against potential resource exhaustion attacks. Maliciously crafted long sequences (many bytes with continuation bits set) could cause excessive CPU usage or memory allocation during decoding. Robust decoders should enforce limits on the number of bytes read for a single LEB128 value or check the magnitude of the partially decoded value against reasonable bounds (e.g., max 128 bits, or application-specific limits) related to the overall transaction size limit.
Fixed Type Decoding (SS=10): Decoders must ensure sufficient bytes are available in the input buffer before attempting to read the data associated with a fixed type to prevent buffer over-reads. The required size is known directly from the TTTT code.

Copyright

This LIP is licensed under the MIT License, in alignment with the main LEA Project License.

Bits	Field	Description
7-6	Tag (`10`)	IxData Field family.
5-2	Index (`IIII`)	The 4-bit index value (0-15).
1-0	Sub-Type (`00`)	Specifies Legacy Index Reference format.

Bits	Field	Description
7-6	Tag (`10`)	IxData Field family.
5-2	Type Code (`TTTT`)	Specifies the fixed data type (see table below).
1-0	Sub-Type (`10`)	Specifies Fixed Data Type format.