- LIP: 1
- Layer: Core Encoding
- Title: Extend Tag 10 for Sub-Typed Data (IxData Field)
- Author: Allwin Ketnawang
- Created: 2025-04-27
- Superseded-By: 6
- Status: Replaced
Abstract
This LIP proposes replacing the original CTE v1.0 Tag 10 "Index Reference" field with a more versatile structure named the "Index and Extended Data Field" (or "IxData Field" for short). This new structure utilizes the previously reserved bits 1-0 of the header byte to introduce four sub-types, enabling the efficient encoding of: legacy 4-bit indices, standard variable-length integers (ULEB128/SLEB128), common fixed-size data types (integers, floats), and single-byte constants (including booleans). This enhances the expressiveness and efficiency of the CTE format.
Motivation
The original CTE v1.0 specification defined the Tag 10 field solely as a 1-byte reference containing a 4-bit index (0-15) into preceding key or signature lists. While useful, this is limiting. Applications built on CTE often require encoding other fundamental data types compactly, such as:
- Integers larger than 15 or needing variable-length encoding for efficiency.
- Signed integers.
- Standard fixed-size types like
int32, uint64, float, and double without the overhead of the generic Command Data field.
- Atomic boolean values (
true/false) or other single-byte markers (like null) in a highly compact, unambiguous way.
This proposal addresses these needs by repurposing the Tag 10 field. By using the two previously reserved bits (1-0) as a sub-type selector, we can introduce multiple data formats under a single tag, significantly increasing the utility and efficiency of the CTE format while maintaining partial backwards compatibility for the original index reference use case.
Specification
This LIP replaces Section 4.3 of the CTE v1.0 specification document entirely with the following:
4.3. Index and Extended Data Field (IxData Field) (Tag 10)
- Tag:
10
- Purpose: Encodes different types of data including simple list indices, variable-length integers using standard encodings, standard fixed-size data types, and single-byte constants or markers.
- Identification: Fields of this type are identified by the Tag
10 in bits 7-6 of the header byte. The specific interpretation and format are determined by the Sub-Type field located in bits 1-0 (SS) of the header byte.
- General Header Byte Structure:
| Bits |
Field |
Description |
| 7-6 |
Tag (10) |
Identifies this as the IxData Field family. |
| 5-2 |
Sub-Data (XXXX) |
4-bit value whose meaning depends on the Sub-Type (SS) field. |
| 1-0 |
Sub-Type (SS) |
Determines the format and interpretation (00, 01, 10, 11). |
- Endianness Note: Unless otherwise specified, all multi-byte numerical data within CTE v1.1 fields (including sub-types defined below and Command Data) MUST be encoded using Little-Endian byte order. LEB128 encodings follow their own standard byte order.
4.3.1. Sub-Type 00: Legacy Index Reference
- Sub-Type Code (
SS): 00
- Purpose: Provides a zero-based index referencing an item within the preceding
Public Key List (Section 4.1) or Signature List (Section 4.2). The context determines which list is being referenced. This maintains compatibility with CTE v1.0.
- Format:
- Header Byte:
10 IIII 00
| Bits |
Field |
Description |
| 7-6 |
Tag (10) |
IxData Field family. |
| 5-2 |
Index (IIII) |
The 4-bit index value (0-15). |
| 1-0 |
Sub-Type (00) |
Specifies Legacy Index Reference format. |
- Data: No following data bytes.
- Constraints: The Index value (
IIII) MUST correspond to a valid position within the relevant list.
- Total Size: 1 byte.
- Reserved Values: None within this sub-type.
- Example (Reference Index 5): Header Byte:
10 0101 00 = 0x94
4.3.2. Sub-Type 01: Variable-Length Encoded Integer (Varint)
- Sub-Type Code (
SS): 01
- Purpose: Encodes a signed or unsigned integer value using standard variable-length encoding schemes (LEB128) or represents the value 0 directly.
- Format:
- Header Byte:
10 EEEE 01
| Bits |
Field |
Description |
| 7-6 |
Tag (10) |
IxData Field family. |
| 5-2 |
Encoding Scheme (EEEE) |
Specifies the encoding method or direct value (see table below). |
| 1-0 |
Sub-Type (01) |
Specifies Varint format. |
- Data: For LEB128 schemes, followed by the bytes constituting the variable-length encoded integer according to the relevant standard. The number of data bytes is determined by the LEB128 encoding itself (via continuation bits).
- Encoding Schemes (
EEEE):
EEEE (Bin) |
EEEE (Dec) |
Header Byte (Hex) |
Encoding Scheme / Value |
Notes |
Size (Bytes) |
0000 |
0 |
0x81 |
Value 0 |
Represents the integer value 0 directly. |
1 (Header only) |
0001 |
1 |
0x85 |
ULEB128 |
Unsigned LEB128 encoded integer data follows. |
1 + Data Length |
0010 |
2 |
0x89 |
SLEB128 |
Signed LEB128 encoded integer data follows. |
1 + Data Length |
0011 |
3 |
0x8D |
Reserved |
For future variable-length encoding schemes. |
- |
| ... |
... |
... |
... |
(Codes 3-15 are Reserved) |
... |
1111 |
15 |
0xBD |
Reserved |
For future variable-length encoding schemes. |
- |
- Constraints:
- Decoders MUST correctly implement ULEB128 and SLEB128 decoding.
- While LEB128 can encode arbitrarily large integers, implementations MAY impose practical limits based on transaction size constraints or application needs (e.g., limiting to 64-bit or 128-bit range).
- Total Size: 1 byte for value 0;
1 + Length(LEB128 Data) bytes otherwise.
- Reserved Values: Encoding scheme codes 3 through 15 are reserved.
- Example 1 (Value 300 /
0x12C using ULEB128):
- ULEB128 requires 2 data bytes:
0xAC, 0x02. EEEE = 0001.
- Header Byte:
10 0001 01 = 0x85
- Following Data:
0xAC, 0x02
- Example 2 (Value -100 using SLEB128):
- SLEB128 requires 2 data bytes:
0x9C, 0x7F. EEEE = 0010.
- Header Byte:
10 0010 01 = 0x89
- Following Data:
0x9C, 0x7F
4.3.3. Sub-Type 10: Fixed Data Type
- Sub-Type Code (
SS): 10
- Purpose: Encodes a value belonging to a standard, fixed-size data type identified by a type code.
- Format:
- Header Byte:
10 TTTT 10
| Bits |
Field |
Description |
| 7-6 |
Tag (10) |
IxData Field family. |
| 5-2 |
Type Code (TTTT) |
Specifies the fixed data type (see table below). |
| 1-0 |
Sub-Type (10) |
Specifies Fixed Data Type format. |
- Data: Followed by the number of bytes corresponding to the specified Type Code. Data is encoded in Little-Endian byte order where applicable (multi-byte integers, floats).
- Type Codes (
TTTT):
TTTT (Bin) |
TTTT (Dec) |
Data Type |
Size (Bytes) |
Notes |
0000 |
0 |
int8_t |
1 |
Signed 8-bit integer |
0001 |
1 |
int16_t |
2 |
Signed 16-bit integer |
0010 |
2 |
int32_t |
4 |
Signed 32-bit integer |
0011 |
3 |
int64_t |
8 |
Signed 64-bit integer |
0100 |
4 |
uint8_t |
1 |
Unsigned 8-bit integer |
0101 |
5 |
uint16_t |
2 |
Unsigned 16-bit integer |
0110 |
6 |
uint32_t |
4 |
Unsigned 32-bit integer |
0111 |
7 |
uint64_t |
8 |
Unsigned 64-bit integer |
1000 |
8 |
float |
4 |
IEEE 754 Single-Precision |
1001 |
9 |
double |
8 |
IEEE 754 Double-Precision |
1010 |
10 |
Reserved |
- |
For future use |
| ... |
... |
... |
... |
(Codes 10-15 are Reserved) |
1111 |
15 |
Reserved |
- |
For future use |
- Constraints: The data following the header MUST match the size and format required by the specified Type Code.
- Total Size:
1 + Size(Type[TTTT]) bytes.
- Reserved Values: Type codes 10 through 15 are reserved.
- Example (Value -100 as
int16_t):
- -100 is
0xFF9C (Little-Endian: 9C FF). TTTT = 0001.
- Header Byte:
10 0001 10 = 0x86
- Following Data:
0x9C, 0xFF
4.3.4. Sub-Type 11: Single-Byte Constant/Marker
- Sub-Type Code (
SS): 11
- Purpose: Encodes specific predefined constant values (like boolean
true/false) or semantic markers using only a single byte. The meaning is determined directly by the Value Code.
- Format:
- Header Byte:
10 XXXX 11 (The header byte is the entire field)
| Bits |
Field |
Description |
| 7-6 |
Tag (10) |
IxData Field family. |
| 5-2 |
Value Code (XXXX) |
Specifies the constant or marker value (0-15). |
| 1-0 |
Sub-Type (11) |
Specifies Single-Byte Constant/Marker format. |
- Data: No following data bytes.
- Value Codes (
XXXX):
XXXX (Bin) |
XXXX (Dec) |
Header Byte (Hex) |
Defined Meaning |
Notes |
0000 |
0 |
0x83 |
false |
Boolean false value |
0001 |
1 |
0x87 |
true |
Boolean true value |
0010 |
2 |
0x8B |
Reserved |
For future use |
| ... |
... |
... |
... |
(Codes 2-15 are Reserved) |
1111 |
15 |
0xBF |
Reserved |
e.g., For Null, Separator |
- Constraints: Implementations should recognize the defined constant values.
- Total Size: 1 byte.
- Reserved Values: Value codes 2 through 15 are reserved for future standard constants or markers.
- Example (Representing
false): The single byte 10 0000 11 = 0x83 is used.
Rationale
- Sub-Typing: Utilizing the two previously reserved bits (1-0) of the Tag
10 header byte allows extending functionality significantly without consuming a new top-level tag, preserving the core 2-bit tag structure of CTE.
- Varint (
SS=01): Adopting ULEB128 and SLEB128 provides standard, efficient variable-length encodings for unsigned and signed integers, respectively. Including a direct representation for '0' (EEEE=0000) optimizes the most common default integer value.
- Fixed Types (
SS=10): Directly encoding common fixed-width types (integers up to 64 bits, float, double) is often more convenient and potentially faster to decode than using variable-length encodings or the generic Command Data field for these types. Little-Endian is chosen as a common standard.
- Single-Byte Constants (
SS=11): Allocating a dedicated sub-type for single-byte constants like true and false provides an unambiguous and highly efficient (1 byte) representation. This keeps the SS=10 (Fixed Type) logic consistent (header always implies following data) and leaves room in SS=11 for other markers (e.g., null, separators) if needed later.
Backwards Compatibility
This proposal introduces new interpretations for the Tag 10 field based on the value of bits 1-0.
- Partially Compatible: CTE processors compliant only with v1.0 will correctly interpret the
Legacy Index Reference (SS=00) sub-type, as its format (10 IIII 00) remains unchanged.
- Incompatible: V1.0 processors will not recognize or correctly decode data encoded using the new sub-types (
SS=01, SS=10, SS=11). Encountering these formats will likely lead to parsing errors or data misinterpretation.
Adoption of this LIP requires updating CTE processors to recognize and handle all four defined sub-types of the Tag 10 IxData Field. This change effectively defines CTE v1.1.
Security Considerations
- LEB128 Decoding (
SS=01): Implementations decoding ULEB128 and SLEB128 must protect against potential resource exhaustion attacks. Maliciously crafted long sequences (many bytes with continuation bits set) could cause excessive CPU usage or memory allocation during decoding. Robust decoders should enforce limits on the number of bytes read for a single LEB128 value or check the magnitude of the partially decoded value against reasonable bounds (e.g., max 128 bits, or application-specific limits) related to the overall transaction size limit.
- Fixed Type Decoding (
SS=10): Decoders must ensure sufficient bytes are available in the input buffer before attempting to read the data associated with a fixed type to prevent buffer over-reads. The required size is known directly from the TTTT code.
Copyright
This LIP is licensed under the MIT License, in alignment with the main LEA Project License.