String type

Type Definition

String

  • UTF-8 encoded text

  • Length-prefixed format

  • Variable size encoding

  • Zero-copy decoding

Encoding Format

Structure:

[Length_Data: Int][UTF-8 bytes]

Implementation Details

Memory Layout

  • Length prefix: Variable size (GEneral Int encoding)

  • UTF-8 bytes: Contiguous

  • No padding or alignment

  • Direct buffer access

UTF-8 Handling

  • Strict UTF-8 validation

  • Proper code point handling

  • Surrogate pair support

  • Invalid sequence detection

Error Handling

Common error cases:

  1. Invalid UTF-8:

    try:
        return bytes.decode('utf-8')
    except UnicodeError as e:
        raise DecodeError(f"Invalid UTF-8: {e}")
    
  2. Buffer overflow:

    if len(buffer) - offset < needed_size:
        raise BufferError(f"Buffer too small: need {needed_size} bytes")
    
  3. Length mismatch:

    if len(utf8_bytes) != length:
        raise ValueError(f"Length mismatch: expected {length}, got {len(utf8_bytes)}")
    

Examples

Basic Usage

from jam.types.base.string import String

# Create and encode
text = String("hello")
encoded = text.encode()
# -> [05 68 65 6C 6C 6F]
#    len=5, "hello"

# Decode
decoded = String.decode(encoded)
assert decoded == "hello"

API Reference

Classes

class jam.types.base.string.String(value: str)[source]

Bases: Codable, JsonSerde

UTF-8 encoded string type that implements the Codable interface.

Examples

>>> s = String("Hello")
>>> str(s)
'Hello'
>>> len(s)
5
>>> s.encode()
b'\x05Hello'  # Length prefix followed by UTF-8 bytes

Note

String length is measured in UTF-16 code units, which means some Unicode characters (like emojis) may count as 2 units. This matches Python’s string length behavior.

__init__(value: str)[source]

Initialize a string.

Parameters:

value – Python string value

Raises:

TypeError – If value is not a str

__str__() str[source]

Convert to str.

__len__() int[source]

Get string length in UTF-16 code units.

__getitem__(index: int | slice) str[source]

Get character(s) at index or slice.

__contains__(item: str) bool[source]

Check if string contains substring.

__eq__(other: Any) bool[source]

Compare for equality.

__hash__() int[source]

Make hashable.

__add__(other: String | str) String[source]

Concatenate strings.

__repr__() str[source]

Get string representation.

static decode_from(buffer: bytes | bytearray | memoryview, offset: int = 0) Tuple[String, int][source]

Decode a String from a buffer.

Parameters:
  • buffer – Bytes to decode from

  • offset – Starting position in buffer

Returns:

Tuple of (String instance, bytes read)

Raises:

Decorators