String type
Type Definition
String
UTF-8 encoded text
Length-prefixed format
Variable size encoding
Zero-copy decoding
Encoding Format
Structure:
[Length_Data: Int][UTF-8 bytes]
Implementation Details
Memory Layout
Length prefix: Variable size (GEneral Int encoding)
UTF-8 bytes: Contiguous
No padding or alignment
Direct buffer access
UTF-8 Handling
Strict UTF-8 validation
Proper code point handling
Surrogate pair support
Invalid sequence detection
Error Handling
Common error cases:
Invalid UTF-8:
try: return bytes.decode('utf-8') except UnicodeError as e: raise DecodeError(f"Invalid UTF-8: {e}")
Buffer overflow:
if len(buffer) - offset < needed_size: raise BufferError(f"Buffer too small: need {needed_size} bytes")
Length mismatch:
if len(utf8_bytes) != length: raise ValueError(f"Length mismatch: expected {length}, got {len(utf8_bytes)}")
Examples
Basic Usage
from jam.types.base.string import String
# Create and encode
text = String("hello")
encoded = text.encode()
# -> [05 68 65 6C 6C 6F]
# len=5, "hello"
# Decode
decoded = String.decode(encoded)
assert decoded == "hello"
API Reference
Classes
- class jam.types.base.string.String(value: str)[source]
-
UTF-8 encoded string type that implements the Codable interface.
Examples
>>> s = String("Hello") >>> str(s) 'Hello' >>> len(s) 5 >>> s.encode() b'\x05Hello' # Length prefix followed by UTF-8 bytes
Note
String length is measured in UTF-16 code units, which means some Unicode characters (like emojis) may count as 2 units. This matches Python’s string length behavior.
- __init__(value: str)[source]
Initialize a string.
- Parameters:
value – Python string value
- Raises:
TypeError – If value is not a str
- static decode_from(buffer: bytes | bytearray | memoryview, offset: int = 0) Tuple[String, int][source]
Decode a String from a buffer.
- Parameters:
buffer – Bytes to decode from
offset – Starting position in buffer
- Returns:
Tuple of (String instance, bytes read)
- Raises:
ValueError – If buffer is too short
UnicodeDecodeError – If buffer contains invalid UTF-8