API Reference#

Image#

class kani.ext.multimodal_core.ImagePart(*, extra: dict = {}, image: Image)[source]#

A part representing image data.

Image data is stored in memory as a Pillow Image object. When serialized, image data is represented as a data URI.

To get audio data in a suitable format for downstream applications, use as_b64(), as_bytes(), as_ndarray(), or as_tensor().

image: Image#

The PIL Image object containing the referenced image.

classmethod from_file(fp: str | bytes | PathLike | IO, **kwargs)[source]#

Create an ImagePart from a local image file. The file format will be automatically detected.

classmethod from_bytes(data: bytes, **kwargs)[source]#

Create an ImagePart from raw binary data.

classmethod from_b64(data: str, **kwargs)[source]#

Create an ImagePart from Base64-encoded binary data.

async classmethod from_url(url: str, **kwargs)[source]#

Download an image from the Internet and create an ImagePart.

Attention

Note that this classmethod is asynchronous, as it downloads data from the web!

Keyword arguments are passed to from_file().

as_bytes(format: str = 'png') bytes[source]#

Return the raw image data in the given format.

as_b64(format: str = 'png') str[source]#

Return the binary image data in the given format encoded in a base64 string.

Note that this is not a web-suitable data:image/... string; just the raw binary of the image. Use as_b64_uri() for a web-suitable string.

as_b64_uri(format: str = 'png') str[source]#

Get the binary image data encoded in a web-suitable base64 string.

as_ndarray() ndarray[source]#

Get the pixel-wise image data as a NumPy array (h*w*c).

Warning

Note that this array is in (height, width, channels) dimensionality, unlike as_tensor() which return a tensor in (channels, height, width) dimensionality.

as_tensor() torch.Tensor[source]#

Get the pixel-wises image data as a PyTorch tensor (c*h*w).

Warning

Note that this tensor is in (channels, height, width) dimensionality, unlike as_ndarray() which return an array in (height, width, channels) dimensionality.

sha256() bytes[source]#

Return the SHA-256 hash of the PIL image. Note that this is not necessarily equivalent to the hash of the image file; it is the internal Pillow representation. This should generally only be used to check if an image has been modified or not.

property size: tuple[int, int]#

The size of the image, in pixels (width, height).

property mime: str#

The MIME filetype of the image.

Audio#

class kani.ext.multimodal_core.AudioPart(*, extra: dict = {}, raw: bytes, sample_rate: int)[source]#

A part representing audio data.

Audio data is stored in memory as raw signed 16-bit little-endian mono PCM in raw, at a variable sample_rate. When serialized, audio data is represented as a data URI.

To get audio data in a suitable format for downstream applications, use as_b64(), as_bytes(), as_ndarray(), or as_tensor().

raw: bytes#

The raw binary data in signed 16-bit little-endian mono PCM format.

sample_rate: int#

The sample rate of the binary data.

classmethod from_b64(data: str, sr: int, **kwargs)[source]#

Create an AudioPart from Base64-encoded signed 16-bit little-endian mono PCM data.

classmethod from_file(
fp: str | bytes | PathLike | IO,
*,
format: str | None = None,
codec: str | None = None,
converter_parameters: str | None = None,
sr: int | None = None,
sample_width: int | None = None,
channels: int | None = None,
**kwargs,
)[source]#

Create an AudioPart from a local file.

Parameters:
  • fp – The path to the file or an open file to read.

  • format – The format (e.g. ‘mp3’) of the audio file. Will attempt to automatically determine based on the given filename if this is not set.

  • codec – An explicit audio codec to use to decode the audio file, if conversion is needed. (See FFMPEG’s -acodec option for valid inputs).

  • converter_parameters – Any additional CLI arguments to pass to the audio converter, if conversion is needed.

  • sr – The sample rate of the audio (raw PCM audio only).

  • sample_width – The sample width, in bytes, of the audio (raw PCM audio only).

  • channels – The number of channels of the audio (raw PCM audio only).

async classmethod from_url(url: str, **kwargs)[source]#

Download audio from the Internet and create an AudioPart.

Attention

Note that this classmethod is asynchronous, as it downloads data from the web!

Keyword arguments are passed to from_file().

as_bytes(sr: int | None = None) bytes[source]#

Return the audio data as signed 16-bit little-endian mono PCM at the given sample rate.

as_b64(sr: int | None = None) str[source]#

Return the audio data as Base64-encoded signed 16-bit little-endian mono PCM at the given sample rate.

as_ndarray(sr: int | None = None) ndarray[source]#

Return the audio data as a 1-dimensional NumPy array of floats at the given sample rate.

as_tensor(sr: int = None) torch.Tensor[source]#

Return the audio data as a 2-dimensional [channel, time] PyTorch Tensor of floats at the given sample rate.

Note that since this library only uses mono audio, that the first dimension will always be 1.

as_wav_bytes() bytes[source]#

Return the audio data as WAV data (including header).

as_wav_b64_uri() str[source]#

Return the WAV audio data encoded in a web-suitable base64 string.

resample(sample_rate: int) AudioPart[source]#

Return a new AudioPart with the given sample rate.

sha256() bytes[source]#

Return the SHA-256 hash of the raw audio.

property duration: float#

The duration of this audio clip, in seconds.

property sr#

An alias to sample_rate.

Video#

class kani.ext.multimodal_core.VideoPart(*, extra: dict = {}, file: BinaryFileLike, mime: str)[source]#

A part representing video data.

Video data is stored as a file-like object and a MIME type. This allows applications to persist large files on disk (using a FileIO) or in memory (using a BytesIO).

When serialized, video data is represented as a data URI. This can lead to some really big files!

To get video data in a suitable format for downstream applications, use as_b64(), as_bytes(), or as_tensor().

async classmethod from_url(url: str, *, allowed_mime=('video/*',), **kwargs)[source]#

Download a video from the Internet and create a VideoPart. This saves the data to a temporary file.

Attention

Note that this classmethod is asynchronous, as it downloads data from the web!

Keyword arguments are passed to from_file().

as_tensor(fps: float = 1, start: float = None, end: float = None) torch.Tensor[source]#

Get the time-pixel-wise video data as a PyTorch tensor (t*c*h*w).

Important

Note that this tensor is in (time, channels, height, width) dimensionality.

Parameters:
  • fps – The number of frames per second (default 1).

  • start – The time, in seconds, to start at.

  • start – The time, in seconds, to end at.

property duration: float#

The duration of this video, in seconds.

property resolution: tuple[int, int]#

The resolution of the video’s first frame, in pixels (width, height).

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:

self: The BaseModel instance. context: The context.

Binary File#

class kani.ext.multimodal_core.BinaryFilePart(*, extra: dict = {}, file: BinaryFileLike, mime: str)[source]#

A MessagePart containing arbitrary binary data.

The raw data is saved as a file-like object and a MIME type. This allows applications to persist large files on disk (using a FileIO) or in memory (using a BytesIO).

When serialized, the binary is represented as a data URI. This can lead to some really big files!

file: BinaryFileLike#

The readable binary file-like object containing the data.

mime: str#

The MIME file type of the file.

classmethod from_file(
fp: str | bytes | PathLike | BinaryFileLike,
mime: str | None = None,
**kwargs,
)[source]#

Create a BinaryFilePart from a local file.

Parameters:
classmethod from_bytes(data: bytes, mime: str, **kwargs)[source]#

Create a BinaryFilePart from raw bytes.

Parameters:
classmethod from_b64(data: str, mime: str, **kwargs)[source]#

Create a BinaryFilePart from Base64-encoded binary data.

async classmethod from_url(url: str, *, allowed_mime=('*',), **kwargs)[source]#

Download a file from the Internet and create a BinaryFilePart. This saves the data to a temporary file.

Attention

Note that this classmethod is asynchronous, as it downloads data from the web!

Tip

Certain sites may download all binary data with the application/octet-stream MIME type. To set the MIME type more precisely, use mime="...".

Keyword arguments are passed to from_file().

as_bytes() bytes[source]#

Return the full raw data. This could consume a lot of memory!

as_b64() str[source]#

Return the binary data encoded in a base64 string. This could consume a lot of memory!

Note that this is not a web-suitable data:mime/... string; just the raw binary of the file. Use as_b64_uri() for a web-suitable string.

as_b64_uri() str[source]#

Get the binary data encoded in a web-suitable base64 string. This could consume a lot of memory!

property filesize#

The size of the file, in bytes.

sha256() bytes[source]#

Return the SHA-256 hash of the file contents.

This method is preferred over manually using hashlib.sha256(part.as_bytes()) as it is speed and memory-optimized.

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:

self: The BaseModel instance. context: The context.

Base#

class kani.ext.multimodal_core.BaseMultimodalPart(*, extra: dict = {})[source]#