API Reference#
Image#
- class kani.ext.multimodal_core.ImagePart(*, extra: dict = {}, image: Image)[source]#
A part representing image data.
Image data is stored in memory as a Pillow Image object. When serialized, image data is represented as a data URI.
To get audio data in a suitable format for downstream applications, use
as_b64(),as_bytes(),as_ndarray(), oras_tensor().- classmethod from_file(fp: str | bytes | PathLike | IO, **kwargs)[source]#
Create an ImagePart from a local image file. The file format will be automatically detected.
- classmethod from_b64(data: str, **kwargs)[source]#
Create an ImagePart from Base64-encoded binary data.
- async classmethod from_url(url: str, **kwargs)[source]#
Download an image from the Internet and create an ImagePart.
Attention
Note that this classmethod is asynchronous, as it downloads data from the web!
Keyword arguments are passed to
from_file().
- as_b64(format: str = 'png') str[source]#
Return the binary image data in the given format encoded in a base64 string.
Note that this is not a web-suitable
data:image/...string; just the raw binary of the image. Useas_b64_uri()for a web-suitable string.
- as_b64_uri(format: str = 'png') str[source]#
Get the binary image data encoded in a web-suitable base64 string.
- as_ndarray() ndarray[source]#
Get the pixel-wise image data as a NumPy array (h*w*c).
Warning
Note that this array is in (height, width, channels) dimensionality, unlike
as_tensor()which return a tensor in (channels, height, width) dimensionality.
- as_tensor() torch.Tensor[source]#
Get the pixel-wises image data as a PyTorch tensor (c*h*w).
Warning
Note that this tensor is in (channels, height, width) dimensionality, unlike
as_ndarray()which return an array in (height, width, channels) dimensionality.
Audio#
- class kani.ext.multimodal_core.AudioPart(*, extra: dict = {}, raw: bytes, sample_rate: int)[source]#
A part representing audio data.
Audio data is stored in memory as raw signed 16-bit little-endian mono PCM in
raw, at a variablesample_rate. When serialized, audio data is represented as a data URI.To get audio data in a suitable format for downstream applications, use
as_b64(),as_bytes(),as_ndarray(), oras_tensor().- classmethod from_b64(data: str, sr: int, **kwargs)[source]#
Create an AudioPart from Base64-encoded signed 16-bit little-endian mono PCM data.
- classmethod from_file(
- fp: str | bytes | PathLike | IO,
- *,
- format: str | None = None,
- codec: str | None = None,
- converter_parameters: str | None = None,
- sr: int | None = None,
- sample_width: int | None = None,
- channels: int | None = None,
- **kwargs,
Create an AudioPart from a local file.
- Parameters:
fp – The path to the file or an open file to read.
format – The format (e.g. ‘mp3’) of the audio file. Will attempt to automatically determine based on the given filename if this is not set.
codec – An explicit audio codec to use to decode the audio file, if conversion is needed. (See FFMPEG’s
-acodecoption for valid inputs).converter_parameters – Any additional CLI arguments to pass to the audio converter, if conversion is needed.
sr – The sample rate of the audio (raw PCM audio only).
sample_width – The sample width, in bytes, of the audio (raw PCM audio only).
channels – The number of channels of the audio (raw PCM audio only).
- async classmethod from_url(url: str, **kwargs)[source]#
Download audio from the Internet and create an AudioPart.
Attention
Note that this classmethod is asynchronous, as it downloads data from the web!
Keyword arguments are passed to
from_file().
- as_bytes(sr: int | None = None) bytes[source]#
Return the audio data as signed 16-bit little-endian mono PCM at the given sample rate.
- as_b64(sr: int | None = None) str[source]#
Return the audio data as Base64-encoded signed 16-bit little-endian mono PCM at the given sample rate.
- as_ndarray(sr: int | None = None) ndarray[source]#
Return the audio data as a 1-dimensional NumPy array of floats at the given sample rate.
- as_tensor(sr: int = None) torch.Tensor[source]#
Return the audio data as a 2-dimensional [channel, time] PyTorch Tensor of floats at the given sample rate.
Note that since this library only uses mono audio, that the first dimension will always be 1.
- property sr#
An alias to
sample_rate.
Video#
- class kani.ext.multimodal_core.VideoPart(*, extra: dict = {}, file: BinaryFileLike, mime: str)[source]#
A part representing video data.
Video data is stored as a file-like object and a MIME type. This allows applications to persist large files on disk (using a FileIO) or in memory (using a BytesIO).
When serialized, video data is represented as a data URI. This can lead to some really big files!
To get video data in a suitable format for downstream applications, use
as_b64(),as_bytes(), oras_tensor().- async classmethod from_url(url: str, *, allowed_mime=('video/*',), **kwargs)[source]#
Download a video from the Internet and create a VideoPart. This saves the data to a temporary file.
Attention
Note that this classmethod is asynchronous, as it downloads data from the web!
Keyword arguments are passed to
from_file().
- as_tensor(fps: float = 1, start: float = None, end: float = None) torch.Tensor[source]#
Get the time-pixel-wise video data as a PyTorch tensor (t*c*h*w).
Important
Note that this tensor is in (time, channels, height, width) dimensionality.
- Parameters:
fps – The number of frames per second (default 1).
start – The time, in seconds, to start at.
start – The time, in seconds, to end at.
Binary File#
- class kani.ext.multimodal_core.BinaryFilePart(*, extra: dict = {}, file: BinaryFileLike, mime: str)[source]#
A MessagePart containing arbitrary binary data.
The raw data is saved as a file-like object and a MIME type. This allows applications to persist large files on disk (using a FileIO) or in memory (using a BytesIO).
When serialized, the binary is represented as a data URI. This can lead to some really big files!
- file: BinaryFileLike#
The readable binary file-like object containing the data.
- classmethod from_file( )[source]#
Create a BinaryFilePart from a local file.
- Parameters:
fp – The path to the file, or a file-like object.
mime – The MIME file type (https://www.iana.org/assignments/media-types/media-types.xhtml) of the file. If not passed, will attempt to guess the filetype from the file name.
- classmethod from_bytes(data: bytes, mime: str, **kwargs)[source]#
Create a BinaryFilePart from raw bytes.
- Parameters:
data – The bytes.
mime – The MIME file type (https://www.iana.org/assignments/media-types/media-types.xhtml) of the file.
- classmethod from_b64(data: str, mime: str, **kwargs)[source]#
Create a BinaryFilePart from Base64-encoded binary data.
- async classmethod from_url(url: str, *, allowed_mime=('*',), **kwargs)[source]#
Download a file from the Internet and create a BinaryFilePart. This saves the data to a temporary file.
Attention
Note that this classmethod is asynchronous, as it downloads data from the web!
Tip
Certain sites may download all binary data with the
application/octet-streamMIME type. To set the MIME type more precisely, usemime="...".Keyword arguments are passed to
from_file().
- as_b64() str[source]#
Return the binary data encoded in a base64 string. This could consume a lot of memory!
Note that this is not a web-suitable
data:mime/...string; just the raw binary of the file. Useas_b64_uri()for a web-suitable string.
- as_b64_uri() str[source]#
Get the binary data encoded in a web-suitable base64 string. This could consume a lot of memory!
- property filesize#
The size of the file, in bytes.