Crate kalosm_vision
source ·Expand description
§Kalosm Vision
Kalosm Vision is a collection of image models and utilities for the Kalosm framework. It includes utilities for generating images from text and segmenting images into objects.
§Image Generation
You can use the Wuerstchen
model to generate images from text:
use futures_util::StreamExt;
use kalosm_vision::{Wuerstchen, WuerstchenInferenceSettings};
#[tokio::main]
async fn main() {
let model = Wuerstchen::builder().build().await.unwrap();
let settings = WuerstchenInferenceSettings::new(
"a cute cat with a hat in a room covered with fur with incredible detail",
);
if let Ok(mut images) = model.run(settings) {
while let Some(image) = images.next().await {
if let Some(buf) = image.generated_image() {
buf.save(&format!("{}.png", image.sample_num())).unwrap();
}
}
}
}
§Image Segmentation
Kalosm supports image segmentation with the SegmentAnything
model. You can use the SegmentAnything::segment_everything
method to segment an image into objects or the SegmentAnything::segment_from_points
method to segment an image into objects at specific points:
use kalosm::vision::*;
let model = SegmentAnything::builder().build().unwrap();
let image = image::open("examples/landscape.jpg").unwrap();
let x = image.width() / 2;
let y = image.height() / 4;
let images = model
.segment_from_points(
SegmentAnythingInferenceSettings::new(image)
.unwrap()
.add_goal_point(x, y),
)
.unwrap();
images.save("out.png").unwrap();
Structs§
- An image generated by the model
- The trocs optical character recognition model.
- A builder for
Ocr
. - Settings for running inference on
Ocr
. - The source of the model.
- The segment anything model.
- A builder for
SegmentAnything
. - Settings for running inference on
SegmentAnything
. - The source of the model.
- A quantized wuerstchen image diffusion model
- A builder for the Wuerstchen model.
- Settings for running inference with the Wuerstchen model.
Enums§
- The progress starting a model
Traits§
- An extension trait for transcribing audio streams.