What's new with Samsar Omni agent

What's new with Samsar Omni agent

Hello there, been a while since last update.. been iterating & improving. The generic video agent is now better everything- better prompt adherence, cinematic quality and more options.

Now use OpenAI GPT5.4 as inference model to create ultra-professional narrative and meta prompts within the pipeline with no manual effort.
Choose from 5 SoTA ImgToVideo models and 3 SoTA Image models for your pipeline rendition. Additionally, now choose from over 26 speakers for speaker options ranging over OpenAI TTS speakers and Elevenlabs TTS and pick and choose between Elevenlabs and Lyria2 for backing track model.

Create your own speaker sets to use in your videos instead of getting assigned random speakers in videos from user settings page in Samsar Studio.

Toggle subtitles on/off, choose from 10 font options should you want subtitles, supported in 10 languages.

SoTA performance can be achieved using VEO3.1 I2V or VEO3.1FastI2V.
T2V rendition using VEO3.1 Fast costs 45 credits / second of video and VEO3.1 costs 90 credits / second of render.
For 16.2k (~$17 worth) credits you can 1-shot a full 3 minute cinematic / educational / marketing content using the best SoTA models in the market.

1-shot T2V render VEO 3.1 Fast + NanoBanana2 Prompt - "A dark cyberpunk anime rescue mission inside an underground reactor facility: a lone bounty hunter enters through smoke-filled tunnels, dodges sweeping laser grids, outmaneuvers security drones, races past towering neon machinery, runs along vertical walls, disables armored robots with glowing energy strikes, then faces a giant guardian mech as the reactor overloads and the chamber floods with light, with kinetic movement, dramatic close-ups, intense sci-fi action, heavy atmosphere, neon machinery, and fast cinematic camera motion."

The text to video pipeline is now more precise and adherent to styling keywords . using GPT5.4 inference model for render you can create rich narratives exactly tuned to your prompt. choose from 3 image models, 5 SoTA image to video models and 2 SoTA tts models for your render. Just prompt and forget, you will get an email once your render is complete. Optionally post-post process in studio after.

Image List to Video

Image List to video to create marketing videos for product, activities and experiences from listing images. Pull metadata directly from your activity or product listing to create accurate rich narrative. Add an outro image with your branding, CTA or scannable QR codes to drive traffic to your app or website.
Image List to Video uses VEO3.1 I2V internally and incurs 75 credits / second of video. This pipeline uses Nanobanana2 internally

See this demo created for a client project, marketing videos using activity listing images with CTA and partner referral links.

Video link

Here's another demo using activity listing images animated via VEO3.1I2V with GPT5.4 Inference.


1-shot I2V using older inference engine model and previous family image-model

Retranslate video

Retranslate any video session in any of the supported languages for localized product/activity listing pages in 1-shot, no hassle. Very cost and resource efficient. For each operation, you only get charged for the part of the pipeline you consume. For example for retranslate, you will only get charged for translation (using SoTA OpenAI Models) and transctiption (only if enabled)

Join multiple video sessions to create longer "Reel" style videos for playing in partner establishment TV screens and live locations !!


In other news , we now support Seedance2.0 as well. :)
It still has high refusal rate and refuses to render for scene depicting characters but is a very high quality model. Can be used in Studio for animating product intros or images with non-human characters.

See some examples below

Closing aerial shot: slowly pull back from the rainy quay and ferries with a gentle tilt up, revealing more of the moonlit harbor and city grid, then let the motion resolve naturally and end cleanly. Keep camera drift minimal and the surroundings stable; only subtle, realistic commuter flow, believable train movement, slight ferry drift, wet reflections, and cold mist. People stay generic city dwellers, unchanged and not duplicated; all motion must be natural, world-aware, and physics-based, with no distortions or extra elements. Preserve any existing text, graphs, and illustrations exactly; add no new text, labels, symbols, non-English characters, brands, or visualizations, and do not alter any existing text.Maintain text and visual accuracy. Do not distort any text or add non-english text.


Continuation shot: slow circular dolly at waist height around the veiled plinth, minimal camera motion, smooth natural perspective shift. A cold harbor gust lifts the silver-gray cloth with believable weight, tension, hem detail, and gravity, briefly showing only the statue base. Partial technician hands and forearms at the frame edges assist naturally, faces never visible; wet stone, glass, spotlights, puddles, mist, and skyline stay stable aside from subtle wind and fog. Preserve any existing text or graphics exactly, add no new text or non-English characters, no logos, no extra objects, no duplication or distortion; all action remains grounded and physically correct.Maintain text and visual accuracy. Do not distort any text or add non-english text.


Thanks for reading
Use coupon code DISCOUNTAGENTONE for 20% discount when signing up for a creators plan, further credits can be purchased via pay as you go plans, once registered, agentic or machine-to-machine payments are also supported.
See https//docs.samsar.one for reference. Need enteprise integration, we'll build automate all of your workflows and build custom versions of enterprise pipelines tailored to your business requirements using our APIs.
Reach out contact@samsar.one