Describe the issue
This is specific to Autogen Studio following my discussion with folks working with autogen :-)
Create a fully accessible interface for autogen studio
It's very important to me that tools i use are fully accessible. That often means multimodality for user inputs.
- in that sense multimodal on-device models are very useful.
User Input
- image , audio and text inputs using on device models
- audio and text outputs for autogen studio returns
Multimodal agents
The current examples of multimodal agents have not taken advantage of llava plus yet. it's a great opportunity to review and update multimodal agents and demonstrate them in context.
Requirements
Autogen Studio
- audio input / output
- image input
Blog : Autogen Studio with on device multimodal agents
Multimodal Agent Notebook Image Agent
- Simple image agent that can parse image inputs in 2-way chats
- Complex image agent on-device model & tools demo
Multimodal Agent Nnotebook Audio Agent(s) :
- simple audio agent that can audio to text
- complex audio demo that can text to studio :-)
Linked Issues :
My Linked Repo :
Autogen Community Contributors !
Hey we're all just doing our best to push our cool demos and ideas upstream, the best for me is to meet like minded contributors in order to co-create the accessible interface we want to use ;-) and also organise it a bit cleanly with "my linked repo" but:
- that said, dont be shy to just contribute to this issue is you own branch :-)
Steps to reproduce
- open autogen studio , cannot type : need audio
- open autogen studio , you're 4.5 years of age : need image to text
- open autogen studio , you're driving , cannot take the laptop to read the output : need text to speech
Screenshots and logs
No response
Additional Information
No response
Describe the issue
This is specific to Autogen Studio following my discussion with folks working with autogen :-)
Create a fully accessible interface for autogen studio
It's very important to me that tools i use are fully accessible. That often means multimodality for user inputs.
User Input
Multimodal agents
The current examples of multimodal agents have not taken advantage of llava plus yet. it's a great opportunity to review and update multimodal agents and demonstrate them in context.
Requirements
Autogen Studio
Blog : Autogen Studio with on device multimodal agents
Multimodal Agent Notebook Image Agent
Multimodal Agent Nnotebook Audio Agent(s) :
Linked Issues :
My Linked Repo :
Autogen Community Contributors !
Hey we're all just doing our best to push our cool demos and ideas upstream, the best for me is to meet like minded contributors in order to co-create the accessible interface we want to use ;-) and also organise it a bit cleanly with "my linked repo" but:
Steps to reproduce
Screenshots and logs
No response
Additional Information
No response