[Issue]: Accessible Interface for Autogen Studio & updated multimodal agents

### Describe the issue

**This is specific to Autogen Studio** following my discussion with folks working with autogen :-)

# Create a fully accessible interface for autogen studio 

It's very important to me that tools i use are fully accessible. That often means multimodality for user inputs. 
- in that sense multimodal on-device models are very useful.

# User Input

- image , audio and text inputs using on device models 
- audio and text outputs for autogen studio returns

# Multimodal agents

The current examples of multimodal agents have not taken advantage of llava plus yet. it's a great opportunity to review and update multimodal agents and demonstrate them in context.

# Requirements

### Autogen Studio
- audio input / output
- image input

### Blog : Autogen Studio with on device multimodal agents

### Multimodal Agent Notebook Image Agent 

- Simple image agent that can parse image inputs in 2-way chats
- Complex image agent on-device model & tools demo

### Multimodal Agent Nnotebook Audio Agent(s) : 

- simple audio agent that can audio to text
- complex audio demo that can text to studio :-) 


### Linked Issues : 

- https://github.com/microsoft/autogen/issues/290
- https://github.com/microsoft/autogen/issues/751
- https://github.com/microsoft/autogen/issues/1239 

### My Linked Repo :

- https://github.com/Josephrp/autogen/tree/main

### Autogen Community Contributors !

Hey we're all just doing our best to push our cool demos and ideas upstream, the best for me is to meet like minded contributors in order to co-create the accessible interface we want to use ;-) and also organise it a bit cleanly with "my linked repo" but:

- that said, dont be shy to just contribute to this issue is you own branch :-) 

### Steps to reproduce

- open autogen studio , cannot type : need audio
- open autogen studio , you're 4.5 years of age : need image to text
- open autogen studio , you're driving , cannot take the laptop to read the output : need text to speech

### Screenshots and logs

_No response_

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Accessible Interface for Autogen Studio & updated multimodal agents #1519

Describe the issue

Create a fully accessible interface for autogen studio

User Input

Multimodal agents

Requirements

Autogen Studio

Blog : Autogen Studio with on device multimodal agents

Multimodal Agent Notebook Image Agent

Multimodal Agent Nnotebook Audio Agent(s) :

Linked Issues :

My Linked Repo :

Autogen Community Contributors !

Steps to reproduce

Screenshots and logs

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue]: Accessible Interface for Autogen Studio & updated multimodal agents #1519

Description

Describe the issue

Create a fully accessible interface for autogen studio

User Input

Multimodal agents

Requirements

Autogen Studio

Blog : Autogen Studio with on device multimodal agents

Multimodal Agent Notebook Image Agent

Multimodal Agent Nnotebook Audio Agent(s) :

Linked Issues :

My Linked Repo :

Autogen Community Contributors !

Steps to reproduce

Screenshots and logs

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions