Have you ever wondered how it is that you can simultaneously listen to music, read a book and recognize the smell of freshly brewed coffee? It’s all thanks to the human ability to process multiple types of data at the same time, i.e. the fact that we are multimodal beings. Bard, the intelligent chatbot from Google, has been multimodal since July 2023. Since October, ChatGPT has been enhanced to understand multiple types of information. Both can not only understand text but also read and visualize data, conduct voice conversation and recognize images. Multimodal AI is thus gaining even more potential to revolutionize the business world. Let’s take a closer look at it to understand the vast possibilities hidden in multitasking AI.

What is multimodal AI?

Multimodal AI is a highly advanced form of AI that mimics the human ability to interpret the world using content and data from different senses. Just as humans understand text, images and sounds, multimodal AI integrates these different types of data to understand the context and complex meaning contained in information. In business, for example, it can enable a better understanding of customer opinions by analyzing both what they say and how they express it through tone of voice or facial expression.

Traditional AI systems are typically unimodal, meaning they specialize in one type of data, such as text or images. They can process large amounts of data quickly and spot patterns that human intelligence cannot pick up. However, they have serious limitations. They are insensitive to context and less adept at dealing with unusual and ambiguous situations.

This is why multimodal AI goes a step further, integrating modalities. This allows for deeper understanding and much more interesting interactions between humans and AI.

What can multimodal AI do?

Artificial intelligence models developed today employ the following pairs of modalities:

  • from text to image – such multimodal AI can create images based on textual prompts; this is a core capability of the famous Midjourney, the OpenAI-developed DALL-E 3, available in the browser as Bing Image Creator, the advanced Stable Diffusion or the youngest tool in the family, Ideogram, which not only understands textual prompts but can also place text on an image:
  • Multimodal AI

    Source: Ideogram (https://ideogram.ai)

    Multimodal AI models are also able to follow textual cues and the image they are “inspired” by simultaneously. They offer even more interesting, more precisely defined results and variations of created images. This is very helpful if you just want to get a slightly different graphic or banner, or add or remove a single element, such as a coffee mug:

    Multimodal AI

    Source: Ideogram (https://ideogram.ai)

  • From image to text – artificial intelligence can do much more than recognize and translate text seen in an image or find a similar product. It can also describe an image in words – as Midjourney does when you type the /describe command, Google Bard, and the Salesforce model (used mainly to create automated product and image descriptions on e-commerce sites,
  • Multimodal AI

    Source: HuggingFace.co (https://huggingface.co/tasks/image-to-text)

  • from voice to text – multimodal AI also empowers voice commands in Google Bard, but it is best performed by Bing Chat, as well as ChatGPT thanks to its excellent Whisper API, which copes with recognizing and recording speech along with punctuation in multiple languages, which can, among other things, greatly facilitate the work of international customer service centers, as well as prepare quick transcription of meetings and translation of business conversations into other languages in real-time,
  • from text to voice – ElevenLabs’ tool allows us to convert any text we choose into a realistic-sounding utterance, and even “voice cloning,” whereby we can teach the AI its sound and expression to create a recording of any text in a foreign language for marketing or presentations to foreign investors, for example,
  • from text to video – converting text to video with a talking avatar is possible in D-ID, Colossyan and Synthesia tools, among others,
  • from image to video – generating videos, including music videos, from images and textual cues is already made possible today by Kaiber, and Meta has announced the release of the Make-A-Video tool soon,
  • image and 3D model – this is a particularly promising area of multimodal AI, targeted by Meta and Nvidia, which enables the creation of realistic avatars from photos, as well as the building of 3D models of objects and products by Masterpiece Studio (https://masterpiecestudio.com/masterpiece-studio-pro), NeROIC (https://zfkuang.github.io/NeROIC/), 3DFY (https://3dfy.ai/), with which, for example, a two-dimensional prototyped product can be returned to the camera with a different side, a quick 3D visualization can be created from a sketch of a piece of furniture, or even a textual description:
  • Multimodal AI

    Source: NeROIC (https://zfkuang.github.io/NeROIC/resources/material.png)

  • from image to movement in space – this modality makes multimodal AI reach beyond screens into the zone of the Internet of Things (IoT), autonomous vehicles and robotics, where devices can perform precise actions thanks to advanced image recognition and the ability to respond to changes in the environment.

There are also experiments with multimodal AI translating music into images, for example (https://huggingface.co/spaces/fffiloni/Music-To-Image), but let’s take a closer look at the business applications of multimodal AI. So how does the issue of multimodality play out in the most popular AI-based chatbots, ChatGPT and Google Bard?

Multimodality in Google Bard, BingChat and ChatGPT

Google Bard can describe simple images and has been equipped with voice communication since July 2023, when it appeared in Europe. Despite the variable quality of the image recognition results, this has so far been one of the strengths that differentiates Google’s solution from ChatGPT.

BingChat, thanks to its use of DALL-E 3, can generate images based on text or voice prompts. While it cannot describe in words the images attached by the user, it can modify them or use them as inspiration to create new images.

As of October 2023, OpenAI also began introducing new voice and image features to ChatGPT Plus, the paid version of the tool. They make it possible to have a voice conversation or show ChatGPT an image, so it will know what you’re asking without having to describe it in exact words.

For example, you can take a photo of a monument while traveling and have a live conversation about what’s interesting about it. Or take a picture of the inside of your refrigerator to find out what you can prepare for dinner with the available ingredients and ask for a step-by-step recipe.

3 applications of Multimodal AI in business

Describing images can help, for example, to prepare goods inventory based on CCTV camera data or identify missing products on store shelves. Object manipulation can be used to replenish the missing goods identified in the previous step. But how can multimodal chatbots be used in business? Here are three examples:

  1. Customer service: A multimodal chat implemented in an online store can serve as an advanced customer service assistant that not only answers text questions but also understands images and questions asked by voice. For example, a customer can take a picture of a damaged product and send it to the chatbot, which will help identify the problem and offer an appropriate solution.
  2. Social media analysis: Multimodal artificial intelligence can analyze social media posts, which include both text and images and even videos, to understand what customers are saying about a company and its products. This can help a company better understand customer feedback and respond more quickly to their needs.
  3. Training and Development: ChatGPT can be used to train employees. For example, it can conduct interactive training sessions that include both text and images to help employees better understand complex concepts.

The future of multimodal AI in business

A great example of forward-looking multimodal AI is the optimization of a company’s business processes. For example, an AI system could analyze data from various sources, such as sales data, customer data and social media data, to identify areas that need improvement and suggest possible solutions.

Another example is employing multimodal AI to organize logistics. Combining GPS data, warehouse status read from a camera and delivery data to optimize logistics processes and reduce costs of business.

Many of these functionalities are already applied today in complex systems such as autonomous cars and smart cities. However, they have not been at this scale in smaller business contexts.

Summary

Multimodality, or the ability to process multiple types of data, such as text, images and audio, promotes deeper contextual understanding and better interaction between humans and AI systems.

An open question remains, what new combinations of modalities might exist shortly? For example, will it be possible to combine text analysis with body language, so that AI can anticipate customer needs by analyzing their facial expressions and gestures? This type of innovation opens up new horizons for business, helping to meet ever-changing customer expectations.

Multimodal AI

If you like our content, join our busy bees community on Facebook, Twitter, LinkedIn, Instagram, YouTube, Pinterest, TikTok.

Multimodal AI. New uses of artificial intelligence in business | AI in business #21 robert whitney avatar 1background

Author: Robert Whitney

JavaScript expert and instructor who coaches IT departments. His main goal is to up-level team productivity by teaching others how to effectively cooperate while coding.

AI in business:

  1. Artificial intelligence in business - Introduction
  2. Threats and opportunities of AI in business (part 1)
  3. Threats and opportunities of AI in business (part 2)
  4. AI applications in business - overview
  5. What is NLP, or natural language processing in business
  6. Automatic document processing
  7. AI and social media – what do they say about us?
  8. Google Translate vs DeepL. 5 applications of machine translation for business
  9. AI-assisted text chatbots
  10. The operation and business applications of voicebots
  11. Virtual assistant technology, or how to talk to AI?
  12. Business NLP today and tomorrow
  13. How can artificial intelligence help with BPM?
  14. Will artificial intelligence replace business analysts?
  15. The role of AI in business decision-making
  16. What is Business Intelligence?
  17. Scheduling social media posts. How can AI help?
  18. Automated social media posts
  19. Artificial intelligence in content management
  20. Creative AI of today and tomorrow
  21. Multimodal AI and its applications in business
  22. New interactions. How is AI changing the way we operate devices?
  23. RPA and APIs in a digital company
  24. New services and products operating with AI
  25. The future job market and upcoming professions
  26. Artificial intelligence and the environment. 3 AI solutions to help you build a sustainable business
  27. AI in EdTech. 3 examples of companies that used the potential of artificial intelligence
  28. Using ChatGPT in business
  29. What are the weaknesses of my business idea? A brainstorming session with ChatGPT
  30. Synthetic actors. Top 3 AI video generators
  31. 3 useful AI graphic design tools. Generative AI in business
  32. 3 awesome AI writers you must try out today
  33. Exploring the power of AI in music creation
  34. Navigating new business opportunities with ChatGPT-4
  35. AI tools for the manager
  36. 6 awesome ChatGTP plugins that will make your life easier
  37. AI content detectors. Are they worth it?
  38. ChatGPT vs Bard vs Bing. Which AI chatbot is leading the race?
  39. Is chatbot AI a competitor to Google search?
  40. Effective ChatGPT Prompts for HR and Recruitment
  41. Prompt engineering. What does a prompt engineer do?
  42. AI Mockup generator. Top 4 tools
  43. AI and what else? Top technology trends for business in 2024
  44. AI and business ethics. Why you should invest in ethical solutions
  45. Meta AI. What should you know about Facebook and Instagram's AI-supported features?
  46. AI regulation. What do you need to know as an entrepreneur?
  47. 5 new uses of AI in business
  48. AI products and projects - how are they different from others?
  49. AI-assisted process automation. Where to start?
  50. How do you match an AI solution to a business problem?
  51. AI as an expert on your team
  52. AI team vs. division of roles
  53. How to choose a career field in AI?
  54. Is it always worth it to add artificial intelligence to the product development process?
  55. AI in HR: How recruitment automation affects HR and team development
  56. 6 most interesting AI tools in 2023
  57. 6 biggest business mishaps caused by AI
  58. What is the company's AI maturity analysis?
  59. AI for B2B personalization
  60. ChatGPT use cases. 18 examples of how to improve your business with ChatGPT in 2024
  61. Microlearning. A quick way to get new skills
  62. The most interesting AI implementations in companies in 2024
  63. What do artificial intelligence specialists do?
  64. What challenges does the AI project bring?
  65. Top 8 AI tools for business in 2024
  66. AI in CRM. What does AI change in CRM tools?
  67. The UE AI Act. How does Europe regulate the use of artificial intelligence
  68. Top 7 AI website builders
  69. No-code tools and AI innovations
  70. How much does using AI increase the productivity of your team?
  71. How to use ChatGTP for market research?
  72. How to broaden the reach of your AI marketing campaign?
  73. "We are all developers". How can citizen developers help your company?
  74. AI in transportation and logistics
  75. What business pain points can AI fix?
  76. Artificial intelligence in the media
  77. AI in banking and finance. Stripe, Monzo, and Grab
  78. AI in the travel industry
  79. How AI is fostering the birth of new technologies
  80. The revolution of AI in social media
  81. AI in e-commerce. Overview of global leaders
  82. Top 4 AI image creation tools
  83. Top 5 AI tools for data analysis
  84. AI strategy in your company - how to build it?
  85. Best AI courses – 6 awesome recommendations
  86. Optimizing social media listening with AI tools
  87. IoT + AI, or how to reduce energy costs in a company
  88. AI in logistics. 5 best tools
  89. GPT Store – an overview of the most interesting GPTs for business
  90. LLM, GPT, RAG... What do AI acronyms mean?
  91. AI robots – the future or present of business?
  92. What is the cost of implementing AI in a company?
  93. Sora. How will realistic videos from OpenAI change business?
  94. How can AI help in a freelancer’s career?
  95. Automating work and increasing productivity. A guide to AI for freelancers
  96. AI for startups – best tools
  97. OpenAI, Midjourney, Anthropic, Hugging Face. Who is who in the world of AI?
  98. Building a website with AI
  99. Synthetic data and its importance for the development of your business
  100. Eleven Labs and what else? The most promising AI startups
  101. Top AI search engines. Where to look for AI tools?
  102. Video AI. The latest AI video generators
  103. AI for managers. How AI can make your job easier
  104. What’s new in Google Gemini? Everything you need to know
  105. AI in Poland. Companies, meetings, and conferences
  106. AI calendar. How to optimize your time in a company?
  107. AI and the future of work. How to prepare your business for change?
  108. AI voice cloning for business. How to create personalized voice messages with AI?
  109. Fact-checking and AI hallucinations
  110. AI in recruitment – developing recruitment materials step-by-step
  111. Midjourney v6. Innovations in AI image generation
  112. AI in SMEs. How can SMEs compete with giants using AI?
  113. How is AI changing influencer marketing?
  114. Is AI really a threat to developers? Devin and Microsoft AutoDev
  115. Best AI chatbots for ecommerce. Platforms
  116. AI chatbots for e-commerce. Case studies
  117. How to stay on top of what's going on in the AI world?
  118. Taming AI. How to take the first steps to apply AI in your business?
  119. Perplexity, Bing Copilot, or You.com? Comparing AI search engines
  120. ReALM. A groundbreaking language model from Apple?
  121. AI experts in Poland
  122. Google Genie — a generative AI model that creates fully interactive worlds from images
  123. Automation or augmentation? Two approaches to AI in a company
  124. LLMOps, or how to effectively manage language models in an organization
  125. AI video generation. New horizons in video content production for businesses
  126. Best AI transcription tools. How to transform long recordings into concise summaries?
  127. Sentiment analysis with AI. How does it help drive change in business?
  128. The role of AI in content moderation
  129. What is the future of AI according to McKinsey Global Institute?
  130. 3 grafików AI. Generatywna sztuczna inteligencja dla biznesu