Reliable AI Software Development Company

Get secure, custom-designed AI software that intelligently adapts to your data
  • Increase sales conversions with AI recommendations
  • Enhance accuracy in business decisions with our ML models
  • Process complex data quickly for valuable insights into customers/employees

What is AI Software Development?

What It Is

AI software development is about creating AI-powered software products (including machine learning, deep learning, and NLP-based solutions). AI engineers and engineering teams select, fine-tune, or integrate AI models, then build, test, deploy, and monitor AI-based applications that use AI capabilities to solve business problems. They also configure and customize prompts, design RAG to keep AI current and domain-specific, and continuously optimize cost and performance.

What It’s Not

AI software development does not apply when AI is just a productivity tool used to build, test, or deploy traditional software faster. AI agents can write and explain code, assist with UX/UI design, scan code for bugs or security vulnerabilities, translate business ideas into technical requirements, automatically deploy or roll back a release, and monitor performance in real time to optimize server. However, if the resulting software makes no AI-driven decisions, there is no AI software being developed.

The Role of Data

AI software development is based on gathering raw, relevant data (business, patient, sensor, etc.) and transforming it into a usable form for the AI to achieve its commercial goal, whether it's clustering (discovering groups), classifying (assigning to groups), detecting (presence or absence), retrieving (fetching relevant items), regressing (predicting or explaining), ranking, recommending (personalization), generating (new content), transcribing (converting to text), or planning (agentic AI).

Hire AI Developer for Custom Solution

Our AI and machine learning specialists and data scientists know how to design, train, and fine-tune AI models. Software engineers from Belitsoft integrate those AI models into software products for your customers and configure access to them. You can also expect roles such as UX designers to create intuitive interfaces for AI applications and project managers to keep development on track.

For Startups

AI Developers for Startups

For startups and small companies, we provide full-stack AI engineers who can select, customize, and deploy models to production, as well as do frontend coding.

  • Full-stack AI development
  • Model customization & deployment
  • Frontend integration
  • Rapid prototyping

AI Software Development Services

Today, computing power has grown enough to finally use AI for commercial purposes.

Get customized AI solutions quickly

Integrate AI Into Your Business Ecosystem with our API Module

  • Get a module for code-free, vibe-powered integration with your ERP, financial application, or any other data source within your business infrastructure
  • Use AI trained specifically on your business information like real-time financial data, transaction history, inventory levels, and customer records to enhance accuracy and relevance
  • Access real-time insights and analytics for financial forecasting, inventory management, customer service, and more
  • Ensure data security and regulatory compliance through strong security measures and strict adherence to standards

Analytical AI Solutions

  • Increase revenue with real-time insights on data trends gained by cutting-edge ML models as stream processing for handling large real-time data flows and data clustering for speedy, precise data categorization
  • Mitigate business risks using an AI early warning system that employs ensemble and time series ML models to accurately predict trend and detect a pattern in company data and financial reports
  • Grow customer earnings and reduce churn with AI-driven marketing, leveraging ensemble, time series, and computer vision models to analyze customer interactions and preferences smartly, guiding tailored and timely offers
  • Discover and anticipate customer preferences with the AI analytical system that sifts through large datasets for complex patterns
  • Free up your team for strategic initiatives by utilizing ML models to automate tasks such as email segmentation, prioritization, and generating automatic responses to standard inquiries without human oversight
  • Cut operational costs by optimizing resource allocation through real-time employee monitoring with machine learning: classification models to organize activities and NLP for analyzing real-time communication data with customers and colleagues

AI Chatbots

  • Enhance response speed and accuracy, and reduce reliance on live agents with a conversational AI assistant that understands user context and processes complex, multi-intent queries
  • Stay ahead of the competition with a cutting-edge voice bot built on advanced speech recognition and NLP models for effortless voice-based customer interaction
  • Increase customer retention and their long-term spending with an AI chatbot acting as a personal advisor, using NLP to understand context and sentiment, and a customer segmentation ML model to send customized follow-up messages
  • Boost customer engagement with an ML-powered AI chatbot that facilitates smooth cross-channel communication, allowing users to switch effortlessly while maintaining the context and reducing redundancy

Customize with Extra ML Features

Our off-the-shelf solutions are easily tailored to your specific needs by incorporating any of the Machine Learning models we offer



Predictive ML Models to Maximize Business Outcomes

Use the potential of machine learning to predict customer churn, make personalized recommendations for the best product or service (Next Best Offer), and identify early signs of significant future events for risk management.

  • Ensemble ML Model aggregates predictions from diverse models to improve accuracy
  • Time Series ML Model analyzes data points collected in chronological order to understand underlying patterns, trends, and seasonalities in time-stamped data
  • Graph-Based Model recommends products, content, or services by exploring the connections between users and items within a network
  • Context-Based Model provides personalized suggestions based on the user's specific context, such as location or time
  • NLP (Natural Language Processing) enables computers to understand and respond to human language, including sentiment analysis for emotion detection, personalized text recommendations, concise summarization of extensive texts, and creating relevant content

Segmentation ML Models to Personalize Customer Service

Drive customer loyalty and sales with robust ML models that classify customers and deliver targeted content, services, or products.

  • RFM Analysis segments customers using Recency, Frequency, and Monetary value to identify nuanced patterns and predict purchasing behavior
  • Unsupervised Clustering automatically groups customers with similar RFM characteristics without predefined labels, revealing complex hidden patterns in customer data for better segmentation
  • Time Series Clustering categorizes customers according to temporal behavior patterns, such as purchase frequency over time, enabling more tailored marketing strategies

Optimal Control ML Models to Enhance Marketing Efficiency

Make data-driven decisions to control and optimize marketing campaigns, delivering the right content to the right users at the right time, resulting in boosted campaign performance and ROI.

  • Contextual Bandit Model personalizes customer experience by dynamically selecting the most effective options to achieve positive outcomes like clicks or purchases based on user data and behavior (like ads, recommendations, etc.), driving engagement and campaign success
  • Optimization-Based Model maximizes marketing objectives, such as the click-through rate or conversion rate, by efficiently allocating resources while considering limitations like budget or reach

Statistical ML Models to Enhance Financial Decision-Making

Predict and analyze consumer spending habits, strategically manage bill payments and forecast future expenses with sophisticated ML models that can handle large datasets and complex relationships within financial data.

  • Linear Regression Model predicts a value by finding a straight-line relationship between original pieces of data, for example, predicting next month's sales based on the number of customer requests this month
  • Probabilistic ML Model aids in decision-making by evaluating future conditions with probabilities in uncertain situations, like financial risk assessment
  • Non-linear ML algorithms handle complex models with non-linear relationships, like financial modeling, where market behaviors and consumer trends rarely follow linear patterns

How Our Custom AI Software Works with Your Data

Our AI system is built to grow with your needs, operate fast, and manage complex data and tasks. It integrates the latest tech advancements, combines open-source and enterprise technologies, and is flexible enough to be deployed either in the cloud or on-premises.

1

Collecting data from your sources

We carefully collect data from different sources in our Staging Database. It arrives via batch processing tools, continuous streams, or direct API connections.

2

Separate storage for AI analysis data

Our database keeps your main databases quick and handles large data sets. It is equipped with SQL databases, Redis, and RabbitMQ to avoid slowdowns and crashes.

3

Converting data into AI insights in AI core

We use Python, ML frameworks, NLP techniques in the AI Core to refine data and train ML models. Deep learning uncovers complex patterns and optimizes decisions.

4

Implementing AI-driven insights and data

For seamless model-to-app communication we use FastAPI while integrating AI insights into your software via APIs, including OpenAI API or dedicated AI tools.

Technologies and tools we use

AI Development
DATA PROCESSING
Batch
Apache Airflow
T-SQL
Argo
Real-time
Kafka
RabbitMQ
RestAPI
DATABASES
PostgreSQL
SQL Server
Redis
ML DEVELOPMENT
Core
Python
NumPy
SQL Alchemy
Pandas
ML
Scikit learn
Deep learning
PyTorch
TensorFlow
PyTorch geometric
NLP
spaCy
NLTK
Hugging Face
Other
GENSIM
OpenCV
DEPLOYMENT & INTEGRATION
FastAPI
Apache Airflow
Argo
Docker
Celery
TensorFlow Lite
ONNX
TensorRT

Frequently Asked Questions

In 2024, companies can pay anywhere from $0 to more than $300,000 for AI software. The cost usually comes from complexity, functionality, and exclusiveness. Utilizing pre-built AI solutions, like ones offered by Belitsoft, can be a cost-effective alternative to building from scratch - they have significantly faster time to market and lower cost.

AI software development services often exceed timeline expectations. Implementing AI-driven development solutions can take around two weeks or more for Proof of Concept (POC) and three to six months or longer for full integration. Contact us for a personalized assessment of the timeline needed for your AI solution!

Portfolio

AI Agent Development: Chrome Extension as In-App Guidance Tool For ERP and CRM
AI Agent Development: Chrome Extension as In-App Guidance Tool For ERP and CRM
For an ecommerce client, we developed an AI Chrome extension that provides in-app guidance - a digital adoption tool to help the founder train staff on best practices and reduce high turnover. While one of their systems, Microsoft Dynamics 365 Business Central, has built-in Copilot agents that automate tasks, these agents don't teach employees how to navigate the interface.
Bespoke Conversation Intelligence Software with Speech Analytics for Call Center Quality Assurance (Automotive Industry)
Bespoke Conversation Intelligence Software with Speech Analytics for Call Center
Belitsoft implemented a custom AI-powered conversation intelligence and speech analytics system to automatically analyze call center conversations for a major auto parts distributor with 8,000 employees, 500 branches, and over 400,000 calls monthly. In three months, the implementation delivered 75,000 euros in additional revenue and two staff positions were cut.
Custom AI Voice-Based Coach Development (Assessment Automation)
AI Voice-Based Coach
Our client is a company involved in software development, IT services, and technology innovation. Over six weeks, we developed an MVP. It provides an efficient knowledge assessment for employees by automating test creation.
Custom Training Software based on Chatbot with Coaching/Mentoring Functionality
Custom Training Software to Develop Leadership Skills in Employees
Our Client, Jeff Otis, a US entrepreneur, turned to Belitsoft to build a unique personal leadership development program. Now, we have launched an MVP of this game-changing personalized interactive web platform with coaching/mentoring functionality.
Custom Chat-Bot and SAAS Web Platform For Lead Generation
Custom Chat-Bot and SAAS Web Platform For Lead Generation
For our client, chief executive officer of a startup company from Germany, we successfully developed a chatbot to convert website visitors to leads and a database application to store them.

Recommended posts

Belitsoft Blog for Entrepreneurs
Claude vs ChatGPT
Claude vs ChatGPT
The Pentagon Situation Anthropic doesn't allow use of its models for fully autonomous weapons or mass domestic surveillance. When the US military used Claude during a January raid to capture Venezuelan President Nicolás Maduro, Anthropic objected. The Trump administration then labeled Anthropic a supply chain risk and told federal agencies to stop using it. However, reports showed the US military continued using Claude for intelligence work, target selection, and battlefield simulations during the US-Israel strikes on Iran, despite the ban. Defense Secretary Pete Hegseth said the military would keep access to Anthropic services for up to six months while switching providers. OpenAI reached a new agreement to run its models on classified Defense Department networks. CEO Sam Altman said the deal still follows OpenAI's own rules, which ban mass domestic surveillance and require human oversight over use of force. Shift in the Market Many users disagreed with the OpenAI agreement. Around 700,000 users reportedly cancelled ChatGPT. Claude's iOS app hit the number one free app spot on the App Store. ChatGPT dropped to second, Gemini to fourth.  Anthropic says free users are up over 60% since January, daily signups tripled since November, and paid subscriptions more than doubled this year. Claude Outage Following the surge in users, today, on March 2, Anthropic's Claude went down. Users faced complete outages. Reports peaked at nearly 2,000 on Downdetector within a short period. In total, around 10,000 reports were submitted. Anthropic said the Claude API was working fine, and that the failures were limited to the claude.ai website. Switching from ChatGPT to Claude Most users avoid switching AI tools because they don't want to lose their conversation history. Anthropic built a few ways to fix this. The fastest option is the Import Memory feature. You paste a prompt from Anthropic into ChatGPT, copy the output, and paste it into Claude's memory settings. Done in under a minute. For a full data transfer, you can export your entire ChatGPT history through settings. It takes 24-48 hours to receive the zip file. You then upload it into a Claude Project, giving Claude access to your full history. You can also download individual ChatGPT conversations as markdown files and upload them directly into Claude. Claude vs ChatGPT Comparision Both ChatGPT Plus and Claude Pro cost $20 per month via web version. In terms of input tokens, GPT-5 is cheaper than Claude Sonnet 4.6. Claude Sonnet 4.6 is priced at $3 per million input tokens and $15 per million output tokens. GPT-5 starts at $1.25 per million input tokens and $10.00 per million output tokens, while GPT-5.2 is priced at $1.75 per million input tokens and $14 per million output tokens. However, Claude is better at coding, generates more cautious and structured answers, and works well with large documents and code. Claude Pro is a better choice for coding quality, and its output is often more natural and context-aware. Claude Pro is also well suited for large documents, long-form summarization, and prototypes, while ChatGPT Plus is better for image generation, video with Sora, and web research. In one comparison, Claude 4.6 Sonnet outperformed ChatGPT 5.2 on most tests. Claude did better at step-by-step reasoning, financial analysis, and tone control. ChatGPT does better at explaining complex topics in simple language and storytelling. For day-to-day professional work, reviewers currently rate Claude as the better tool.
Dmitry Baraishuk • 2 min read
Top AI Developers 2026: How to Choose the Best AI App Development Company
Top AI Developers 2026: How to Choose the Best AI App Development Company
AI app creation is the process of developing software apps that utilize AI models and algorithms for analysis of the data, learning from it, predicting, and responding smartly to interactions with users. Machine learning (ML) technology is used for service and innovation driving in various fields, from cybersecurity and legal research to healthcare automation. Multiple artificial intelligence software development companies around the world provide AI app development services and it can be difficult to select the most appropriate one from the list of popular AI platform software brands. Let's read what some of the major players in the AI app development field have to offer. Top AI Software Development Companies in the USA Belitsoft Outsourcing AI development company Belitsoft provides SaaS startups with generative AI development services to create Gen AI apps. When creating an application, developers work in stages: first, the application is designed and experimented with, then built and finally deployed. During the development stage, programmers research and evaluate models from open-source communities or popular repositories (e.g., Hugging Face). Given the model's performance and size, developers use benchmark tools and different prompting techniques (chain-of-thought prompting, zero-shot prompting, etc.). When creating genAI apps, Belitsoft developers cut the cost of AI development by using the right frameworks, such as LangChain, and tools. When deploying, a hybrid "Swiss Army knife" setup approach is preferred - for different use cases different models are used, and cloud infrastructure is combined with an on-premises one to optimize budget and resources. Once AI-powered apps are launched into production, Belitsoft specialists conduct benchmarking, monitoring, and handling of exceptions thrown by the app. According to surveys, many companies launch up to several dozen genAI experiments and expect to scale up about a third of their proof-of-concept AI within three to six months. These organizations are from key industries such as technology, financial services, telecommunications and media, healthcare, etc., and/or are advanced in the use of AI. The market demands that business leaders quickly realize the benefits of AI, so the initial prototypes are needed within a few weeks. Belitsoft specialists create prototypes to support short cycles of products very quickly and focus on projects that promise quick wins (such as automating a specific task). The multidisciplinary team consists of data scientists, machine learning (ML) and AI specialists, and ML engineers. Belitsoft software engineers integrate AI into products. They also set up the deployment pipeline. UX designers develop intuitive experiences based on AI. For small companies and SaaS startups, full-stack AI engineers handle multiple tasks, from developing models to writing front-end code. In the early stages, one or several ML engineers quickly build a prototype for them using public APIs and emerging AI coding practices, which allows startups to follow their budget constraints. For AI projects, enterprise customers like Fortune 1000 companies get larger cross-functional teams. Belitsoft brings in data engineers for pipelines and data preparation, MLOps engineers for model deployment and monitoring, and security experts. OpenAI This company designed ChatGPT - an AI tool that utilizes large language models (LLMs). AI technologies, which are developed by OpenAI, optimize business processes and empower interactions in real time. The company partners with Microsoft, which in turn provides advanced automation systems, virtual assistants, and other secure genAI solutions for different industries. Microsoft Microsoft and OpenAI have a long-standing partnership that is backed by billions of dollars in investment. In 2019, Microsoft Azure became the sole provider of OpenAI cloud solutions thanks to this. The company uses machine learning models and AI-powered tools to improve efficiency and productivity in various industries. OpenAI is integrated into Microsoft’s Prometheus model. The company also aims to rebuild its Bing search engine, also known as Copilot, to compete with Google in the search market. In enterprise AI use cases, Microsoft Bing provides real-time automation solutions and advanced AI assistants to optimize workflows. IBM Watson IBM clients can make better decisions with the Watson AI product portfolio. IBM offers Watson Studio services for designing and developing AI applications for enterprise clients. The company's solutions include AI apps that improve customer service, simplify workflows, predict outcomes, and reduce costs. Among the case studies that IBM presented, there is the creation of models for predicting and preventing mortality from sepsis based on clinical data of inpatients. These models have shown high efficiency in situations where time matters and rapid analysis of insurance claims data allows for faster decision-making, for example, on urgent medical interventions. AWS AI One of the AI and ML services provided by AWS AI (headquartered in Seattle, USA) is Amazon SageMaker. This solution makes it faster and easier for engineers to build, train, and deploy ML models. Customers across industries use AI and ML tools from AWS AI to personalize, automate, and optimize their business workflows. Using AI tools, AWS AI customers can improve response rates by creating messages and emails based on the behavior and profile of the prospect. By analyzing service, product, industry, and customer segment, they can create talking points or sales scripts. Google AI This branch of Google is engaged in developments in the fields of natural language programming, machine learning, and computer vision. Google AI research and development have led to the creation of Google Cloud AI, Google Translate, and Google Assistant. Google Cloud’s LLMs technologies and GenAI capabilities transform the fast-food restaurant industry’s customer experience when ordering food in the drive-thru mode. A voice-controlled AI assistant replaces an employee. It processes customers’ voice requests for orders and generates answers to popular questions. Integration with the POS system allows the AI assistant to quickly create an order and send it to the kitchen. Salesforce Einstein With AI-powered customer relationship management (CRM) tools, companies can provide personalized customer experiences. They use machine learning, automation, and predictive analytics. Einstein AI’s functionalities enable workflow automation, lead scoring, and sales forecasting. In particular, sellers can use Salesforce Einstein AI solutions to automatically generate sales pitches for each lead individually. The AI tools use CRM data for phone and email introductory messages. The assistant bot studies the customer's latest CRM data to prepare or correct an email that matches the lead's needs in context and tone. Deloitte AI This professional services firm offers companies from various business industries (government, healthcare, finance, etc.) comprehensive services of AI strategic planning and development. Deloitte AI clients use AI solutions to increase efficiency, automate processes, and improve decision-making. Generative AI models continuously and simultaneously find discrepancies, patterns, and anomalies and perform a root cause analysis in real time. This is important for risk management processes. Intel AI This AI hardware market player offers services and products ranging from advanced AI processors and chips to AI software to help companies in industries such as financial services, cybersecurity, automotive, and healthcare develop and scale AI apps. AI services and products drive progress in real-time data processing and automation, enabling the creation of advanced AI models and effective machine learning. Which AI Use Cases Show the Most Promise for Companies? Open access to AI tools has inspired companies to change how they operate and start using GenAI technology in various areas. According to Deloitte research, the IT function occupies a leading position — 28%, operations account for 11%, marketing — 10%, cybersecurity and customer service — 8% each. In the consumer industry, GenAI apps are used for IT and marketing functions and their volume of GenAI initiatives is 20% each; the volume of customer service initiatives is 12%. In the financial services industry, the most scaled GenAI initiatives are IT (21%), cybersecurity (14%), and finance (13%). In the government industry, IT initiative occupies 96% and operations — only 3%. Also, Gartner notes that by 2029, the customer service and support industry will be transformed by advanced agentic AI tools, which will autonomously resolve 80% of tasks without human intervention. In this case, generative AI is one of its key components. According to the Harvard Business Review survey, 89% of respondents expect that AI will become the most transformational technology in a generation. This drives overall investment in corporate data and AI initiatives. Nearly 99% of companies surveyed say they have increased their investment in AI, with nearly 91% citing it as their top priority. Respondents specify that they see the value of their investment in measurable, quantifiable business results that can be tracked through metrics such as increased productivity and revenue, improved customer acquisition, retention, and increased customer satisfaction. The percentage of companies allocating from 20 to 39 percent of their overall AI budget to GenAI increased twelve points in 2024. AI has the potential to shape major areas such as finance, healthcare, cybersecurity, and education. Fintech Industry In 2025, the financial sector is shaped by the following trends — AI chatbots for customer service, algorithmic trading, customized financial services, risk assessment, customer authentication, regulatory compliance, transaction optimization, and risk assessment. Cybersecurity AI tools enable companies to monitor security in real time, detecting malicious digital footprints, intrusions, and fraud. AI-based software performs predictive and simulation modeling so that the company is prepared for possible attacks from hackers and cybercriminals. Healthcare The development of AI is important for various areas of healthcare: chatbots provide initial consultations for patients with mental health problems, and AI-equipped robots ensure precision in complex surgeries. AI tools conduct large-scale data analysis, optimize the management of clinical patient data, identify healthcare trends, and predict possible disease outbreaks. Moreover, the biomedical and healthcare fields advance with the contribution of AI technologies in medical image analysis, patient diagnosis, personalized drug prescription, follow-up monitoring of treatment progress, drug development, and predictive analytics. Education AI technologies enable teachers to create advanced learning materials to make the education process more adaptive and save time. AI tools can analyze students’ progress to identify gaps in their knowledge and adjust their learning, as well as provide students with individual assignments and rewards based on their performance. How to Choose a Company that Provides GenAI Services? There are several important criteria for choosing the best AI company: data protection and privacy measures, domain knowledge, experience and expertise, reputation, portfolio and client reviews, and budgetary considerations. Make sure the company possesses strong data protection controls when you deal with sensitive data. Having the specific industry or domain knowledge of the company can be a difference-maker in terms of how effective and relevant the artificial intelligence solutions they provide will be. A domain-expert company is more apt to have insights into the exact business problems. The team's level of experience, as well as credentials and training in AI technologies, ensure effective and comprehensive collaboration. In addition, it is important to evaluate criteria such as the team's working methodologies, adaptability, and availability. When choosing an AI product development team, it is important to pay attention to its client base and list of completed projects. You can also talk to past or current clients who have sought services from the AI company. It is important to conduct an in-depth study of the supplier's payment policies, pricing model, warranty policies, quality control guidelines, and delivery timelines. The agency must be in touch throughout the work process. It has to ensure transparent communication, regular feedback sessions, and joint discussions of adjustments. One of the key indicators of transparent communication is an agile project management methodology with set sprint deadlines, within which the team performs a certain amount of work, reports on the results, and discusses possible improvements with the customer. An AI solution must grow with the customer’s business and meet increasing demands, so it is important that the artificial intelligence development company chosen is able to scale your project. After development, an AI product needs maintenance and continuous support. The customer should make sure that the AI team provides appropriate services after development. Benefits of Collaborating with Companies that Develop AI Software One of the key benefits of collaboration with top AI development companies is experience and deep knowledge in the field of frameworks and advanced AI algorithms. The client doesn't need to spend time on internal training of their in-house engineers. In addition, there are other benefits. AI-Powered Solutions Are Tailored to the Client's Business Goals and Objectives The development company offers customized services and solutions, from designing intelligent recommendation systems for personalized customer experience to developing AI-powered chatbots for communication with users. Cost-Effectiveness without Additional Budget for Hiring and Training Staff A company that needs an AI solution building can spend a lot on recruiting and training its internal team. The experience of a third-party AI development company is a more profitable alternative for the client to build its infrastructure. AI-Powered Technologies Improve the App Development Process Engineers save time and can focus on more important responsibilities when routine tasks are automated with AI tools. Collaboration with an AI software development company involves the creation of flexible engagement models. The client can order a small project to prove their concept, and if desired, incorporate AI incrementally at the enterprise level. What Are the Current Trends in AI Development? When looking at the statistical cross-section of broad AI development trends such as virtual assistants, natural language, machine learning, robotics and process automation, and computer vision, the McKinsey Global Institute projects that by 2030, approximately 70% of all companies will likely have adopted at least one AI tech category. No more than half of the companies will adopt all five categories, and there will be many different companies in between at different stages of AI adoption. At the average rate of AI adoption, AI is expected to add approximately 13 trillion dollars to the global economy by 2030. This number is equivalent to a 16 percent increase in cumulative GDP compared to today’s level. AI models are expected to continually adapt and learn from changing conditions, accumulating new skills and knowledge in addition to what they have already learned. Specific future trends include: More Complex AI Algorithms AI systems are predicted to use generative models, reinforcement learning, and deep learning to improve efficiency and performance in different tasks. AI for Edge Computing With the development and spread of edge computing, AI models and algorithms allow users to lower latency, be less dependent on cloud infrastructure, and process data in real time. Ethical AI Systems and Addressing Bias AI systems are designed to be fair, ethical, and include mechanisms to detect and mitigate bias, ensuring equitable and responsible deployment. Responsible Use of AI Transparency, accountability, and ethical behavior of AI systems are possible thanks to the creation of best practices for the development, deployment, and utilization of these systems, as well as standards and regulatory policies. Multimodal AI In the future, greater interaction and understanding of different contexts will be possible through the integration of data from multiple text, video, audio, and other sources. Providing a Personalized Experience AI algorithms will enable personalization across a variety of industries and applications, including content curation, bespoke recommendations, targeted marketing, and adaptive learning. Explainable AI (XAI) Today, developers create AI models and/or functions that operate in a more transparent way, explaining their decisions. This increases interpretability and trust from users. This is especially important for financiers, healthcare professionals, and lawyers. ML and NLP Advances in NLP mean that AI could respond to cultural subtleties, idioms, and emotions, understanding nuances and context when communicating with a person, rather than just the meaning of words. Popular NLP apps include AI search, which leverages the power of large language models (LLMs) to improve the way people search for information on the internet. LLMs can answer questions as if they were humans, create and sort text, recognize text in different languages, and translate it from one language to another. How Does an Outsourcing AI Software Development Company Belitsoft Help? Deloitte says that 68% of executives rate their skills gap as extreme or moderate, while 27% report an extreme or significant gap. These companies are looking for experts to help expand their capabilities. The professionals in greatest demand are the ones who build AI-based solutions. These are the researchers who create new types of systems and AI algorithms, the software engineers, the data scientists, and the project managers who make sure that AI projects are carried out according to plan. But searching for such specialists or retraining in-house ones can be expensive for an organization. The profitable option is to rely on a credible AI custom software development firm. Clients from multiple industries — from healthcare to finance — choose Belitsoft due to our 20+ years of deep expertise in building SaaS solutions, including AI-powered B2B SaaS products. Representatives of the Belitsoft team also provide customers with in-depth tech knowledge in machine learning, including specialized frameworks for LLM and NLP. By outsourcing the development of an AI product, a client can cut costs in half while still maintaining the necessary oversight and control. Both AI startups and enterprises can rely on the custom APIs development, MVP development, cloud migration, and other comprehensive software development services that are adapted to any business specifics. Belitsoft provides customers with new app version releases one or two times a month. It also offers: Collecting client data from various sources in the staging base of data through direct API connections, tools for batch processing, and continuous streams Storing AI analysis data separately in the warehouse Implementing TLS and SSL encryption and multi-factor authentication to better secure the sensitive data of clients Handling complex projects with trusted methodologies like Agile Using specific protection protocols for HIPAA, PCI, and GDPR compliance (and other standards, depending on your industry-specific project) We utilize Continuous Integration (CI) and Continuous Deployment (CD), and develop AI systems that are able to grow according to the needs of your business. Partner with Belitsoft to get secure, custom-designed AI software and integrate analytical AI systems, AI chatbots and machine learning models. We take a consultative approach, understand the client’s unique challenges, and craft a solution accordingly. Contact us and we will promptly discuss your project requirements.
Dmitry Baraishuk • 11 min read
Agentic AI Coding: What Still Remains Expensive Amid a 90% Drop in Costs
Agentic AI Coding: What Still Remains Expensive Amid a 90% Drop in Costs
Benefits of Agentic Coding Engineering Cost Reduction Agentic coding has changed the software development market by cutting the labor cost of implementation for simple and internal tools. The cost of writing has dropped by up to 90% compared with similar work a decade ago.  Agentic coding excels at CRUD applications, simple web forms, standard workflows, small internal tools, simple test suites, and basic API glue code. For these types of projects, development costs have dropped approximately 90%. AI tools let companies delay hiring managers and larger engineering teams, because a small senior team can do more. Tools like Cursor plus Claude allow a single experienced engineer to generate output that used to require a small team. With these tools plus smaller teams, a handful of people can now achieve roughly an order of magnitude more than before. Engineering Time Savings For a typical internal tool where data modeling is already complete, work that previously required a small team can now be finished in a few hours with an agentic coding command line interface. AI coding agents like Claude Code can generate a full unit and integration test suite for a fairly complex internal tool in a few hours. The AI-generated test suite, which contained more than 300 tests, would have taken several software engineers several days to write by hand. A project that previously took roughly a month from start to release can now be completed within about a week when using agentic coding tools. New Market Opportunities For niche platforms, AI makes it economically viable to build a product where the total market size is no more than 10 million USD  in annual revenue and hiring a full team would not have been worthwhile. Faster Product Idea Exploration Agentic coding tools are extremely good at turning business logic specifications into well-written application programming interfaces and services. AI agents allow faster exploration of product ideas, which shortens the loop between product and engineering.  Pairing a business domain expert with a motivated software engineer and these tools produces an extremely powerful combination. Instead of a larger squad that pairs a business specialist with a group of software engineers, we will see much tighter two-person pairings. Such pairings allow extremely rapid iteration on software products. If the chosen direction is poor, the team can discard the current software and quickly start again, using what they have learned. In this new mode, the hard work is the conceptual thinking rather than the typing. Faster B2B SaaS Customization There is a lot of customization per client in B2B SaaS, so the fast conversion of unclear requirements into working prototypes that reveal misunderstandings is the biggest gain.  AI agents help most when software requirements are fuzzy, polish and long-term support are less critical right now, and you only need to iterate quickly by creating simple prototypes. Excel Spreadsheet to Web App Conversion Every organization has hundreds, and possibly thousands, of Excel sheets that track important business processes. Those Excel-based processes would be much better expressed as applications. In some cases, a professional development agency can turn these spreadsheets into an application for around 5000 dollars by combining a competent software engineer with AI tools. Cutting Recurring SaaS Costs with Self-Hosted Internal Solutions, Built Cost-Effectively with AI Agents As AI lowers the cost of custom development, SaaS tools that mostly wrap simple workflows are no longer viable as monthly subscription products. High SaaS subscription prices make it easier for companies to justify replacing them with AI-coded internal solutions. Some multi-billion dollar corporations are already replacing SaaS tools with custom internal solutions built with AI assistance. They:  replicate Salesforce-like features and embed them directly into internal systems to reduce costs. replace tools like Fivetran with internal ETL solutions based on open-source platforms and custom code, saving 40,000 dollars per month, reducing maintenance costs, and making customization inexpensive. rebuild key features of expensive back office SaaS in several weeks. For many internal workloads or custom mobile apps, it now makes sense to build rather than buy. Low-code platforms combined with AI agents enable companies to build business applications quickly, replacing subscription-based alternatives. Optimal Use Cases for Agentic Coding AI is especially powerful in small companies or teams that already have an engineering culture and testing practices.  Production of Small Applications Many developers now produce far more small internal or personal applications, even if those applications never become public products. Even if AI may produce boring, ugly code, it is still good enough for personal tools such as IDE plugins that saves developer time. Component Development Large language models are very good at writing small components from the bottom up and stitching them together.  A software engineer can feed a REST API specification to an AI coding tool and receive a module that largely works. They can also write documentation and explanations for protocols or complex code paths. Composing Small Parts With guidance, AI is very good at composing small parts, such as several API calls plus error processing, into a coherent routine.  Senior developers are comfortable with letting AI generate entire small components such as simple dialogs or small modules. Predictable Tasks, Known Patterns and Libraries AI tools work best when they can compose known patterns and libraries rather than invent new designs.  AI coding tools work excellent when the underlying task is predictable, such as generating wrappers or predictable user interface patterns. They are strong at identifying libraries that solve a problem when given freedom to choose the approach.  The Main Use Case: Understanding Legacy Code The main value of AI agents is not code volume, but having a second brain that thinks faster than a human developer. Senior developers use AI agents to understand large legacy codebases, not necessarily to write big changes autonomously.  Legacy projects are dangerous for agentic coding, because AI agents may generate large diffs that are hard to review, and lack of tests makes correctness very hard to assert. However, even if they are not good at generating new code in that environment, AI tools are excellent at parsing existing legacy code and using it to explore scenarios and hypotheticals.  AI can extract complex business logic even from obscure implementations such as templates and plugins. AI is also good at explaining low-level code, including assembly for retro platforms. What Makes Code Legacy When models are fed well-structured code even from legacy systems, they can help deliver changes faster without a large team. AI agents can easily read a specific function, understand what it does, and propose changes with high accuracy because there are boundaries in the well-structured code. However, most legacy projects are those that are not maintained and have little or no test coverage, and where engineers fear changing anything. Legacy is a business decision: a system becomes legacy when the business declares it obsolete and stops investing in it.  Poor engineering and mismanagement can create legacy code even on a brand new project, so age is not the key factor.  Most vibe-coded apps become legacy almost immediately, because no one wants to invest in cleaning them up. 10x Productivity Gains A 10x personal productivity increase may occur when working on a large legacy codebase if your code agent can: scan and understand big old codebases, answer questions about them, propose targeted changes, and assist with testing and debugging. AI is excellent at understanding existing code, summarising it, and answering questions, and this is the main productivity gain for work on legacy systems. AI is a crew of excavators compared to a single shovel when exploring large garbage heap codebases. Developers have spent a lot of time trying to understand codebases that are several years old. Agents make understanding these older codebases easier: explain what the code is doing, and locate bugs in that code.  New Automation Opportunities Many use cases senior developers previously would not have bothered to script or automate are now easy, and they have cranked out small scripts and small web services in several hours using AI. AI-Assisted Code vs. Contractor Code Developers would rather inherit a repository written with the help of an agent and a good engineer in the loop than one written by a cheap contractor of questionable quality from India who left several years earlier. The kind of repository left by such a questionable contractor typically has no tests and code is often a mess of classes and methods. What Remains Expensive in Custom Software Development Even with AI Agent Coding AI hype around code is comparable to the hype around self-driving cars: solvable in theory, but much harder than many assumed. In fact, coding time is often only a small fraction of the total time spent on a complex enterprise project. Hidden Costs The main costs are not in the initial building but in future maintenance, feature additions, operations, and organizational coordination, which AI reduces far less. Debugging and supporting software in production is still difficult and has not been automated by AI. Production software also has many hidden costs that include: security, upgrades and patches, hosting and uptime at scale, customer interactions, regulatory and compliance aspects, product management and design, and data migration and integration. The cost of having software today also includes the cost of dealing with cloud platform complexity, such as Kubernetes, distributed databases, queues, and multiple user interfaces. Distribution Still Costs More Than Building  AI agents can reduce building cost, but large companies still have advantages in brand, distribution, and customer trust.  Marketplaces and discovery are controlled by algorithms that favour big players and established brands. Many excellent small products will remain invisible until their features are copied by large companies.  You May Still Require a Team Decision makers in larger organisations are unlikely to trust a one-person shop for core systems, regardless of how cheaply that person can code. Coding time is often only a small fraction of the total time spent on a complex enterprise project where you may still require a team of human engineers. Yes, smaller than even a year ago, but still a team. Before building the app from the ground up, a team would set up continuous integration and continuous delivery, define and implement data access patterns and the core services. After that, the team would usually build a backend, dashboards and graphs for users. Near the end, the team would ideally add automated unit tests, automated integration tests and automated end to end tests to make the product fairly solid. The release of such a product, depending on the complexity, may happen even about a month after the work started. This description above only covers direct labour. Every additional person on the project adds coordination overhead that includes daily or regular standups, ticket management, code reviews, handoffs between frontend and backend contributors and waiting on other team members to unblock your work. Senior Engineering Effort is Still Required to Solve Complex Production Issues There is currently enormous value in having a human supervise the agent and check its work. AI makes it very easy to create huge volumes of code, but this can be dangerous because code for critical systems is considered a liability, and less code is usually better. When AI writes code, it is tempting to accept it quickly, but this can damage long-term quality and maintainability. With a senior engineer in the loop, AI agents can create very high-quality software very quickly. AI is best used as a partner, not as an autonomous agent. Senior engineers still need to design, review, and direct the work. Using AI for code generation requires the same effort in design, coding, and review, except the code under review is not their own. Many engineers use AI primarily for exploring codebases and libraries, generating first drafts of code or tests, refactoring proposals, explaining errors, and drafting documentation. They may ask AI to implement something and use the AI output only as motivation. Experienced software engineers never just copy AI-generated code as is, and always review and adjust it. They may ask the model to write some code, then ask it to list the top ten problems, and then ask it to fix the most important ones. AI agents are very good at checking code and then critiquing or fixing it, so engineers actively use this workflow. To get good results from AI, you need to learn how to ask good prompts, plan the work, and supervise what the model does. Experienced software engineers often start with a planning step before any code is written. They ask the agent to propose a high level plan so that the person and the model are aligned. A simple CLAUDE.md or instruction file can teach Claude Code how a specific project works and help it stop repeating the same mistakes over time.  Agents are claimed to be strong at writing large volumes of unit or integration tests quickly. However, there may be challenges with the quality of these tests. They can create a false sense of security. While a human writes a test based on requirements, the AI may write a test based only on the code it sees. If the original code is wrong, the test will also use the wrong logic to verify it - the test passes, but the bug remains. That's why AI agents work best when you already have good automated tests, because those tests keep the model in check and make its output more trustworthy. AI can replicate the behavior of people who put together StackOverflow snippets without fully understanding them. With AI, average software engineers lose the educational value of spending hours reading documentation.   The educational loss is a reason why AI is more useful for senior software engineers than for junior engineers.  Will AI Replace Software Engineers? Current models may be pretty bad at programming anything non-trivial and  "fix" issues by removing functionality. AI coding assistants may hallucinate functions or APIs that do not exist, use deprecated interfaces, reintroduce bugs during refactors, and remove tests silently. They may "fix" failing tests by changing the test instead of the code. However, many human programmers also do not manage complexity well, and for senior engineers there is no fundamental difference between improving large language models and training junior developers. Moreover, AI agents and models are still improving at a rapid pace. Existing benchmarks do not really capture how fast these agents and models are improving. Newer models will soon make the current ones look completely obsolete. Model providers are now actively filtering code training data for quality, so the new generation will use higher quality code than older models.  Even without AI, developers were always at risk of field disruption from new technologies. There will still be strong demand for engineers who can supervise AI, manage complexity, and align software systems with business goals. Knowing the business and industry specifics gives senior programmers an advantage over others in the era of AI-assisted coding. When you add business domain understanding on top of technical and agent skills (which architectural decisions fit a project, which framework to use, and which libraries perform best), it feels as if the 10X engineer has arrived. Such engineers move toward full stack and product engineering or roles with more business responsibility, propose solutions rather than just implement specifications, and engage with customers, sales, and marketing. Modern developers use AI to learn faster about business processes, regulations, and industry history (verifying information). AI is a partner that amplifies us. We plus AI are much more valuable than we or AI alone.
Alexander Kom • 10 min read
Top-10 Conversation Intelligence Software Products
Top-10 Conversation Intelligence Software Products
Below is a list of the top companies for conversation intelligence and keyword spotting across calls. 1. Belitsoft Conversation Intelligence Belitsoft Conversation Intelligence is a speech analytics system to assess communication quality in phone calls and dialogues. It can be deployed as a cloud SaaS product or on-premises within a company's network. The system focuses on accurate speech recognition with accuracy of up to 95% on stereo recordings. It also provides in-depth analysis of each conversation, covering more than 50 parameters, including diction, speaking rate, pauses, emotional tone, script compliance, and call outcome. Each call generates a transcript. The software automatically recognizes who is speaking and then scores how well they follow the sales process using checklists. Each checklist step is scored 0 (missed it), 0.5 (did it but not well), or 1 (did it successfully). It marks important things that happen during the call. For example, when a customer objects or mentions a competitor. It also looks for signals that indicate buying interest. You can customize checklists to match your sales process. The system connects to your existing CRM and phone system. It shows dashboards for managers, sales people, and trainers to see how their team is doing. The software can also send automated coaching advice to agents through messaging apps such as Telegram. Companies can start using Belitsoft Conversation Intelligence with AI services like ASR and language models (GPT, Gemini) from outside. Later, they can switch to comparable open-source models hosted on their own on-premise GPU infrastructure. This keeps the recognition and analytics work exactly how they want it. It also makes sure that regulations and internal security standards are followed. Deployment and data control Belitsoft Conversation Intelligence is available for on premise deployment. This is important for customers that prioritize data security and strict compliance with local regulations. Global market leaders can also be deployed on-premise, but their projects are complex and expensive, with long implementation timelines and enterprise level licenses. Belitsoft Conversation Intelligence is ideal for faster on-premise rollout with shorter deployment times and individual pricing that is acceptable for mid-sized companies. It fits organizations where a pure cloud setup is not allowed or where a hybrid model is required, with some services kept locally and others in the cloud. This is relevant for banking and government in countries where the use of foreign cloud providers is restricted. Price and scale Belitsoft Conversation Intelligence costs less than many bigger competitors. Prices start around $420 a month for 10,000 minutes. That makes conversation analytics affordable for most mid-sized companies. The focus segment is mid-sized and growing companies, where Belitsoft Conversation Intelligence can compete by delivering about 80 to 90% of the functionality of market leaders at a lower cost. A regional bank or a network of clinics with a call center of about fifty agents can cover its main needs such as transcription, script based search, emotion detection and quality scoring with the standard product and pay significantly less than with the largest providers in this market. Functionality and focus Usually, businesses assume they must buy very expensive software from industry giants to get deep insights. Belitsoft Conversation Intelligence is designed to match top analytics systems. Despite being a smaller or more custom provider, its technical capabilities (AI analysis of phone calls) are equal to those of the famous market leaders. It tracks more than 50 indicators that cover speech technique (Is the agent talking too much and not letting the customer speak? Are there awkward pauses suggesting the agent doesn't know the answer? Is the agent cutting the customer off?), emotions (Is the customer frustrated, angry, or happy? Does the agent sound supportive? Identifying when a conversation turns hostile), script adherence (Did they say the mandatory legal disclaimer? Did they ask for the sale? Did they handle objections correctly? Did they use the standard brand greeting?) and the call outcome (Was a sale made? Was a follow-up booked? Was the technical issue solved?), which is comparable to what large platforms monitor. One of the differences is the focus on the mechanics of speech. It detects things like filler words (um, ah, like), how agents fill awkward silences, and whether their grammar/language is accurate. This is specifically designed for training and coaching. It helps managers correct bad habits in individual agents to make them sound more professional. Global solutions focus more on sentiment analysis, net promoter score, churn drivers and related metrics to improve customer experience. However, both groups of products address similar areas such as service quality, customer satisfaction and compliance. Belitsoft Conversation Intelligence can stand out through flexible configuration for a specific business. Large products also support configuration, but this often needs vendor or integrator teams. Moreover, if you ask a giant software company to build a brand new feature just for your specific company, they will likely say no. Their product is standardized for thousands of users. As an engineering focused company, Belitsoft can offer a more personalized setup. If you need a metric that doesn't exist yet, they will write the code to create it for you. For example, if you are a niche medical supply company, your own implementation of Belitsoft Conversation Intelligence will know your specific medical jargon. Integrations and implementation Check out the case study on implementing Belitsoft Conversation Intelligence for a client in the automotive industry. Large platforms usually offer analytics as an element inside a suite that covers call recording, routing, workforce management and other functions. Belitsoft Conversation Intelligence is a specialized analytics service. It connects to existing telephony or customer relationship management systems through API. For many customers this independence is an advantage. They can keep their current telephony stack and add Belitsoft Conversation Intelligence on top as an analytics module without changing the rest of the system. Competing vendors often sell a full package that includes telephony, virtual agents and analytics together. Not every company wants or is able to replace all components at once. Belitsoft Conversation Intelligence is attractive for customers that already have a stable call infrastructure but lack analytics. They can buy only the analytics module. This is relevant for outsourced call centers that operate many projects and do not want to redesign their platform around a single vendor. They need a flexible call analytics tool that works with different systems. Belitsoft Conversation Intelligence is most beneficial for mid-sized and upper-mid-sized customers for whom leader products are either too complex or too expensive. Typical examples are regional banks and insurance companies, networks of medical clinics, retailers with call centers from 50 to 200 agents, online businesses that sell actively by phone and public sector entities that must store data locally. These customers value fast payback and configuration flexibility. Belitsoft Conversation Intelligence can be deployed for them within a few weeks, while large corporate solutions are usually implemented over several months. 2. CallMiner (Eureka) CallMiner Eureka is not for small businesses. It is designed for large enterprises and large-scale quality control. The software is built to process millions of hours of calls and terabytes of text from giant contact centers of banks, hospitals, or telecom giants (AT&T, Verizon, UnitedHealthcare). CallMiner analyzes every way a customer interacts with a company. A customer can complain on X/Twitter, then email support, and finally call the hotline. It ingests audio (calls), text (email/chat), and social media. This gives the company a unified view of the customer journey. The AI goes beyond just transcribing words and looks for intent (why are they calling?), emotion/tone (how are they feeling?), and signs of fraud (theft patterns or suspicious behavior). The system automatically deletes credit card numbers or social security numbers from the recording so the company does not violate privacy laws (GDPR, PCI, HIPAA). It does not just analyze the call after it is done but assists agents during calls. If a customer says "I want to close my account", the software can display a retention offer on the agent's screen. CallMiner analyzes interactions to help you understand why performance has dropped (for example, thousands of customers complained about the new billing interface update). 3. NICE (Enlighten AI, Nexidia) NICE Enlighten AI, previously Nexidia, is a large-scale contact center platform with a comprehensive set of interaction analytics tools. It extracts insights from all communication channels and provides a view of customer satisfaction, complaints, sales effectiveness and other key performance indicators. The company is known for phonetic indexing and strong artificial intelligence that combine real time and historical analytics for high-volume tasks that require high accuracy and performance. Agentic AI features allow the system to make decisions during a call. It can check script and policy compliance in the moment or automatically redirect a call when customer sentiment drops. The platform scales to very large datasets, supports many languages and has a version hosted inside the European Union to support General Data Protection Regulation requirements. Typical customers are large enterprise contact centers in banking, airlines, retail and other sectors that need speech analytics together with a full workforce optimization and workforce engagement management stack that covers recording, scheduling and training with integrated analytics. These companies expect deep reporting configuration and strong guarantees of regulatory compliance. 4. Verint Speech Analytics Verint is historically known for call recording and workforce engagement systems. It has been the industry standard for recording calls and managing employee schedules (Workforce Management/WFM) for decades. Companies that already use Verint to schedule their 5,000 agents find it easy to simply switch on the analytics module rather than buy a new tool from a different vendor. The system acts like an automated auditor. If a debt collector threatens a customer (illegal) or a banker promises a specific return on investment (illegal), Verint flags it immediately. For banks and insurance companies, a bad customer experience is annoying, but a regulatory violation can cost millions in fines. Verint prioritizes preventing fines. Verint offers an all-in-one suite. Analytics module detects that Agent Steve is struggling with closing sales. WFM module automatically schedules a training session for Steve during his next low-volume shift. Most competitors focus only on analytics, whereas Verint connects analytics directly to HR and scheduling tools. Financial services, insurance, healthcare, and government are industries that are terrified of data leaks and instability. Because Verint has a long track record and specializes in data security, IT directors at major banks trust it. They view it as a low-risk purchase because the company is a large, publicly traded entity with a reputation for reliability. 5. Observe.AI Observe.AI is a relatively new vendor founded in 2017 that focuses on AI support for quality control in contact centers. It's a platform for automating quality assurance and agent coaching. The system automatically transcribes calls, scores them against predefined criteria and helps scale quality checks that were previously done manually. Its strengths include sentiment analysis, keyword and phrase search, intelligent scoring that discover the most important deviations and built-in workflows for reviewing mistakes and training agents. Observe.AI integrates with popular CRM and telephony systems, which simplifies deployment, and is often chosen by contact centers that want to improve traditional call listening and training. Typical customers are mid-sized and large commercial contact centers in e-commerce, technical support and retail that want to improve service quality and agent productivity with a cloud-based AI product. These companies value fast software as a service rollout and ready-made metrics for call evaluation. 6. Balto Balto is a virtual prompter. As soon as the customer complains about the price, Balto's AI hears it and immediately pops up a script on the agent's screen with the correct discount offer. It fixes the mistake while the call is still happening, potentially saving the sale. If other solutions detect compliance violations after they happen (so you can punish the agent), Balto tries to prevent them. It tracks a checklist and if the agent is about to hang up but hasn't read the mandatory legal rights script, Balto will flash an alert before the call ends. Balto is usually an overlay. You don't replace your current phone system (like RingCentral, Five9, or Zoom) but simply plug Balto into it. Typical customers are companies that need to improve conversations while they are happening. This includes mid-sized sales and contact centers where every call has high value, such as outbound sales teams, debt collection departments and service focused call centers with strong conversion targets. Balto is often chosen by teams that want to quickly raise the performance of new or weaker agents through on screen guidance and reduce errors during the call. 7. Gong.io Gong focuses on B2B sales and client meetings. The platform automatically records sales calls and video conferences, transcribes them, and analyzes how each deal is progressing. Gong detects mentions of competitors and customer objections, compares how top performers speak versus weaker performers, and estimates the probability of closing a deal based on the content of the conversations. It gives the sales team visibility into negotiations, such as which topics come up, where customer interest drops, and whether the sales method is followed. Key features include insight discovery to grow revenue, deal forecasting based on conversations, and speech pattern analysis such as the talk time ratio between manager and client and the use of required phrases. Typical customers are commercial teams in technology, software as a service, and financial services, from startups with small sales teams to large corporations that want to scale their best sales practices. Gong has become a standard for companies that manage sales quality using data from conversations. In this area, Belitsoft Conversation Intelligence competes with Gong in the analysis of sales communications. 8. Genesys Cloud CX (Speech & Text Analytics) Genesys Speech Analytics is the native analytics module from Genesys, a major contact center vendor, designed mainly for users of the Genesys Cloud platform. Genesys provides the core call center infrastructure such as automatic call distribution and interactive voice response. Within this stack, Speech Analytics lets customers automatically analyze all calls and written interactions, perform speech recognition, run topic analysis on customer requests, assess sentiment and measure agent empathy. Because it is built into Genesys Cloud, integration is seamless and analytics data is immediately available for supervisors inside the same interface. The module tracks the share of talk time between customer and agent, key topics in the dialogue and script adherence. A recent addition is Agent Empathy Analysis, which scores empathy based on vocabulary and tone. Genesys also offers real time functions such as Agent Assist, also called Copilot, that suggests responses and knowledge during the call and alerts supervisors to issues such as dropping customer satisfaction. Typical customers are large contact centers that already run on Genesys Cloud and prefer an integrated solution from one vendor. These are usually companies with tens or hundreds of agents, such as major retailers, telecommunications providers and banks that use Genesys for routing and extend their setup with the built-in analytics module. 9. Uniphore Uniphore offers a conversational AI platform for enterprises using generative AI and multimodal analysis to extract insights from conversations. Their Conversation Insights Agent analyzes all interactions and provides answers to business questions such as "What are the top three reasons for customer dissatisfaction this week?". Uniphore has a proprietary AI infrastructure. Its Business AI Cloud includes in-house LLMs and Emotion AI engines, avoiding reliance on external services. The platform tracks emotions and tone during conversations, giving real-time feedback (for example, suggesting that an agent show empathy when detecting customer irritation). Additional modules include U-Assist for real-time guidance and post-call automation (such as ticket creation and conversation summaries) and U-Analyze for reviewing all recordings with quality, compliance, and CX scoring. Typical clients are large enterprises and global corporations ready to adopt AI technologies to enhance customer service. Uniphore is used in banking, telecommunications, and aviation, where companies look to apply technologies such as voice biometrics, facial expression analysis (computer vision) during video calls, and automatic CRM population based on conversation data. 10. Cresta Cresta is a conversation intelligence platform that combines AI-based agent coaching with process automation. It not only analyzes conversations. The Cresta Omnichannel AI Agent module provides a report on the performance of both human and virtual agents. Cresta focuses on low latency and fast guidance. It uses ultra fast transcription to show prompts during conversations and can be configured for specific business needs without programming skills. Its models are trained on the company data with full transparency for the customer. The goal is to move from simple reporting on what was good or bad in a call to direct intervention during the interaction and improvement of the outcome with artificial intelligence. Typical customers are large technology driven companies and contact centers that want joint work between AI and human staff. Cresta is often deployed where there is already a high volume of interactions and established processes and the management team wants to grow revenue and service quality through AI-based coaching, for example in large e-commerce businesses, software as a service vendors with big phone sales teams and premium customer service operations.
Alexander Suhov • 10 min read
Amazon Cut Tens of Thousands corporate Jobs to Invest more in AI Automation
Amazon Cut Tens of Thousands Jobs to Invest in AI Automation
Amazon staff reduction due to AI automation Andy Jassy, Chief Executive Officer, says the reason is to reduce the "excess of bureaucracy". The vision is to operate like the world’s biggest startup and to make the company leaner, with fewer layers and more ownership, so Amazon can move more quickly. However, the key reason may be the increased use of AI that cut jobs by automating repetitive and routine tasks. It seems that AI-driven productivity gains within corporate teams were sufficient for a substantial reduction in force. Amazon has long-term investments in building out its AI infrastructure and in the short term must offset costs. The company is expected to spend $118 billion in capital expenditures for the year, with much of it going towards building AI and cloud infrastructure. Beth Galetti, Senior Vice President of People Experience and Technology, says the reduction is necessary because this generation of artificial intelligence is the most transformative technology since the Internet, and it accelerates the pace of innovation across existing and new market segments.   Amazon had more than 1,000 generative artificial intelligence services and applications in progress or built, but that figure was a "small fraction" of what it plans to build. Amazon shares rose 1.2 percent to $226.97 on Monday following the report. The company appears to be expecting another big holiday selling season and plans to offer 250,000 seasonal jobs to help staff warehouses, among other needs, which is the same seasonal hiring level as in the prior two years. This contrasts large seasonal warehouse hiring with corporate reductions. Amazon Web Services Amazon Web Services, the company’s cloud computing unit, is affected among others. AWS reported second-quarter sales of $30.9 billion (17.5 percent increase year over year), and that growth was well below gains recorded for Microsoft Azure (39 percent and for Alphabet’s Google Cloud at 32 percent in the same period). This competitive pressure may be driving Amazon to restructure AWS. The division has been making headlines recently for a fifteen-hour internet outage last week that disrupted many widely used online services. Amazon Robots Amazon executives believe the company is on the cusp of a major workplace shift that will replace more than 500,000 jobs with robots. Robotic automation could allow the company to avoid hiring more than 600,000 people by 2033, even while selling twice as many products. Amazon’s automation team expects the company can avoid hiring more than 160,000 people in the United States by 2027. To mitigate fallout in communities that may lose jobs, Amazon policy avoids using words such as "automation" and "artificial intelligence" when discussing robotics and instead substitutes phrases such as "advanced technology" or the word "cobot" to imply collaboration with humans. Daron Acemoglu, a professor at the Massachusetts Institute of Technology who studies automation and who won the Nobel Prize in Economic Sciences last year, said that once companies work out how to automate profitably, the practice will spread to other firms.
Dmitry Baraishuk • 2 min read
.NET Machine Learning & AI Integration
.NET Machine Learning & AI Integration
Benefits of using AI with .NET Access to Large & Small Language Models Large and small language models from OpenAI, Mistral, Cohere, and Meta are available through Azure, GitHub Models, or Hugging Face and can be invoked directly from .NET code or via official SDKs. Native Vector Databases for High-Dimensional Search Vector databases such as Milvus, Qdrant, and Azure AI Search store and query embeddings so high-dimensional data can be searched efficiently at production scale. Rich .NET AI Libraries & SDKs Libraries - including Semantic Kernel, Azure AI Foundry, Azure AI Inference SDK, ML.NET, and Microsoft.Extensions.AI - provide components for prompt handling, model orchestration, and streaming responses. Generative & Analytical Use Cases Developers can build chat interfaces, summarize large text collections, generate text, code, images, or audio, and run semantic search or analytics over document repositories. Multimodal Vision, Speech & Workflow Automation The same approach extends to computer vision pipelines that detect objects in images or video, speech synthesis services that produce natural voices, classification systems that label incoming issues, and workflow automation that triggers downstream tasks. Enterprise-Grade Deployment on Azure Azure supports enterprise deployment with identity integration, private networking, role-based access control, audit logging, and other compliance mechanisms, enabling applications to run at global scale while meeting security and privacy requirements. .NET AI Stack The .NET ecosystem for AI can be understood through four informal categories that clarify when to choose a particular library or approach. Microsoft AI Extensions Much like other Microsoft.Extensions packages, they expose common interfaces such as IChatClient, so developers can swap providers - like replacing an OpenAI back end with a local Ollama instance - without changing application code. The central package, Microsoft.Extensions.AI.Abstractions, defines the shared types for chat, embeddings, and function calls, while concrete packages such as Microsoft.Extensions.AI.OpenAI or Microsoft.Extensions.AI.Ollama supply the actual implementations. Helpers like AsChatClient adapt service clients to the interface. Future releases may implement it directly, making the helper unnecessary. Orchestration frameworks to coordinate multiple models, agents, or data sources In .NET, the main choices are Semantic Kernel and AutoGen. Semantic Kernel has connectors for many systems. AutoGen focuses on multi-agent workflows. They are useful when an application needs complex prompt routing, chaining of calls, or integration of several AI services, and they can be combined with or substituted for the Microsoft AI Extensions. Azure AI Services official SDKs for each Azure offering The most widely used is Azure.AI.OpenAI, which adds Azure-specific features and identity-based authentication on top of the standard OpenAI client. Other libraries target vision, speech, translation, search, and more. These purpose-built services are typically cheaper, more capable in their niche, and expose richer, task-specific APIs than relying on GPT models for everything. Direct use of Azure OpenAI, rather than via the abstraction layer, also provides features that Extensions do not yet surface, such as image or audio generation. Third-party and self-hosted options .NET can call any REST-based AI service - the OpenAI client for non-Azure endpoints, Amazon Bedrock, the OlSharp package for a local Ollama server, and vector database libraries such as Qdrant. Connectors in Semantic Kernel and similar frameworks further simplify using these external or on-premises resources, proving that effective AI development in .NET is not limited to Azure. .NET AI Example Projects The barriers to experimenting with generative AI in enterprise .NET stacks have dropped. A small proof-of-concept can be online within a day, letting you validate both impact and risks before making larger investments. This video demonstrates the demo that runs the language model locally in Docker using a tool called Ollama. Running on-premises lowers cloud costs and, more importantly, ensures that no customer data leaves your network - an immediate benefit for privacy, compliance, and latency. In another example, using the Semantic Kernel library, the presenters demonstrated how they started from an empty console program and, in minutes, turned it into a conversational application that could draw on LLM models.   Coding the feature is quick. What takes time is evaluation, prompt refinement, and building dashboards that grade usefulness, accuracy, and cost on every answer. The company also could need model fine-tuning or complex retraining. How to build Generative AI applications in .NET .NET developers can build AI apps fully in .NET now. If they already use C#, they do not have to jump to Python or JavaScript to use modern AI. At the Build conference a few weeks ago, Microsoft showed hands-on labs and live demos proving this point.  Almost everything was recorded and posted online for free. Microsoft’s GitHub holds small, copy-and-paste-ready projects (search, chatbots, agents, etc.), plus one-command Azure deploy scripts.  Here is one more starter kit for adding artificial intelligence features to .NET software. Everything is open source and licensed under the MIT license. The samples show how to plug services such as Azure OpenAI, the public OpenAI API, or locally hosted models into any .NET application.  Microsoft.Extensions.AI gives developers the interface for all major AI providers. That design means your teams can experiment, switch vendors, or run models on-premises without rewriting code, and it keeps individual features neatly componentized and easy to test. A companion library - Microsoft.Extensions.AI.Evaluation - lets teams measure how well a large language model answer meets business quality standards, so you can track accuracy and risk before deploying new AI features. Ready-made quickstarts cover common use cases such as summarizing text, building a chatbot, or calling custom business functions from natural language, and a recorded Build 2024 session walks through the whole process step by step. Chat with your Data "Chat with your Data" is a showcase built in Microsoft’s .NET ecosystem that lets employees question their own documents and receive precise, well-sourced answers instead of generic chatbot replies. The solution pairs OpenAI’s latest large language model with a search index in Azure, so each answer is grounded in your organization’s content rather than the public internet. The system comes with two lightweight web apps.  The first is the chat interface your staff will use every day. The second is a document manager that business teams can open to upload or revise source files. Behind the scenes, a dashboard called Aspire tracks every running microservice, while Azure Application Insights maps calls between components and flags performance issues. All cloud resources are in a single Azure resource group, making them easy to locate, govern, and, if necessary, de-provision in one step. A prepared script signs in to Azure, spins up the required services, and returns two URLs: one for chat and one for document management. You choose the subscription and region up front - everything else is automated. The package can be pointed at your existing Azure estate if you prefer to reuse models or search instances already in place. Uploaded documents are automatically broken into small fragments, embedded as vectors, and stored in Azure AI Search. When a user asks a question, the system looks up the most relevant fragments, feeds them into OpenAI’s GPT-4o model, and delivers a response that cites the source material. This "retrieval-augmented" flow improves accuracy and reduces the risk of hallucination. The Aspire dashboard gives a snapshot of health and throughput, and Application Insights captures request rates, latencies, and failures for deeper analysis. Together, they offer the end-to-end monitoring enterprise support teams expect. OpenAI usage is metered by tokens, Search and Application Insights follow pay-as-you-go rules, and Container Apps remain on a low-cost consumption tier by default. Because everything is contained in one resource group, a single delete action - or the supplied "azd down" command - will shut the deployment off and stop charges. Security is handled through Azure Managed Identity wherever possible, avoiding hard-coded keys. A GitHub Action scans the infrastructure scripts for misconfigurations, and secret scanning is recommended for any downstream forks. If needed, you can also place the container apps behind an internal firewall or virtual network. For leaders evaluating generative AI pilots, this reference implementation offers a clear view of the architecture, operating model, and cost profile required to make private-data chat a reality, while letting technical teams dig into the code, infrastructure, and observability tooling at their own pace. eShopLite Since February 2025, the .NET advocacy group has maintained eShopLite as an e-commerce codebase that demonstrates all currently relevant generative AI patterns.  The repository now contains six fully working variants: vector search on ChromaDB, Azure AI Search, real-time audio inference with DeepSeek R1, agent orchestration over the Model Context Protocol, a pure multi-agent example, and a SQL Server implementation hosted with .NET Aspire.  Every variant includes complete source, infrastructure code, service-graph metadata, and OpenTelemetry tracing. A clone followed by "azd up" deploys the whole stack to Azure, giving teams an out-of-the-box reference they can copy into their own pipelines. All agent interactions use MCP, the standard published by Anthropic in late 2024. Visual Studio Code and a first-party C# SDK allow engineers to host or consume MCP services without custom code, while Azure AI Foundry extends the same protocol to cross-vendor agent workers. A dotnet new ai-chat template adds a Blazor front end, Microsoft.Extensions.AI back end, and Aspire health checks in minutes, and can target GitHub Models, Azure OpenAI, plain OpenAI, or a local Ollama endpoint without code changes. Local execution is handled through two routes.  Docker Desktop’s Model Runner now hosts GGUF or Ollama models on Windows and macOS, keeping development and production identical when containers move to the cloud.  The VS Code AI Toolkit can also download and expose a model locally with the same client, so the application code stays unchanged whether it calls GPT-4o in Azure or a laptop-hosted model. The result is a repeatable, standards-aligned path from prototype to production that lets teams decide at any point whether to run models locally or in Azure, while keeping security, tracing, and compliance artifacts in place. .NET AI and Machine Learning Case Studies Below are several examples across different domains, demonstrating tangible outcomes from combining .NET and AI. Image Recognition and Anomaly Detection Scancam Industries is a small Australian security and investigations firm that focuses on end-to-end anti-fuel-theft systems for service station operators.  To address the problem, Scancam equips pumps with cameras whose motion sensors raise events whenever a vehicle arrives. An ASP.NET Core endpoint running in Docker receives each event. ML.NET models running in the same process first confirm vehicle presence and then locate any visible license plate region. A specialized recognition engine reads the characters. An Azure Function completes the cloud pipeline by checking the plate against a database of outstanding debts and broadcasting results to iPad and TV displays via SignalR. The attendant’s iPad application, shows every detected plate at each pump and flags known offenders so staff can require prepayment or withhold activation of the nozzle.  Scancam adopted ML.NET after exporting its Custom Vision object detection model to ONNX, replacing a separate Python container and unifying all machine learning code with the existing .NET codebase. ML.NET models are also being deployed for anomaly detection to spot spikes from misconfigured motion zones and dips caused by blocked or misaligned cameras, giving the team proactive insight into hundreds of installations.  Industrial IoT process monitoring and predictive analytics Evolution Software Design, Inc., a small United States consulting firm has extended its work into hazelnut processing by collaborating with several processors to improve nut quality from harvest through distribution.  In commercial practice, hazelnuts must leave the dryer with 8.5 to 11.5 percent moisture: under-drying leads to mould and spoilage, over-drying causes excessive shrinkage.  Evolution Software addressed these issues with the Hazelnut Monitor application.  During the day, sensors send temperature and humidity numbers to the server, while workers type in the missing facts: when the batch started and finished, its weight, the nut variety, which dryer was used, where the dryer sits and an occasional hand-held moisture reading. At night the system converts every typed word into simple 0-or-1 columns, lines these columns up with the day’s sensor numbers, and feeds the whole table into a learning routine that figures out how the numbers map to the true moisture. The freshly trained model is saved as a small zip file in cloud storage. When the web app starts each morning it loads that zip into memory so every new sensor reading immediately gets a moisture prediction. This cycle - collect, label, retrain, reload - runs every 24 hours. A web portal built with .NET Core and Aurelia presents real-time predictions alongside raw measurements, and a rules engine hosted in Azure Functions triggers SMS and email notifications when target moisture is reached, when temperatures drift or when sensors fail. Operators can therefore monitor dryers from mobile devices, reduce physical sampling and make timely adjustments. The model’s job is to guess the nut moisture percentage in real time so workers don’t have to keep pulling hot samples. A few handheld moisture samples are still needed, but far fewer than before, and they are samples the crew already had to take for quality control anyway. Email Classification Software SigParser is a United States software company with fewer than one hundred employees. Its API and service automate the labor-intensive and often expensive job of adding to and maintaining customer relationship management systems by extracting names, email addresses, and phone numbers from email signatures and loading that information into CRMs or other databases A significant operational issue is that many messages entering a customer’s mailbox are automated items such as newsletters, payment notifications, and password reset messages. If these non-human messages were treated as real correspondence, their senders would pollute CRM contact lists. To prevent this, SigParser built a machine learning model that predicts whether a message is "spammy looking," meaning it originates from an automated source. For example, a notification from a forum’s noreply address is flagged as spammy, so the sender is excluded from the contact database. Chief executive officer Paul Mendoza moved all training and inference to ML.NET, where models can be trained and tested directly in the production codebase. After adopting ML.NET it now operates six models covering different aspects of email parsing. The team labeled several thousand of their own messages, classifying each as human or non-human while remaining compliant with the General Data Protection Regulation. The resulting dataset feeds a binary classification pipeline that uses two features: a boolean flag indicating whether the body contains "unsubscribe" or "opt out," and a cleaned HTML body string that is language agnostic and stripped of personally identifiable information. These features are supplied to a decision tree algorithm. The data is split twenty percent for testing, and the trained model is saved as zip. The classifier runs in production against millions of emails each month. Its predictions prevent non-human senders from entering customer contact lists and allow SigParser to export accurate contact data automatically, eliminating manual entry errors and delays.  AI-Based Customer Support Visma Spcs, a Nordic software provider of accounting, HR, payroll, and related services, serves several hundred thousand customers. The company had integrated an "AI Assistant" based on Microsoft Semantic Kernel to improve customer support.  Customer research had shown that users needed to locate information quickly, ask questions and receive correct answers, and obtain links to the relevant documents.  To meet those needs, Visma Spcs implemented a retrieval augmented generation pipeline that queries existing product documentation through Azure AI Search and uses GPT-4 hosted on Azure OpenAI for response generation, with Semantic Kernel handling orchestration inside the company’s .NET stack. Semantic Kernel was selected because it aligns with the organization’s predominant use of .NET, offers built-in orchestration, agent and automation capabilities required for deep product integration, provides abstraction that allows model substitution as new or specialist LLMs emerge.  The AI Assistant has been deployed to the full customer base. Several percent of daily active users engage with the chat each day, many repeatedly, and roughly 40 percent of all messages arrive outside normal support hours. Internal telemetry shows that about 90 percent of requests receive a satisfactory answer, latency to first reply is consistently a few seconds with variations tied mainly to global API load, and usage levels and quality metrics are trending upward. The assistant also supports newly hired customer success staff when they handle unfamiliar questions. Health-Related and Clinical Text Classification Hunter Medical Research Institute (HMRI) is a large Australian healthcare sector organisation . HMRI created a Human-in-the-Loop (HITL) machine learning development framework for clinical research. The framework, built entirely with ML.NET, Visual Studio, SQL Server, ASP.NET Core, and Office 365, enables clinicians to label data, train models, and perform inference without prior programming or machine learning expertise.  Its first use case focused on classifying causes of mortality and hospitalisation arising from extreme heat, and the approach is documented in the publication "A method for rapid machine learning development for data mining with Doctor-in-the-Loop." The project addressed the persistent difficulty healthcare institutions face in extracting insights from large volumes of mostly unstructured text despite digitisation. Traditional solutions such as regular expressions, SQL queries, and general purpose NLP tools provided only limited value, while conventional machine learning workflows demanded skills beyond most medical professionals and often produced models that, when left unsupervised, performed poorly in operational settings. HMRI therefore required a system that could incorporate clinicians’ domain expertise directly into the modeling process and yield high-quality results from comparatively small annotated datasets. ML.NET was selected because it allowed the team to remain entirely within the existing .NET ecosystem, avoiding the technical overhead of integrating non-.NET components and enabling staff to apply their existing knowledge. Using Model Builder, researchers rapidly confirmed that machine learning could solve their classification problem, after which the ML.NET AutoML API automated algorithm selection, pipeline construction, and hyperparameter tuning inside the custom HITL framework.  Clinicians interact with the framework through a web application backed by SQL Server. Initial datasets comprised a 40-year mortality database of roughly 30,000 records and an aeromedical retrieval set of around 13,000 records. Experts first label a test set through the web interface, then trigger server-side ML.NET code to train a model on a small randomly chosen subset. The model assigns predictions and confidence scores to the remaining records, and SQL Server stored procedures immediately compute recall, specificity, and other accuracy metrics against the labeled test set. Results appear in seconds, enabling clinicians to identify additional cases for labeling by reviewing both low confidence predictions and high confidence errors, an active learning strategy that accelerates performance gains. The resulting models achieved mid- to high-ninety percent accuracy, giving researchers confidence to apply the automated categorizations in ongoing studies. Because training completes quickly and the feedback loop is easy to repeat, the same workflow can be applied efficiently and cost effectively to new classification tasks or additional datasets without compromising accuracy.  Automated Classification of Survey Responses Brenmor Technologies is a small U.S. healthcare sector firm that supplies patient satisfaction solutions to medical groups and health plan providers. Its core service is a customizable survey system designed to deliver statistically reliable insight into the strengths and weaknesses of clinical encounters. Each survey includes at least one free-text question. Historically, Brenmor staff spent about 16 hours every month manually classifying these comments into topical categories - such as Provider, Staff, Appointments, and Telephone - and assessing sentiment so relevant teams could plan quality improvement measures. The manual workflow was slow, subject to inconsistency, limited to monthly cycles, and produced no confidence scores. To eliminate these constraints, Brenmor replaced manual labeling with a multiclass text classification model built in ML.NET, Microsoft’s machine learning framework for .NET developers. According to Brenmor’s CTO, successive ML.NET releases have increased both classification speed and accuracy. The initial training set comprised roughly 3,000 HIPAA-cleansed survey comments.  In operation, the application now classifies responses in real time. Low-confidence predictions are reviewed by staff and added to the training set, and new model versions are stored in source control and deployed automatically through Azure DevOps. Classification accuracy is about 76 percent, and every prediction carries a confidence score. Clients receive topic-segmented feedback immediately and can allocate issues to the appropriate clinical or administrative teams without delay. Developers no longer spend time experimenting with algorithms - instead, they focus on curating higher-quality data.  The company concludes that automating text classification meets the healthcare market’s growing need for near real-time, high-precision analysis of patient comments. This enables medical groups and health plans to act on survey findings more quickly and reliably than before. Automated Legal Text classification Williams Mullen is a medium-sized U.S. law firm that focuses on corporate law, litigation, finance, and real estate matters. Attorneys at the firm produce large volumes of unstructured content such as Word files, PDFs, and emails. These documents are stored in a document management system that covers decades of material. Attorneys typically search this data using document metadata. However, the firm found that the metadata for roughly one-fifth of all documents - amounting to millions of files - was missing, inaccurate, or outdated. This deficiency made many documents difficult to retrieve, took up attorney time, and reduced billable work. The cost to correct the metadata manually was estimated in the hundreds of thousands of dollars. Williams Mullen adopted ML.NET.  The solution that was implemented consists of two .NET Core console applications and a database. The first application downloads about two million training documents from the document management system, prepares the data, and trains the model. The second application retrieves production data, loads the trained model, classifies each record, and writes updated metadata back to the database. By deploying this approach, Williams Mullen corrected metadata issues across millions of documents, restored the ability to search, and improved attorney productivity.  How Belitsoft Can Help Belitsoft is a .NET development company that brings together GenAI architects, machine learning engineers, and cloud DevOps experts in a single team. We transform your C# stack into an AI-driven solution that is secure, fast, and flexible enough to run on Azure, on-premises, or with any large language model provider you select.
Denis Perevalov • 14 min read
Enterprise Vibe Coding vs SaaS
Enterprise Vibe Coding vs SaaS
Internal Vibe Coding: Will Enterprises Buy Fewer SaaS Applications? Artificial intelligence coding platforms (Lovable, Bolt, Replit, Cursor, etc.) are rapidly transforming enterprise software strategy. By converting simple, natural language prompts directly into working code, these tools reduce the difference between adopting a SaaS subscription and coding your own solution to nothing. As the time, cost, and complexity of DIY development drop, the value proposition of traditional subscription-based SaaS comes into question. Companies that master AI development can control their software, reduce spending, and accelerate innovation.   Low Barriers In many cases, it is already faster to build a custom tool with AI than to learn a vendor’s complex user interface. Non-programming AI builders - often business or operations specialists - can now create working software after only weeks of training. This growing talent pool, combined with the productivity boost from AI for every developer, makes projects that once seemed uneconomical suddenly possible.  The first areas to move from buying to building are lightweight but highly customized.  These include HR and training portals, Q&A and knowledge bases, revenue operations dashboards, CPQ calculators, AI-driven healthcare tools (like AI medical coding software), and custom marketing tools.  Security team can replace a SaaS-based survey with an internally built alternative using Bolt. A revenue operations staff member can code a pricing calculator that would have once required commercial software. A recruiter can use the Lovable platform to create an interview training course. Managers can write even their own AI-based personal CRM, sidestepping the Salesforce interface completely. Incumbent vendors feel the pressure. While the core Salesforce customer record database may stay popular, the profitable add-ons built on top of it are now vulnerable. Salesforce’s response - its new Agentforce suite - shows early promise. Many smaller and mid-tier SaaS providers do not have similar options. Challenges Enterprise-level reliability, security, and ongoing maintenance do not go away just because code is machine-generated. When an AI-written application fails, it is not always clear who owns the fix.  To reduce that risk, AI-building platform providers provide standard stacks that include authentication, staging, security controls, and data access patterns. This lets companies move prototypes into production without hiring large teams of senior engineers. Bright New Future? Firms that use AI tools early will gain speed, flexibility, and cost advantages. Those that rely only on SaaS could end up paying more for less.  On the other hand, software vendors must build unique AI features or face the risk of being replaced.   Overall SaaS spending still rises. Companies keep buying cloud software, just a different mix. Heavyweight systems stay SaaS - ERP, finance, payroll, global CRM databases remain too complex or regulated to rebuild quickly - those contracts keep renewing. However, some CEOs already predict a shake-out in which only the largest or most AI-advanced SaaS vendors survive, while the rest are folded into broader hubs.   Some SaaS Was Never Hard to Build Before AI-assisted vibe coding tools appeared, most business-to-business software development was slow rather than technically difficult.  Many engineering teams spent weeks or months recreating features every SaaS product needs, such as user-permission matrices, audit logs, and email notification systems, even though thousands of developers had solved these problems before. Eighty percent of an engineer’s time went into this repetitive work, while the truly difficult challenges - understanding customers, designing the right workflow, and scaling the architecture - competed for the remaining bandwidth. Generative AI coding platforms such as Cursor, Windsurf, Loveable, and Replit change that equation. By recycling proven open-source patterns and boilerplate, they reduce build times for standard features by at least half and often by as much as five times.  A user-permission service that once took three weeks now appears in three days. An audit log drops from two weeks to half a day. Email scaffolding is ready in hours.  These tools do not yet create a fully hardened, enterprise-grade product overnight, but they eliminate the “artificial slowness” that used to dominate business-to-business development. Since well over ninety percent of SaaS functionality is routine, the impact is broad. Product teams can test several workflow designs in the time it once took to build one, refining decisions with real feedback. Engineering is no longer the bottleneck. Requests that used to trigger three-month roadmap debates now become three-week sprints, and internal panels or admin consoles are ready by Friday.  For founders and engineering leaders, the question is no longer whether AI will replace developers - it will not. The question is whether their teams will use AI to remove busywork and focus their talent on the problems that matter, such as deeply understanding users, creating scalable systems, and delivering experiences that competitors find hard to copy. Teams that adopt this new approach will reach product-market fit faster and set prices based on differentiated value. Those who do not will still be discussing three-month roadmaps while their rivals are already shipping. Vibe coding tools mark a fundamental shift, not because they solve the hardest technical problems, but because they remove the slow, repetitive ones that never gave any advantage. Companies that move now will build better products, faster. Those that delay risk watching the market move past them. More and More Enterprise Software Will Be Assembled with Vibe Coding Techniques "Vibe coding", the term computer scientist Andrej Karpathy introduced in February 2025 for using large language model tools to generate production code, is quickly rising on the CIO agenda.  Gartner estimates that by 2028, about 40 percent of all new enterprise software will be assembled with vibe coding techniques.  Yet most large organizations remain cautious. The current generation of vibe coding platforms excels at small, temporary projects. For example, a user interface prototype during a hackathon or a celebratory web page in minutes. Such experiments succeed in sandboxes, proofs of concept, and disposable utilities.  In these cases, the code does not need to be highly robust, scalable, or built to last. Those qualities are exactly what enterprises need for customer-facing or critical systems, and analysts agree the tools are not there yet. Security controls, audit trails, and large-scale deployment patterns are still being developed. CIOs say they welcome AI-driven productivity, especially during multiyear cloud and ERP migrations, but insist they will not compromise on enterprise-grade reliability. A March 2025 HackerRank survey found that more than two-thirds of engineers feel extra pressure to deliver faster since AI assistants became part of the tool chain. Gartner expects about 80 percent of developers to reskill by 2027 as generative AI changes their roles, shifting work from writing boilerplate code to reviewing, securing, and integrating AI-generated output. Analysts urge CIOs to keep vibe coding projects in controlled, well-governed environments, set clear security, compliance, and testing standards, and keep close communication with engineering teams to decide where this approach fits best.  Large language models are improving quickly, and Omdia predicts noticeable quality improvements within six to twelve months, so the readiness gap may close sooner than expected. Until then, organizations that pair strong governance with targeted pilots can gain early productivity benefits without taking on unknown risk.  Scaling Vibe Coding in Enterprise IT Enterprises are experimenting with vibe coding - the practice of using large language model tools to generate working software with minimal hand coding - because it promises rapid prototyping and shorter release cycles. Thanks to readily available LLM APIs, GitHub Copilot, and similar assistants, projects that once required a full-stack team can now be kicked off by analysts or subject matter experts.  The difficulty emerges when organizations try to scale those prototypes for everyday, nontechnical users. Choices that feel harmless in a sandbox can turn into long-term liabilities. Extending a Python Flask demo to a full web product, for instance, collides with the reality that most modern front-end tooling, hosting frameworks, and pretrained AI agents gravitate toward React and TypeScript stacks. Are you ready for this? Two distinct modes of vibe coding Senior architects use AI as a force multiplier: they prompt multiple agents, explore design forks, and evaluate trade-offs quickly. Casual enthusiasts, by contrast, may generate code on demand and ship it unchecked.  Both practices produce very different outcomes and risk profiles. Without oversight, the second mode can scatter inconsistent applications and unforeseen technical debt across the enterprise. Architecture and economics Hosted models such as OpenAI’s deliver sub-three-second responses that local open-source models struggle to match without substantial GPU investment. Firebase accelerates back-end build-out by more than 60 percent compared with Kubernetes, but its usage-based billing can become volatile once user numbers rise. Each platform or data-store decision ripples through cost forecasts, latency budgets, and support models, demanding active monitoring and a clear exit strategy. For CTOs, vibe coding shifts the bottleneck from writing code to deciding what should be built, how it should be hosted, and whether the result is operationally sustainable. Deep technical judgment becomes more valuable, because the keyboard is no longer the scarce resource - architectural clarity is.  Scaling safely requires rigorous product management: centralized governance committees, reference architectures for would-be vibe coders, scoped feature requests, scheduled technical-debt reviews, and explicit security and compliance checkpoints. Cultural enablers matter too - structured upskilling in prompt engineering, psychological safety for experimentation, and guardrails that prevent burnout amid rapid iteration. Handled well, vibe coding lets enterprises capture innovation speed without the chaos of uncontrolled cloud sprawl. Handled poorly, it simply turbocharges mistakes.
Dmitry Baraishuk • 6 min read
AI Software Development Trends in 2025
AI Software Development Trends in 2025
Launching generative-AI startup on a cloud Early-stage generative-AI startups should build on a major public-cloud platform (think AWS, Google Cloud, Azure). Cloud is already the default home for Gen-AI unicorns. Founders can ride the same stack of specialized GPUs/TPUs, AI tooling, and enterprise-grade security that today’s big players use. Massive free-credit programs lower the burn. Eligible startups can tap as much as $350 k in usage credits, giving them breathing room during the heavy-experiment phase when compute costs would otherwise spike. Out of the box you get MLOps best practices, pretrained foundation models, and a partner marketplace - so teams can move straight to product-level problems. With the hard parts handled, startups can focus on delivering real-time, AI-driven personalization across marketing, onboarding, in-product experience, and retention - capabilities that were practically impossible just a few years ago. Full-stack AI Trend Margins shift from hardware manufacturers to integrated cloud providers. AI builders rent bare-metal accelerators or, more likely, just buy a fully managed service to focus on model fine-tuning or domain expertise.  The next phase of the AI race is all about vertical integration - “full-stack AI”. The biggest firms (like Google) are building and operating the entire pipeline themselves. Hardware ownership - custom silicon (Google TPUs) and specialized, liquid-cooled data-centers tuned for giant-model training and inference. Tightly coupled software - foundation models such as Gemini that can handle huge context windows and call Google Search for facts, removing reliance on external APIs. Google is promising 10- to 100-fold drops in compute cost over the next few years, telling start-ups to expect GPU/TPU cycles that keep getting faster, cheaper, and more predictable. Shift from "one-size-fits-all" models The next wave of AI progress will be more about smarter orchestration, long-term personalization, and nimble business models that exploit the lull before the next breakthrough. Interfaces are about to flip. Voice-first interaction - and models that read your emotions - will make today’s tapping, swiping and staring at screens feel old-fashioned. Memory and search will blur together. Next-gen language models will decide on-the-fly whether to keep facts in working memory or go fetch them, squeezing more usefulness out of limited context windows through tricks like sparse attention and selective retrieval. Agents will actually do things. Early demos (like Google’s Project Mariner) show LLM-driven agents completing multi-step tasks in a live browser with real-time audio/video streams. After 2025, the killer feature becomes a durable, personal memory that still remembers your quirks a year later. “One big brain” is out. Swarms of small, specialised models - plus plug-ins such as code runners, SQL tools and calculators - are in. Hybrids like Jamba hint at cheaper, more reliable AI by routing tasks to the right miniature expert instead of a single giant model. Analysts predict 18 months of slower foundation-model breakthroughs. That gives focused startups time to grab niches, set new price baselines and build moats with high-quality domain models that beat generic LLMs on ROI. Falling compute costs could let a 100-person company plausibly reach a $100 billion valuation. From Search to Workflow Automation models Roughly 80 % of a company’s PDFs, e-mails, chats and slide decks are never re-used. Turning that text into instant, trustworthy answers is now a higher priority for executives than fully autonomous agents. Today’s systems pair a language model with a vector database so that every answer is anchored to the source paragraph, delivered in less than 1 s, and far less prone to hallucination. Source quality matters: peer-reviewed papers outrank quarterly e-mails, which outrank emoji-filled chats. “RAG 2.0” is on the horizon. The next iteration will jointly train the “retrieve” and “generate” components and insert small active agents that can ask clarifying questions (e.g., “Which quarter’s revenue?”) before composing a reply. Once all corporate knowledge is exposed through a single retrieval API, agents can begin doing things - updating CRM records, drafting contract clauses, filing support tickets. That demands fine-grained, revocable permissions and audit-ready observability. Lightweight specialist models will handle niche tasks, while explicit tool calls and state snapshots make failures traceable. Continuous “living” evaluation suites are wired in from day one, because customers expect concrete reliability metrics before any pilot goes live. 24/7 autonomous AI agents A new generation of AI agents are running continuously in the background — monitoring calendars, dashboards, sensor feeds, etc. — and only nudge you when something looks odd. Think of them as proactive co-workers rather than apps you open and close. Consumer life gets re-wired. Personal assistants morph into shopping teams: ask for “better insurance” and a swarm of agents could hunt quotes, negotiate terms, generate Web3 smart contracts, move stablecoins, then show you the deals. Point your phone at a leaking pipe and an agent walks you through the fix. Search evolves from “10 blue links” to “I bought the part, here’s the receipt, expense filed.” Big firms will adopt more cautiously. Procurement wants scheduling controls, audit trails, pause-and-review checkpoints, and clear notification rules before cutting purchase orders. To keep risk (and cost) low, the first commercial bots will each tackle a narrow, high-ROI job - reconciling invoices, chasing late sales leads, updating a dashboard—before companies trust them with broader autonomy. As specialist bots multiply, they’ll begin to coordinate, forming a kind of operating system that orchestrates tasks across the whole tech stack. One-person start-ups are already chaining micro-agents together. Expect a top-level “manager” agent that farms out subtasks to tool-specific workers (code generators, SQL runners, calculators) and stitches the outputs together—tool calling and routing as standard practice. Licensing shifts from per-seat SaaS fees to outcome-based pricing. Many current SaaS workflows may be rebuilt on cheaper GPUs and open-source stacks, eroding incumbents’ margins. This mirrors the shift inside large enterprises, where AI-assisted "vibe coding" enables internal teams to replace narrow SaaS tools with custom-built solutions. Entire economy is reshaping AI isn’t just automating isolated tasks - it is starting to reshape the entire economy from media and marketing to biotech, construction and everyday services by replacing many middle-layer functions with software that can design, decide and personalise in real time. AI-generated movies/games will upend Hollywood economics. Traditional “publisher” value props will erode. Personalised, on-demand content becomes the norm, threatening incumbent studios and ad-driven media. Internal “software factories” and one-person AI companies shrink the trillion-dollar SaaS market; SMEs jump straight to AI-native vertical apps. Much of today’s generic CRM/ERP stack gets replaced by in-house or domain-specific AI tooling. Lowest-skill knowledge work is first in line for agentic automation, delivering ultra-tailored user journeys. Automation carves out middle-office costs and may structurally lift corporate profitability. As foundation models migrate into robots, construction, logistics, and field work digitise much faster.  By merging generative foundation models with automated wet-lab and fab-lab infrastructure, bio- and materials research is starting to look like rapid software prototyping. The result is a compressed innovation cycle — months instead of years — to deliver climate-relevant materials, new crops, and bespoke therapeutics. AI agents will replace large developer teams... someday Foundation models will be abundant and inexpensive, so businesses won’t keep armies of devs building bespoke apps. Head-count drops but margins rise because the expensive part (labour) gets automated. Multi-agent clusters turn a plain-English spec into runnable software overnight. Coding becomes a utility service, compressing development cycles from months to hours. Winning products will either (1) boost a universal productivity task - writing, meeting, planning - or (2) own a single specialist role end-to-end (paralegal, claims adjuster, chem-lab tech).  Because customers will judge these tools strictly by measurable business impact, pricing will migrate from per-seat SaaS licences to usage-, value-, or outcome-based fees - even tiny crypto payments as agents pay one another for micro-tasks.  Inside large firms, the same cheap reasoning stacks will spawn an internal “business OS” whose first mandate is to automate revenue-touching jobs like renewals, upsells, and flawless billing.  Sectors drenched in data and logic (law, healthcare) may shift to pay-per-successful-brief or pay-per-accurate-diagnosis models. Big enterprises will adopt agent-based AI more slowly than the hype suggests Large-scale roll-outs inside big companies will lag the hype cycle. Integrating safety guardrails, revamping data governance, and simply training staff all take longer than drafting a proof-of-concept demo. Agentic workflows, in particular, require both deeper reasoning reliability and sturdier orchestration infrastructure than is common today, so mainstream agent adoption will trail even further behind. Because features built atop large models commoditise quickly, each new capability triggers a flurry of copycats, revenue spikes, and inevitable consolidation. Application-layer teams must plan for that cycle: flashy daily-active-user growth in 2025 is less important than durable stickiness and clear ROI after the novelty wears off. Investors continue to reward product teams that iterate quickly and win users. In a market where raw model breakthroughs pause but infrastructure costs keep falling, execution velocity becomes the decisive competitive edge. AI regulation No single global rulebook for AI exists yet. Countries that go easy on regulation will attract more money and talent, forcing stricter jurisdictions to reconsider or risk falling behind economically. Even without new AI-specific statutes, regulators can lean on product-liability, consumer-protection, antitrust, data-protection, and cyber-crime laws to punish companies that deploy harmful or negligent AI systems. Secure, traceable pipelines are mandatory. Before a model is trained or an agent is let loose, the workflow should: strip out personally identifiable info, guard against prompt-injection and jailbreak attacks, and keep detailed logs of every step for later audits. Because large language models still hallucinate, most companies will roll them out in tightly scoped pilots (customer support chatbots, coding assistants, etc.). Each deployment must pass security, privacy, and governance audits before legal or procurement teams sign off. Two unresolved gaps could slow everything down. Explainability & provenance. Tooling to show why a model produced a given answer - or whether it used data it shouldn’t - remains immature. Training-data rights & compensation. Copyright, publicity rights, and licensing terms for the data used to train models are fuzzy, inviting future lawsuits. Advice to AI Startup Founders Prioritise growth over cost-cutting - savings from AI agents are immediately reinvested in new, revenue-generating features. Attack high-value, repetitive workflows first - target tasks where falling GPU prices turn compute into outsized financial returns. Ship fast, narrow solutions - release a single, pain-killing feature quickly, obsessively measure usage, and prove value with usage- or outcome-based pricing. Embed “invisibly” in existing tools - give users a prompt-less experience, with human-in-the-loop safety checkpoints and transparent dashboards. Build a defensive data moat - secure diverse, proprietary datasets and evaluation benchmark, become model-agnostic so you can swap in better LLMs at will. Exploit an 18-month capability plateau - move fast while models commoditise, doubling down on speed, product taste and customer obsession - advantages large incumbents struggle to match. How Belitsoft Can Help Belitsoft supplies battle-tested AI developers, architects, and MLOps engineers who turn AI trends into working, reliable software. Whether you’re a seed-stage Gen-AI startup, a scale-up, or an enterprise that needs bullet-proof governance, we build AI that will help you run your business.
Dmitry Baraishuk • 7 min read
BloombergGPT is Live. A Custom Large Language Model for Finance
BloombergGPT is Live. A Custom Large Language Model for Finance
BloombergGPT Is in Production or Not? The last official updates about custom financial LLM BloombergGPT came in 2023. All attempts to find out whether BloombergGPT is open source, to locate it on Hugging Face or GitHub, to download and try it, or just to understand how to access the BloombergGPT’s demo and its cost -  end in silence. The internet’s full of rumors. Some say BloombergGPT is already obsolete, frustrated by the lack of updates. Others laugh at the money spent, saying ChatGPT-4 does the same thing better, faster and cheaper for the end-user. They argue Bloomberg should’ve waited for stronger models. The team keeps quiet. Maybe because money doesn’t like noise. What matters: the model is already in production, built into Bloomberg’s stack. Doug Levin, a successfull startup founder, now at Harvard, wrote a review after testing BloombergGPT inside the Terminal. Called it a disruptive layer in Bloomberg’s legacy architecture. Not a research demo. A system already shaping workflows from the inside. Directly mentioned use cases from Doug Levin’s article: Financial report generation Market summaries Trading ideas or analysis Financial data analysis Market trend predictions Sentiment analysis Automated report generation Language translation Financial document text generation Risk assessment Real-time market updates Support in client communication Support in regulatory compliance S-1 analysis and modeling Search functionality via Bloomberg Search (SEAR) BloombergGPT-like LLM: Train from Scratch or Fine-Tune? A lot was written about BloombergGPT in broad terms: about the impressive results achieved by the team behind this custom financial LLM, how it outperformed many other models. But very little was said about what was happening backstage: what it’s actually like to develop a specialized model for the financial industry, how resource-intensive that process is, and what kinds of challenges these teams run into. Time to change that. Given that, we decided it was worth doing some reverse engineering: cutting through the PR noise in the available information to uncover insights that will likely stay relevant for a long time for any company that decides to build their own financial LLM. David Rosenberg, who leads the BloombergGPT development team, is still in his position (LinkedIn says so). And according to his social media, he insists that the information from mid-2023 about the model is still relevant. In this context, the information shared on The TWIML AI Podcast with Sam Charrington genuinely deserves close attention. "Using an API like OpenAI’s is not suitable for us: we have data we don’t want to send out. So for internal and sensitive use, in-house models are preferable." — David Rosenberg, The TWIML AI Podcast Let’s take a closer look at what else they discussed. Financial LLM use cases What the BloombergGPT's development team actually spent the most time on was thinking about financial LLM use cases from a variety of angles. For example, could BloombergGPT LLM help them solve problems they already had solutions for, but in a better way? Or with less investment in training data? They explored use cases like natural language to BQL (Bloomberg Query Language is used inside Bloomberg Terminal to pull structured financial data, the idea was to build a kind of financial code assistant that translates human language into Bloomberg-specific queries). They wanted an internal code assistant that understood their libraries. Or the ability to input a large document and interact with it: ask what information it contained, that sort of thing. In that sense, they were exploring many directions to see where the model could have the most impact. "As for production use, we need to be very cautious. No one has really solved the hallucination problem yet. These language models can say wrong things, do strange things, so there needs to be a process around making them safe to use, either internally or, eventually, with clients." — David Rosenberg, The TWIML AI Podcast They started with internal use, and in that context, it wasn’t so much about safety or reputation, it was about function: was it useful, did it do the job? That was their focus at the time. They were also aware that if people started relying on an LLM for internal tasks, they become less critical of its output.  So teams building custom LLMs needed a special system for checking its work. But then came the obvious question: if someone was always checking it anyway, should they just do the task themselves? In short, they kept things internal and focused on finance, code completion, and basic summarization tasks. Backstage of BloombergGPT Financial LLM Development Decision Behind Creating a Custom Financial LLM BloombergGPT is an example of a project where a team inside an enterprise trained and built a custom large language model specializing in financial language. The enterprise made a strategic decision to invest money, time, and human resources into this machine learning effort when GPT-3 was released. "The question was, is this a direction we pursue, we invest in? Because it was clearly a big investment. We didn’t know how much GPT-3 actually cost to make, but it was clear that it was a huge investment. We decided it was worth making the move. Maybe there’s some risk there, but it seemed like the possibilities were pretty great. That was kind of a decision made back in late 2020 - to start building towards this goal of our own GPT-3-style model. I’m not sure we knew exactly at that time what it would be used for. We're still experimenting to figure out how best to use it." — — David Rosenberg, The TWIML AI Podcast Training Dataset for BloombergGPT In some ways, it was a general-purpose model - but also purpose-built for financial applications. The training dataset was a mix of standard general-purpose data used for GPT-style models and Bloomberg’s proprietary financial data. About half of the dataset came from Bloomberg’s curated collection called FinPile, built over many years starting in 2007. It included financial reports, news articles, filings, press releases, earnings call transcripts, and other structured content. Some documents included tables and charts. They didn’t do any special processing for this training run - but when that information had already been extracted, they used it. Structured data wasn’t treated differently. It was tokenized like any other content. However, one area of concern was numerical data. Finance involves a heavy amount of numbers. They were concerned that the GPT-2 tokenizer didn’t treat numbers in any special way. A number like 5,234 could be split unpredictably - not digit-by-digit, not as a single unit - making it harder for the model to reason about numeric values. So they used character-level tokenization for numbers, allowing the model to learn digit structure and positional order. They followed an approach similar to Google’s PaLM model, where numbers were split into individual digits. This helped the model understand that the first digit carries the highest value, and so on - one way they adapted the model for financial data. BloombergGPT Training Process At the time, Meta’s OPT model had just been released. They used its training logs as a roadmap. Hugging Face’s BLOOM also published detailed logs, which helped guide their process. To reduce risk, they made their architecture as close as possible to something that had already worked. "We copied the BLOOM model architecture fairly closely, with some small tweaks. Tokenization and number handling were two key pieces. We called it v0."— David Rosenberg, The TWIML AI Podcast One thing they tried was sorting FinPile data chronologically - thinking newer data might be more accurate - while the rest of the data was randomly shuffled. For validation, they used the month immediately following the training set. They trained for 4-5 days and saw the loss curve level off. After 8-10 days, they stopped training. They suspected that curriculum learning (via time sorting) wasn’t helping. They restarted with fully shuffled data. That became version one of training. It started off stronger. Around day 8, the gradient norm spiked. Validation performance also dropped. They turned to the OPT paper’s troubleshooting guide, rolled back to a checkpoint, reshuffled data, and lowered the learning rate - but saw no major improvement. They investigated further and noticed something strange: out of 70 layers, the first layer’s layer norm scale weights dropped, then suddenly increased. This pointed to a bug: they were applying weight decay to weights that should’ve remained centered around one. They fixed this in version two, did a full code review, improved mixed-precision handling, and added an extra layer norm at the beginning. Then they restarted training. This time, it worked. They trained for 42 days, with a steady loss decrease. They hit some challenges around 75% of the dataset and eventually stopped training - performance had already exceeded expectations. Resources Used to Train BloombergGPT The core team included about nine people: Four focused on implementation Three on ML and data One on optimization and compute The rest handled evaluation, literature review, and support They trained on Amazon SageMaker using SMP (Sharded Model Parallelism) with 40GB A100 GPUs - 512 in total. They pre-purchased around 1.3 million GPU hours at a negotiated rate. Validation and Performance Evaluation During training, they used the last month of training data for validation. Later, they added a random validation set. They also monitored downstream tasks like MMLU and BBH (Big Bench Hard). After training, they did a full evaluation and compared BloombergGPT to OPT-66B, BLOOM, and GPT-NeoX. On general-purpose benchmarks, it was competitive. On financial tasks, it significantly outperformed open models. For example, on ConfFinQA (a benchmark requiring numerical reasoning in financial docs), it performed extremely well. They also had internal benchmarks: Sentiment analysis on financial news and social media Named entity disambiguation (linking "Apple" to its stock ticker, etc.) Natural language to BQL, which is like SQL for Bloomberg Terminal Even though the model wasn’t trained on BQL directly, it performed well in few-shot scenarios. They also experimented with headline generation and other generative tasks. What’s Next for the BloombergGPT Team? They had skipped a lot of early experimentation due to time constraints. Now that they have a working model, they’re going back to small-scale experiments - testing tokenization strategies, data mixtures, and architecture choices in a more disciplined way. They’re continuing instruction tuning using public data (like FLAN) and internal labeled datasets. They have rich internal data for tasks like entity recognition, which they’re formatting into query–response pairs for tuning. "We’re more interested now in smaller models. They’re easier to use: you can run inference on a single GPU. Our 50B model requires multi-GPU infrastructure. Inspired by the LLaMA paper, we’re exploring what we can achieve with smaller models, longer training, and careful design. We want both small and large models for practical use." — David Rosenberg, The TWIML AI Podcast Financial LLM: A Mirage? Financial LLMs are not a fantasy. There are already startups in this space that have gone to production, raised funding, and landed their first clients. The most well-known are those backed by YCombinator, for example, Truewind. So in fact, the question is simple: does the team have the Product Thinking or not? LLMs have made serious progress over the past two years, and it’s entirely possible that Bloomberg’s team are gradually shifting to a new technological foundation, replacing what they’d built before. Or maybe they’re just continuing to improve their system quietly - and we’ll hear updates soon enough. The key point: you can’t expect a financial LLM to do what it simply wasn’t built to do. LLMs have core limitations by design. This is clearly shown in this YCombinator discussion and explored in detail in this 2025 review of financial LLM capabilities. So, is a financial LLM a mirage? Yes, if used the wrong way. No, if used right. How Belitsoft Can Help Building a financial LLM is not a technical challenge. It’s a product challenge. Belitsoft helps financial firms build custom language models that are production-ready from the start. A good financial LLM needs to fit the firm’s data, language, workflows, and risk boundaries. This means: Training smaller models on tightly scoped tasks Fine-tuning existing open models with in-house financial content Building the validation systems Designing instruction datasets from internal annotations instead of starting from scratch Embedding the model behind interfaces users already trust, like dashboards and query layers The result is not just a model. It’s a usable system. A decision tool. An internal assistant. A document engine. Whatever your workflow needs. Belitsoft turns that strategy into a service: all deployments stay in-house. No data leaves the firm. No prompts go to external APIs. The entire model lifecycle is private, owned, and secure. Frequently Asked Questions
Dzmitry Garbar • 8 min read
Financial LLM: Use Cases and Examples
Financial LLM: Use Cases and Examples
What Is a “Financial LLM”? A financial LLM is a large language model, trained or fine-tuned on financial data, to be tailored for the finance domain, able to answer questions or generate content with an understanding of financial context, instruments, and regulations. Such models grasp industry jargon (tickers, regulations, accounting terms), handle numeric and tabular context, and comply with financial regulations in their outputs.  Organizations seek to apply the power of GPT-style models to banking, markets, insurance, and financial analytics while incorporating domain expertise and control. General-purpose LLMs (like GPT-4) lack certain finance-specific knowledge or precision, and companies have begun developing specialized “FinLLMs”.  BloombergGPT was one of the first large models trained specifically on a wide range of financial data (in addition to general text). Core Features and Capabilities of Financial LLMs Financial LLMs can answer questions, analyze and summarize text, classify sentiment or intent, check compliance, and produce financial writing.   Financial LLMs are used to generate content tailored to finance needs: draft research reports, write personalized portfolio explanations for clients, compose client emails, or generate financial news articles. Such a model writes in a style and context that financial professionals and customers expect. Question Answering on Financial Knowledge It's a chat assistant that understands your world and speaks your financial language, backed by actual data. Financial LLMs help answer questions like “What happened in Company X’s Q3 results?” or “What does Basel III actually require?” not by guessing, but by pulling answers from internal docs, or research.  They’re built to understand finance, and talk like a human, whether you’re a banker checking policy, or an investor tracking the market.  Most financial LLMs now prioritize auditability, because in this space you need to show where the answer came from. No black box. Just traceable output linked to source. Document Summarization and Report Generation It summarizes lengthy financial documents (research reports, earnings call transcripts, 10-K filings, insurance policies) into concise, clear narratives.  A financial LLM produces an executive summary of a 100-page annual report or distils key points from an earnings call in a few sentences. This is a highly valued feature given the volume of texts.  JPMorgan’s internally-developed DocLLM is designed to process visually complex documents and extract key information, providing summaries and answering questions about the content. Automating report generation (writing first drafts of market commentary or credit memos) is another capability of LLM. Sentiment Analysis and Market Insights LLMs are getting better at pulling signals from fast sources like news, Twitter, and analyst notes. They can tag headlines or posts as positive, negative, or neutral for a stock. That's basic for a fintech LLMs.  Regulatory Compliance and Risk Assessment Finance is heavily regulated. LLMs in this space need to support compliance and risk, not just generate text. Most real deployments use retrieval augmentation or guardrails to keep answers accurate and policy-aligned. FinLLMs are used to cross-check text against rules - for example, scan loan docs for compliance issues, flag SEC or FINRA violations, and pull policy red flags from internal communications. Financial LLMs are also used for risk checks. They parse financial statements, credit history, and reports to surface red flags or consolidate exposure data. Domain-tuned models are safer, because they stay within boundaries: no leaks, no speculation, no policy violations. Financial Data Extraction and Synthesis Extracting structured data from unstructured financial text is another core capability.  An LLM ingests a pile of earnings reports or claim forms and pulls out key fields (revenues, dates, loss amounts, etc.), performing data entry and aggregation. These models can then synthesize data across sources by aggregating and comparing data from multiple quarterly reports to answer “How did revenue grow quarter-over-quarter?”.  They can fill out templates or spreadsheets with information gathered from documents. This capability supports use cases like automating due diligence (consolidating data on a company from various filings) and feeding downstream analytics or models. FinBERT (financial sentiment analysis) FinBERT is a specialized open-source BERT-based model trained on financial text (news, filings, social media) for sentiment analysis. FinBERT was released years ago and hasn’t been actively updated, but 2024 year’s paper shows that it’s still useful, especially when fine-tuned and combined with a time-series model like LSTM. FinBERT hasn’t been updated in years, however this 2024 paper shows it’s still usable when fine-tuned and combined with other models like LSTM. FinBERT is based on BERT, trained on financial text to work as a sentiment classifier, not a full LLM by current standards. The study shows it still holds as a reliable component inside a larger pipeline. If you work with financial news and need sentiment signals, you can fine-tune it on your own data and feed the output into whatever model you already use (forecasting, scoring, classification). Load the model, run inference on news or filings, and map the output to positive, neutral, or negative. Output can be used as a feature in trading logic: entry/exit signals, risk filters, portfolio weighting. Use cases: news sentiment on equities, regulatory sentiment for risk exposure, general signal extraction from contracts or disclosures. For example, FinBERT can be used with QuantConnect, a cloud platform for developing, testing, and deploying algorithmic trading strategies across equities, FX, futures, options, derivatives, and crypto. FinGPT (financial sentiment analysis) FinGPT is an open-source financial large language model (LLM) developed by the SecureFinAI Lab at Columbia University for sentiment analysis, market trend prediction, and financial report summarization. FinGPT is a model built using transformer architecture. The model itself hasn't been updated since 2023 due to lack of funding, but it's still being actively used. For example, in 2025, there was news about fine-tuning this model to do extra tasks like financial risk prediction via audio analysis or end-to-end trading.  FinGPT v3.3 shows that a fine-tuned open-source model can outperform GPT-4 and earlier domain-specific models like FinBERT on narrow financial tasks without needing GPT-4 scale or cost.
Dzmitry Garbar • 4 min read
Hugging Face: The Guide for AI Startup Founders
Hugging Face: The Guide for AI Startup Founders
Role of Hugging Face in the AI Development Process Pre-trained Models and Code Libraries Expect access to a huge collection of pre-trained models covering a wide range of AI tasks – from NLP (text classification, chatbots, translation) to computer vision (image classification, object detection), audio processing (speech recognition, audio classification), and multimodal tasks. Thousands of state-of-the-art pretrained models (over 1 million models) are available on the HF Hub. Developers can pull these into their projects with minimal code, using HF’s high-level libraries like Transformers for model inference/training. This accelerates prototyping by reusing existing models instead of training from scratch. Hugging Face is more than just a model repository: it maintains several widely-used libraries that simplify ML development. Transformers. A high-level library to load and use pre-trained models (primarily transformer architecture models for NLP, vision, etc.) with a unified API. It includes model classes, tokenizer implementations, training pipelines, and the convenient pipeline() interface for quick inference. This is a core tool to integrate models into your code. Datasets. Tokenizers for efficient text tokenization (important for NLP tasks). Diffusers. Specialized library for generative image models (like Stable Diffusion) and other diffusion models. Accelerators for easier multi-GPU or mixed-precision training. Evaluation libraries. These tools are openly available (installable via pip) and come with documentation and examples. A startup team can expect that using these libraries cuts down the code to implement advanced ML functionality. For example, the Transformers library “abstracts much of the complexity involved in working with deep learning models, making the technology accessible even to those without extensive experience”. Datasets and Evaluation Hugging Face hosts over 200,000 datasets (text, images, audio, etc.) that teams can use for training or benchmarking. It also provides tools (like datasets and evaluation libraries) for accessing data and computing evaluation metrics, fitting into the data collection and model validation stages. Founders can search by domain or task to find data to train or test their models. The datasets library in Python allows easy downloading and streaming of these datasets. You should expect that common public datasets you’ve heard of (and many you haven’t) are one command away on HF – useful for experimentation or bootstrapping a model when you lack your own data. Inference API & Endpoints For any public model on the Hub, HF offers a free inference API (with rate limits) that you can call to get predictions. They also offer managed Inference Endpoints – a paid service where HF will deploy the model on their infrastructure and provide you a scalable API endpoint. A founder can expect that small-scale usage is trivially easy (you could literally call curl on a model’s inference API to test it). Model Hosting and Collaboration The HF Hub is a cloud-hosted Git-based repository system for models, datasets, and AI demos (“Spaces”). It’s a public marketplace of AI demos and prototypes. It enables versioning, issue discussions, and community contributions to ML artifacts, much like code collaboration on GitHub. This helps in the later development stages, allowing teams to store and manage models, share them across the team or publicly, and collaborate with the open-source community. Demo and Deployment Prototyping With HF Spaces (which support Gradio or Streamlit apps, Hugging Face automatically installs them in the cloud environment when building your Space), a team can quickly deploy an interactive demo or minimal API for their model directly on the HF platform. This is valuable in the prototyping/validation phase of product development. A founder can spin up a web demo of a chatbot or image generator to showcase to users or investors without setting up their own servers. HF Spaces allow you to host a web application (usually a demo UI around a model). The community has contributed over 500k Spaces, including everything from fun demos to useful AI tools as examples. A founder can create a Space for their own model, a simple Gradio app to input text and get a model’s response – and HF will handle the deployment. Expect that HF gives you a quick way to share a working prototype with the world without needing your own web servers. The free tier gives you a CPU demo, and you can pay to add GPU power if needed. Required Team Skills and Capabilities to Use Hugging Face A startup looking to leverage Hugging Face should have at least one technically proficient ML engineer who can navigate the HF ecosystem, and a supportive structure to handle data, evaluation, and iteration. The good news is that HF’s learning curve is gentle compared to building everything from scratch – for instance, one can perform a sentiment analysis in a few lines of code using pipeline() – but the team still needs core programming and ML understanding to adapt those examples into a real product. Strong Python and ML Framework Skills Hugging Face’s libraries are Python-based. Teams should be comfortable coding in Python and using deep learning frameworks like PyTorch or TensorFlow (which HF supports). While HF abstracts a lot of complexity, understanding the basics of model training, inference, and data preprocessing is necessary to adapt pre-trained models to your use case. In practice, a developer should at least know how to load models via the transformers API, handle tensors, and debug model outputs. Foundational ML Knowledge The team should have a grasp of machine learning concepts and have ideally completed an introductory deep learning course or project. Knowing how transformer models work, what fine-tuning is, and how to evaluate model performance will enable effective use of HF’s tools (for example, deciding when to fine-tune a model on your own data versus using it out-of-the-box). HF makes many things easier, but it doesn’t eliminate the need for machine learning understanding. A team lulled by the ease of pipeline() may deploy a model without fully understanding its failure modes. This can backfire with bad product outcomes (the model gives wrong answers that the team didn’t know how to catch). HF can create a false sense of security for non-experts. It’s easy to get something working, but hard to get it working well. You still need experts to tune hyperparameters, curate training data, or interpret why the model behaves a certain way. If a startup entirely lacks ML depth, HF alone won’t guarantee a successful AI product. Ability to Prepare and Manage Data Hugging Face provides datasets and tools, but a startup often needs to prepare its own data for training or fine-tuning a model to fit their product niche. The team should be capable of collecting, cleaning, and formatting data (using the Datasets library) and understand data versioning. Version Control and Collaboration Workflow Since the HF Hub uses a Git-based system for model repositories, familiarity with version control (git) is helpful. Teams should be comfortable with concepts like pushing changes, using branches or pull requests, especially if they plan to open-source some components or collaborate with external contributors via Hugging Face. Infrastructure/DevOps Awareness While one can prototype on a local machine or HF Spaces, to train large models or run continuous experiments, the team needs access to GPUs or cloud instances. Knowing how to set up and use cloud ML environments (AWS, GCP, etc., possibly with HF integrations like the AWS SageMaker partnership) is important once you go beyond toy examples. Being prepared to allocate computing resources (and budget for them) is part of this capability. Workflow for Evaluation and Compliance Since HF will give access to many ready-made models, the team must have the discipline to evaluate any model before putting it into their product. This means having capability in measuring accuracy, assessing biases, and verifying the model on your specific domain data. Organizationally, you should be prepared to handle ethical and legal considerations – for example, checking the license of a model or dataset (many on HF are Apache/MIT, but some have non-commercial licenses) and ensuring it’s permissible for your use. Also, the team should take responsibility for model outputs (HF won’t prevent problematic outputs automatically), which means establishing internal review processes for model behavior (safety checks, human-in-the-loop testing, etc.). Cons of Using Hugging Face for AI Startups Dependency on External Ecosystem Relying heavily on HF means your workflow depends on third-party infrastructure and community contributions. If the HF Hub is down or a critical model is updated or removed, it could disrupt your product. Startups need contingency plans (caching models). If you integrate deeply with HF’s proprietary features (like their specific APIs), migrating away also will require effort. Quality Variability The open nature of the Hub is a double-edged sword – not all models or datasets are of high quality. There’s a long tail of poorly documented or underperforming models. A startup picks a model that seems good based on downloads but later finds it wasn’t thoroughly evaluated. Unlike a paid API from a big company which has certain quality guarantees, HF models are use-at-your-own-risk. Your team should ensure that the model you choose is reliable. This can slow development if you have to test many options. Scalability and Performance Challenges The models on the HF Hub vary widely in quality. HF does not guarantee their performance. It’s up to your team to evaluate models. HF simplifies prototyping, but not production-level performance. Many top models on HF are large and resource-intensive. For example, a 20-billion-parameter model with great accuracy is too slow for a real-time app without optimization. Out of the box, the Transformer library doesn’t meet strict latency/memory requirements for production environments. You need optimization techniques like distillation (train a smaller model to mimic a larger one) or quantization (reduce model precision to save memory and speed up computation), or use specialized serving engines (TensorFlow Serving). Taking a model from HF and making it fast and cheap to run at scale requires additional tools (TensorRT, ONNX Runtime, etc.) and engineering work. Using a model locally or in a HF Space is one thing. Running it in a production with high uptime and scalability is another. Hugging Face’s free offerings won’t magically scale your model to millions of users. You’ll have to deploy the model on servers or cloud instances yourself (or opt for HF’s paid Inference Endpoint service). Don’t expect HF to handle production deployment for free. Have a plan for DevOps/MLOps work: containerize the model, choose inference hardware, configure load balancing and monitoring, etc. HF can assist with tools (like optimum for optimization or documentation on deploying to ONNX/Triton), but it doesn’t provide a full production pipeline out-of-the-box. HF Hub may have downtime, the model may be deleted. The best practice is to host critical models yourself. Limited Scope for Highly Custom Needs Hugging Face excels with standard architectures and tasks. If your problem is very unique, you’ll hit the limits of what HF provides. If you need a model architecture that isn’t supported by Transformers, you’ll have to implement it yourself without HF’s help. If you require a feature engineering step that isn’t covered by HF Datasets, you’ll have to build it. Transitioning from the “HF way” to a custom implementation can be a challenge. Don’t expect that a popular model on HF will work with high accuracy on your specific dataset without validation or fine-tuning. There are also cases for niche tasks where no pre-trained model exists. Some models (especially proprietary ones like OpenAI’s GPT-4) won’t be on HF due to licensing. You shouldn’t expect to simply download a model and instantly have a polished product. Significant engineering is still required to integrate a model into your application’s workflow, UI, and back-end systems. For example, using a HF language model to build a customer support chatbot will still require designing conversation logic, handling queries the model can’t answer, and integrating with your database or APIs – HF won’t provide those parts. Security and Compliance Considerations Because HF encourages sharing and pulling in community content, there’s a risk of introducing something insecure or non-compliant. A model repository may contain malicious code in its files (this is more theoretical, as HF does some scanning, but possible). Managing API tokens (for using HF Hub or endpoints) needs care. There have been reports of users accidentally leaking HF API tokens and causing security issues. For a startup dealing with sensitive domains, these are concerns. You need to run the model in a controlled environment (sandbox, where it can’t cause damage, leak data, or affect production systems), or review the code in model repos, which is extra overhead. HF doesn’t currently offer extensive enterprise compliance certifications for the free platform (Enterprise Hub provides more control though). Constraints on Private Data and Models By default, anything you upload to HF Hub as a free user is public. If you need to keep a model private (it contains proprietary data or IP), you shouldn’t expect the free tier to accommodate that. If you use HF’s hosted inference for a model, you send data to HF’s servers. This could be a privacy concern, because HF will not be liable for any data you expose. Overhead in Managing Updates HF is a constantly evolving platform (new library versions, new models, etc.). Your team needs to keep up with updates and deprecations to ensure you’re using the best and that your code stays compatible. In a small startup, chasing the latest version could be distracting, but staying too far behind means missing important fixes. It’s a trade-off to manage. Limited Protection from Bias or Harmful Content Many AI models (especially large language models) can produce biased, inappropriate, or factually incorrect outputs because they learn from internet data. Hugging Face hosts model cards and encourages responsible AI practices, but it does not filter or moderate the outputs of models. HF provides warnings about such issues (and has an ethical tagging system and model cards detailing biases), but it does not fix them for you. So you should not expect any model from HF to be bias-free or safe without your own review. Licensing and Commercial Use Constraints Not all “open” models on HF are free to use commercially. Some models are clearly marked non-commercial or research-only, but it’s easy to overlook that in the rush of prototyping. If you build around one of those models and only notice the license later, you’re stuck. Either rebuild or try to negotiate licensing — both can kill your timeline. Even models listed as “open” may have unclear redistribution terms. Some require attribution, others forbid modification, and some may have vague “fair use” clauses that don’t hold up if you scale. Competition and Lack of Proprietary Edge Using publicly available models from HF means that other companies (including competitors) have access to the same technology. If your product solely wraps a HF model with minimal changes, there is a risk that another team could replicate it quickly, since they can obtain the same model. Startups often need to build proprietary advantages on top (like unique data for fine-tuning or superior UX) to maintain a lead. Thus, while HF accelerates development, it doesn’t automatically give you a competitive advantage. Investors can question what’s unique if “anyone can use that model from HF.” It’s something to be prepared to answer (typically: your secret sauce is in data or integration, not just the model itself). From Prototype to Production Engineering Building an MVP with HF is relatively fast: just fine-tune a model and deploy a quick demo on Spaces. However, challenges include deploying the model to a scalable, secure environment, integrating it with your full application stack, setting up monitoring, and ensuring reliability under real-world usage. These tasks require software engineering and DevOps skills. A development firm can help containerize your model, set up cloud infrastructure (Kubernetes clusters, CI/CD pipelines for models), and incorporate best practices (logging, fallback systems if the model fails, etc.). In short, if your team has little experience taking models to production, outsourcing this phase can save time and costly trial-and-error. Performance Optimization and Customization The off-the-shelf model that works in a prototype may need optimization for production (to meet latency requirements, reduce memory footprint, etc.). Technical consultants with expertise in model optimization (quantization, distillation, compiling models to ONNX, using GPUs/TPUs effectively) can be valuable. For example, a vendor experienced in Hugging Face’s Optimum library or hardware acceleration could help you serve an NLP model 5x faster. If your startup doesn’t have a dedicated ML optimizer, outsourcing this to a specialist can ensure your product is both fast and cost-efficient in production. Extending Beyond HF’s Scope It’s possible your product needs capabilities that go beyond what HF’s out-of-the-box models provide: a novel model architecture or a custom data pipeline integrated with proprietary enterprise systems. After initial validation with HF components, you can decide to build a more bespoke solution. In such cases, hiring an external ML engineering team to develop a custom model or additional software around the model could make sense. They could, for instance, develop a custom training pipeline on your proprietary data, or integrate your AI component deeply into a mobile app or edge device – tasks which require software development expertise beyond just using HF libraries. Focus on Core Product vs. ML Infrastructure Founders should consider where their team’s time is best spent. If your core product value is in an AI-driven insight or service, you want your team focusing on improving that insight, not necessarily on plumbing (like server scaling or rewriting model serving code). Outsourcing the non-differentiating infrastructure work to an experienced team lets your staff concentrate on the core logic. For example, you can outsource the creation of a robust API around the model, or the design of a frontend for the AI feature, while you fine-tune the model’s outputs for quality. Also published here.
Dmitry Baraishuk • 11 min read
System Prompt Engineering in Gen AI Applications
System Prompt Engineering in Gen AI Applications
Large enterprises seek full-spectrum customization and need LLMs that are trained on their internal knowledge datasets, branding, advanced functionality, etc. These companies can choose to develop a gen AI model from scratch. However, this option requires massive investment. Another way is to either use ready-made models or customize existing ones by training them with proprietary data. According to market researchers, the global prompt engineering market is projected to grow from USD 280.08 million in 2024 to USD 2,515.79 billion by 2032, with a CAGR of 31.6% during the forecast period. Why Prompts Are Your Secret Weapon Interactions between a human and a machine are currently happening with natural language (NL). That is why it is important to formulate prompts that direct artificial intelligence (AI) in its effort to generate relevant and reliable responses. The skill of prompt engineering includes formulating correct requests and anticipating how the AI will interpret and execute commands. Competent prompt engineers build prompts with linguistic precision and the knowledge of how algorithms perform their functions. No matter what framework you use (LlamaIndex, LangChain, or your own code), the retrieval-augmented generation (RAG) system needs clear, well-structured prompts for every LLM interaction. A RAG-based application behaves as a simple user does while interacting with an LLM through the chat. For every task, such as indexing, retrieval of the information, metadata extraction, or response generation, the RAG system produces prompts. The context is added to those prompts and sent to the LLM. Ready-made systems like LlamaIndex provide templates, storage, injection, and inspection tools. However, it is necessary to understand and fine-tune them. For instance, in LlamaIndex, every kind of interaction with an LLM uses a default prompt as a template. For example, the TitleExtractor extracts metadata into the RAG workflow. Prompt libraries allow you to speed up the process of creating content and provide a more predictable result since all requests have already been pre-checked. However, the models are regularly updated and it is useful to test the existing prompts on new versions. Customizing Prompts The RAG workflow programmatically creates prompts. When LlamaIndex or any other framework is used, it builds prompts based on the company’s documents. The documents are divided into nodes, indexed, and selected with retrievers. Prompt customization is sometimes necessary or desirable. Developers do it, as it helps to achieve better interaction between the RAG components and the LLM, which leads to improved accuracy, and effectiveness of the app. Prompt customization is used in the following situations: To integrate domain-specific knowledge and terms To adjust prompts to a certain writing style or tone To modify prompts to prioritize certain types of information or outputs To use different prompt structures in order to optimize performance or quality The LlamaIndex framework offers the following advanced prompting techniques: Partial formatting means that you format a prompt partially, leaving some variables to be filled in later. It is convenient for multi-step processes when the required data is not available at once. Prompt template variable mappings let you reuse existing templates instead of rewriting them. Prompt function mappings allow for dynamic injection of certain values that depend on some specific conditions. The Golden Rules of Prompt Engineering The following golden rules include the prompt’s characteristics, differences of LLMs, and methods of creating prompts. By following these recommendations, you can develop effective and reliable RAG applications using LlamaIndex or other frameworks. Accuracy The prompt is precise and does not allow for ambiguity. You will receive a relevant response only if you clearly state what you need. Directiveness The directiveness of the prompt impacts the response. The prompt can be either open-ended or specific. The first type implies some space for creativity. The second needs a particular answer. As it was mentioned earlier, prompts combine the static part with dynamically retrieved content. Prompts should contain verbs like “summarize”, “analyze”, or “explain”, because those are clear instructions that make AI understand what is needed. Context quality An effective RAG system depends on the proprietary knowledge base. Prompt engineers remove data duplicates, inconsistencies, and grammar mistakes from the database, as they affect the retrieval process and the response generation. Context quantity A prompt should be brief and detailed at the same time. It means it should give the context in the amount sufficient to understand the request and specific requirements. Providing a RAG system with more details can give a broader understanding of the task, but it also can confuse the system with a lengthy prompt. Long and unstructured prompts may lead to hallucinations or irrelevant answers. Structured and relevant long prompts can improve accuracy. Cognitive load is the amount of resources that the LLM needs to examine, understand, and respond to. With RAG systems, cognitive load is the amount and difficulty of the prompt context. Apart from the context quality and quantity, context ordering is also critical. If you provide a long context, make sure you place the key information at the beginning or at the end. It helps LLMs to extract the main problem from the context and generate a relevant output. Required output format You need to specify the output in format, size, or language. Inference costs It is important to make cost estimations and consider token usage. Tools like LongLLMLinguaPostprocessor help to compress prompts. Prompt compression techniques can also improve the quality of the final response by removing unnecessary data from the context. System latency The system latency is related to the quality of the prompts. When there is a long and overly detailed request, the system requires more time to process it. Long processing times decrease user satisfaction levels. Prompt engineers regularly evaluate the performance of the prompts and optimize them depending on the results. It is a continuous process because the rules are changing rapidly. Selecting the Right LLM Not all LLMs are equal. The wrong LLM is able to neglect the effort devoted to crafting prompts. The following characteristics are useful while choosing the model: Model architecture defines which tasks the model is suited for. Encoder-only models (BERT) categorize texts and predict relations between sentences. Encoder-decoder models (BART) not only understand the input but also generate new texts. They can translate, summarize, and provide responses. Decoder-only models (GPT, LlaMa, Claude, Mistral) predict the next words in a sequence and can perform creative tasks. They write different texts and answer questions. Mixture-of-experts (MoE) models (Mixtral 8x7B) can cope with complex math, multilingual tasks, and code generation. Model size determines computational costs and the model’s capabilities. The more parameters the LLM has, the more resources it needs, and the higher operational expenses are. Inference speed is how fast the system processes input and generates output. Model pruning, quantization, and special hardware can improve the LLM speed. Besides the above-mentioned characteristics, LLMs can be divided according to different tasks or domains, demonstrating better performance in certain scenarios. Chat models are used for building AI chatbots and virtual assistants. Instruct models are found in educational tools and productivity applications, where users are interested in a detailed explanation rather than a natural conversation. Codex models are integrated into development environments and coding automation tools. They help with coding tasks, debugging codes, explaining code snippets, and even generating programs based on a description. Summarization models transform long texts into short summaries. They are used in news aggregation services, content creation, and research. Translation models suit global communication platforms, educational platforms for language learners, and localization tools. Question-answering models are underneath intelligent search engines and interactive knowledge bases. Need a custom-built LLM for your business? As a reliable AI Development company, Belitsoft creates generative AI models trained on your proprietary data. Our expert LLM developers ensure the right model architecture, fine-tuning approach, and infrastructure to fit your workflows and industry needs. Methods for Creating Prompts The following advanced techniques are used for complex and multi-step RAG applications. They structure the input to better guide the model’s internal reasoning. Few-shot prompting, or k-shot prompting, means showing a couple of examples of the task. Those examples demonstrate the model of what kind of response is expected from it. It helps to optimize the system to the tasks specific to a certain niche. Chain-of-Thought (CoT) prompting is breaking the problem into several steps. Instead of asking the system to provide the final result, prompts encourage the model to explain the process step-by-step. For example: “Children have five apples. John has eaten two apples, and Mary has eaten one apple. How many apples are left? Explain the solution step-by-step”. The system shows each calculation one by one as a school student does. The process of answer generation becomes transparent and reliable with this method. Self-consistency method improves the performance of the CoT prompting. It generates several reasoning paths and selects the answer that appears in most or all cases. For example, the task about apples mentioned earlier can be solved in three ways: 5 - 2 - 1 = 2 apples left 5 - (2 + 1) = 2 apples left 5 - 1 - 2 = 2 apples left The answer is 2 in all approaches, so 2 is the final result. This method of prompting is used to solve logic puzzles, math problems, and real-world reasoning (“Should I buy the shares of this company?”). Tree of Thoughts (ToT) prompting is based on the CoT, but it goes further, generates several ways of dealing with each step, and evaluates the results of each step. It may turn back if the result is incorrect and examine another solution. Therefore, each solution is like a new branch of a tree. Prompt chaining is giving short prompts in a sequence. The output of the first prompt becomes the input of the second. It makes the process of dealing with complicated tasks simpler. API-Based and Tool-Augmented Prompting The following methods are used when the model interacts with external systems, tools, or APIs to retrieve or process data. Function Calling is calling an external function according to the described scheme (via the OpenAl API, etc.). The model provides structured outputs (e.g., JSON) after calling integrated APIs. For example, in response to "Weather forecast in Paris?" the model calls getWeather("Paris") and generates the answer. Tool Use allows the model to dynamically choose tools (search engines, calculators, APIs, etc.) while generating the answer. For example, to provide the latest news on a certain topic, it uses a connected search tool. Models retrieve live data and verify facts. ReAct (Reason + Act) combines natural language reasoning and tool execution. Users provide prompts such as "I need to find out the dollar exchange rate. First tell me what you're going to do, and then do it." The model gives a step-by-step plan, performs actions (tool calls), observes the results, and continues its logic. ReAct serves as the foundation for AI agents and retrieval-augmented decision-making. How Belitsoft Can Help Belitsoft is a software development company that offers outsourced services of LLM training. We use the internal data of our customers (policies, documents, workflows) to tailor their LLMs while taking good care of data security. We use different prompt engineering techniques (few-shot prompting, chain-of-thought, etc.) to align the responses with appropriate use cases. As a result, the LLM interprets complex queries with high contextual accuracy. By combining tools of popular frameworks (LlamaIndex, LangChain, Haystack, Mistral, etc.) with best practices Belitsoft developers optimize prompts for high-performance LLM applications. Whether you need advanced prompt tuning, RAG-enhanced AI models or OpenAI API integration, we provide tailored AI solutions to fit your requirements. Contact us to discuss your project.
Dmitry Baraishuk • 7 min read
AI Document Classification: Cost of Building the Model
AI Document Classification: Cost of Building the Model
The cost depends on the task, specifically on the business scenarios they want to cover, the complexity of evaluation criteria (just relevance or also structure, readability, value scoring), the volume of training data (do they already have labeled data or do we need to create it?), the choice between using commercial APIs (faster start, but higher long-term usage costs) or building and tuning an open-source model (more flexible, but longer and more expensive to develop), the expected processing speed and cost per document, whether they need manual score correction tools and continuous retraining. Let’s take a hypothetical example of a marketing company with a request to build a text classification model for AI-based content curation. They have 1 million pre-labeled documents, want to achieve a goal of 90% accuracy, and have requirements for fast processing and low per-document cost (processing time per document between 5-10 seconds, and target processing cost per document $0.005). They need the system to evaluate each document according to several different, well-defined criteria, not just label documents as relevant or not, but assign scores based on Value (how important and useful the document is compared to others), Relevance (how well the document fits the needed topic), Readability (how clear and easy the text is to read). Cost estimation The estimation includes initial development, setting up, fine-tuning, testing on a limited dataset, calculating processing speed, accuracy, and costs and providing the client with a working prototype and performance results. Two approaches are possible, below we provide each with detailed estimates. Generative AI for Text Classification Development time (393–510 hours) Why this time? Even if you use OpenAI’s pretrained model, you still need custom code to connect your system to OpenAI API, logic for parsing and preparing those 1M PDFs, preprocessing pipelines (tokenization, chunking, embeddings), writing scoring logic, API integration for inference, testing, monitoring setup, fallback logic. All of this takes serious development time. That’s around 2.5–3 developers working full-time for 1.5–2 months, or 1 developer for 3 months. 1 million+ documents is huge. You need data ingestion logic for massive PDF parsing, extracting text, tables, images (maybe using OCR for some parts), storing and managing intermediate results, logging and error handling (there will be broken files, encoding issues). Scoring logic development (Value, Relevance, Structure, Readability). You can’t just throw a PDF into GPT and get four perfect scores. Developers need to design rules, add system prompts, and build a scoring framework. System integration. Testing. Before running on 1M docs, we test multiple times, adjust parameters, fine-tune batch sizes, and score weights. Delivery format + UI/Reporting. For a project with 1M documents, complex evaluation, custom scoring, reliable infrastructure, it’s a normal, reasonable estimate. A promise of “2 weeks” should make you suspicious. Fine-tuning cost (OpenAI) Clients don’t just ask for a fixed model: they want a system that learns and improves over time with manual corrections. That’s exactly what fine-tuning is. If OpenAI offers fine-tuning for $3 per million tokens, and we assume PDF documents of 10 pages contain around 2,500–2,600 words each, that would be approximately 3,300–3,400 tokens per document. If we fine-tune 100k documents, we have to pay OpenAI $1,000. If we fine-tune on the full 1 million documents, we have to pay them $10,000+. There are two variants within the OpenAI option: fine-tuning on 100k documents and fine-tuning on 1M documents. 100k documents is partial fine-tuning: faster and cheaper, but less precise. 1M documents is a full fine-tuning: higher cost and effort, but maximum alignment with client data. Why do we offer both partial and full tuning? Because 1M documents is huge. Processing and fine-tuning on all of them are expensive. Clients may not want to spend $10k+ on fine-tuning right away without first seeing value. So we provide a smaller “entry” scenario: fine-tune on 100k documents for $1k. If that works well, they can scale up to 1M. This helps de-risk: start small, validate quality, then invest more. The client requests 90% accuracy based on their labeled data. To meet that accuracy goal with confidence, full fine-tuning (1M documents) is best aligned. But offering partial tuning is still reasonable as a pilot step or fallback if the client wants to test results before scaling. However, if the client demands “production-ready” 90% accuracy from day one, partial tuning is not an option. Ongoing Usage Costs after Fine-tuning (OpenAI) After the model is fine-tuned, each time you use it to classify new documents through the OpenAI API, you pay based on the number of tokens processed. The client says: "We have 1 million documents with known classification." That means they already have labeled data (relevant/irrelevant). This labeled data is used for fine-tuning the model. What happens after fine-tuning? The model is trained to understand what makes documents relevant or not. But after training, the client still needs to run the model on new, incoming documents — documents that are not yet classified. The client wants to continuously process new batches of documents (potentially millions more PDFs in the future) and automatically score, classify, and filter them. How much would this cost at scale? To process (recognize/classify) 1 million documents, the estimated cost starts at $600+. To process 5 million documents, the estimated cost starts at $3,000+. If you are looking to build generative AI product, our engineering team deploys models that scale with your infrastructure. We connect APIs directly to your CRM, apps, websites or data pipelines, pulling context from your existing databases to ensure compliance with your software architecture. Discriminative AI  for Text Classification Learning from 1 million pre-classified documents (relevant vs. not relevant) is a classic supervised machine learning classification task. Document categorization is the dominant task for this project (40%), however, the remaining 60% focuses on scoring, etc. so it's more than a simple classifier. The system uses machine learning for classification, and while it leverages a generative model like OpenAI’s GPT for embeddings or fine-tuning, its core function is not generative. This is a discriminative AI system, not a generative one. It is about building an AI application powered by discriminative machine learning models. Generative models like GPT can be used in a discriminative way through fine-tuning or prompting. However, there are also specialized discriminative models such as SBERT and CatBoost, which are open source. These should also be included in the cost estimation process, especially because they offer long-term cost savings. Development time (615-799 hours) Why more hours for the open-source option? Because with OpenSource, you’re not just writing a simple code to call someone else’s API. You build and run the entire machine yourself. What exactly takes time: Set up servers and cloud GPUs manually. Not just click-and-use, but install drivers, libraries, and handle networking. Load models locally, troubleshoot compatibility (Hugging Face versions, CUDA errors, etc.). Write custom training scripts, not just call one OpenAI endpoint. Manage checkpoints, tune hyperparameters, monitor loss curves. Build your own inference service. That means writing API code around the model, handling batching, queuing, timeouts. Deploy on your servers. Set up Docker, CI/CD, security layers, scaling logic. Renting a GPU server for Fine-Tuning Let's assume that the fine-tuning process will take about 3.3 months total. In each month, let's take 22 working days (standard estimate, excluding weekends). Each day equals 24 hours of continuous GPU usage (meaning the tuning job runs non-stop). Let's take the upper price estimate of $0.4 per hour for a decent GPU instance (this is a realistic price for renting a mid-range GPU on platforms like vast.ai or other cheap providers).  3.3 months × 22 days × 24 hours × $0.4/hour = around $700 in server rental costs. The tuning job will run for around 1,742 hours in total.  Why this approach? You can’t fine-tune huge models instantly. It’s slow and runs for weeks/months. This cost estimate reflects real compute time needed for large-scale tuning. You pay here not for developer work but for compute time. Ongoing Costs for Using the Model to Classify New Documents in Production After fine-tuning, the client has a trained model. But to actually use that model to process new incoming documents, they need to run inference (classification jobs) somewhere. They have two hosting options. Rent servers and run inference jobs there You pay per hour of usage. So, you have to estimate the workload: how many documents you’ll process, how long it takes, and how many hours the server will run. More documents = more hours = more cost. It scales linearly. The final cost of renting a server depends directly on model performance (speed per document). Faster models (like CatBoost) process documents quicker, so total server hours needed are lower (5M docs = 4166 hours × $0.45/hour = $1,875 but less accuracy). Slower but smarter models (like SBERT) process documents more carefully, which takes more time, so you rent the server for more hours (5M docs = 5500 hours × $0.45/hour = $2,500, better quality results). Buy their own server You pay a fixed one-time cost (around $3,000). After that, you don’t pay for hours, the server is yours. Processing more documents just means it takes more time, but no extra rental payments. The “cost” then is just electricity and maintenance, not per-document fees. So the price is fixed upfront. But the real question becomes: how much capacity and time do you have to process big volumes? If you need results fast (say, classify 5M documents in a few days), you’ll need either multiple servers running in parallel (rented = more cost), or a very powerful owned server (expensive upfront, but fast). If the client has a large volume of new documents coming in regularly, they can decide if they want to optimize for cost or quality. AI automates finance and accounting tasks, from financial text processing to workflow automation. See how Truewind AI integrates AI-powered automation into accounting processes, or explore BloombergGPT’s use of AI for advanced financial data analysis. How Belitsoft Can Help Product Strategy Consulting We help companies build smart AI systems that classify, score, and filter massive amounts of content, and advise on the right technology, infrastructure, and cost strategy. We make complex ML processes simple to understand. We show you where your money goes, why it matters, and what results you can expect. We explain what’s possible, what’s practical, and what’s cost-effective: a quick start with commercial APIs (like OpenAI) or custom solutions with open-source models. We calculate fine-tuning costs (based on data volume and pricing per token or compute), inference costs at scale (depending on document flow and model choice), and explain server rental vs. buying hardware trade-offs. Full-Cycle Development We build small-scale working prototypes to demonstrate value before you invest big. We cover all activities, including building data pipelines, tokenization, embeddings, chunking, sliding window processing, custom business logic, fine-tuning, testing, deployment, and integration into your business systems. Partner with Belitsoft to get secure, custom-designed AI software and integrate analytical AI systems, AI chatbots and machine learning models. We take a consultative approach, understand the client’s unique challenges, and craft a solution accordingly. Get expert consultation and a cost estimate—contact us today.
Alexander Kom • 7 min read
Augment Your LLM With RAG Using LlamaIndex
Augment Your LLM With RAG Using LlamaIndex
Exploring Challenges with LLMs LLMs have become the basis of the natural language generation (NLG) technology. They are used in chatbots, medical coding software, search engines, text summarization assistants, etc. They analyze written texts and generate new content. A trend towards “personalized” generative models comes from the demand for trustful content and compliance with security regulations. The recent launch of ChatGPT Gov for U.S. government agencies proves the tendency towards secure environments for handling sensitive information across industries. Another trend is the appearance of agents that can support a natural conversation with a customer using up-to-date personalized data. For example, Amazon Bedrock Agents help design customer service chatbots that can process orders. Agents extend the functionality of the LLM by addressing third-party APIs and databases. As a result, they understand user requests, divide complex tasks into several steps, continue a conversation to learn more details, etc. Experts mention several challenges associated with utilizing LLMs. Those challenges demonstrate the necessity of customizing language models to enterprise data. LLMs lack access to real-time information. They use the data that was relevant at the time when they were trained, so it does not include recent research, news, or tendencies. As a result, users may receive incorrect responses when they need to find out the latest information. After the appearance of ChatGPT in 2022, it became clear that the LLM underneath it was trained on the data of 2021 and was not able to give up-to-date answers to users’ queries. That is why Microsoft added the current information from the Web and introduced the Bing Chat. However, it was not yet focused on enterprise data. LLMs do not distinguish between facts and fiction. They may seem convincing, but the responses they generate appear misleading, especially when discussing specific domains. The cause of that lies also in the training data which may lack quality. In the context of health sciences, for example, incorrect data might even be dangerous for users. LLMs demonstrate false reasoning and impede the transparency of their functioning. It is difficult to understand how the system has come to this or that conclusion and which documented fact influenced the final response. Lack of information limits LLMs in solving logical problems. For example, research shows that puzzles such as riddles, programming puzzles, and logic games with missing information demand richer datasets to enhance LLM’s puzzle-solving capabilities. It is difficult for LLMs to discuss and memorize long documents. Despite the fact that context windows have become longer and LLMs are now able to process documents of up to 3,000 pages, there are drawbacks. LLMs fail to provide a grounded explanation of their output, they need more time to respond, and they demand higher costs, because of bigger input volumes. Augmenting LLMs with RAG RAG enables AI solutions to access and use the information of the enterprise. It comes from the name of this technology that it combines two components: a retriever and a generator. The system retrieves the information from indexed data sources (proprietary knowledge) and uses it to generate the response. As a result, the LLM integrates the knowledge of the organization with its generative capability. The image shows how a user query is processed with an RAG model RAG allows LLMs to address the challenges mentioned above. RAG pulls the information from such sources as vector databases, relational databases, document-oriented databases, and file-based repositories. It allows LLMs to rely not only on its generative model’s knowledge but also on external up-to-date documents while providing responses. Thus, the information is real-time and the output is accurate. Due to retrieving the organization's proprietary knowledge, RAG technology allows for more truthful answers. Users are less likely to receive misleading information, as the system processes the context and facts related to the query. The retrieval component helps the RAG model find the data closely related to the query. This leads to better reasoning for the response. It is even possible for the users to get the actual quote from the source that the system referred to in its answer. RAG technology allows users to receive responses tailored to certain industries. For instance, healthcare LLM performs document analysis, customer and clinical decision support. Is RAG the Only Possible Solution? No, RAG is not the only one. Optimizing an LLM is also possible with prompting and fine-tuning. Prompt engineering is changing requests to receive a more accurate response. It may be achieved by rephrasing queries, limiting ambiguity, giving more precise instructions, etc. It also demands that an engineer understand how the AI is going to interpret and run certain commands. This method is cheap, it is easy to implement and does not require additional datasets. However, prompt engineering is less effective in handling domain-specific requests. Besides, creating prompts takes much time, and long complex prompts may increase response times. Fine-tuning AI models means training them additionally on proprietary data. After the pre-training on generic data, the model receives domain-specific data. It may be tailored to the set of tasks that this model should perform. For example, the LLM can be trained on internal branding guidelines, historical advertising campaigns, and instructions on the marketing tone of voice. After fine-tuning, this LLM will generate tailored ad banners and marketing posts that will adhere to the brand identity and tone. The image demonstrates LLM fine-tuning Among the disadvantages of fine-tuning are its high costs, demand for large datasets, and difficulties with updating the training dataset with fresh data. Another drawback is that fine-tuning may result in limiting the LLM to a certain domain and its worsened ability to handle requests outside this domain. For example, a model that is trained for medical coding won’t cope well with creative writing as it is focused on medical terminology. Sometimes, a model may also invent or overthink the facts. Thus, to the request “Who is the founder of the Belitsoft company?”, the model mentioned a fake person. Low-Rank Adaptation (LoRA) may be seen as a solution to the mentioned drawback of a changed base AI model as a result of fine-tuning. LoRA does not affect all layers of the neural network. It fine-tunes only smaller matrices while leaving the larger pre-trained LLM unaltered. However, this approach needs significant computational resources and expertise. RAG is often used in combination with fine-tuning. LlamaIndex is one of the ways to improve LLM-based applications using RAG. It is a framework that developers use to build custom RAG solutions. Let’s look closer at this framework. Caidera.ai, a company dealing with marketing automation for the life sciences industry, faced high production costs and resource shortages. The integration of the company’s AI pipeline with LlamaIndex resulted in a 70% reduction in campaign creation time, doubled conversion rates, 40% less resources spent, and three times faster compliance processing. Belitsoft experts utilize LlamaIndex to train LLMs on internal data (docs, FAQs, knowledge bases, etc.). What LlamaIndex Does LlamaIndex is one of the tools for RAG, focused on indexing, connecting data sources, and feeding the right chunks of data into the LLM at query time. With LlamaIndex, companies’ LLMs provide targeted and relevant responses on the basis of the business’s internal data. LlamaIndex organizes both structured and unstructured proprietary data into a retrievable format for further RAG workflows. The framework connects external datasets to popular LLMs like GPT-4, Claude, and Llama. As a result, businesses leverage the computational power of LLMs while focusing their responses on specific, reliable datasets, which maximizes the value of AI investments. LlamaIndex allows for the following possibilities: Build a search engine: LlamaIndex indexes many different formats, including Word files, PDFs, Notion documents, PowerPoint presentations, GitHub reports, etc. Create a customized chatbot: organizations “teach” their chatbots specific jargon and terminology, internal policies, and expertise to make them speak the language of their customers. Summarize large volumes of documents: companies like market research agencies can feed their LLMs reports on market trends, consumer behavior, and competitive analyses. Summaries of the key facts from those documents save analysts hours of manual work. Design an assistant: KPMG’s internal AI agents can retrieve, synthesize, and analyze enterprise data. Collaboration of LlamaIndex with Nvidia allows for developing a multi-agent system for conducting research and creating blog posts. The image compares the costs of updating data with RAG and fine-tuning “In part to help fund LlamaCloud’s development, LlamaIndex recently raised $19 million in a Series A funding round that was led by Norwest Venture Partners, and saw participation from Greylock as well. The new cash brings LlamaIndex’s total funding raised to $27.5 million, and Liu says that it’ll be used for expanding LlamaIndex’s 20-person team and product development.” - TechCrunch Progressive Disclosure of Complexity This is the design principle that LlamIndex uses. It means the framework is accessible to developers of different levels of expertise. It is easy in the beginning, as you collect the data with a few lines of code. Further, more advanced features may be added if necessary. LlamaIndex immediately starts converting the data into indexes that are digestible to the LLM. For example, to develop a company's knowledge assistant, advanced indexing technologies are used. First, developers load the necessary documents from a dataset. After that, they use the documents to create an index. It brings the system efficient querying and retrieval of information. Developers can set up many parameters like selecting specialized index structures, optimizing indexes for different use cases, adapting to prompt strategies, and customizing query algorithms. Important Aspects to Consider Relying on third-party experts for custom LLMs guarantees the RAG workflows will be optimized correctly. The first step in RAG workflow is to organize and process proprietary data. It must be clean, without errors or duplicates. Redundant documents may confuse the RAG system. It is important to review and update the information, remove irrelevant data, and add new content. LlamaParse helps with the automatic processing of the data from different APIs, document types, and databases. Indexes may pose unexpected costs as they call LLMs to process large volumes of text. This may happen while building TreeIndex or KeywordTableIndex. The best solution is to stay informed of the potential expenses and to use indexes with no LLM calls (SummaryIndex, SimpleKeywordTableIndex) and cheaper LLMs. However, the latter may result in quality trade-offs. Caching and reusing indexes instead of rebuilding them also saves costs. To maintain data security, developers use local LLMs instead of hosted services. How Belitsoft Can Help: Customizing and Deploying Your LlamaIndex Project Belitsoft is a custom software development company offering full-cycle generative AI implementation services, including selecting AI model architecture (LLM vs. RAG), configuring infrastructure (on-premises vs. cloud), fine-tuning models with domain-specific data, integrating AI systems with organizational software, testing, and deployment. By outsourcing Belitsoft's software developers, companies can create customized RAG applications tailored to their unique business needs. Belitsoft can set up RAG workflows on OpenAI models, as well as commercial and open-source models hosted locally to provide private options and reduce costs for large-scale projects. Running an open-source LLM on customer hardware requires significant memory and computational resources. To address this, Belitsoft applies quantization, a post-training optimization technique that reduces memory usage and speeds up processing. Belitsoft ensures the correct operation of RAG pipelines through careful evaluation. This process involves assessing retrieved nodes, output quality, scalability, and the ability to handle diverse queries, adversarial inputs, and edge cases. If you're looking for expertise in LLM development services, custom LLM training, or AI chatbot development, we are ready to serve your needs. Contact us today and we will discuss your project requirements.
Dmitry Baraishuk • 7 min read
Truewind AI ($17M): AI Use Case in Finance and Accounting
Truewind AI ($17M): AI Use Case in Finance and Accounting
What is Truewind?  Truewind is generative AI-based company based in San Francisco, raised a total funding of $17M over 2 rounds from 15 investors (Rho Capital, Thomson Reuters Ventures, Pathlight Ventures, Fin Capital, and Y Combinator). It has over 100 customers, including accounting firms like EisnerAmper and Frank Rimerman. How Truewind Uses AI for Accounting?  Transaction Classification Transaction classification (coding each transaction to the proper account, department, etc.) is a core bookkeeping step and a standard part of every accounting close process. All financial records depend on correctly classifying transactions in the general ledger. This includes identifying the payee, determining the account category (expense vs asset), and matching payments to invoices. Virtually 100% of accounting workflows involve transaction coding in some form. Every industry and company size (beyond perhaps the smallest cash-only operations) requires sorting of transactions into accounts. While the complexity may vary (a tech startup vs a manufacturing enterprise), the act of categorizing transactions (whether manually or via software rules) is universal in accounting. Bookkeepers and accounting staff are the primary people performing this workflow. Coding transactions is a repetitive, high-volume task that can consume a large portion of an accountant’s time. Accountants often spend hours identifying payees, choosing accounts, and adding tracking classes for thousands of transactions. Traditional accounting software offers limited automation, often requiring manual rule creation and constant oversight. This means the process is prone to errors (misclassified expenses) and drudgery, making it one of the more burdensome parts of bookkeeping. For transaction-heavy businesses, this pain is acute. In fact, automating even 90% of high-volume items (like credit card charges) saves substantial time and reduces errors, underscoring how much effort manual classification normally takes. Truewind AI classifies each transaction before it is synced (posted) into the client’s accounting system. It assigns the correct category (expense type or asset type), payee (vendor or merchant name), class (department, location, project), etc. Truewind also gives explanations and a confidence score, flags exceptions or low-confidence transactions for manual review, and allows bulk approval or review. The accountant opens their dashboard and finds that most transactions are classified and posted, exceptions are neatly flagged with explanations, and the ledger is nearly closed. All they need to do is review a small handful of items and approve. Transaction auto-classification using historical data is a classic LLM classification task. AI can achieve high accuracy (over 90%) in context-aware classification for this workflow. The manual coding step takes up to 60–70% of the total time spent on the transaction classification workflow. After automation, this drops to 5–10%, leaving only the review of flagged items. The AI classifies, flags exceptions, shows transactions in a structured format, and allows quick syncing to the ledger. The Truewind demo screenshot shows automated classification of bank and credit card transactions by category, payee, and class, with confidence indicators and review options before posting to the accounting system. Accrual Automation Accrual workpapers are a fundamental component of the month-end close in accrual accounting. They are used to record expenses and revenues in the correct period (prepaid expenses, accrued liabilities), ensuring compliance with GAAP/IFRS. Any company using accrual basis accounting (virtually all beyond the very small cash-basis businesses) must prepare such workpapers each period, especially subscription-based businesses requiring precise monthly accruals. Corporate accountants and controllers on accounting teams prepare and use accrual workpapers. This includes staff accountants compiling entries for accrued expenses or deferred revenue, and accounting managers or controllers reviewing them. Audit teams also examine these workpapers during financial audits, though the creation is done by the accounting department. Maintaining accrual schedules manually is time-consuming and error-prone. Accountants often juggle disjointed Excel spreadsheets for items like prepaid expenses and fixed asset depreciation. It’s considered a tedious workload that adds to close deadlines. Truewind automates accrual workpapers. AI prepares a clean, structured, audit-ready accrual package that’s 95% done, where the accountant only needs to double-check flagged items and click "approve". The system uses AI to read the transaction description and decide whether it's a prepaid expense that should be spread over future periods or a regular expense to book right away. This helps make accrual entries accurate and automatic. Preparation of accrual workpapers takes up to 50% of the entire accrual workflow, but after automation, this time is reduced by up to 10 times. At the same time, the entire accrual workflow becomes approximately 30–40% faster. Truewind AI automatically suggests accrual entries based on historical data and transaction patterns. The accountant simply reviews, selects, and confirms entries. AI Reconciliation Tool Reconciliation is a cornerstone of accounting and financial reporting. Financial records (like the general ledger) should match with external statements or subledgers. Common examples are bank reconciliations (matching the books to bank statements) and intercompany or subledger reconciliations. This process is fundamental for verifying that all transactions are recorded completely and accurately. In practice, every period-end close includes multiple reconciliations to catch discrepancies. Any organization that keeps books performs reconciliations regularly. Mid-size and large companies typically reconcile every bank account, credit card, and key balance sheet account each month. Even smaller companies perform at least bank and cash reconciliations. Accountants and controllers are responsible for reconciliations. Staff accountants or accounting analysts usually do the detailed matching. For example, ticking and tying transactions in bank recs or reconciling subledger reports (AR, AP, inventory) to the general ledger. Accounting managers or controllers then review and sign off on these reconciliations. External auditors also heavily scrutinize reconciliations during audits to verify that balances are supported. Reconciliation is a major bottleneck in the month-end close, often requiring hours or even days of manual work to match transactions line-by-line in Excel or other tools. High-volume accounts (such as cash or intercompany clearing accounts) make the process even more time-consuming, forcing teams to work late nights just to complete reconciliations. This error-prone, stressful “ticking and tying” process is a key reason why 82% of accountants view the close negatively, according to surveys. Truewind is currently working on AI-powered matching and anomaly detection, but it’s not in production yet. AI Contract Management Software Contract management is about aligning financial records (revenue recognition, lease liabilities, etc.) with contract terms (start/end dates, payment terms, performance obligations, etc.). For example, under revenue recognition standards (ASC 606 / IFRS 15), accountants must identify all key contract terms to allocate and time the revenue properly.  This workflow is essential for software, telecom, construction, leasing, and other industries. Many large companies have material contracts (SaaS subscriptions, multi-year sales agreements, vendor contracts, leases), so they require this process.  Often a technical accounting or revenue recognition team reviews customer contracts to determine how to record revenue or expenses. They coordinate with legal or sales operations to obtain contract details. Auditors also pay close attention to contract accounting (for example, checking that revenue is recognized according to the contract terms). Extracting key data from contracts is typically manual: accountants must read lengthy agreements and enter key details into spreadsheets or an ERP. AI streamlines contract management by employing optical character recognition (OCR) and natural language processing to automatically extract key terms from complex sales contracts. For example, the system pulls out payment schedules, renewal dates, deliverables, cancellation clauses, and other critical data directly from the contract documents. Rather than an accountant manually poring over each PDF, AI scans the text and populates those details into the accounting system or workpapers. Complex Contracts Complex contract management is an extension of contract management for businesses with highly customized or multi-element contracts. It covers scenarios like contracts with multiple performance obligations, custom pricing & billing terms, contract amendments, and revenue that must be recognized over time or under various conditions. Managing these manually is difficult, as revenue often needs to be broken into multiple components over time. For example, a SaaS company with custom contracts splits a contract into software license revenue, service revenue, and subscription revenue over different periods. Enterprise software companies, telecommunications, defense contractors, or any industry offering configurable bundles and multi-year deals have complex contracts.  Complex contracts are one of the most challenging areas of accounting. Many ERPs don’t fully support intricate arrangements out of the box, so companies resort to manual processes or expensive add-on systems. The workload involves reading through unique contract terms, manually configuring billing or revenue schedules, and constantly updating them for contract changes (amendments, renewals).  AI platform ingests a complex contract and then automatically generates the necessary sales orders or accounting entries directly in the ERP. For example, if a contract has multiple components (subscription fee, one-time setup fee, future rate increases, etc.), the system parses those details and creates accurate billing schedules. Truewind is currently working on AI-powered contract management, but it’s not in production yet. AI automates complex processes and improves decisions in finance and accounting industries. If you're looking to build an AI system, our guide on AI classification model costs breaks down key factors that influence development budgets and strategic AI investments. For a real-world example, see how BloombergGPT leverages AI for financial document classification, sentiment analysis, and more. Automation of Flux Analysis in Accounting Flux analysis (fluctuation analysis) is a common process during financial close and review. It compares account balances or financial line items between periods (month-over-month or year-over-year, or against budget) and analyses significant variances. In many organizations, flux analysis is part of internal controls or audit preparation, as management and auditors ask, “Why did this number change so much from last period?” Therefore, it’s a standard analytical step in the accounting/FP&A workflow for medium and large companies. Public companies, for example, must explain period-to-period changes in financial statements (Management Discussion & Analysis), so flux analysis is mandatory for them.   Flux analysis is used by those who need insight into financial changes: corporate accounting teams produce the analysis, and CFOs, controllers, audit committees, and auditors consume the results. Flux analysis can be quite tedious and is often cited as a pain point, especially because it’s traditionally done with a lot of manual work in spreadsheets. Accountants export trial balances to Excel, compute variances, and then manually write explanations or investigate transaction details for the differences. This process is time-consuming, and it becomes frustrating when late adjustments require re-doing the analysis.   Truewind is currently working on automating flux analysis by using advanced algorithms and AI, but it’s not in production yet. Pro-forma Forecasting Pro-forma and forecasting tools are used to create projected or “as if” financial statements (for example, forecasting the next quarter’s income statement, or combining entities for a what-if scenario). While not part of the ledger bookkeeping itself, they rely on accounting data and are often maintained by finance teams in conjunction with accounting. Many accounting departments (especially in firms that offer client advisory or in family office contexts) include forecasting as a service to provide insight to management.  Nearly all public companies and many mid-size to large firms use these tools for budgeting, cash flow forecasting, and strategic planning, such as annual budgets and fundraising. In accounting firms, offering projections and advisory is a value-add, especially for larger clients. So while not a required step for compliance, in practice, most companies perform this workflow to guide decision-making. In a corporate setting, the FP&A (Financial Planning & Analysis) department typically takes actual accounting data and projects it forward, and these are the primary users of forecasting software.  Financial forecasting and pro-forma budgeting are quite challenging. The pain comes from dealing with uncertain variables, large data sets, and the limitations of manual tools. Many teams still use spreadsheets for budgeting. According to a CFO study, 56% of finance leaders said the growing complexity of forecasting and budgeting is their greatest internal challenge. Common pain points include: consolidating data from multiple sources, updating forecasts for actual results, and running multiple scenarios without a good system. If done manually, producing a rolling forecast or pro-forma statements can consume a lot of time and still yield inaccurate results, making it a notable pain area in finance. Truewind is currently working on a cash flow projection feature that uses real-time data from the books to predict future cash positions. The system employs predictive analytics and machine learning to analyze historical cash patterns, upcoming payables/receivables, and other inputs, and then produces a forecast. However, it’s not in production yet. Intercompany Reconciliation Automation For companies with multiple legal entities, managing intercompany transactions is a critical part of the accounting and consolidation workflow. These financial dealings between subsidiaries or business units under the same parent, such as sales of goods from one subsidiary to another, or shared expenses must be recorded in each entity’s books and then eliminated or adjusted upon consolidation so that they don’t inflate the group’s overall financials. Intercompany accounting also involves transfer pricing adjustments, foreign exchange for cross-border intercompany, and reconciliation of intercompany balances. It’s considered one of the most complex routine tasks in finance because of the volume and need for precision. For any sizable or global company with subsidiaries, this process is absolutely required.  In fact, many large enterprises have dozens or hundreds of entities and thus heavy intercompany activity (e.g. cross-charges, intercompany sales, loans between entities). In those environments, intercompany accounting is unavoidable and happens every month. Nearly all multinational corporations must deal with it (often at a large scale, since intercompany entries are multiples of external transactions in volume).  Accounting teams at multinational or multi-entity companies, including consolidation accountants, corporate controllers, and treasury or tax specialists, use this workflow. The process starts with local entity accountants recording intercompany invoices or transfers, but the critical part is handled by corporate accounting during consolidation (to reconcile and eliminate those intercompany balances). Sometimes a company has a dedicated “Intercompany Accountant” or a team focused on intercompany reconciliations. The tax department also gets involved because transfer pricing (pricing of intercompany sales) has tax implications.   Intercompany accounting is widely recognized as a headache for finance teams. Surveys have found that 96% of finance stakeholders report challenges with intercompany processes, and an astounding 90% said their staff pull all-nighters at least once a year solely due to intercompany issues. The pain comes from difficulties in matching transactions between entities (discrepancies in timing or amounts), dealing with different currencies and exchange rates, and ensuring all parties record the transactions consistently. Overdue intercompany reconciliations can linger for years, causing uncertainty and even risk of audit issues. Nearly half of companies report that unresolved intercompany imbalances create business uncertainty and risk of compliance problems.   AI and intelligent rules automatically identify intercompany pairs. For example, if one entity records an intercompany sale, it finds the corresponding purchase in the other entity. During consolidation, AI automatically eliminates those intercompany entries so they won’t double count in the consolidated financials. This ensures compliance with accounting standards (GAAP/IFRS requires that intercompany revenue/expenses be removed) and saves a huge amount of manual work. The system also manages currency conversion and exchange rate adjustments in real-time, which is a big part of intercompany for global companies. Truewind is currently working on intercompany reconciliation automation with AI, but it’s not in production yet. How Belitsoft Can Help Belitsoft offers standout software engineers to build AI and machine learning applications, as well as data-intensive distributed systems for finance and accounting. We also provide exceptional full-stack engineering services to architect and develop software products that use AI technology to reimagine traditional bookkeeping and finance processes. Our developers create web applications and LLM applications using both custom LLMs and the OpenAI API integrated with vector databases. They also build and manage the technical infrastructure supporting databases, servers, and APIs. Partner with Belitsoft and automate your accounting workflows with Generative AI. By outsourcing our software developers, you eliminate manual entry, receive audit-ready outputs and generate accrual workpapers, financial statements, and variance reports in minutes ready for an accountant’s review to flag items, not rebuild them. Contact us to discuss your project.
Dmitry Baraishuk • 10 min read
Generative AI App Development Company
Generative AI App Development Company
Stages of Generative AI Application Development While AI is new, it’s just another tool. How to get started building an application that uses generative AI? There are three main steps: the ideation and experimentation, the building, and the deployment stages. Ideating around exploration and proof of concepts As each use case is unique, our clients need a specialized model that does the job well.  After understanding exactly what you’re planning to use generative AI for, we start by researching and evaluating models from popular repositories like Hugging Face or the open-source community. Our generative AI developers consider the model size, its performance, and benchmarking through popular benchmark tools. We run tests, checking options based on your previously identified use case and deployment needs.  We can consider three factors. First, accuracy: how close the generated output is to the desired result. It can be measured objectively and repeatedly with benchmarks. The second factor is the reliability of the model. That’s determined by consistency, explainability, trustworthiness, and how well the model avoids toxicity like hate speech. The third factor is speed: how quickly a user gets a response to a submitted prompt. And then finally, you choose the option that provides the most value. Generally, self-hosting a large language model is cheaper than a cloud-based service. Plus, you get the added benefit of knowing your data is secure and private on-prem. And small language models (SLMs), versus large language models (LLMs), generally perform better with lower latency and are specialized for specific tasks. When working with models, and experimenting with your data, our generative AI developers rely on various prompting techniques.  Zero-shot prompting — asking the model a question without any examples of how to respond. Few-shot prompting, where we give a few different examples of the behavior we want the LLM to have. Chain-of-thought prompting, where we ask the model to explain its thinking step by step. We explain early on the different capabilities and limitations of the models we're going to work with, so that you understand any potential challenges that might come up along the way. Time to Build Gen AI Application Our client often wants to use their data with the large language model of their choice. There are a few different methods to do this. Retrieval-augmented generation, or RAG, where we take a large pre-trained foundational model and supplement it with relevant and accurate data. This provides better and more accurate responses. Fine-tune the model or take the large language model and include your data with it so you’re actually baking in the information, the behavior, and the style you want into the model itself. Then you can inference it and have that domain-specific data every time you interact with the AI model. BloombergGPT exemplifies how domain-specific AI improves results in finance. Having the right tools and frameworks, like LangChain, help us cut AI development costs.  They let us focus on building out new features (chatbots, IT process automation, data management, and much more)  by simplifying the different calls your app is going to make through the model. Through sequences of prompts and model calls your app is capable of accomplishing more complex tasks. It means breaking down problems into smaller, more manageable steps and evaluating the flows during these model calls. Deploying Gen AI-Powered Applications So finally, you've got that application powered by AI or a large language model, and we deploy it to production to scale things up. This falls under the umbrella of machine learning operations, or MLOps. The infrastructure needs to handle efficient model deployment and scaling. Using technologies like containers and orchestrators like Kubernetes help us do this: auto-scaling and balancing traffic for your application. We also use production-ready runtimes like vLLM for model serving. And what we’re seeing now is that organizations are taking a hybrid approach with both their models and their infrastructure. That means having this “Swiss Army knife” setup: different models for different use cases, as well as a combination of on-prem and cloud infrastructure to get the most out of resources and budget. With AI-powered applications running in production, the job isn’t done. We still need to benchmark, monitor, and handle different exceptions coming from your application. Just like with DevOps, we also have MLOps for getting models into production smoothly. What To Expect From Belitsoft Generative AI Development Services Fast Development and Quick Time-to-Market Business leaders feel market pressure to realize value quickly with AI, so it's natural for them to push for initial prototypes within weeks and full solutions in a few months.  Organizations run numerous pilot projects before scaling. Surveys found most companies are running up to 20 genAI experiments, expecting that up to 30% of their AI proofs-of-concept would be fully scaled within 3–6 months. While generative AI enables faster development than traditional software, meaningful deployment (with proper testing and governance) may take 6–12+ months for complex enterprise use cases. Our generative AI product engineering team prototypes fast to support short product cycles, focusing on projects that promise quick wins (like automating a specific task). Competitive Pricing Generative AI projects don’t have a fixed price tag. The range is huge. On the low end (a couple of thousand dollars), you’re looking at basic projects. On the high end (several hundreds of thousands of dollars), it’s a completely different story. These are large, enterprise-level systems. Several factors drive this cost range: project scope, level of customization, data requirements, complexity of the infrastructure, team salaries, and whether existing AI services can be used or a custom solution needs to be built from scratch. For example, fine-tuning a large language model on proprietary data is far more resource-intensive than using an off-the-shelf API, and comes with a price tag to match. Outsourcing to specialized or offshore AI development firms like Belitsoft cuts development costs by half, while maintaining appropriate control and oversight. Our development team, depending on the client’s request, suggests optimizing spend by reusing pre-trained models (avoiding full model training costs), controlling cloud compute time, and focusing on high-value use cases. We also recommend strategies like hybrid cloud architectures to manage costs (ensuring workloads run in the most cost-effective environments), and monitor usage to prevent budget overruns. Customization Customizing a genAI model means adjusting it for specific tasks and using the company’s unique data.  Our customers expect a high degree of adaptability in their generative AI solutions.  They want one tuned to their business needs.  Our engineers fine-tune or extend models so that outputs are domain-specific and workflows align with their operations.  To embed generative AI apps into customers’ products and processes, we develop API integrations, including OpenAI API, into CRM systems, websites, mobile apps, or data pipelines they already use - often replacing traditional SaaS modules with vibe-coded. A generative model will pull context from a client’s database and comply with existing software architecture.   The Right Team & Proven Expertise Successful AI projects should involve people who understand the business context to make sure the AI solves the right problem aligned with industry-specific requirements (healthcare regulations, financial data formats, etc.). Building generative AI applications requires a multidisciplinary team, and our company has the right expertise. Our AI/ML specialists, data scientists, and machine learning engineers know how to design, train, and fine-tune models. Our software engineers integrate AI into products and configure the deployment pipeline. You can also expect to fill roles like UX designers (to create intuitive AI-driven experiences) and project managers who keep development on track. The required team profile that Belitsoft provides differs depending on the client. For startups and small companies, we provide “full-stack” AI engineers who can wear multiple hats, from model development to frontend coding. It’s now possible for even a very small team to achieve a lot with generative AI (the concept of a “one-person unicorn” is emerging, where a solo founder uses genAI tools to build a product). So, in the early stages, a startup relies on one or two of our ML engineers to rapidly prototype using public APIs, given budget constraints. For enterprise clients like Fortune 500 companies, we also provide larger, cross-functional teams for AI projects. We act as a development partner to bring in data engineers (for pipelines and data prep), MLOps engineers (for model deployment and monitoring), and security experts. Belitsoft’s AI development team has deep expertise in AI techniques, experience in delivering AI projects, and maturity in security, scale, and cross-department coordination. Our AI experts implement best practices in data pipelines, AI cost optimization, and iterative model improvements to ensure your AI solution delivers maximum value. Challenges & Best Practices in Generative AI Projects Common Development Roadblocks Generative AI development is not plug-and-play. It’s hard work, and organizations have learned they often need more time than expected to overcome adoption challenges. Data issues. “Garbage in, garbage out” applies strongly to AI. Many firms find their data is siloed, unclean, or simply not ready to train high-quality models. Integrating relevant data into AI workflows and doing so without violating privacy is not a trivial task. Cost and ROI concerns. At least 30% of generative AI projects are abandoned at the pilot stage due to unclear value or unforeseen complexity. Early hype leads to unrealistic expectations. When a proof-of-concept doesn’t deliver a quick win, stakeholders lose confidence. Integration challenges. Hooking a genAI system into legacy systems or workflows often reveals incompatibilities or performance issues: latency problems when an AI service calls an external API, or trouble containerizing a model that wasn’t designed for production deployment. Best Practices for Overcoming Challenges  Take a phased approach. Begin with a pilot on a well-scoped problem (a “ground game” of small wins) and use those results to fund and inform more ambitious AI projects. Establish procedures to monitor genAI tools and set up AI usage policies and controls (such as guardrails against biased or unsafe outputs) as part of the project plan. This proactive governance helps prevent problems and reassures stakeholders that risks are under control. Focus on data readiness and engineering. Since data quality is a common roadblock, successful teams often start with a data audit and improvement phase. Create data engineering pipelines and build new data infrastructure (like data lakes or vector databases) if needed to support the genAI model. Techniques like capturing metadata, building knowledge graphs, and using efficient fine-tuning methods maximize the value drawn from enterprise data. Privacy-by-design is key: teams anonymize or segregate sensitive data from the outset so it never inadvertently leaks through a generative model. Implement FinOps (financial operations) for AI, tracking and optimizing AI compute usage to keep cloud costs sustainable. Treat genAI initiatives with a product mindset. Rather than running one-off experiments, as AI development trends now emphasize sustained iteration and long-term value, teams should plan for continuous improvement: iterative model updates, regular user feedback, and new feature rollouts over time. This “product approach” means that even after initial deployment, the AI application is monitored and refined, just like any software product. Best practices include setting up A/B testing for model versions, maintaining a feedback loop from end-users (to catch issues like irrelevant or incorrect outputs), and scheduling periodic retraining or prompt tuning as data or business needs evolve. Looking to develop a high-performing generative AI app? We go beyond model selection and provide clean data, cost control, robust integration, and continuous optimization. Belitsoft helps with AI adoption challenges from day one through data readiness, secure governance, and a product-focused approach. Contact our team.
Dmitry Baraishuk • 7 min read
Custom AI Chatbot Development Services
Custom AI Chatbot Development Services
Custom AI Chatbot Using Your Own Proprietary Data  A key feature distinguishing custom chatbots from generic ones is the data they’re built on. A “custom” chatbot will know your business inside-out, often by being trained on the company’s own data.  Off-the-shelf bots rely on broad, general training data, whereas a custom chatbot is typically powered by data specific to the company’s domain (e.g. product info, internal documents, FAQs). This means the bot can answer questions and respond to requests with context that’s unique to that business, delivering more relevant, accurate responses.  Belitsoft developers train NLP models on a company’s knowledge base so the bot understands your industry jargon or detailed product terminology. Custom Enterprise AI ChatBots Larger enterprises tend to pursue full-spectrum customization. They often need the chatbot to tick all the boxes (proprietary knowledge, integrations, branding, advanced features) because they have diverse use cases and high volumes.   Each industry emphasizes the elements of customization that align with its business model and customer expectations (compliance in banking, rich functionality in retail, domain expertise in healthcare, etc.). For regulated industries like finance or healthcare, where companies place a huge emphasis on security, compliance, and control, you can expect your custom chatbot to run in a secure environment (sometimes on-premises) and follow all data privacy rules. The solution keeps data in-house and abides by industry standards. For example, a hospital might require that the AI chatbot be HIPAA-compliant and only trained on its vetted medical content. For industries like e-commerce, which are extremely customer-facing, competitive on experience, and prioritize personalization and integration, businesses need custom chatbots that can recommend products, apply promo codes, or check inventory with tight integration into e-commerce platforms. For travel agencies, an AI bot may be integrated with booking systems and remember user preferences (window seat, hotel ratings) to personalize recommendations. In manufacturing or tech support, custom chatbots have the ability to interface with IoT devices or databases to provide real-time status updates and troubleshoot equipment issues. Belitsoft provides custom AI chatbots for large enterprises as deeply integrated, scalable solutions. We plug the AI bot into your existing systems (CRM, ERP, databases). These bespoke enterprise AI chatbots align with your workflows, manage complex, multi-step tasks across departments, and cover diverse use cases and multiple purposes (customer support, HR self-service, IT helpdesk, etc.). Custom AI Chatbots for Startups What counts as “custom” can shift as a company scales: a feature that a small business finds uniquely tailored (say a bot that answers their top 50 FAQs) is considered basic by an enterprise. Conversely, an enterprise-grade custom chatbot (with millions of interactions, multiple integrations, and AI fine-tuning) would be overkill for a small business.   Small companies typically focus on solving a specific problem quickly. Many startups begin with off-the-shelf or no-code chatbot tools to quickly set up a basic Q&A or FAQ bot. They call it “custom” once they load in their own content and branding. These non-enterprise buyers value quick deployment and affordability, so they often do not initially require heavy integration or advanced AI features. For them, we create custom chatbots with minimal complexity, tailored to their niche use case or brand. As the business grows and their expectations evolve, we provide more advanced customizations (such as adding sentiment analysis or connecting to proprietary systems) once basic solutions fall short. Custom AI Chatbots from Scratch Our AI engineers also build or fine-tune AI chatbots based on open-source frameworks for technical buyers. They use programming frameworks and AI models to craft chatbots from the ground up to meet exact specifications. We train a custom language model on in-house data, implement advanced algorithms for intent detection or sentiment analysis, and deploy the solution on their own infrastructure. This is a highly configurable solution that can be adjusted at a granular level. This is the ideal solution for those who want full control over the chatbot’s behavior and tech stack: from the NLP engine to integrations, along with extra capabilities like API access and on-premise deployment for data control. Rely on our expert support to set those features up. AI Chatbot Customization Beyond Proprietary Data Customization isn’t only about data. While many buyers equate a "custom AI chatbot" with "a chatbot trained on our proprietary data", others seek different types of customization. We also work with businesses that prioritize custom conversation flows or logic (even without extensive data training) or the configuration of how the chatbot interacts with users to fit a specific use case. Such custom AI chatbot projects usually involve tailoring multiple elements. 1. Executing specific workflows or decisions unique to companies’ operations  We work with businesses that have niche requirements beyond basic chat capabilities. For example, healthcare providers that need the bot to securely collect patient symptoms and offer preliminary guidance within regulatory boundaries, or SaaS companies that need the chatbot to perform technical troubleshooting steps interactively. Custom chatbots have capabilities uniquely suited to businesses, whether it’s a complex calculator embedded in the chat for finance or an AI that remembers a user’s past interactions to personalize the next response. Custom bots are designed to follow complex, industry-specific workflows that generic bots can’t. This includes guiding users through multi-step processes (like an insurance bot walking a customer through a claims submission) or automating tasks like ticket creation or appointment booking. We define these custom workflows so the bot carries out the exact tasks the business requires, following the same sequence or logic a human agent would. A “custom” chatbot handles specialized functionalities tailored to the business domain. Features like advanced analytics dashboards, A/B testing capabilities, or machine learning enhancements (e.g., sentiment analysis, context awareness) fall under this umbrella. Other advanced features also include multilingual support (serving customers in different languages), omnichannel deployment (website, mobile app, WhatsApp, etc.), or interactive elements like forms and buttons within the chat. 2. Integration capability is a top request beyond Q&A knowledge Many businesses define “custom” in terms of how well the chatbot integrates with their existing software ecosystem. We connect chatbots with your internal systems, databases, and third-party services to pull or push information in real time. API connectors to CRMs, ERPs, ticketing systems, and more allow the bot to transact and retrieve data on the user’s behalf, making it far more powerful than a standalone FAQ bot. For example, we integrate e-commerce bots with inventory and order management systems so they can check product availability or order status during a conversation. Similarly, we integrate a bank’s chatbot with account databases to provide balances or transaction history with proper authentication. Where to Start? Primary Objectives First When seeking a custom AI chatbot, prioritize what to customize based on your industry and use case. Accuracy and task competence tend to be the top concerns. Businesses want the chatbot to actually solve the problem it’s intended for, whether that means answering correctly or completing an action. Generally, companies start by focusing on whatever will deliver the most value quickly. For many, that means using proprietary knowledge. The bot must accurately answer questions about their products, policies, or services. Thus, feeding the bot company data (or integrating a knowledge base) is often the first priority. On the other hand, if the goal is transactional (automating a process), then workflow customization and system integrations become equally important from the start. A retailer deploying a chatbot for shopping assistance often prioritizes integration with product databases, while a consulting firm focuses on ingesting its document library. How Belitsoft Can Help Belitsoft is an agency that tailors projects to client needs and builds truly custom AI chatbot solutions: we do not rely solely on templates. Our custom AI chatbot development services are designed for clients who have a clear vision of what they need and want a partner to tailor a chatbot for them. As a custom AI chatbot provider, we take a consultative approach, understand the client’s unique challenges, and craft a solution accordingly. You can expect a consultative sales process, flexibility in scoping the project, and evidence of past tailored solutions. This bespoke approach suits businesses looking for an expert partner rather than just any chatbot provider. They weigh value versus cost, and for them, a custom chatbot delivers exactly what they need, including better ROI. We develop AI chatbots for companies that don't want to pay extra $$ per month based on the number of dialogues. We work with businesses that see their project as complex or strategically important. Such clients often equate custom work with higher quality and better alignment with their business. If you assume that opting for a custom-built AI chatbot will require a larger budget, perhaps for extensive development hours, AI training, and ongoing adjustments, just inquire about our custom services. We also offer affordable or fixed-cost solutions using existing tools or templates for clients who know they want a chatbot but are not yet ready for a fully custom project. Belitsoft builds custom AI chatbots for enterprises and startups, delivering a full-cycle generative AI implementation services that include choosing the right AI model architecture (LLM, RAG), configuring infrastructure (on-premises vs. cloud server), and integrating OpenAI API for advanced conversational capabilities. Our team fines-tune models with proprietary data, integrates them with your systems and tests them. Contact us today and we will discuss your project requirements.
Dmitry Baraishuk • 6 min read
Custom LLM Case Study: Healthcare (Innovaccer, Unicorn)
Custom LLM Case Study: Innovaccer (Healthcare)
What is Innovaccer? Founded in 2014, Innovaccer is a population health data and analytics platform deployed across more than 1,600 hospitals and clinics in the US. It integrates healthcare data (clinical, claims, etc.) across electronic health records and other software systems. To date, San Francisco-based Innovaccer has raised a total of $675M from leading venture capital firms and strategic investors. Its revenue has increased by 50% every year for the past five years, and the company is on track to hit $250 million in annual recurring revenue in 2025. Generative AI "is, and will continue to be", a key tool in Innovaccer’s AI toolkit, according to their team. Since 2023, they have been developing a suite of models trained in the healthcare context (semantics, security, privacy and regulatory requirements) to create chatbot assistants for executives, clinicians, care managers, and care marketers. Why does Innovaccer Use Custom LLM models? The reason Innovaccer relies on its own LLM, trained on proprietary data and delivering results through a custom chatbot, is to: enhance the accuracy of AI model responses to questions reduce issues common with generative AI, such as hallucinations ensure that their AI models are secure and compliant “The proprietary models at Innovaccer are trained to understand thousands of healthcare entities, concepts, formulas, and disease conditions. Our AI models are 15% more accurate than commercially trained models, including ChatGPT-4, when answering frequently asked questions about healthcare data. Healthcare is far more complex than any other industry, encompassing a vast array of knowledge, from biomedicine to medtech to primary care to specialties to reimbursement models and beyond. It is crucial to train models that understand both the science and business of healthcare.” — Innovaccer Is Innovaccer's Focus on Custom LLMs Relevant to Market Trends? Everybody in the software market today knows that there has been a slowdown in venture capital investment in recent times. However, GenAI investments seem to be immune to that. Well-funded GenAI startups continue to emerge, mature, and rise across many sectors. Moreover, according to Gartners’ GenAI prediction No. 1: “Demand will increase for domain-specific GenAI models” (typically custom LLMs trained on proprietary or industry-specific data). “Combined with the increased availability of high-performing and commercially usable open-source large language models, there is an appetite for domain-specific models. By 2027, more than 50% of the GenAI models that enterprises use will be specific to either an industry or business function—up from approximately 1% in 2023. Before you build your own, look for off-the-shelf, domain-specific models you can train or tune to accommodate your enterprise needs.”— Gartner As Gartner notes, “general-purpose models perform well across a broad set of applications.” However, “domain models can be smaller, less computationally intensive, and lower the hallucination risks”. Custom LLM Use Cases in Healthcare by Innovaccer Custom LLM use cases in healthcare by Innovaccer include using LLM-based extensions to traditional applications already used in health systems, such as healthcare business intelligence, healthcare document management, clinical decision support, and customer support tools. Such extensions provide greater benefits than traditional systems alone and cannot be built on generic systems like ChatGPT due to the requirement for deep domain expertise, security considerations, and long-term business vision (as they sell enterprise-wide products rather than just testing hypotheses). Custom LLM for Healthcare Business Intelligence (BI) One of Innovaccer’s chatbot assistants represents a case of how they trained LLM to create a generative AI-powered decision support tool that combines retrieval-based AI, BI, and predictive analytics to provide executives with interactive answers to complex questions without asking their data teams or using database query language. Executives struggle to obtain timely, relevant insights.  However, even with BI software, they depend on data analysts equipped with specialized tools. This back-and-forth process between them and data analysts leads to delays. They wait for final results, which are no real-time insights. With this specially trained chatbot assistant, health system leaders can ask questions in plain English, then drill down, and refine their questions to pinpoint what exactly they need in minutes. Innovaccer's Business Intelligence (BI) Chatbot Innovaccer's Business Intelligence (BI) Chatbot For example, they can ask “why are readmissions high?” and the LLM will: suggest the root causes based on performance data list patients who require support to prevent readmissions and even refine triage strategies for interventions The visibility of information in reports is restricted based on user access rules and the company's security and compliance standards, so the data is shared only with the right people. In the previous article, which was dedicated to reviewing the capabilities of tools adopted in the industry for healthcare business intelligence, we already showed how such tools work. Check the example video below for reference. Custom LLM for Healthcare Documents Analysis, Extraction, Summarization, and Classification Another example of what the final result of a custom LLM implementation looks like is the integration of a generative AI assistant into the interface used in the workflow of the care management team. Care coordinators listen to patients, and develop care plans. However, documentation alone consumes around 25 hours per week per care coordinator. After converting speech into text using a healthcare automatic speech recognition model (which listens to conversations and generates transcriptions), a custom healthcare LLM comes into play.  It creates smart summaries of calls, generates care plans, and fills out structured forms (with symptoms, medications, and diagnoses), allowing coordinators to move on to the next patient faster. These outputs require some edits, but they are not generated from scratch each time. Innovaccer estimates that this custom AI solution saves care coordinators 10+ hours per week, improves efficiency by 50%, and enables engagement with 35% more patients. Innovaccer's Custom Generative AI is integrated into the Practice Management Software/EHR interface A custom LLM is able to generate a detailed summary from a full transcript of a clinician-patient dialogue (previously transcribed with the help of a Speech-to-Text model) and then provide it for review by clinicians in the same interface they are using, so they can incorporate it into patient chart notes without changing screens. Innovaccer's Custom Generative AI is integrated into the Practice Management Software/EHR interface Innovaccer's Custom Generative AI is integrated into the Practice Management Software/EHR interface Custom LLM for Clinical Decision Support One example of how training a custom LLM  improves the workflows of healthcare professionals is a point-of-care assistance.  The LLM gets the ability to identify potential care gaps even during a patient encounter.  It analyzes the patient’s current medical record in real time, understands evidence-based guidelines and integrates them with an analysis of the patient’s longitudinal record.  The resulting recommendations ultimately improve population health management and value-based care delivery. Custom LLM for Clinical Decision Support Custom LLM for Clinical Decision Support LLM for Customer Support (Call Center LLM) Using LLMs to automate the work of contact center agents is hardly surprising. But do you really need to train your own model for that? It all depends on the task. LLM for Customer Support (Call Center LLM) Innovaccer trained its model on an organization’s knowledge base so it could: Analyze agent-patient conversations Automatically retrieve data from multiple scheduling, ticketing, and documentation systems Quickly consolidate and deliver relevant caller information in real time to help customer care representatives (i.e., call center agents) provide personalized service Summarize key points from transcribed conversations after each call As a result, key contact center metrics are improving—agents complete calls faster, reduce handling times, close tickets more quickly, and move on to the next caller without delays. Innovaccer reports saving 10 hours per week per call center agent on documentation, increasing call volume by 25%. How Belitsoft Can Help Belitsoft is a healthcare software development company. We offer a full-cycle generative AI implementation services that include choosing the right AI model architecture (LLM vs. RAG, etc.), configuring infrastructure (on-premises vs. cloud server), fine-tuning the model with domain-specific data, integrating with the organization’s software systems, testing, and deploying AI systems. The client receives a fully customized AI assistant, trained on their proprietary data, optimized for their workflows, and integrated into their ecosystem.
Dmitry Baraishuk • 5 min read
LLM Training
LLM Training
Types of LLM Training LLM Fine-Tuning Fine-tuning is the training of a general-purpose LLM that has already been pre-trained. Such an LLM has generic knowledge but does not perform well on domain-specific tasks without further training. Fine-tuning feeds an LLM with domain-specific labeled examples to allow it to complete domain-specific tasks without errors and hallucinations. It's a customization of a general-purpose LLM with respect to the expertise area. Such an LLM can better recognize specific nuances, like legal jargon, medical terms or individual preferences of users. Fine-tuning is often split into training, validation, and test sets. The labeled dataset should be relevant to the specific task the LLM must learn to perform, be of high quality (without inconsistent data and duplicates), have sufficient quantity, and be in the form of inputs and outputs. A pre-trained LLM usually consists of different layers (each one processes the input in its own way). For example, ChatGPT-4 has 120 layers. When fine-tuning such an LLM, layers representing general knowledge are kept unchanged, while only the top and later layers are modified according to the task-specific data. The goal is to make the model’s predictions as close as possible to the desired output (the validation dataset is used to measure this). Both automated metrics (BLEU and ROUGE scores) and human evaluations are used to get a 360° view of a model's performance. What do machine learning engineers do in the process of the LLM fine-tuning? They focus on finding: the optimal learning rate (the speed at which ML algorithms adjust some of their parameters automatically) to avoid too high or too low rates. the right batch size (the number of training examples of the algorithm processes before trying to improve the model). This mitigates overgeneralization (when it ignores exceptions or variations), and overfitting (when it memorizes without understanding the underlying principles). the right number of epochs (training iterations) to ensure the model does not train for so long that it begins to overfit. They also use regularization techniques to discourage the model from giving too much importance to one or a few features (characteristics) of the data, and to encourage the consideration of all features more evenly to improve performance on new, unseen data. Retrieval-Augmented Training of Large Language Models Fine-tuned LLMs may become outdated if there is a lot of dynamic data in the domain, as they are tied to the facts in their initial training datasets. They cannot acquire new information and thus may respond inaccurately. Fine-tuning requires a lot of labeled data, and in general, the overall cost of fine-tuning may be relatively higher compared to Retrieval-Augmented Generation (RAG). RAG is an approach that improves existing language models by integrating retrieval capabilities directly into the generation process. From a user perspective, RAG may be referred to as a search engine with a built-in content writer. RAG architecture is used to increase the performance of an LLM by merging the standard generative capabilities with retrieval mechanisms. It works as follows: searching vast external knowledge bases, finding relevant information for a given prompt, and generating new text based on this information. Illustration of RAG process Machine learning engineers often implement RAG by relying on a vector database. The knowledge base is converted into vectors to be stored in this database. When a user submits a query to the LMM, it’s also converted into a vector. The retrieval component searches the vector database for similar vectors. The most similar information is combined with the user query. This forms the augmented query that is ready to be fed into an LLM to let it generate an up-to-date response. RAG prevents the problem of 'best-guess' assumptions and generates factually correct and unbiased responses because it adapts to situations where the information has changed over time. Since it generates information from the retrieved data, it becomes nearly impossible for it to produce fabricated responses. The source of an LLM’s answer can easily be identified based on the references, which are essential for quality assurance. Chatbots with RAG capabilities efficiently retrieve relevant information from an organization's instruction manuals and technical documents (for customer support), up-to-date medical documents and clinical guidelines (for medics), from an institution's study materials according to specific curriculum requirements (for educational institutions), and from a repository of former depositions and legal decisions (for legal professionals). RAG also improves language translation in specialized fields. LLM Training Stages Data Sources Preparation The goal here is to find and prepare data that is sufficient in volume, relevant and focused directly on the target use cases, and relatively high in quality (ready to clean). Example of a labeled dataset with descriptive features and a target feature Data Cleaning At this stage, machine learning engineers remove corrupted data from training data sets, reduce duplicate copies of the same data to a single one, and complete (when feasible) incomplete data by adding missing information. OpenRefine and a variety of proprietary data quality and cleaning tools are available for this purpose.   Data Formatting Models recognize patterns and input-output relationships better if the training data is structured based on specified guidelines. Examples of inputs are customer questions, and outputs are the support team's responses. Machine learning engineers may reformat the source data using JSON. They use custom scripts to expedite the process and manually tweak and clean up where it’s necessary. Adjusting Parameters Transformer-based deep learning frameworks are used to train models, and parameters are customized for these models. Tweaking parameters of how the LLM interprets data is a way to guide it toward behaving in desired ways. The AI team knows exactly which parameters to customize and which ones not to, using methods like LoRA, as well as the best way to customize them. They adjust model weights to indicate the relationship strength between data within a training set. LLM Training Process Machine learning engineers run code that learns from the custom data using previously set parameters. The process may be finished after either hours or weeks, depending on the size of the data.  They train the LLM in a three-stage process that includes self-supervised learning, supervised learning, and reinforcement learning. First, the model reads a lot of texts in your domain on its own. It learns how language works and starts to guess what words/sentences might come next. Then, the model is given examples by our data scientists to learn from. After this, it can follow instructions and do well on new tasks it hasn't seen before. Finally, the model's answers are graded by our staff to teach the model which answers are preferred. Then the trained model is tested. The goal is to have an LLM that is accurate across your domain, consistent, uses natural language, performs well in real-world tasks like problem-solving, and answers factual questions without hallucinations. In the end, the LLM is being integrated with the appropriate real-life application. How Belitsoft Can Help Data Collection and Preprocessing  We help aggregate a diverse dataset of anonymized data from various sources and manage the process to comply with privacy regulations. Annotation and Labeling Experienced subject matter experts should annotate the dataset to ensure that the model will learn from expert interpretations. Model Architecture We employ modern deep learning architectures, designed for specific tasks (image analysis, etc.) and tailored to the characteristics of the dataset (X-ray images among others) to process and interpret data as efficiently and accurately as possible. Training Process and Performance Metrics The model is then being trained using supervised learning techniques, with hyperparameter tuning, to achieve the best levels of performance metrics (accuracy, sensitivity, specificity, area under the ROC curve, etc.). Bias Mitigation We verify whether the training dataset is diverse (e.g., it represents a range of groups, characteristics, and conditions) and adjust training based on results to minimize biases.  Risk Assessment We perform risk analysis to identify failure modes and implement appropriate safeguards. Testing  and Validation Studies First, the model undergoes validation using separate datasets that were not seen during the training. Then, primary users test the tool in real-world scenarios, but within controlled settings, to provide feedback. Integration with Existing Systems, User Training and Support Finally, we integrate the AI tool into business workflows, bringing it live. Using APIs, including OpenAI API integration, we connect AI models with your existing software ecosystem. Our expertise covers integrating AI-powered systems with CRMs, ERPs, data lakes, and proprietary platforms to optimize performance. Training materials and support services are also provided to help users effectively utilize the tool. Periodic Re-training, Change Management and Version Control To retrain the model, we gather new data (either from user feedback or the latest findings). Any modifications to the model follow a structured process that includes re-validation and documentation. We also maintain strict version control, recording changes between versions to trace updates. Regulatory compliance and Data Security All model updates are evaluated for regulatory impact to maintain compliance, for example, adherence to Good Machine Learning Practices guidelines. Naturally, we implement security protocols to safeguard sensitive data during transmission and storage. Belitsoft provides full-cycle LLM training services to power generative AI applications, from custom AI chatbots to domain-specific assistants. We fine-tune large language models with proprietary datasets, apply prompt engineering techniques for optimal responses, and optimize training pipelines for efficiency. Contact us today to discuss your needs.
Dmitry Baraishuk • 6 min read
AWS Document Management System
AWS Document Management System
Cloud Document Management Benefits Everybody knows the benefits that a digital document management system offers compared to traditional paper storage in file cabinets. Going digital lowers office costs. Suddenly, your documents want less office space, furniture, and paper usage. And your workers start to spend less time accessing documents. Physical handling from employee to employee no longer makes sense for this reason. Since it’s widely known, let's stay focused on the advantages of moving such important assets as documents to the cloud. Security If the U.S. Department of Defense selected the cloud, specifically Amazon, there was a reason. The main one is to offload security from the internal IT teams in the face of the growing number and complexity of cyber threats. By default, up-to-date security means automatic control that assigns and revokes appropriate document access, encryption to protect stored and shared data, anti-disaster digital storage, backup, recovery and restore, and, of course, software updates. It's worth focusing on how cloud content is backed up. It does this after each edit and in multiple data centers. Business continuity is supported, and data and services aren’t down regardless of any reason. Remote access At first sight, remote access does not appear to be something completely new. Maybe your workers already connect to a corporate network using a VPN. The key here is the VPN. Your IT team has to set this up for each new team member and help with management and solving issues (there may be a lot of them). The major drawback of VPN is dramatically slowing down and lengthening an employee's journey. Traffic is slowed by taking longer routes. Transferring large files comes with significant long loading times. Encrypting/decrypting further slows down the connection. Direct access to cloud resources eliminates the extra twists that VPNs introduce. Cloud-based document management systems are accessible on any device, regardless of the user's location. They just need an internet connection to log in through a web browser. Budget Optimization No capital expenditures. Monthly subscription costs for cloud services are operational expenses. IT staff is not necessary to manage servers, disk space, or to buy new computers. The number of servers, processor speed, and the amount of storage automatically increase with the growth of files and traffic. For heavily regulated finance or healthcare industries, compliant cloud solutions are always at hand. Extensive tech resources at your side are no longer required. Features of a Document Management System Some features are already set up in the described system. Others can be customized on request, which is fairly easy to do since the system is deployed on AWS. AWS provides many options for integrating ready-made solutions, which makes customization simpler. Convert Scanned Documents to Texts and Process them With AI Optical Character Recognition technology (OCR), implemented in modern document management systems, automatically, quickly, and accurately extracts large amounts of text from scanned documents, images, or PDFs. After such preprocessing, users can quickly find and edit information from scanned documents. OCR is not a new thing in document management. It has established itself as a tool to process banking documents (checks, loan applications, and account statements), automate extraction of information from invoices (vendor details, amounts, and dates), and retail documents (labels, receipts), digitize healthcare documents (patient records, prescriptions), and assist in the analysis of legal documents (contracts and case files). But AI makes a difference. It not just improves text recognition accuracy due to machine learning but is also able to deal with it like a human acts in a business process (know what to do with it) thanks to natural language processing technology (NLP). It may read content and compare it with other documents to check accuracy (data validation) and forward it to the head of the department for further actions (decision-making). For example, it’s reported that AI-based invoice processing times can be reduced by 90 percent, equating to a 400 percent increase in employee productivity, making turnaround time for invoices from days to minutes. Each industry has specific forms to process, and this can also be automated with AI-enabled recognition. We talk about insurance claims forms, logistics driver logs or delivery receipts, banking credit card applications or loan and mortgage forms. When needed to verify addresses at scale, AI can analyze driving licenses, passports, bank statements, and utility bills in bulk. The modern low-code/no-code document management solution can be easily integrated with AI-based OCR software with pre-trained, ready-to-go models or custom extraction models based on specific business requirements. Document Classification ML, computer vision, and NLP are technologies to categorize documents. They automatically add predefined categories, or tags to documents. Computer vision is the fastest method. It can understand the type of document without reading its text just by seeing the visual structure during the scanning phase. As to text-based classification, documents can be segmented based on the complete document, specific paragraphs, particular sentences, or even phrases. In general, business case influences how to segregate documents. Document classes might be user-defined, and document sorting is possible by type, level of confidentiality, vendor, or project containing a set of documents, and more. AI can understand various types of documents in each industry: legal documents, notarial deeds, and contracts for law firms; medical records, patient files, and clinical research documents for healthcare organizations; financial statements, loan applications, or insurance claims for banks, insurance companies, debt collection agencies, and other financial institutions. AI-based document classification can work with structured documents (tax return forms and mortgage applications), semi-structured documents (invoices), and unstructured documents (contracts). Machine learning models are able to understand whether the uploaded document is complete and flag missing or incomplete inputs and pages, and mark any documents with errors. It identifies fraudulent documents through anomalies, helping to reduce document fraud. After the classification process is finished, documents can be automatically routed to the appropriate department and respective team members. Document Summarization AI creates text summaries of lengthy documents, scanned files, and images for those who are short on time. It can even transcribe video and audio files so one can search within that content. It’s not about shrinking text but rather extracting the main points. For example, it can highlight key points like pricing and terms from a lengthy contract. AI rephrases complex sentences and technical jargon using plain language. It can explain specific clauses so simply that a reader can understand the legal implications much faster and more precisely. Interactive Q&A To Replace Traditional Search Traditional document repositories are text-searchable by keywords. But this may overwhelm. You have to comb through too many search results to find something valuable. Disappointment is usual. Can't you often find relevant records even after spending a lot of time? AI-powered search makes a difference. No more manual searching through each document to quickly find specific information within some of them. Now, searching is like an interview. The chat understands the context, asks you questions, and narrows down the results. It works best with complex requests like "What is the total amount spent with Company X last month?" or "List documents from 2023 that mention topic X." Responses are accurate because they are based on your document's content. Integration Integrated document management systems allow employees to view, edit, and save documents directly in their daily-use software. Users from different industries like retail, banking, finance, insurance, or manufacturing can work with their documents without switching applications.  For example, when a bank employee is processing a client's account in a core banking system, they can also access account opening forms, loan, or mortgage agreements stored externally. They can edit such documents right there, and their line-of-business app will automatically save the modified document back to the document management tool. A document management system, integrated with other vital systems in your organization, helps your employees avoid unnecessary challenges. Document status visibility A document management system provides an interface that helps to understand the status of each document in the workflow, like whether it’s in draft, in progress, awaiting review, under review, feedback provided, revised, awaiting approval, approved, rejected, finalized, published, archived, completed, or expired. Document versioning Since documents can be shared with several teams, departments, or even external stakeholders, the system includes a tracking feature. The log may save all changes made to a document, including who made them and when.  Nested Folders  The nesting (organization of items within a hierarchy) helps us establish relations between things. Using nesting, end-users can define the order they like for grouping related documents.  Document Security The owner can analyze who accessed each document, what changes were made, who uploaded or downloaded it, and what comments were left. They can also restore a document to its earlier versions if something goes wrong. Admins may apply access rights (view, edit, or comment) to different types of folders, subfolders, or individual documents, which is a best practice for security. An enterprise-level document management system provides powerful tools for sharing and collaborating on files within your company. Multiple team members can work on the same document simultaneously, making changes and leaving comments for each other. A good document management system lets you create rules for how your employees can share documents. Automated alerts and actions on certain conditions This feature allows a document management system to keep an eye on your documents and notifies you when something needs your attention or when a specific action needs to be taken.  For example, if a contract is about to expire, the system can send you a reminder email so you don't forget to renew it. If a document containing sensitive data is accessed by an unauthorized user, the system can immediately alert the security team.  The system can also be set up to automatically perform certain actions based on predefined conditions or schedules. For instance, it can automatically archive old documents or delete after a certain retention period as mandated by legal regulations or company policies like freeing up storage space.  Metadata extraction Metadata is extra information about a document that helps describe what's inside the document without you having to open it and read through the whole thing.  It may be the date when the document was added and the user identity of who uploaded or edited the document.  For example, claim documents with digital photographs may contain the date the photograph was taken and even geolocation. This feature makes it simple for the user to access what they seek and allows documents to be found easily.  Metadata is automatically extracted and stored for each document. The system may also offer the user to manually add metadata. 
Dmitry Baraishuk • 7 min read
Predictive Data Analytics
Predictive Data Analytics
What is Predictive Data Analytics Predictive data analytics involves the creation and application of models for making predictions based on patterns identified in historical data. These models are often trained using machine learning techniques. In daily life, we often guess what will happen next. But in data analytics, 'predicting' can mean different things. Sometimes it's about guessing future prices. Other times, it's about figuring out what category something belongs to, like what kind of document we have. Predictive Data Analytics for Price Prediction Hotel chains, airlines, and online retailers must continually modify their pricing strategies to optimize revenue. This adjustment is influenced by various elements, including seasonal variations, changes in consumer demand, and the presence of special events. Businesses can use predictive analytics models, which are trained using historical sales data, to forecast the most effective prices. These predicted prices can then guide their pricing strategy decisions. Predictive Data Analytics for Propensity Modeling Propensity modeling involves predicting the probability of individual customers engaging in specific behaviors. These behaviors can include purchasing various products, reacting to certain marketing initiatives, or switching from one mobile phone operator to another. Predictive Data Analytics for Dosage Prediction Doctors and scientists often determine the appropriate amount of medication or chemicals to use in treatments. Predictive analytics models can assist in predicting the optimal dosages by analyzing historical data on past dosages and their corresponding outcomes. Predictive Data Analytics for Diagnosis Doctors, engineers, and scientists typically rely on their extensive training, expertise, and experience to make diagnoses. Predictive analytics models, however, utilize vast datasets of historical examples, encompassing a scale far greater than what an individual might encounter throughout their career. The insights derived from predictive analytics can aid these professionals in making more accurate and informed diagnoses. Predictive Data Analytics for Risk Assessment Risk plays a crucial role in decision-making processes like loan issuance or insurance policy underwriting. Predictive analytics models, once trained on historical data, can identify key risk indicators. The insights gained from these models can be employed to make more informed and accurate risk assessments. Predictive Data Analytics for Document Classification Predictive data analytics has the capability to automatically categorize various types of documents, including images, sounds, and videos, into distinct categories. This functionality is useful in a range of applications such as assisting in medical decision-making processes, directing customer complaints to the appropriate channels, or filtering email spam. Predictive Data Analytics Project Lifecycle The likelihood of success in a predictive data analytics project is heavily reliant on the process employed to manage the project. Therefore, it is advisable to focus on and utilize a well-defined project management process for these initiatives. In predictive data analytics projects, the majority of the work, approximately 80%, is concentrated in the phases of Business Understanding, Data Understanding, and Data Preparation. Conversely, only about 20% of the effort is dedicated to the Modeling, Evaluation, and Deployment phases. In predictive data analytics projects, some phases are more closely interconnected. For instance, the Business Understanding and Data Understanding phases are tightly coupled, often leading to projects oscillating between these two stages. Likewise, the Data Preparation and Modeling phases are closely connected, with projects frequently alternating between these phases. Business Understanding Predictive data analytics projects typically begin with objectives such as acquiring new customers, increasing product sales, or enhancing process efficiencies. The process of developing a predictive model should start with a deep understanding of the business problem, ensuring that the model not only predicts accurately but also provides actionable and relevant insights for the business. In the initial phase, the primary responsibility of the data analyst is to comprehensively understand the business or organizational problem at hand. Following this understanding, the next step involves designing a data analytics solution to tackle this problem. Data Understanding At this stage, the data analyst gains a thorough understanding of the available data sources within the organization and the types of data these sources contain. For building predictive data analytics models, it's crucial to have specific types of data, which need to be organized in a particular structure known as an Analytics Base Table (ABT). This structured approach is essential for effective model development. Data Preparation This phase encompasses all the necessary activities to transform the various data sources within an organization into a well-structured ABT. The ABT is the key element from which machine learning models can be effectively trained, ensuring that the data is in a format suitable for this purpose. Modeling During the Modeling phase, the focus shifts to the actual machine learning work. This involves employing different machine learning algorithms to construct a variety of predictive models. From this range, the most effective model is identified and selected for deployment. This phase is crucial for determining the most suitable model based on performance and applicability to the specific problem. Evaluation (Testing) Prior to deploying models within an organization, it is vital to thoroughly evaluate them to ensure they are suitable for the intended purpose. The evaluation phase encompasses all tasks necessary to demonstrate that a prediction model is capable of making accurate predictions once deployed. This includes verifying that the model does not suffer from overfitting or underfitting, which are critical factors for its effectiveness and reliability in practical applications. Deployment   The final phase of a machine learning project involves all the work necessary to successfully integrate a machine learning model into an organization's existing processes. This phase is critical, as it ensures that the developed model effectively serves its intended purpose. It covers aspects such as deploying the model into a production environment, integrating it with existing systems, and ensuring it operates seamlessly within the organization's processes. Predictive Data Analytics Tools The initial decision in selecting a machine learning platform involves choosing between an application-based solution and a programming language-based approach. Application-based Solutions for Building Predictive Data Analytics Models Application-based, or point-and-click, tools are well-designed to facilitate the rapid and straightforward development, evaluation of models, and execution of associated data manipulation tasks. Utilizing such tools, one can train, evaluate, and deploy a predictive data analytics model in a remarkably short time, potentially in less than an hour. Enterprise-wide solutions Key application-based solutions for constructing predictive data analytics models include platforms like IBM SPSS, Knime Analytics Platform, RapidMiner Studio, SAS Enterprise Miner, and Weka. These tools offer user-friendly interfaces and a range of functionalities that streamline the model development process, making them especially valuable for users who may not have extensive programming expertise. The tools offered by IBM and SAS are designed as enterprise-wide solutions, seamlessly integrating with other products and services provided by these companies. This integration facilitates a cohesive and comprehensive approach to data analytics within larger organizations. Open-source solutions In contrast, Knime, RapidMiner, and Weka stand out for being open-source and freely available. These tools provide a significant advantage for individuals or organizations looking to explore predictive data analytics without an initial financial commitment. The open-source nature of these platforms also encourages a community-driven approach to development and problem-solving, offering a wealth of resources and support for users at all levels of expertise. Programming languages for Building Predictive Data Analytics Models R and Python are indeed two of the most widely used programming languages in the field of predictive data analytics. Building predictive data analytics models using languages like R or Python is not overly challenging, particularly for those who have some background in programming or data science. Advantages One of the significant advantages of using a programming language is the immense flexibility it provides to data analysts. Virtually anything that the analyst can conceptualize can be implemented.  In contrast, application-based solutions have limitations in terms of flexibility. Analysts using these tools can typically only achieve what the developers had in mind when designing the tool.  Additionally, the most recent advanced analytics techniques are accessible in programming languages well before they are incorporated into application-based solutions Disadvantages Certainly, using programming languages does come with its drawbacks. The primary disadvantage is that programming is a skill that requires a significant investment of time and effort to learn. Utilizing a programming language for advanced analytics presents a notably steeper learning curve compared to using an application-based solution. The second drawback is that when using a programming language, there is generally limited infrastructural support, including data management, which is readily provided by application-based solutions. This places an additional responsibility on developers to implement these essential components themselves. Supervised machine learning To build the models for predictive data analytics applications, supervised machine learning is often used. It starts with a collection of data that has already been labeled with the correct answer. The dataset is referred to as a labeled dataset if it includes values for the target feature.  Other types of machine learning include unsupervised learning, semi-supervised learning, and reinforcement learning.  Historical Dataset to Train a Model A machine learning algorithm analyzes the training dataset and develops a model by finding patterns between the descriptive features and the target feature based on a set of historical examples (training dataset), or historical instances.  The two steps in supervised machine learning: Learning and Predicting The model's goal is to understand the relationships in such a way that it can predict the target feature for new, unseen instances. Descriptive features and a Target feature In supervised learning, the target feature is known from the training (historical) dataset. It's used to train a machine learning model to predict the probability that a mortgage applicant will fail to repay the loan as agreed (credit default risk). In this dataset, the descriptive features are occupation, age and loan-salary ratio of the loan amount to the applicant's salary.  The "Outcome" field (a target feature) indicates whether the mortgage applicant has failed to make the payments on their loan according to the agreed terms, an event which is recorded as "default". Model consistency with the dataset A model that is consistent with the dataset is one that accurately reflects the relationships between the features and the target outcome in the historical data. Consistency means that for every instance where the model makes a prediction, the prediction matches the actual outcome that is recorded in the historical dataset. For instance, if the model predicts that a person with a certain age, occupation, and loan-salary ratio will default on a loan, and the dataset shows that the person did indeed default, then the model's prediction for that instance is consistent with the dataset. A consistent model not only fits the training data but also generalizes well to unseen data. Such model's predictions are stable across the dataset even if there are small variations in the input data. Machine learning is not for simple datasets For simple datasets with 3 descriptive features and dozens of instances, we can manually create a prediction model. A decision rule model used to predict loan repayment outcomes in this case: If the ratio of the loan amount to the borrower's salary is greater than 3.1, then the prediction is that the borrower will default on the loan. If the loan-to-salary ratio is not greater than 3.1, then the prediction is that the borrower will repay the loan. However, to manually learn the model by examining the large datasets containing thousands or even millions of instances, each with multiple features is almost impossible. The simple prediction model using only the loan-salary ratio feature is no longer consistent with more complex datasets.  A training historical credit scoring dataset with 25 historical instances, 7 descriptive features and 1 target feature (outcome). FYI: Ftb are first-time buyers, stb are second-time buyers.  A decision rule model used to predict loan repayment outcomes in this case: If the borrower's loan amount is less than 1.4 times their salary, then predict that the borrower will repay the loan. If the loan amount is more than 4.1 times the borrower's salary, then predict that the borrower will default on the loan. If none of the above conditions are met, but the borrower is younger than 39 years old and works in the industrial sector, then predict that the borrower will default on the loan. If none of the above conditions are met, predict that the borrower will repay the loan. When we want to build consistent prediction models from large datasets with multiple features, machine learning is the solution. It’s able to detect relations which are not immediately obvious and could be missed in a manual examination of the data. Machine learning algorithms notice "unnoticed" patterns Detecting such relations manually is very difficult, especially when there are many features. As you add more features, the number of possible combinations increases exponentially, making it virtually impossible to manually explore all potential rules. A simple observation might suggest that a high 'Loan-Salary Ratio' leads to a higher likelihood of default. However, there might be an interaction between 'Loan-Salary Ratio' and 'Occupation'. For instance, it could be that professionals with a high loan-salary ratio default less often than industrial workers with a high loan-salary ratio because they have more stable incomes or better prospects for salary increases. Patterns that are subtle may only emerge when looking at the data in aggregate, often through statistical methods. A statistical analysis may reveal, for example, that defaults are more common among industrial workers who are younger than 40 and have a loan-salary ratio greater than 3. This pattern might not be obvious when looking at individual records because it's the combination of three separate features. There could also be a threshold effect where defaults spike once the loan-salary ratio exceeds a certain value, but below that threshold, the ratio has little impact on the likelihood of default. Without statistical analysis, such threshold effects could go unnoticed.
Dmitry Baraishuk • 8 min read
What is Artificial Intelligence? AI vs Traditional Software
What is Artificial Intelligence? AI vs Traditional Software
Is AI considered Software? "AI" (artificial intelligence) is the knowledge and techniques that enable machines to mimic humans' cognitive functions (learning, problem-solving, perception, and decision-making). But when you implement AI concepts into a program or application, that becomes AI software. For example, an application like a chatbot that uses NLP is AI software you can run, and interact with. AI is like the theory, and AI software is the practical application of that theory. Just like how mathematics isn't software, but a calculator app (which uses mathematical algorithms) is software. Even when we talk about AI models, those models are part of the software, but not standalone software by itself. A trained AI model is more like a file containing structured data, weights and other data in a format like .pt (PyTorch), which a program (AI software) loads and executes. The training process (which produces the model) is done before it becomes part of AI software. Once trained, a model is just a structured dataset that doesn't "run" by itself without AI software that loads and executes it. AI can exist as a service (AIaaS), which is still software delivered via the cloud. But again, that's the software implementation of AI. Comparison between Traditional Programming and Machine Learning (ML) Machine Learning (ML) contrasts with traditional programming. While traditional programming relies on programmers to define explicit, scenario-specific logic and instructions, ML enables machines to learn autonomously and make decisions without detailed instructions for each task. Comparison Using The Example of Marketing Automation in E-commerce Traditional Programming Standard programming techniques involve creating precise instructions. For example, for a common application, SQL queries are used to target specific demographic groups based on predefined criteria, such as age, purchase history, and gender. In this case, a marketer specifies the target audience, and the programmer manually crafts the necessary query. A diagram showing the input, logic, and output of a traditional software program ML-Based Approach ML changes the traditional approach by employing past data to train models and identify patterns. Once trained, these models can then predict outcomes on new, unseen data. For instance, ML models autonomously determine target customers for marketing campaigns, based on insights from data patterns. This simplifies the marketing process and reduces the need for manual targeting and segmentation programming. A diagram illustrating a high probability of customers purchasing sports equipment after buying sports wear Advanced Segmentation with ML Instead of manually segmenting audiences based on predetermined criteria, an ML model can analyze complex data patterns to identify new target groups. This process reveals insights that traditional analysis may overlook, potentially making the marketer's strategy more effective and less manual effort. Necessity of Technical Expertise in ML However, it's important to note that ML models still require initial programming, setup, and ongoing maintenance, often by data scientists or ML engineers. Collaboration between marketers and technical professionals is still necessary. Marketers provide input on campaign goals and parameters, while data professionals develop, train, and maintain the ML models. ML's Superiority Using machine learning in marketing has significant benefits, particularly in targeting strategies. ML algorithms can analyze large sets of data more efficiently and effectively than traditional methods. They can uncover complex patterns, trends, and customer behaviors that human analysts might miss. This leads to more precise and sophisticated audience segmentation. Long-Term Benefits of ML While ML models require initial resources, they can ultimately reduce the workload for marketers and programmers. Automated processes for data analysis and audience segmentation free up human resources for more strategic tasks. ML enables personalized marketing at a scale challenging to achieve manually. It tailors marketing messages and offers to specific segments based on customer preferences and behaviors, increasing marketing campaigns effectiveness. In the long run, using ML can be more cost-effective. Refined targeting minimizes waste in marketing spend, as campaigns more accurately engage the intended audience. ML also provides a competitive edge. Companies that leverage advanced analytics and predictive modeling can often outperform competitors in terms of customer engagement and conversion rates. Comparison Using The Example of Sales Predictions Limitations of Traditional Predictive Software While traditional software can make basic predictions based on historical data, its predictive capabilities are not as advanced as those of machine learning. For instance, traditional software might predict that if sales in January have been around $100,000 for the past few years, then the next January's sales will likely be similar. This form of prediction is a basic extrapolation, assuming that past patterns will repeat under similar conditions. Advanced Pattern Recognition in ML Machine learning, however, employs algorithms to identify complex patterns and relationships in data that are not immediately obvious or predictable through simple extrapolation. ML models adapt their predictions based on new data, continually refining their accuracy. An ML model predicting sales, for instance, might consider historical sales figures, changing customer preferences, market trends, economic conditions, and adjusts predictions with new data. Traditional rule-based systems lack this adaptability. Data Processing: Traditional vs. ML Approaches Both ML models and traditional software can access the same data, such as past sales figures, customer preferences, market trends, and economic conditions. The key difference lies in how they process and utilize this data. Traditional software operates based on explicit rules set by programmers. For example, a programmer might establish a rule like, "If past sales were X and the economic conditions are Y, predict sales to be Z." This approach is limited by the programmer's ability to anticipate and code for every potential scenario. In contrast, ML models learn directly from the data. They autonomously detect complex patterns and relationships within the data, with no need for explicit programming for each scenario. ML algorithms can discover subtle correlations and trends that may not be evident or predictable in advance. Adaptability of ML vs. Static Traditional Software A significant difference is that traditional software does not adapt or learn independently. If market dynamics change or new trends emerge, the software will continue to operate based on its original programming until a programmer updates the rules. Meanwhile, ML models continuously update their understanding and predictions as new data arrives. This allows them to adapt to changes in patterns and trends without human intervention. Probabilistic Nature of ML Models However, there is a nuance with machine learning models not found in traditional programming. Unlike the deterministic outputs of traditional software, ML models provide probabilistic estimates. This means they predict the likelihood of various outcomes rather than offering absolute certainties. As such, continuous evaluation and potential retraining of these models are essential to maintain their accuracy. Two types of ML: Supervised Versus Unsupervised Learning Models Supervised Learning Model (Labeled Data) Supervised Learning Applications In supervised learning, models learn from data that has predefined labels, and algorithms must find structure and patterns in the data with guidance on what outcomes to predict. Supervised learning is used in a wide range of applications, including but not limited to voice recognition (learning to understand and transcribe speech), and medical diagnosis (learning to identify diseases from symptoms and test results). A diagram showing a typical supervised learning prediction workflow Let's use teaching a supervised machine learning model to identify dog breeds as an example Preparing a Labeled Dataset The first step is to collect a large dataset of dog images, labeling each image with the correct breed. For example, Golden Retrievers are labeled as "Golden Retriever," Poodles as "Poodle," and so on for each breed you want the model to recognize. Feature Analysis and Training the Model The model examines the features in each image, such as color patterns, ear shape, size, fur texture, and other physical characteristics distinctive to each breed. During training, the model is fed these images and their corresponding breed labels. The model's task is to learn the patterns and characteristics indicative of each breed. For example, it may learn that Beagles often have a certain ear shape or that Huskies commonly have a specific fur texture. Improving ML Accuracy As the model goes through more images, it improves at recognizing and understanding the subtle differences between breeds. It adjusts its internal parameters to reduce prediction errors, like mistaking a Labrador for a Golden Retriever. Recognizing dog breeds is more complex than simply identifying if an image contains a dog. Breeds can have subtle differences, and there's significant variation within each breed. Hence, the model needs to learn to focus on breed-specific characteristics while ignoring individual variances. Testing the Trained Model on Unseen Data After training, the model is tested with a set of images it hasn't seen before. For instance, if it has never seen a picture of a dog breed like a Dalmatian during training, it may struggle to identify a Dalmatian. The more diverse and comprehensive the training data (different breeds, colors, sizes, backgrounds), the better the model becomes at correctly identifying dog breeds. During the training phase, the model iteratively adjusts its parameters to minimize the difference between its predictions and the actual labels in the training data. This process is typically quantified using a loss function, which the model aims to minimize. Some breeds may be harder to distinguish, requiring a larger or more diverse set of training images. Over time, adding more images and examples of difficult-to-distinguish breeds can improve the model's accuracy. The quality of a supervised learning model depends heavily on the training data. Poor quality data can lead to issues like overfitting, underfitting, or biased predictions. Unsupervised Learning Model (Unlabeled Data) Applications and Advantages of Unsupervised Learning Unsupervised learning is helpful in scenarios when we want to discover new patterns in data that were not previously considered. Common applications include market segmentation, anomaly detection, and organizing large datasets. Unlike supervised learning, where models learn from data that has predefined labels, unsupervised learning algorithms work with data that has no labels. The algorithms must find structure and patterns in the data without guidance on predicting outcomes. The focus is on understanding the structure and distribution of the data. Example of K-means Clustering K-means Clustering is a classic example of an unsupervised learning algorithm. It partitions the data into 'k' distinct clusters. The algorithm assigns each data point to the nearest cluster. The goal is to minimize variance within each cluster and maximize variance between different clusters. Feature-Based Clustering in Unsupervised Learning Imagine, you possess a large collection of dog photos without breed labels. Your task is to organize these photos meaningfully without knowing the breed of each dog. The unsupervised learning model analyzes photo features like dog size, fur length, ear shape, and color patterns without prior knowledge of dog breeds. Pattern Recognition and Grouping in Unsupervised Learning The model then tries to find patterns among these features. It might notice that some dogs have long fur and floppy ears, while others have short fur and pointy ears. Based on these observed patterns, the model starts grouping the photos. Dogs with short fur and pointy ears might be grouped together, while those with long fur and floppy ears might be placed in a different group. The model measures the similarity of features to create these clusters. Each cluster represents a set of dogs that look similar. In an ideal scenario, these clusters might end up representing actual dog breeds, like one cluster having mostly Labrador Retrievers and another mostly German Shepherds. However, the model doesn't know the specific breeds; it only recognizes similar groups. Dimensionality and Visualization in Data Clustering Unsupervised learning models represent data points in two or three dimensions for visualization and easier comprehension. For instance, in dog images, one axis might represent fur color, another the tail length, and so on. Data points close together are considered more similar and may belong to the same group or cluster. High-Dimensional Data and Dimensionality Reduction Techniques While data can have many dimensions (with each dimension representing a feature or characteristic of the data points), humans are best at visualizing in two or three dimensions. By plotting the data points based on these features, we can visually inspect the data. Data points that are similar will be closer together in this space, while different ones will be further apart. For example, images of a specific dog breed cluster together based on shared features like ear size and fur length. In practice, data often has over 3 dimensions (high-dimensional), making direct visualization impossible. Techniques are used to reduce the number of dimensions while preserving as much of the significant structure in the data as possible. As the observer, you can look at these clusters and recognize the breeds based on your knowledge. You can also adjust the criteria or the number of clusters (like determining the desired number of groups) to better match what you know about dog breeds. Looking for AI solutions customized to meet the unique requirements of your business? Get in touch with us to find out how we can support your project. You can't write the code for a car to navigate a roundabout, but AI can Driving a car with software is impossible. You can't write the code with an if-then-else statement, case statement, or switch statement for a car to navigate a roundabout. But software can't train a car to drive. That's what AI does. While conventional software is programmed to perform a task, AI is programmed to learn to perform the task. Code is the primary artifact of “traditional” software. In AI software, code is not the primary artifact. Yes, when we build AI software we have to write code. But the primary artifact of AI is the data (data collection, data labeling, data analytics using algorithms to spot patterns). Row Data Collection The software cannot drive a car. But software can collect data. Cars have near-field sensors, microphones, cameras, lidar, and radar. And then AI just starts learning. It learns how to make a right turn, how to make a left turn, how to go straight, how to recognize a stop sign, how to recognize a traffic light. Because of patterns, because it's got all this data. Once it sees a thousand stop signs, it can recognize a stop sign itself. That stop sign could be straight on. It can be cocked a bit, it can be bent, and it recognizes it because it's seen it enough times. “The reason that cars can drive now and they couldn't drive themselves 10 years ago or 20 years ago is because of the cloud. The cloud changed the game with AI” James A. Whittaker The cloud changed the game because all that data that AI needs to learn takes a bit of storage. You know, Google had a cloud, a proprietary cloud that did nothing but search. Amazon had a private proprietary cloud that did nothing but e-commerce. Facebook had a proprietary cloud that did nothing but store social data. This is where the modern AI was born - in these clouds. Alongside these technological giants, cloud migration companies have emerged to help various businesses and organizations leverage cloud capabilities. You know, all that camera data from all those cars going through all those roundabouts takes a bit of storage! And the algorithm has to have access to it all. Before the cloud, it didn't. Data labeling Data labeling is the process of identifying raw data (images, text files, videos, etc.) and adding meaningful labels to provide context so that a machine learning model can learn from it. For example, you prepare 10 000 pictures of cats and label them "these pictures have cats". Then you prepare a bunch of pictures that don't have cats so they have the label "not a cat". And over time, the AI figures it out itself. If you've ever taught a child to read letters, you show them the flashcards over and over. They guess it right, you're like "Hey, good job!" Guess it's wrong "No, you did wrong." We do the same thing with AI. We show it a bunch of examples, and when it gets it wrong, you change how data is labeled so it knows it got it wrong. It’s reinforcement learning. “Let me make a prediction… Whereas programmers, developers in modern times are the most central to a team developing software, my prediction is data scientists are going to take over as the most important part of an AI project. Not coding. Because you have to recognize good data from bad data. You have to be able to label it correctly. Those labels help the algorithms to understand what's going on” James A. Whittaker Machine Learning Algorithms You take your data, you label it, you organize it as well as a human can. And then you stick the algorithms on it. Algorithms are used to analyze data, to gain insight, and to subsequently make a prediction or create a determination with it. For example, look at the reinforced learning algorithm that provides recommendations for you on YouTube. After watching a video, the platform will show you similar titles that you believe you will like. But if you watch the recommendation and do not finish it, the machine understands that the recommendation would not be a good one and will try another approach next time. Machine learning is a set of algorithms that enable the software to update itself and "learn" from previous outcomes with no programmer intervention. In summary, a traditional algorithm takes some input and some logic as the code and drums up the output. As opposed to this, a Machine Learning Algorithm takes an input and an output and gives some logic, which can then work with new input to give one an output. I think instead of universities studying the nuances of programming languages, we're going to be studying the nuances of algorithms… The nuances of data structures, how control structures work, whether to use an if-statement or a switch-statement or a lookup table are not going to matter. The skill that is going to matter is your understanding of probability statistics”James A. Whittaker Finding Patterns A good data scientist can look at data and say, “That's probably the algorithm we should start with.” But that's the process: get the data and start running the algorithms through it and hope that those algorithms start finding patterns. AI use cases fall into one or more of seven common patterns. The 7 patterns are hyper-personalization, autonomous systems, predictive analytics and decision support, conversational/human interactions, patterns and anomalies, recognition systems, and goal-driven systems. For example, an algorithm can find such a pattern as “fraud/risk” demonstrating that things are out of the ordinary or expected. “And this is a key skill that distinguishes a good Data Scientist from a mediocre Data Scientist. It's picking the right algorithms, understanding those patterns, and then iterating, combining algorithms to generate patterns” James A. Whittaker Feedback loop Another fundamental difference between AI and conventional software is that software never changes. We build software, we release it to the field and it just does the same repeatedly. But once it gets out in the wild, it doesn't really change unless humans update it. AI changes. An artificially intelligent car will go through a roundabout and it might discover something new. It might discover a car driving the wrong way on the roundabout. And once it figures out what to do - that's new data. That's the thing about AI: it keeps learning even after it's released. “The conventional software didn't wake up one day and said, "You know what? Fuck that shit. "I'm tired of processing those inputs. I'm gonna do something else. James A. Whittaker“ That's not the way conventional software works. The AI software does work that way. AI software gets better itself. The feedback loop is a cycle without an end: AI observes user actions and system events and captures data for analysis. AI analyzes this data against historical trends and data from other sources, if necessary. AI predicts outcomes and recommends specific actions. The loop starts over again. The system continues to refine its recommendations based on the latest feedback (whether the user accepted the recommendation and what happened after). Rule-based chatbots vs AI chatbots One illustrative example of the difference between traditional software and AI-driven software is the contrast between rule-based chatbots and AI chatbots—a distinction we're well-versed in, thanks to our extensive experience in custom chatbot development, both rule-based and AI-driven. Rule-based chatbots work with simple instructions. They follow a script like, "If the user says 'A,' then reply with 'B.'" You'll find them handy for frequently asked questions or basic customer service tasks. Think of them as those voice-operated phone menus that guide you through a list of options. However, these bots have a big limitation: they don't learn or adapt. If a user asks something outside the script, the bot won't have an answer. This can make user interactions feel robotic and often frustrating, requiring ongoing tweaks from developers. On the flip side, AI chatbots are a lot smarter. They use tech like machine learning and language understanding to figure out what users really want. Over time, they actually get better at helping people thanks to the data they collect. They can even notice patterns in questions from different users and refine their answers. These bots can handle multiple languages and tailor their responses to individual users. Plus, they know when a problem is too complex and a human needs to step in. That's why businesses that want more natural, intelligent interactions are going for AI bots. Our example of Artificial Intelligence software in use today As you see, the most important potential for AI is to be a recommendation engine. Having solid experience in custom eLearning development, it is not surprising that our offshore software development company is looking for ways to implement AI in eLearning projects. The core idea of AI in eLearning is the implementation of a Recommendation Engine on the eLearning platform. This tool recommends micro-learning content to the user based on their learning experience and other data that a user might provide (including search history or specific requests). From the UI/UX perspective, it looks like an AI-powered chatbot or an AI-powered dashboard like YouTube or LinkedIn has. Such chatbot assistants can automatically imitate tutors, understand the level of expertise of a learner, and pick information that is well fitted to a particular level. For example, it can recommend skills to acquire for each learner and then match them with the corresponding courses. That is just a small fraction of what we do. Recommendation Engine is developed to address a specific business need. It's hard to find a one-size-fits-all solution. Enhance efficiency and customize operations with our AI software development services, designed for your specific data and business needs. Contact our experts for project support
Dmitry Baraishuk • 14 min read
LLM Pretraining
LLM Pretraining
Pre-training is the First Step in Training an LLM Training a large model from scratch is computationally expensive, requiring multiple state-of-the-art GPUs. For this reason, most developers won't pre-train a model from scratch and will instead take an existing model and use fine-tuning to adapt it to their own tasks. However, there are still some situations where pre-training a model may be required or preferred. Some want to build models for tasks in specific domains like legal, healthcare, and e-commerce. Others need models with stronger abilities in specific languages.  Further, new training methods are making more efficient pre-training possible like Depth Upscaling, which uses two or more sets of existing models to build larger models. Because of this technology improvement, there is more and more interest in pre-training. Depth Upscaling creates a new larger model by duplicating layers of a smaller pre-trained model. The new model is then further pre-trained, resulting in a better, larger model than the original. Models created in this way can be pre-trained with up to less compute than traditional pre-training, representing a large cost saving.  Whether pre-training is the right solution for your work depends on several factors, such as whether a model might already be available that might work for your task without pre-training, and what data you have available, as well as the compute resources you have access to, both for training and serving, and lastly, the privacy requirements you may have, which may also implicate regulatory compliance requirements. Pre-training large models of large datasets is an expensive activity, with a minimum cost of maybe about $1,000 for the smallest models, up to tens of thousands of dollars to hundreds of thousands of dollars for maybe a billion parameter scale model. So do be careful if you choose to try this out yourself. There are calculators, like one from Hugging Face, that can help you estimate the cost of your pre-training scenario before you get started. These can help you avoid unexpected large bills from your cloud provider. Best Use-Case for Pre-training Pre-training is the first phase of training on LLM, where the model learns to generate the text by using a very large amount of unstructured text data. Each text sample is turned into many input-output pairs. Over time, the model learns to correctly predict the next word, and in doing so, the model incurs knowledge about the world. These base models are suitable at generating text, but not always good at following instructions or behaving in a safe way. The LLMs you encounter in consumer applications like ChatGPT, Bing Search, and others have had their initial pre-training extended with a phase of fine-tuning to make them better at following instructions and alignment with human preferences to make them safe and helpful. The model only has knowledge of the content that was in the training data, so if you want the model to learn new knowledge, you have to do more training on more data. Additional fine-tuning or alignment training is useful to teach the model new behavior, say writing a summary in a specific style or avoiding a particular topic. However, if you want the model to develop a deep understanding of a new domain, additional pre-training on text from that specific domain is necessary. People often try to add new knowledge without pre-training, focusing on fine-tuning the model with smaller datasets. However, this doesn't work in every situation, especially if the new knowledge is not well represented in the base model. In those cases, additional pre-training is required to get good performance. Let's take a look at a specific example. For instance, let's say you want to create an LLM that is good at a specific language. A base model that wasn't trained on much text from this language, for example, the Lama7b model cannot write text in this language. If you ask the model to tell us about some native term, it gets the answer completely wrong. The model fine-tuned on a small amount of data, can answer only partially in this specific language, however, the answer will actually make sense. The model created by further pre-training LLM, on a huge amount of unstructured text in the language of your interest can now speak this language fluently. So as you can see, pre-training is critical here to getting a good language model. How can we make the results better? Some people will think of fine-tuning. Fine-tuning involves training your model on a small amount of data, which is task specific. It is important to note that in contrast to fine-tuning, which can sometimes be done using a few hundred thousand tokens, and it can be quite cheap, pre-training requires lots of data and so is expensive. The cost to train the 248 million parameter model carried out on 16 H100 GPUs, may take seven hours and cost $1,500 on AWS. LLM Data Cleaning When pre-training a model, it is important to start with a high-quality training dataset. The datasets used for pre-training LLMs are made up of vast amounts of unstructured text. Each text sample is used to train an LLM to repeatedly predict the next word, known as autoregressive text generation. During the training phase, the model's weights are updated as it processes each example in the training data, until over time, the model becomes good at predicting the next word. You can think of this phase as being like reading, where the input texts are used in their original form without any additional structuring of the training samples. Huge amounts of training text, equivalent to millions of completed books, are required for language models to get really good at next-word prediction and to encode reliable knowledge about the world. In contrast, the data used for fine-tuning is highly structured. For example, question-answer pairs, instruction-response pairs, and so on. So, the form of the fine-tuning sample is quite different. The goal of fine-tuning is to get the model to behave in a certain way or to get good at completing a particular task. If pre-training is like reading many, many books, you can think of fine-tuning as being like taking a practice exam. You aren't really learning new knowledge. You learned everything from your reading and pre-training. Instead, fine-tuning is just learning how to answer questions in a specific way. If you want to read a lot of text, you have to find a lot of books and code examples and articles and Wikipedias, webpages, and extra. Actually, pre-training datasets are built from large collections of text documents, many of which are sources from the internet. The world is filled with text, so it's quite easy to find lots of text for pre-training. Fine-tuning of the datasets, on the other hand, requires precise questions and high-quality corresponding answers. Traditionally, this work has been done by humans, which takes time and can be expensive. More recently, teams have been using LLMs to generate fine-tuning data, but you need to use a very capable model for this to work well. In fact, you need to do a bit more work to create good-quality fine-tuning datasets. You will compare and contrast some sample pre-training and fine-tuning datasets. Data quality is very important for pre-training LLMs. If there are issues with your training data, for example, lots of duplicate examples, spelling errors, factual inconsistencies or inaccuracies, and toxic language, then your resulting LLM will not perform well. Taking steps to address these issues and make sure that your training data is of high quality will result in a better LLM and more return on your training investment. Here are major tasks you should complete to clean your text data for training. The first is the duplication. Having duplicated data can bias your model towards particular patterns and examples. It also increases your training time while not necessarily increasing the model performance. Thus, removing duplicate text is a crucial step in cleaning your data. This should be within individual documents and across all documents. You want the intrinsic quality of your training data to be high. The text should be in the language you are interested in, be relevant to any topics you want the LLM to build knowledge of, and meet any other quality metrics that you have. You can design quality filters to clean up this aspect of your training data. A relative step is applying content filters to remove potentially toxic or biased content. Safety is an important concern. And then to avoid potential data leakage, you should always remove personally identifiable information or PII for any of your examples. One common strategy is to redact this in the training text. Lastly, you can come up with rules for how to fix common quality issues like all caps, extra punctuations, and poorly formatted text. As you can see, data cleaning can be complicated and takes lots of time. Luckily, more and more tools are available to help you with this important step. One example is Dataverse, an open source project. Dataverse is a ready-to-use data cleaning pipeline that will take your raw training data, apply the cleaning steps, and also other ones, and then package up your data in a way that is ready for training. You can take a look at the GitHub page to learn more about how to use Dataverse. Data cleaning steps Started with data collection. Since the objective of pre-training is to perform the next token prediction, you need a gigantic corpus of unlabeled data. You can often acquire this data by scraping from the web, gathering documents within your organization, or simply downloading open datasets from data hubs. The content itself is not important. The important part here is that it is an example that consists of plain text data. For pre-training, this is what we want. We want plain text that is not structured in any kind of instruction type. For example, a question-answer pair. Feel free to change the index number here if you want to explore any other example within the dataset. Now let's download another dataset called Alpaca. Alpaca is a fine-tuning dataset which contains 52,000 instruction-following data generated by GPT-4. Here you can see the dataset consists of an instruction, an input, and an output. Let's see what an example looks like. Here we are going to see the first example and print the instruction, input, and output. It's three tips for staying healthy. Note that in contrast to the pre-training dataset, which comprises solely of the text, this instruction dataset, Alpaca, includes the instruction, input, and output columns. Since we are interested in pre-training, we will choose to only use the pre-training dataset from now on. Now let's try scraping from the web and form a custom dataset. To do this, we will download nine random Python scripts. However, note that in practice, you will have much, much more samples, up to billions. This is a very practical action you will do when you're pre-training your own model. You will download some data, add some custom data, and combine. Now we have a total of 60,009 rows. Let's go through some typical steps for data cleaning and see how the number of rows decrease as we progress. First, we will filter out samples that are too short. This is a function describing a common practice for pre-training data. Simply put, we keep text that has at least three lines or sentences and each line of the text contains at least three words. We want to do this because our objective in pre-training is to predict the next token, but short examples are not very useful for that task. So let's try running this function. Note that the dataset library has a filter method which applies a function to each example in the dataset. If you check the number of rows, you can see that over 7,000 rows got eliminated. Now we'll move on to the second part where we remove repetitions. So this is basically a function where, given an input of paragraphs, you can find duplicates. We use this function to find repetitions within a paragraph and say if compared to a paragraph's length, a paragraph has too many duplicates, then we return false to get rid of that paragraph. We will run this function throughout the dataset. Now we're down to 52,000 examples, which is a decrease of 30 rows. That is a tiny decrease, but this is one advantage where you download datasets from HuggingFace because datasets on HuggingFace have a lot of the pre-processing done already. And for the third part of pre-processing, let's go on to deduplication. This function removes duplicate entries by storing unique text segments and comparing each text against it. Let's try running that function. As a result, 8,000 rows were removed, and that is a big decrease. In reality, there is also a lot of duplication in documents, so make sure you cover this step. The last step is language filtering. This is one of the quality filters that Sung previously mentioned. If you want to focus on a particular language or domain, it is good to filter out other languages or domains so that the model is trained on relevant text. Here we'll use the FastText language classifier to only keep English samples to train our model. You will see this warning, but don't worry about it too much. Also note that the run is slower than the filters that we run above. That is because this is actually a real machine learning model in action. Let's check the number of rows. Now we're down to 40,000 after removing approximately 3,000 rows. Here, I would like to note that starting from a large data set from the first place is very important because you are constantly throwing out rows by cleaning out the data set. Finally, we will save the data in the local file system in a parquet format. Note that in reality, you would want to save the data in each stage of cleaning because you're handling a large amount of data and data cannot be contained in memory. Parquet is a columnar storage file format that is widely used in big data and data analytics scenarios. You're free to use any other format like CSV or JSON, but since parquet is really fast, we're choosing it here. The next step in the process is to prepare your saved data set for training. This involves some additional manipulations of the data. Data tokenizing and packing Now that you have your clean data set, you need to prepare it for training. There is a bit more manipulation of the data that you have to do before you can use it in a training run. The two main steps are tokenizing the data and then packing it. LLMs don't actually work directly with the text. Their internal calculations require numbers. Tokenization is the step that transforms your text data into numbers. The exact details of how text is mapped to tokens depends on the vocabulary and the tokenization algorithm of your model. Each model has a specific tokenizer, and it is important to choose the right one or your model won't work. Packing structures the data into continuous sequences of tokens, all of the maximum length of the model support. This reshaping makes training efficient. Let's start with tokenizing. You can choose any one from an existing model hosted on HuggingFace or create your own. Many times you will see models in the same family use the same tokenizer. In this case, we will be using TinySolr's tokenizer from Solr, which is in the same family. Now we are going to calculate the total number of tokens in our dataset. When training LLMs, we are often interested in calculating the total number of tokens, and we can easily check this with NumPy. So with this small dataset that actually started out with approximately 4,000 text samples, you actually have 5 million tokens. Let's pack our dataset. So we now have our clean data tokenized and packed into the right shape for training. Model Training Decode-only or autoregressive models Now you need a model to train. There are several ways to configure and initialize a model for training. And your choice will impact how quickly pre-training proceeds. Although there are several variations of the transformer architecture used in large language models, we're focusing on decode-only or autoregressive models. The decoder-only architecture simplifies the model and is more efficient for the next token prediction. OpenAI's GPT models and the most other popular LLMs, Llama, Mistral have adapted a decoder-only architecture. A decoder-only model is made of an embedding layer that turns text to vector representations, and then several decoder layers, each of which contains several different parts that are based on neural networks. Lastly, the model ends with a classifier layer that predicts the most probable next token from the vocabulary. Initialize the weights Once we decide the architecture, the next step is to initialize the weights. These weights get updated during training as the model learns to predict the next token from the examples in the training data. There are a few ways that you can initialize the weights. The simplest choice is to initialize the weights with random values. This is okay, but it means that training takes a very long time and requires a huge amount of data. Actually, a better way is to reuse existing weights. For example, you can start from Llama7B or Mistral 7B weights. This means your model has already been trained and gets some basic knowledge, so you can generate text very well already. This is the best way to start if you want to continue pre-training a model on new domain data. Training in this scenario generally takes much less data and time than starting from random weights, but still it's much more data than fine-tuning. With all the open models out in the world right now, this can be a great option for creating your own custom LLM. We used exactly the same size, but we put more data. In this training, we used 200 billion tokens. And then the hyperparameters we used in here. These hyperparameters are also very different from fine-tuning. The total price here is 0.2 million. So if you see the price, it's still expensive, but it's much, much cheaper than starting training from scratch. Here, we used 1 trillion tokens, so the approach was more expensive. It cost us about 1 million. However, this is actually much less data than needed to train a model of this size from scratch, which would be around 3 trillion tokens. Model Scaling And you might notice that our model has 10 billion parameters, which is not the same size as the trial weight that we initialized the model with. We found that the 7 billion model, which was available, was not quite good enough for our purposes. But we were limited by our hardware to train a model less than 13 billion. So we took advantage of a technology called "model scaling" to create a new model with a different size. Model scaling removes or adds layers to an existing model and then carries out more training to create a new model with a different size. What if you want to make a smaller model? One option is called downscaling. Downscaling involves removing layers to produce a smaller model than the one you started with. This approach can work well for large models, but it doesn't work well for small models. In general, layers near to the middle of the models are removed, and then the resulting smaller model is pre-trained on a large body of the text to bring its weight back into coherence. The better method is called upscaling. Here you start with a smaller model, then duplicate some layers to make a larger model. Let's take a look at an example. To make a 10 billion model with upscaling, you can start with a 7 billion model. For illustration, let's assume the 7B model has 4 layers. In reality, Lama 7B, for example, has 32 layers. You can make two copies of the model, then use some top layers from one copy and then some bottom layers from the second copy and put them together to create a new model with 6 layers. At this point, the model is no longer coherent. Inference would not work well. Continued pre-training is required to bring the model back into coherence and enable text generation. However, because the model weights of the copied layers have already encoded some language understandings and knowledges, it takes less data and time to create a good model. In fact, upscaling can allow you to train a larger, well-performing model with 70% less data than training the equivalent model from scratch. So depth upscaling can actually be a more cost-effective way to pre-train a model, although it's still expensive. Let's take a look at how you can create models using each of these methods. Let's begin as before by setting a configuration to minimize warnings and by setting a seed for reproducibility. The models we will be creating here will be based on Meta Llama 2 architecture, a decoder-only model that is one of the most frequently used architectures by LLM developers. You can set configuration options using the LLAMAConfig module of the Transformers library. We will reuse most of the parameters of the original LLAMA2 model. But since we want to run our model with limited computation, let's adjust some parameters to reduce the model size. We will be setting the number of hidden layers to 12 and shrinking the model in terms of hidden size, intermediate size, and number of key value heads. Experimenting with these settings is hard because pre-training takes so much time and is expensive. The best place to look for advice on designing a model's architecture is the academic literature. So look for papers on the archive and in conference proceedings. Now that we have determined our model configurations, let's initialize the model. The first and most naive way to initialize a model would be to initialize it with random weights. Initializing a model from random weights is very easy with the Transformers library. All you need to do is pass on the config we've just defined in create an instance of LLAMA. Before we move on, let's check the size of the model. When training an LLM, we always want to make sure of the size of the model because the size directly impacts compute and cost. So our current model is sized at 248 million parameters. When a model is randomly initialized, the weights are given random. Let's take a look at a small sample of weights from one of the layers in the self-attention head. The model is randomly initialized and not trained on any data. Do you want to try it for inference? Can you guess what it will output? So you've seen this happen before. We are first going to load a tokenizer. You will see some random outputs because our model is not trained yet. Before we move on, let's release the memory. This is because these models we created take up to several hundred megabytes and we need to release the memory to avoid crashing the kernel. Now, instead of random weight initialization, let's try using a pre-existing pre-trained model. All we need to do is load the model using auto model for causal LLM and we are ready to keep training. Taking an existing model and continuing to train it on new data is called continued pre-training and is a much faster way to train a model on new data than starting from scratch. Before we move on, let's empty the memory once more. Earlier, we showed how you can remove layers from a large model to create a smaller one in a process called downscaling. Here's how you can do that. You will be shrinking a 12-layer 248 million size model by removing the mid-layers. To start, let's check how many layers the model currently has. You can see that the model currently has 12 layers and has 248 million parameters. Now let's create a smaller model from our initial model by deleting two of the mid-layers. Here we will be selecting the first five layers and the last five layers and concatenating them to form a total of 10 layers. Now you have 10 layers left, which is what we wanted. So now this model configuration is ready to start using for pre-training. As you heard earlier, downscaling works best with larger models. This small model here would not be a good choice and is only being used to show you the method. Let's go ahead and empty our memory once more. So now you are going to try upscaling a pre-existing pre-trained model. By upscaling, we mean that we start from a small pre-trained model and end up with a larger model. Here we will be upscaling a model with 10 layers to a model with 16 layers. The first step is to create a model instance for the large final model we are going to train. So these are the basic configurations for the larger model. As above, we start with the Llama2 model architecture. And all numbers other than the number of hidden layers are the same as the smaller pre-trained model we are going to upscale. Let's finish this part up by initializing the larger model with random weights. Next, you are going to overwrite these randomly assigned weights using the weights from a pre-trained model. So let's load the smaller pre-trained model into memory so you can copy layers from it. Here you will use LLM, which has 12 layers to upscale to our 16-layer model. First, you'll take the bottom-most 8 layers and the top-most 8 layers and concatenate them to form a total of 16 layers. You'll then overwrite the weights of the randomly initialized model with these new values. Lastly, these lines of code here copy over the components that make up the embedding and classification layers for the model. So those can be used as well. Let's check the number of parameters to confirm that it hasn't changed. Let's also try inferencing the model. Now this is interesting. The model has been initialized with another model's weights, so it has some ability to generate what we need. But the layers are not yet coherent, so the generation isn't good. This is why it's necessary to continue pre-training this model on more data. But as you can see here, you are much further along than when you started with random weights. This is why upscaling can help you train models much faster. Then during training, you'll be updating all the weights of this model so all of the layers work together as expected. Let's save this model and then train it.
• 17 min read

Our Clients' Feedback

zensai
technicolor
crismon
berkeley
hathway
howcast
fraunhofer
apollomatrix
key2know
regenmed
moblers
showcast
ticken
Next slide
Let's Talk Business
Do you have a software development project to implement? We have people to work on it. We will be glad to answer all your questions as well as estimate any project of yours. Use the form below to describe the project and we will get in touch with you within 1 business day.
Contact form
We will process your personal data as described in the privacy notice
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply
Contact us

USA +1 (917) 410-57-57
700 N Fairfax St Ste 614, Alexandria, VA, 22314 - 2040, United States

UK +44 (20) 3318-18-53
26/28 Hammersmith Grove, London W6 7HA

Poland +48 222 922 436
Warsaw, Poland, st. Elektoralna 13/103

Email us

[email protected]

to top