learn data science - Medium

Why Deep Learning Didn’t Replace Tree Models for Tabular Data

Kan Nishida — Mon, 06 Apr 2026 02:38:19 GMT

Why models like XGBoost and LightGBM still dominate structured data problems.

Over the past decade, the world of machine learning and AI has been dominated by one idea: deep learning.

Neural Networks — the algorithms behind deep learning — has transformed fields like:

computer vision
speech recognition
natural language processing

Neural networks have transformed fields like image recognition, speech processing, and natural language understanding. Models such as transformers and convolutional neural networks now power everything from ChatGPT to self-driving cars.

Given that success, it seemed inevitable that deep learning would replace traditional machine learning everywhere.

But something interesting happened.

In the world of tabular data, the structured datasets used by most businesses, deep learning never completely took over.

Instead, algorithms like XGBoost and LightGBM continue to dominate many real-world machine learning applications.

This sounds surprising to some. But once you understand the nature of tabular data and what deep learning is good at (and not good at), it starts to make sense.

Most Businesses Data Are Tabular Data

Most organizations are not training AI models on billions of images or internet-scale text corpora.

Instead, they work with data that looks something like this:

Columns represent different concepts:

age
income
purchase behavior
geographic region

Unlike images or language, there is no inherent structure connecting these variables.

Images have spatial structure

Language has sequential structure

But, tabular data is different. It’s simply a collection of variables describing some phenomenon. And it doesn’t have the kind of structure that images and text have.

Tabular data

Deep learning thrives when data has hierarchical structure (images, language).

Think of the modern Transformer, which is one of the most important algorithms in deep learning and AI today. When it builds a model on a given text data, it takes the sequence of text and the relation between words, sentences, etc. into account, and predicts the next word.

But, tabular data typically does not have such sequence and structure. Each value in a given variable is typically independent from other values.

And it turned out that this difference matters a lot.

Why Tree Based ML Models Work So Well

Machine Learning models like XGBoost, LightGBM, Random Forest, etc. are still the most common algorithms among practitioners who are building prediction models with business data, and they all share a common fundamental architecture.

That is Decision Trees.

Decision trees approach the problem in a way that feels very natural for this type of data.

Instead of learning abstract representations, they learn rules.

For example:

Or:

These kinds of conditional rules often show up in real-world datasets, and Decision trees are extremely good at discovering them.

The Rise of XGBoost

Around the mid-2010s, one algorithm in the tree family became very popular.

That algorithm was XGBoost.

XGBoost implemented gradient boosting in a highly optimized way and quickly became the default choice for many machine learning practitioners working with tabular data.

It builds trees sequentially, each one correcting the mistakes of the previous model.

For several years, it dominated the eyes of data science practitioners who were building prediction models for business data.

But as datasets grew larger, people began to encounter a new problem.

Training these models could take a long time.

Enter LightGBM

In 2017, researchers at Microsoft introduced a new boosting framework called LightGBM.

The goal wasn’t to reinvent gradient boosting.

Instead, the idea was to make boosting lighter.

The word Light in LightGBM refers to being lightweight in computation and memory usage.

Several clever design decisions helped achieve this:

trees grow leaf-wise, focusing computation on the most useful splits
features are converted into histogram bins to reduce split evaluations
rows with large gradients are prioritized using GOSS
sparse features are compressed using exclusive feature bundling (EFB)

Level-wise vs Leaf-wise growth

Level-wise (XGBoost)

Leaf-wise (LightGBM)

LightGBM grows trees where the loss decreases most, focusing computation on the most informative parts of the model.

Together, these ideas dramatically reduce the amount of computation required to train models.

For people experimenting with machine learning models by trying many features and hyper-parameters, this speed improvement made a huge difference.

Why Deep Learning Often Struggles?

It’s not that people stopped trying, in fact many researchers have been trying applying neural networks to tabular datasets.

But very often, Decision tree based boosting models such as XGBoost, LightGBM, still performed better.

The reason is surprisingly simple.

Deep learning excels when the data contains rich internal structure.

Images contain spatial patterns. Language contains grammatical patterns.

Tabular data usually does not.

Instead, tabular datasets often contain:

diverse mix of variables
engineered features (artificially created extra variables based on original variables)
sparse categorical encodings (one-hot encoding)
nonlinear feature interactions

Decision trees are well suited to discovering these kinds of patterns.

What This Means in Practice

In many real-world machine learning projects, the workflow often looks like this:

Build a baseline model.
Try a tree boosting algorithm.
Improve the model through feature engineering and parameter tuning.

And very often, the models that end up performing best are based on boosting algorithms such as XGBoost, LightGBM, etc.

These algorithms have become reliable workhorses for tabular data problems.

Trying These Models Yourself

One of the motivations behind building Exploratory was to make various data science tools easier to use, and building prediction models with machine learning models is one of them.

In Exploratory, you can train tree based models such as:

Random Forest
XGBoost
LightGBM

directly from an interactive interface.

You can take a look at this how-to note for more details on how to use LightGBM.

Instead of writing large amounts of code, you can focus on exploring your data, building features, and comparing models.

If you work with tabular data such as customer behavior, financial data, operational metrics, etc., it’s definitely worth trying them.

A Final Thought

As it turned out, the deep learning revolution didn’t eliminate traditional machine learning algorithms.

Instead, it clarified something important.

Different types of data require different tools.

For images and language, as you know, deep learning dominates.

But, for tabular data, tree based algorithms like XGBoost and LightGBM remain some of the most powerful methods available.

One of the things I appreciate about boosting algorithms like XGBoost and LightGBM is how practical they are.

You don’t need massive infrastructure or complicated neural network architectures. You start with your data, build some features (if required), and let the model discover useful patterns.

In many cases, the results are surprisingly good!

In Exploratory, you can train models such as Random Forest, XGBoost, and LightGBM directly from an interactive interface and compare their performance on your dataset.

Sometimes the easiest way to understand the strengths of these models is simply to see how they perform on your own data.

Download Exploratory

You can start using XGBoost, LightGBM, Random Forest, and other models today in the latest version of Exploratory.

👉 Download Exploratory

https://exploratory.io/download

If you don’t have an account yet, sign up here to start your 30-day free trial.

https://exploratory.io/

If your trial has expired, simply launch the latest version and use the Extend Trial option.

If you have questions or feedback, feel free to contact me at kan@exploratory.io .

We’d love to hear how you’re using Exploratory to uncover insights in your data.

Kan Nishida

CEO, Exploratory

Why Deep Learning Didn’t Replace Tree Models for Tabular Data was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

LightGBM Explained: How It Differs from Random Forest and XGBoost

Kan Nishida — Sun, 22 Mar 2026 17:03:57 GMT

The evolution of tree-based models — from robustness to optimization to scalability

If you work with tabular data (table data), the kind of structured data found in business analytics, finance, marketing, or operations, you’ve probably encountered three popular machine learning algorithms:

Random Forest
XGBoost
LightGBM

All three rely on decision trees, and they can all produce very strong predictive models. But they were designed with different priorities in mind.

Random Forest emphasizes simplicity and robustness. XGBoost focuses on highly optimized gradient boosting. LightGBM was created to make boosting faster and more scalable.

So, which one to choose?

It depends…

And this is why I wrote this blog post.

Understanding why LightGBM was created and how it works makes it much easier to decide whether it’s the right tool for your problem.

The Problem LightGBM Was Designed to Solve

By the mid-2010s, gradient boosting had already proven to be one of the most powerful techniques for predictive modeling on tabular data, or structured data if you will.

In particular, XGBoost had become extremely popular after dominating many machine learning competitions.

However, as datasets continued to grow, practitioners began to encounter new challenges.

The performance.

Datasets with millions of rows become normal, and variables (or features) grow thousands, which caused sparse features produced by one-hot encoding.

Gradient boosting was powerful, but it could also be computationally heavy. This means that it takes time to build models with boosting algorithms when the data size is big.

Researchers at Microsoft set out to redesign parts of the algorithm so it could handle large datasets more efficiently.

The result was LightGBM, and they released it as open source in 2017.

Why Is It Called “LightGBM”?

LightGBM stands for ‘Light Gradient Boosting Machine’.

The word “Light” does not refer to the speed of light.

Instead, it refers to the algorithm being lightweight in computation and memory usage.

The goal was to create a gradient boosting system that could:

train faster
use less memory
scale to larger datasets

while still maintaining strong predictive performance.

Before diving into LightGBM’s innovations, it helps to understand how the three algorithms differ conceptually.

Random Forest: Many Independent Trees

Random Forest builds many trees independently.

Each tree:

samples the dataset randomly
selects variables (or features) randomly when splitting
produces its own prediction

The final prediction is simply the average (regression) or majority vote (classification).

Key idea is that many independent trees reduce variance and improve stability compared to a single tree (Decision Tree).

But the trees do not learn from each other.

XGBoost: Trees That Correct Mistakes

XGBoost uses a technique called gradient boosting. Instead of building independent trees, it builds trees sequentially to correct the errors made by the previous trees.

Boosting algorithms are really performing a form of gradient descent in function space.

It builds the first tree and predict and calculate the errors (or loss). If it was a regression problem then the errors can be the residual between the actual and predicted values. And this is called ‘Gradient’.

The next tree will be built to predict the gradient values, not the actual values because the gradient tells the model how to move predictions to reduce loss.

And combining the predicted value from the first tree and the predicted values from the second tree will become the new predicted values as a model.

prediction_new = prediction_from_first_tree + learning_rate × prediction_from_second_tree

Conceptually, the model evolves like this:

Tree1 → initial prediction
Tree2 → fix errors from Tree1
Tree3 → fix remaining errors
Tree4 → continue improving

This process usually leads to more accurate models than Random Forest.

However, as datasets grow larger, training boosting models can become computationally expensive.

That is where LightGBM comes in.

LightGBM: Designed to scale

LightGBM does not change the basic idea of gradient boosting.

Instead of optimizing only the boosting algorithm, LightGBM also optimizes:

how trees grow
how rows are sampled
how splits are evaluated
how features are represented

to scale to very large datasets while maintaining strong accuracy.

These improvements come from four key innovations:

Leaf-wise tree growth
Histogram-based splitting
GOSS (Gradient-based One-Side Sampling)
EFB (Exclusive Feature Bundling)

Let’s walk through them one by one.

1. Leaf-Wise Tree Growth

One of the most distinctive features of LightGBM is leaf-wise tree growth.

Traditional tree algorithms such as Random Forest and XGBoost grow trees level-wise.

At each depth of the tree, all nodes are expanded.

Example:

        Root
       /    \
      A      B
     / \    / \
    C   D  E   F

Every level of the tree expands evenly, and it produces balanced trees, which are stable and predictable.

But, they may waste computation expanding branches that do not significantly improve predictions.

LightGBM takes a different approach. Instead of expanding all nodes at the same depth, it expands the leaf that produces the greatest reduction in loss.

Example:

        Root
       /    \
      A      B
     / \
    C   D
   /
  E

The tree grows where the model improves most.

This approach allows LightGBM to reach strong predictive performance with fewer splits.

The trade-off is that trees can become deeper in certain branches, so LightGBM provides parameters such as max_depth and num_leaves to control model complexity.

2. Histogram-Based Splitting

Another important optimization in LightGBM is histogram-based splitting.

Standard tree algorithms may evaluate many possible split thresholds for continuous features.

Example:

Age ≤ 21
Age ≤ 22
Age ≤ 23
Age ≤ 24

LightGBM speeds this up using histogram binning.

Instead of evaluating every unique value, continuous features are grouped into bins.

Example:

Original values:

23, 25, 27, 29, 35

Converted to bins:

20–25
25–30
30–40

Now the algorithm evaluates splits only on bin boundaries.

This dramatically reduces the number of candidate splits and speeds up training.

3. GOSS (Gradient-Based One-Side Sampling)

Training boosting models on large datasets normally requires processing all rows.

LightGBM introduces GOSS (Gradient-based One-Side Sampling) to reduce the number of rows used during training.

The key idea is based on how boosting works.

In boosting algorithms, each data point has a gradient value indicating how much the model needs to adjust its prediction.

Rows with large gradients represent predictions where the model is making large errors.
Rows with small gradients are already well predicted.

While typical boosting algorithms including XGBoost randomly sample the data, LightGBM uses this gradient information to sample the data.

Keeps all rows with large gradients
Keeps only a subset of rows with small gradients

Example:

Dataset: 100,000 rows

Top 20% largest gradients → keep all (20,000 rows)
Remaining 80% → sample 10% (8,000 rows)

This wya, it need to use only 28,000 rows instead of 100,000 rows.

This significantly reduces computation while preserving important learning signals.

4. EFB (Exclusive Feature Bundling)

Many modern datasets contain high-dimensional sparse features, especially when categorical variables are one-hot encoded.

Example:

These features are mutually exclusive, only one can be active in each row.

Instead of treating them separately, LightGBM bundles them into one feature.

Original:

Bundled:

Now the algorithm evaluates splits on one feature instead of three.

This reduces feature dimensionality and speeds up training.

How to Choose Each Model?

Each algorithm has strengths.

Random Forest

Good when:

you want a simple baseline
datasets are relatively small
minimal tuning is preferred

XGBoost

Good when:

datasets are moderate in size
you want strong predictive performance
stability and extensive tuning options are important

LightGBM

LightGBM works particularly well when:

datasets are large
feature dimension is high
features are sparse
training time matters

Practical Recommendation

A common workflow in machine learning projects is:

Start with Random Forest as a baseline.
Try XGBoost or LightGBM to improve performance.
Prefer LightGBM when datasets become large or training time becomes a bottleneck.

Because of its efficiency and scalability, LightGBM has become a popular choice for many machine learning tasks with tabular data.

Final Thought

Random Forest, XGBoost, and LightGBM all rely on decision trees, but they represent different philosophies:

Random Forest focuses on robust ensembles
XGBoost focuses on optimized gradient boosting
LightGBM focuses on efficient and scalable boosting

Understanding these differences helps you choose the right tool, and explains why LightGBM has become an important algorithm in modern machine learning.

Try LightGBM with Exploratory!

You can try LightGBM with Exploratory v14.5 or later versions.

Go to Analytics view.
Select LightGBM.
Select a Target Variable.
Select Explanatory Variables (Features)
Click Run button.

You can take a look at this how-to note for more details on how to use LightGBM.

Download Exploratory

You can start using LightGBM today in the latest version of Exploratory.

👉 Download Exploratory v14

https://exploratory.io/download

If you don’t have an account yet, sign up here to start your 30-day free trial.

https://exploratory.io/

If your trial has expired but you’d like to try the new features, simply launch the latest version and use the Extend Trial option.

If you have questions or feedback, feel free to contact me at kan@exploratory.io .

We’d love to hear how you’re using Exploratory to uncover insights in your data.

Kan Nishida
CEO, Exploratory

LightGBM Explained: How It Differs from Random Forest and XGBoost was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Auto-Positioning Labels: Keep Your Charts Readable Automatically

Kan Nishida — Thu, 19 Mar 2026 11:18:40 GMT

Introducing auto-positioning for chart values in Exploratory

Showing values directly on a chart is incredibly powerful.

Your audience first sees the pattern in the data. Then they see the exact labels and numbers behind the pattern.

That combination is often what makes a chart insightful.

Most charting systems place labels exactly at the coordinates of the data point.

For small datasets, this works perfectly.

But when you have many data points that are close to each other, your chart would become something like this.

Instead of helping the reader, the labels start fighting each other.

Typical problems appear immediately:

Labels overlap each other
Labels cover the data points
Values become unreadable
Analysts start manually adjusting labels

What was supposed to clarify the chart ends up making it harder to read.

This problem is known as label overlap or label collision.

And it turns out to be surprisingly difficult to solve.

The Idea: Let Labels Move (Just a Little)

In Exploratory v14.3 we introduced Auto-Positioning for Labels.

The idea sounds simple:

Instead of fixing labels exactly on the data points (or coordinates), allow them to move slightly until they no longer overlap.

But there are a few important constraints.

The labels must:

Avoid overlapping each other
Avoid being overlapped by marker (e.g. line, bar, etc.)
Stay visually close to the original point
Indicate clearly which data point it represents
Preserve readability

When the system finds a better layout, the labels reposition automatically.

If a label moves away from its original point, a leader line (arrow) connects the label back to the data point.

The result is a chart that remains readable even with many labels.

Our First Attempt: Physics Simulation

This is a solved problem in R with a package called ‘ggrepel’, but while Exploratory is built on top of R system we use Plotly JS (Java Script) for chart rendering. So we needed to build an auto-positioning system in JS layer.

So, our first approach was to use d3-force, a physics simulation library in JS.

The idea was appealing.

Labels repel each other like particles with repulsive force.

Eventually they should settle into positions where nothing overlaps.

In theory.

In practice, it didn’t work.

When many values are densely packed the labels tend to push each other away and scatter outside the chart area or across the entire screen.

So we gave up on that approach.

The Algorithm That Worked: Simulated Annealing

Instead, we adopted an algorithm called Simulated Annealing, which was used in D3-Labeler.

This is a classic optimization technique inspired by metallurgy.

When metal cools slowly, its atoms settle into a stable structure.

Simulated Annealing follows a similar idea:

Start with the current label layout
Move labels slightly in random directions
Evaluate whether the layout improves
Gradually reduce the randomness over time

After many small adjustments, the system converges toward a layout with minimal overlaps and good readability.

The key advantage is that labels stay close to their original positions, rather than flying across the chart.

Engineering the System

To integrate this into Exploratory, we created a new module that wraps the chart rendering process.

The workflow looks like this:

Exploratory renders the chart normally
The wrapper module analyzes the label positions
The auto-positioning algorithm runs
Labels are repositioned if collisions are detected

This adjustment happens automatically after the chart rendering.

Keeping It Fast (Even with Hundreds of Labels)

Optimization algorithms can become slow when the number of labels increases.

To keep performance smooth, we introduced spatial grid partitioning.

Instead of checking collisions between every pair of labels, the system divides the chart into small grid cells.

Labels only need to check nearby neighbors inside the same grid.

This dramatically reduces the number of collision checks.

As a result, the system remains responsive even with 500+ labels.

Additional Improvements

We added a few additional features to make the system more practical.

Leader Lines

If a label moves far from its original point, a leader line automatically appears.

This maintains the visual connection between label and data point.

Error Bar Awareness

Bar charts with error bars introduce another challenge.

Labels must avoid overlapping the confidence interval lines.

The algorithm takes error bar ranges into account when calculating positions.

How to Enable Auto-Positioning

Using the Auto-Positioning feature is simple.

Open the chart property dialog.
Click the gear icon at the top of the chart.
Then go to the Values tab.
Enable Show Values.

When the Position is set to Automatic, Exploratory automatically adjusts label positions to avoid overlaps.

If labels move far from their original point, arrows appear to indicate the connection.

You can control the color and the arrow visibility using the Arrow Display Threshold setting.

Increasing the threshold hides shorter arrows and reduces clutter.

For a newly created charts, when you enable to show the values (labels) on chart the auto-positioning is automatically set by default. For existing charts that you created before v14.3, you want to manually switch the Position to Automatic.

Improving Placement Accuracy

You can also control the optimization effort.

The setting ‘# Tries to Improve Accuracy’ determines how many optimization trials are performed.

Increase the value → more accurate placement (slower)
Decrease the value → faster calculation (less precise)

A Small Feature That Makes Powerful Data Exploration

In Exploratory v14.3 we introduced this auto-positioning system using Simulated Annealing and spatial optimization.

At first glance, this might look like a small feature.

But in practice, it changes something fundamental about how you explore data.

Exploratory Data Analysis is not just about creating charts.

It is about discovering things you did not expect to see.

That kind of discovery often happens when you start looking closely at individual data points.

For example, imagine a scatterplot showing 190 countries.

At first, you might look for a particular country you are interested in.

But once the labels become readable, something else begins to happen.

You start to notice patterns you did not plan to look for.

Which countries are close to each other?
Which countries behave similarly?
Which countries stand apart from the rest?

These moments of discovery are the essence of exploratory data analysis.

However, if the labels overlap and become unreadable, those discoveries become much harder.

Analysts end up spending time manually adjusting labels instead of exploring the data itself.

Our goal with Exploratory has always been to build a tool that helps people think better with data.

Not by automating the thinking process, but by removing the friction that gets in the way of exploration.

Auto-positioning labels is one of those small features that quietly makes exploration easier.

And when exploration becomes easier, new insights often follow.

That is why we believe this feature helps make Exploratory a better environment for true Exploratory Data Analysis.

Try Auto-Positioning Today

You can start using the Auto-Positioning feature today in the latest version of Exploratory.

👉 Download Exploratory v14

https://exploratory.io/download

If you don’t have an account yet, sign up here to start your 30-day free trial.

https://exploratory.io/

If your trial has expired but you’d like to try the new features, simply launch the latest version and use the Extend Trial option.

If you have questions or feedback, feel free to contact me at kan@exploratory.io .

We’d love to hear how you’re using Exploratory to uncover insights in your data.

Kan Nishida

CEO, Exploratory

Auto-Positioning Labels: Keep Your Charts Readable Automatically was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Note Editor: Create Reports 10x Faster, 10x Better

Kan Nishida — Mon, 16 Mar 2026 11:56:24 GMT

We’re thrilled to introduce AI Note Editor in Exploratory v14! 🎉

This new feature works like having a professional editor — and a data analyst — right beside you as you write. Except it never gets tired, always respond fast, and can analyze your charts with deep statistical knowledge.

It’s designed to help you create high-quality analysis reports quickly, clearly, and effortlessly.

Why Reporting Is the Most Underrated (and Most Painful) Step in Data Analysis

In any real-world data analysis workflow, running the analysis is only half the job. The other half — often the harder half — is explaining what you discovered.

But let’s be honest: most of us don’t enjoy writing reports. Sometimes, we aren’t sure how to describe what charts are showing. So, we end up sending a Slack message with “here’s the chart” or copying & pasting charts in PowerPoint/Slides and hoping the audience magically understands it.

As a result:

Screenshots pile up in Slack threads.
PowerPoint slides become chart image dumps.
Insights get lost because they’re never clearly communicated.

This is the communication problem in Data Science workflow that we wanted to solve with the new AI Note Editor.

Meet AI Note Editor: Turn Comments & Charts Into a Clear, Polished Report

With AI Note Editor, your analysis report writing workflow will become something like this.

Add your charts.
Write a few comments, if you like.
Get a complete data analysis reports generated.
Edit it as it fits your needs.
Share your report with others!

Based on your comments and charts, AI Note Editor generates a polished, structured, ready-to-share report for you, right inside Exploratory.

No switching tools.

No copy & paste to Power-Point.

No asking somebody else to create reports.

No staring at a blinking cursor trying to think of the right words.

Your data analysis stays where it belongs — inside the Exploratory workflow — and AI handles the writing.

Automatic Chart Interpretation

AI Note Editor doesn’t just improve your writing, it can also interpret your charts and explain what’s happening in the data. This is something that no general-purpose writing tool can do.

Give it a chart and the AI will:

Identify key patterns
Explain strengths and weaknesses
Describe trends, anomalies, or outliers
Summarize relationships between variables
Highlight important signals
Put everything into intuitive, natural language

For example, given a radar chart like the one below, AI will describe the key patterns, strengths, weaknesses, and overall story behind the data.

Detect Trends & Signals

Chart interpretation not only interprets numerical values but also provides context-aware interpretations, such as trends and signals within the data.

For example, with XmR charts (control charts), it can detect whether a signal is present and explain what it indicates accordingly.

AI Note Editor that comes with a deep statistical knowledge

Moreover, for charts containing statistical information, the AI Note Editor provides interpretations based on that statistical data.

For example, for a scatter plot showing the relationship between two variables, it can explain how strong (or weak) the correlation is, what the relationship implies, and whether it is statistically significant or not.

AI Note Editor Can Analyze Data and Explain What’s There

You’ve probably experienced something like these before:

“I know what I see with the chart… but how do I put it into words?”
“I think this trend is important, but I’m not confident how to explain it.”
“I saw the chart, but I didn’t even notice it until someone pointed it out.”

AI Note Editor can address such concerns by:

Helping you verbalize what the data shows
Pointing out things you might have missed
Ensuring your analysis is complete
Reducing the risk of overlooking important signals
Improving the clarity and quality of your reports — instantly

AI Note Editor is not just about writing faster. It’s about analyzing data better and communicating more clearly.

It can help you Write Better

AI Note Editor also includes a set of tools to improve your writing:

Summarize long text
Fix grammar and spelling
Improve clarity and tone
Refine wording and expression
Translate your writing

Whether you’re writing an internal update, a weekly KPI brief, or a full analysis report, AI Note Editor makes the report writing process faster and better.

Create Your Own Report Format with Custom Prompts

Exploratory provides an “Analysis Report” style out of the box — but let’s be honest, not everyone writes reports the same way.

You have your own reporting needs and might need to write in different formats or style to fit your clients or audience’s needs by project to project.

That’s why AI Note Editor includes Custom Prompts support.

Just use ‘Run Custom Prompt’ and describe the format you want with Markdown instructing a specific structure with:

Headings and subheadings
Bullet points
Summary sections
Business-style executive summaries
Step-by-step analysis
Narrative storytelling
Even templates that match your corporate writing guidelines

Just tell the AI the structure you want — and it will generate the full report in that format.

Save & Reuse Prompt Templates

Once you create a prompt you like, you can save it as a template.

Your templates appear in the Template list, ready to use anytime.

This is incredibly useful for reports you produce regularly — weekly reports, monthly summaries, recurring analyses, etc. Just pick a template and generate a polished report in seconds — consistent, clean, and always in the right style.

A Gallery of Prompt Examples to Get You Started

To help you get started, we have prepared a collection of prompt examples in the AI Note Editor Gallery. You can browse through the prompts, copy & paste them, and tweak however you like.

You can create sophisticated report formats immediately — even if you’ve never written a prompt before.

Share Templates with Your Team

AI Note Editor templates can also be shared across teams.

Your team members can directly import them by clicking the “Import” button from the “AI Prompt Template” dialog mentioned above.

This means:

Standardized reporting
Consistent communication style
Faster onboarding for new members
Reduced back-and-forth editing
Higher-quality reports across the organization

A New Standard for Data Communication

AI Note Editor isn’t just a writing tool — it’s a communication tool with a deep statistical knowledge. It helps you:

Turn analysis into narrative
Turn charts into discoveries and insights
Turn discoveries into stories
Turn insights into action

No more copy & paste dumps in Power-Point slides.

No more staring at charts wondering how to describe what you see.

No more wasting time wondering what to write and how to explain.

AI Note Editor gives you clarity.

Your audience gets deeper understanding of your discoveries.

And your analysis becomes dramatically more impactful.

This is what the future of data communication looks like — and it lives directly inside Exploratory.

Try AI Note Editor Today!

You can start using AI Note Editor right now in the latest version of Exploratory.

👉 Download Exploratory v14
https://exploratory.io/download

If you don’t have an Exploratory account yet, please sign up here to try it out. The first 30 days are a free trial period!

If your trial has already expired but you want to try the new AI features, simply launch the latest version and use the “Extend Trial” option in the dialog — or contact us directly.

For questions or feedback, feel free to reach out:
📧 support@exploratory.io

We can’t wait to see the reports you create — and how AI Note Editor helps you communicate insights faster, clearer, and more confidently than ever before.

AI Note Editor: Create Reports 10x Faster, 10x Better was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Turn GitHub Issues into Release Notes with AI

Kan Nishida — Mon, 16 Mar 2026 11:43:31 GMT

How I Use Exploratory’s AI Function to Automatically Categorize, Rewrite, and Generate Release Notes

Preparing a release note used to take a few hours every release. With AI Function, it now takes less than a few minute.

Every time we ship a new version, we go through dozens of GitHub issues — bug fixes, enhancements, and new features — and turn them into a release note that our users can easily understand.

This used to be a very manual process.

For each issue we had to:

Read the issue title and description
Decide whether it is a bug fix, enhancement, or documentation change
Assign it to a functional category (Data Wrangling, Chart, Analytics, etc.)
Rewrite the title so users understand what changed and why it matters

When there are 50–100 issues per release, this quickly becomes tedious.

Today, we automate most of this workflow using AI Function and AI Note Editor inside Exploratory.

In this post, I’ll show you exactly how we do it.

My hope is that this gives you ideas for how you can use AI Function to automate your own text-based workflows.

What is AI Function?

AI Function in Exploratory lets you create a function using a prompt so that AI (LLM) processes each row of your data.

You can ask AI to:

analyze text
classify information
summarize content
generate new text
transform messy data

all directly inside your data workflow.

For example, if you have a column containing customer feedback comments, you can simply write a prompt like:

Classify the text into several groups.

AI will analyze each row and assign a category.

You can learn more about AI Function here.

https://blog.exploratory.io/data-science-2-0-a-new-era-of-text-data-analysis-b6430baadba7

I personally use AI Function almost every day.

When we built this feature, I realized something interesting:

There was far more text data in my daily work than I had ever noticed before.

Issue logs.

Customer feedback.

Meeting notes.

Support conversations.

Task descriptions.

These are all valuable data sources, but without the right tools they’re difficult to analyze or automate.

AI Function changes that.

Example: Automating Our Release Notes

Let me show you a real example from our workflow.

We manage all development work in GitHub Issues.

These issues include:

bug reports
feature requests
internal development tasks
documentation updates

When we release a new version, we publish the closed issues for that milestone as the release note.

But raw GitHub issue titles are not written for users — they’re written for developers.

So we need to transform them.

Specifically, we need to:

Categorize issues by product area
Identify whether they are bug fixes or enhancements
Rewrite the title so users clearly understand the change

Before AI Function, I did this manually.

Now it’s automated.

Step 1: Import GitHub Issues

First, we import GitHub issues directly into Exploratory.

For example, we can import issues for the milestone v14.5.

Once imported, the dataset contains columns like:

Issue title
Issue body
Labels
Status
Milestone

Take a look at this how-to note for details.

How to Import Github Issue Data

Now that the data is imported, it’s time to work with AI Function.

Step 2: Categorize Issues by Product Area

First, we categorize each issue into a functional area.

For this, I create an AI Function with a prompt like this:

Based on the title and the body text, categorize text to one of the following groups.

AI Function
AI Prompt
Summary View
Table View
Data Source
Data Wrangling
Chart
Analytics
Note
Dashboard
Parameter
Project
Publish
Install
Document
Others

Instead of letting AI invent categories, I give it a predefined list.

This improves consistency.

I also provide two columns as input:

Title
Body

The title text and the body text would look like the below on the actual Github issue page.

This gives the model enough context

When executed, the AI analyzes the issue text and assigns an appropriate category for each row.

Step 3: Identify Issue Type

Next, we classify whether the issue is:

a bug fix
an enhancement
documentation
other

The prompt is simple:

Identify if a given sentence indicates whether it is an issue fix, a product enhancement, documents, or others.

Identify if a given sentence indicates whether it is an issue fix, a product enhancement, documents, or others.

This correctly categorize them into the right group.

Step 4: Rewrite the Issue Title

The final step is improving the issue titles.

GitHub issue titles are often short and technical.

For example:

AI Function: Don’t run the cached step when duplicating the data frame

That’s clear to developers, but not to most users.

We already built an internal system to improve and clean up the issue title and generate the Release Note, but we’ve realized that we can simply use AI Function to rewrite a better title for each issue based on the title and the body text

So I use AI Function again to generate a better description.

Prompt:

Based on the ‘title’ and ‘body’ text, write a one or two sentence summary that clearly explains what the issue is and why it matters to users.

In the AI Function dialog, I’m setting ‘title’ and ‘body’ columns as the Target Columns.

The result:

Original:

AI Function: Don't run the cached step when duplicating the data frame

New:

Duplicating a data frame currently triggers re-execution of AI Function steps, which can cause unnecessary processing time and API costs.

Much clearer.

And this happens for every issue automatically.

That’s all!

Now that I have categorized the issues and improved the text I can show them in a table format.

From Raw Issues to Clean Release Notes

After these steps, we now have structured information:

Issue category
Issue type
Improved description

This makes it easy to organize them into a release note.

Next Step: Generate Release Note with AI Note Editor

Once the data is prepared, we take it one step further.

We use AI Note Editor inside Exploratory to generate the release note itself.

Check out this introductory blog post for AI Note Editor.

AI Note Editor: Create Reports 10x Faster, 10x Better

I’ll write another blog post explaining exactly how we do this, so stay tuned!

The Real Power: Reproducibility

The biggest benefit of this workflow is reproducibility.

Once the AI Function steps are created, we can reuse them.

What if the new issues have been added in the last minutes?

You can:

Re-import issues for the new milestone
Run the workflow

What if we will need to generate a release note for another version in future?

Click on the Data Source step and open the Import dialog
Update the milestone
Run the same workflow

We release new versions often, so being able to repeat the same workflow automatically is important for us.

In any future releases I can simply click a button to re-import data, and the AI Functions automatically process the new issues and produce clean, user-friendly descriptions.

No manual rewriting required.

This is the real power of combining:

data workflows
AI functions
reproducibility

That is the real power of building an automated data wrangling system with AI Function that is reproducible and can be used with any future incoming data.

Why Not Just Copy and Paste the Issues into ChatGPT?

At this point, you might be thinking:

“Couldn’t I just copy the list of issues into ChatGPT and ask it to do the same thing?”

Yes, you certainly could.

But once you try to use that approach in a real workflow, several practical limitations quickly appear.

What makes AI Function inside Exploratory powerful is not just that it uses AI. It’s that AI becomes part of a data workflow.

Here are some key differences.

1. Iterative Workflow Development

When working with AI Function, you can build your workflow step by step.

For example:

Start with issue categorization
Review the results
Improve the prompt
Run it again
Add another AI Function step for issue type classification
Add another step to rewrite titles

Each step produces a visible column of results, which makes it easy to:

review
refine prompts
improve output quality

This iterative workflow is much harder to do when you are simply pasting text into an AI chat interface.

2. Control Over the Output

When AI is part of your data workflow, you have much more control over the result.

For example, you can:

specify allowed categories
combine multiple columns as context
inspect the output row by row
filter and fix problematic cases

This is important if you care about quality and consistency, not just quick answers.

3. Reproducibility

This is perhaps the most important difference.

Once you build the workflow, it becomes reproducible.

Every time you import a new set of issues, you can run the exact same steps and get consistent results.

With copy-paste AI workflows, you would have to:

prepare the text
paste it into the AI
rewrite prompts
manually format the results

every single time.

AI Function turns that process into something you can run repeatedly with one click.

4. AI Works Directly on Your Data

Instead of copying and pasting text back and forth, AI Function works directly on the data frame in Exploratory.

This means you can combine AI with other data operations, such as:

filtering rows
joining datasets
grouping and summarizing results
visualizing patterns

AI becomes just another data transformation step inside the workflow.

5. Scales to Large Datasets

Chat interfaces are great for small tasks, but they quickly become cumbersome when working with hundreds or thousands of rows of data.

AI Function processes your data row by row, allowing you to apply the same logic consistently across the entire dataset.

6. Part of a Larger Data System

Finally, AI Function integrates with the rest of Exploratory’s capabilities:

data wrangling
visualization
machine learning
dashboards
notes

In our case, the output of AI Function becomes the input for AI Note Editor, which then generates the release notes themselves.

This creates a complete pipeline:

GitHub Issues → AI Function → Enriched Data → AI Note Editor → Release Note

The Real Difference

Using AI in a chat interface is great for one-off tasks.

But AI Function allows you to turn those tasks into reusable data workflows.

And once that happens, something interesting occurs:

AI stops being a tool you occasionally use, and becomes part of your everyday data process.

That’s the real power of AI Function inside Exploratory.

Key Takeaways: Why AI Function Is a Game Changer

This example is not really about release notes.

It’s about something much bigger.

Many everyday workflows involve unstructured text data:

GitHub issues
customer feedback
support tickets
meeting notes
task descriptions
survey comments

Traditionally, these workflows required manual reading and interpretation, which made them difficult to automate.

That’s why many teams simply accept them as manual work.

But AI Function changes this.

Instead of writing complicated scripts or building custom AI pipelines, you can simply describe the task in plain language and apply it directly to your data.

In the release note example above, AI Function helped automate tasks that previously required manual effort:

Categorizing issues by product area
Identifying whether they are bug fixes or enhancements
Rewriting issue titles into user-friendly descriptions

Once the workflow is built, it becomes reusable and reproducible.

Every new release can go through the exact same process automatically.

What used to be a repetitive manual task becomes a data workflow you run with one click.

And this idea applies far beyond release notes.

Anywhere you have rows of text data, AI Function can help you:

classify
summarize
transform
generate structured information

directly inside your data workflow.

This is why we built AI Function inside Exploratory.

Not to replace human thinking, but to remove the repetitive parts of working with text, so you can focus on the insights and decisions that matter.

AI Function turns messy text data into something you can actually work with, and once that happens, many workflows that used to be manual suddenly become automated.

Try AI Function with Your Own Data

If you work with text data such as:

issue logs
support tickets
customer feedback
meeting notes
task lists

You can build similar workflows with AI Function.

If you’re new to AI Function, I recommend starting with the examples available in the Create AI Function menu.

Try AI Functions Today!

You can start using AI Functions today in the latest version of Exploratory.

👉 Download Exploratory v14

https://exploratory.io/download

If you don’t have an Exploratory account yet, please sign up here to try it out. The first 30 days are a free trial period!

If your trial has already expired but you want to try the new AI features, simply launch the latest version and use the “Extend Trial” option in the dialog, or contact us (support@exploratory.io) directly.

If you have any questions or feedback, please contact me at kan@exploratory.io

We’d love to hear what data wrangling system you build with AI Functions!

Kan

CEO/Exploratory

Turn GitHub Issues into Release Notes with AI was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Data Science 2.0: A New Era of Text Data Analysis

Kan Nishida — Tue, 02 Dec 2025 01:11:55 GMT

How Generative AI and Prompts Are Transforming Data Science

Since ChatGPT 3.0 arrived in 2023, generative AI has reshaped nearly every industry — and data science was no exception.

Suddenly, we could ask a machine to summarize an article, classify customer feedback, translate text, or even reason about documents it had never seen before.

It was clear: we were entering a different world.

But to understand why this shift is so significant, let’s step back and look at how data science has evolved over the years.

Before Generative AI: What Is Data Science, Anyway?

Let’s first clarify what data science is. While there are various opinions depending on who you ask, it can be simply defined as the intersection of the following three fields:

Statistics
Machine Learning
Programming

Before “data science” became a buzzword, the people doing this type of work were simply called statisticians. I still remember meeting a statistician friend at a conference years ago — only to find his business card now said “Data Scientist.”

Why?

Because the discipline had started to shift.

The Rise of R, Python, and Open-Source Machine Learning

The difference between data science and traditional statistics was a frequently debated topic at the time, but broadly speaking, it came down to programming and machine learning.

With the rapid evolution of R and Python in the open-source community, using programming languages became a way to do data analysis and processing. And anyone who can program gained free access to advanced algorithms and models that were also developed in the open-source community.

Furthermore, the explosion of data generated from the internet and mobile devices led to the continuous development of highly complex machine learning and deep learning models utilizing this big data that delivered better prediction results.

This was a pivotal moment, why?

Because, previously, using such algorithms and models required purchasing expensive commercial software (e.g. SAS, SPSS, etc.) or building them from scratch. It was the “democratization of algorithms and models,” if you will.

And it changed everything.

The conventional wisdom that data analysis meant using statistical software was dramatically altered by the birth of these open-source programming languages and machine learning models. And the intersection of these three areas began to be called data science.

This was the birth of Data Science.

Machine Learning -> Generative AI

But things have changed dramatically just over the last few years.

When ChatGPT 3.5 was released in 2022, it quickly became clear that it could not only converse in a chat format but also interpret, summarize, classify, and answer questions about even the latest documents not included in its training data.

Previously, if you wanted to analyze the sentiment of text data, for example, you would have to collect a large amount of text data, build a machine learning or deep learning model based on that data, tune it for accuracy improvement, and then predict sentiment.

But, only a select few with data science knowledge, access to vast amounts of data, and programming skills could do something like this.

Then, with the emergence of GPT (Generalized Pre-Trained Transformer) models used in generative AI, this necessity disappeared.

Large Language Models (LLMs) like GPT are massive models already trained on virtually all digitized data worldwide, possessing the necessary capabilities to interpret any text (or image or video) information available. Therefore, when given text data that the model has supposedly never seen before, it can summarize, classify, and score the sentiment of such text data.

While not universally true, the era of building custom models by feeding them vast amounts of text (or image) data has largely ended.

For relatively smaller datasets typically encountered in business, or for numerical data, custom statistical or machine learning models still tend to offer better predictive accuracy than generative AI models. This is because GPT’s architecture is inherently specialized in recognizing patterns in high-dimensional data specific to text and pixels.

But, when it comes to text data?

Generative AI is the new default.

Programming -> Prompts

Another significant impact of generative AI on data science is related to accessing these models and “tuning” them to improve prediction accuracy.

Previously, this primarily involved writing programming languages like Python to build custom machine learning and deep learning models, and then tuning them by optimizing the parameters to improve their predictive accuracy.

However, with these new AI models, we don’t write programming languages or change parameters to tune them in order to improve the accuracy of the results or to get better output.

Instead, to improve prediction accuracy and obtain better output, we use prompts. And, the language we use for the prompt is our everyday language such as English.

This is a profound shift.

You no longer need programming skills to work with state-of-the-art models. Anyone can “tune” a world-class AI just by expressing what they want — in English, Japanese, or any language.

With this shift, machine learning in data science has been replaced by generative AI (GPT), and programming has been replaced by prompts. Combining these with statistics can be considered the new data science.

Of course, statistics has also been significantly impacted by the emergence of generative AI. However, statistics itself has not been transformed into something else or eliminated by AI. Rather, the execution and interpretation of statistics have become easier with AI support, at least that’s where we are now.

Challenges with AI Model for Data Analysis

But even with these advances, many people quickly discovered that:

“Using ChatGPT to analyze your own data doesn’t always work.”

Common frustrations include:

Inconsistent results halfway through
Output becomes sloppy or incomplete with many rows
Hard to verify correctness
Difficult to process for large datasets
Some data requires pre-processing before getting fed into AI, but AI doesn’t ‘magically’ take care of it.

For example, imagine having hundreds or thousands of lines of free-text responses from a survey. If you feed this data to an AI to score the sentiment of each sentence or classify them, some parts might work well, but often, the output becomes inconsistent midway through.

There’s no guarantee that the AI has actually “reviewed” all the data you provided in the way a human would expect before generating responses for everything. Consequently, when the data volume increases, the answers can become unstable, or you might need to pre-process the data before feeding it to the AI, or it might be difficult to judge if the results returned for hundreds or thousands of lines of data are correct.

Furthermore, knowing what kind of prompt to write is a significant challenge for many people.

In other words:

LLMs are powerful, but not reliable as a stand-alone data processing engine.

You need a system that:

Feeds data row by row
Ensures no rows are skipped
Produces consistent results
Lets you transform and clean data as preprocessing
Gives you a tool to explore and validate the result

This is exactly where Exploratory’s ‘AI Function’ comes in.

Exploratory v14 Introduces “AI Functions”

To overcome these challenges and bring generative AI into real analytics workflows, we built AI Functions in Exploratory v14.

AI Functions let you apply AI — reliably, row by row — to your own dataset.

Just ask AI to analyze, transform, or generate information from your data — directly inside Exploratory — simply by typing instructions in plain language.

For example, if you want to classify customer feedback, just type:

‘Classify the text into several groups.’

AI will immediately analyze every comment and assign a category to each one.

And this is only the beginning.

With AI Function, you can create all kinds of custom model functions like:

“Score the sentiment of the text.”

“Translate and summarize the text.”

“Standardize company names.”

“Using these user attributes, write an email to schedule a meeting with each customer.”

“Get population for each country presented in this data.”

and much more with AI directly inside Exploratory.

Of course, to create this function, you don’t need to know what functions to use or how to use, nor how to code in programming languages. You simply instruct the desired process in a prompt using everyday language. When executed, the AI returns results for each row according to the prompt’s instructions.

No coding.

No custom model building.

No machine learning pipeline.

All you need is to describe what you want in natural language.

Exploratory handles the data, AI handles the logic, and you get consistent results for every row of your data.

This is what Data Science 2.0 promises.

How to Use AI Functions

Using AI Function inside Exploratory is straightforward.

First, select “Create AI Function” from the column header menu.

Then, enter the instruction describing what you want. For example:

Classify the sentences into several groups.

If you already know which categories you want, add them to your prompt for higher accuracy.

For example, here’s an example of classifying text data into 9 groups.

Here is an example of classifying text into 9 groups.

About Data Splitting & Parallel Processing

By default, Exploratory splits large datasets (200+ rows) into smaller chunks and processes them in parallel. This dramatically improves performance.

Splitting data:

Combining results:

However, splitting means that each AI request only sees part of the data. This can sometimes lead to slightly different interpretations from chunk to chunk.

If you prefer the AI to analyze the entire dataset at once, you can turn off:

If you are not happy with the result and rather want the AI to analyze the entire dataset at once, you can turn off ‘Enable parallel processing by splitting data’ option.

This will send the whole data to the AI model at once. The result will be globally consistent, though it will take longer time to process.

AI Prompt Templates

You can save your prompt instruction as ‘Template’ so that you can use it later.

Once you saved it you can select it under ‘Use Template’ for any data you want to run the prompt with.

Any Prompt Examples?

Yes! In order to get you started quickly, we have prepared many examples. Visit the AI Function Gallery page to see prompt examples and downloadable sample data you can test yourself.

Why Exploratory’s AI Functions?

When you use AI Functions inside Exploratory, you don’t just “call an AI model”. You combine AI with a complete data analysis environment.

This creates a workflow that is far more reliable, flexible, and powerful than sending raw data to ChatGPT, etc.

Here are the key advantages:

Flexible Pre-processing Before AI — Clean, filter, reshape, or join data before sending it to AI — ensuring high-quality, consistent outputs.
(Here’s a detailed post on Exploratory’s AI Data Wrangling.)
Visual Verification You Can Trust — Use summary views, charts, and comparisons to check results quickly and intuitively. No guessing whether the AI processed everything.
Seamless Connection to Deeper Analysis — Once your AI Function outputs are ready, you can immediately continue with Analytics such as Statistical Tests, Multivariate Analysis, Factor Analysis, etc. and by visualizing data with various types of charts. Everything stays connected and reproducible.
Fast Performance with Parallel Processing — Exploratory automatically splits large datasets and processes them in parallel — so AI stays fast even for thousands of rows.
Stable, Row-by-Row Accuracy — Results stay consistent and reliable across your entire dataset. No missing rows. No unnoticed failures.
Reusable Prompts as Templates — Turn any instruction into a reusable “function template” and apply it again and again — just like a real analytical function, but powered by AI.

Data Science 2.0 is here!

With AI function, you can transform, enrich, and analyze text data simply by describing what you want:

Sentiment scoring
Cleaning and standardizing company names
Detecting a customer’s country from a phone number
Translating sentences
Extracting company attributes (industry, size, region) from email domains
Auto-generating email drafts based on customer attributes
Classifying feedback into categories
Summarizing long text fields

And much more.

In Data Science 2.0 — where the triangle is Statistics × AI × Prompt — the ability to describe your intent in natural language and let Exploratory execute it reliably is a breakthrough.

But, generative AI alone isn’t enough. You need a platform that prepares the data, executes the prompts safely and intelligently, and integrates the results into analytics and visualization.

This is why we built AI Function with Exploratory v14. And we believe this will make data science more accessible to a wider audience, leading to business improvements and better decision-making.

Try AI Functions Today!

You can start using AI Functions today in the latest version of Exploratory.

👉 Download Exploratory v14
https://exploratory.io/download

If you don’t have an Exploratory account yet, please sign up here to try it out. The first 30 days are a free trial period!

If your trial has already expired but you want to try the new AI features, simply launch the latest version and use the “Extend Trial” option in the dialog — or contact us directly.

If you have any questions or feedback, please contact us at kan@exploratory.io

We’d love to hear what you build with AI Functions — and how Data Science 2.0 transforms your workflow! 🚀

Data Science 2.0: A New Era of Text Data Analysis was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Exploratory v14 — A New Era of AI-Powered Data Analysis

Kan Nishida — Mon, 24 Nov 2025 14:35:00 GMT

Exploratory v14 — A New Era of AI-Powered Data Analysis

A new paradigm for text analysis, auto-chart interpretation, and automated reporting.

Text analysis that used to take hours now takes minutes.

Reports that once felt painful now write themselves.

Charts that you’ve stared at for years now explain their own insights.

I’m super excited to introduce Exploratory v14, our most transformative release ever! 🎉

This version brings two breakthrough features — AI Model Functions and AI Note Editor — that make data analysis dramatically easier, faster, and smarter. Whether you’re analyzing & cleaning data, interpreting charts, or writing reports, Exploratory v14 helps you get higher-quality results in a fraction of the time.

1. AI Functions: Create AI Model Functions with Your Own Words

You can now ask AI to analyze, transform, or generate information from your data — directly inside Exploratory — simply by typing instructions in plain language. We call this new capability AI Function.

For example, if you want to classify customer feedback, just type:

‘Classify the text into several groups.’

AI will immediately analyze every comment and assign a category to each one.

And this is only the beginning. With AI Function, you can create all kinds of custom model functions like:

“Score the sentiment of the text.”

“Translate and summarize the text.”

“Standardize company names.”

“Using these user attributes, write an email to schedule a meeting with each customer.”

“Get population for each country presented in this data.”

Just describe what you want, and AI takes care of the rest — no model building, no coding, no training data required.

If you wanted to do something like this before, you would have had to create your own algorithms or train machine learning models using large text data you collected.

Not anymore!

Thanks to generative AI (LLM), anyone can now perform tasks like text classification, sentiment scoring, prediction, translation, and data matching just by writing prompts in plain English.

How to use AI Function?

Using AI Function inside Exploratory is straightforward.

First, select “Create AI Function” from the column header menu.

Then, enter the instruction describing what you want. For example:

Classify the sentences into several groups.

If you already know which categories you want, add them to your prompt for higher accuracy.

Here is an example of classifying text into 9 groups.

About Data Splitting & Parallel Processing

By default, Exploratory splits large datasets (200+ rows) into smaller chunks and processes them in parallel. This dramatically improves performance.

Splitting data:

Combining results:

However, splitting means each AI request only sees part of the data. This can sometimes lead to slightly different interpretations from chunk to chunk.

If you prefer the AI to analyze the entire dataset at once, you can turn off:

If you are not happy with the result and rather want the AI to analyze the entire dataset at once, you can turn off ‘Enable parallel processing by splitting data’ option.

This will send the whole data to the AI model at once. The result will be globally consistent, though it will take longer time to process.

I’ve written more about these trade-offs in a separate post, so feel free to check that out if you want the full details.

Want Prompt Examples?

Yes! In order to get you started quickly, we have prepared many examples. Visit the AI Function Gallery page to see prompt examples and downloadable sample data you can test yourself.

Anyway, with AI function, you can do many other things simply by describing what you want to get from your data in the prompt such as:

Sentiment scoring
Company name cleanup & standardization
Detect country from phone number
Translate text
Adding company information (industry, size, etc.) from email addresses
Automatically generating email drafts based on user attributes

2. AI Note Editor: Create Reports 10x Faster, 10x Better

In a typical data analysis workflow, writing the report is just as important as running the analysis itself.

But let’s be honest: most of us don’t enjoy writing.

So what happens? Screenshots get dropped into Slack… or pasted into PowerPoint… or left forgotten in a folder somewhere.

To make report writing easier, faster, and more effective, we’re introducing AI Note Editor in Exploratory v14. With AI Note Editor, you can turn your charts, comments, and analysis steps into high-quality reports — generated automatically by AI right inside Exploratory.

Just add your charts and comments, and AI Note Editor will write a polished, structured report based on the context.

Automatic Chart Interpretation

AI Note Editor doesn’t just summarize your text, it can also interpret your charts and explain what’s happening in the data.

For example, given a radar chart like this, AI will describe the key patterns, strengths, weaknesses, and overall story behind the data.

For example, given a radar chart like the one below, AI will describe the key patterns, strengths, weaknesses, and overall story behind the data.

Even better, chart interpretation works across many chart types:

Scatter Plots: evaluates correlation strength & significance
XmR / Control Charts: identifies statistical signals
Time Series: detects trends, change points, and comparison against benchmarks

You can see detailed examples in this separate article.

This “chart interpretation” feature not only saves time for writing reports, but also improves the quality of analysis by helping you catch insights you might have missed or misinterpreted.

AI Tools for Better Writing

AI Note Editor also includes a set of tools to elevate your writing:

Summarize long text
Fix grammar and spelling
Improve clarity and tone
Refine wording and expression
Translate your writing

Whether you’re writing an internal update, a weekly KPI brief, or a full analysis report, AI Note Editor makes the process dramatically easier.

Custom Prompts

Need a report in a specific structure — like bullet points, executive summaries, or headings?

Just use ‘Run Custom Prompt’ and describe the format you want using prompts (Markdown supported).

You can also save your custom prompts as templates for future reuse.

Once saved, they appear in the Templates list:

This is incredibly useful for reports you produce regularly — weekly reports, monthly summaries, recurring analyses, etc. Just pick a template and generate the report instantly, while maintaining consistent quality every time.

Prompt Sample Collection

To help you get started, we have prepared a collection of prompt examples in the AI Note Editor Gallery. You can browse through the prompts and copy & paste to start!

Other New Features

Along with the two major features, Exploratory v14 also includes several powerful enhancements that make your analysis workflow even smoother.

Here are a few highlights.

New UI & Enhancements for Reference Lines

Reference Lines just got a major upgrade in v14. We redesigned the UI/UX around them so you can add and manage reference lines more easily — right from the top toolbar using the new Reference Line button.

With this update, the following tasks are now much simpler!

1. Centralized Management for All X and Y Axes

When you click the Lines button, all reference lines are now displayed in a single dialog. You can edit, reorder, or delete reference lines for both axes in one place.

2. Displaying Multiple Reference Lines

You can now add multiple reference lines to the Y-axis as well, making it easier to highlight targets, thresholds, or comparison lines directly on your chart.

3. New Types of Reference Lines

Several calculations previously only available through Window Calculations can now be displayed as reference lines.

For example:

Add last year’s same-month value as a reference line to compare against current sales.

Or create Pareto-style visuals by adding cumulative totals or cumulative percentage lines alongside your bar charts:

These enhancements make it easier to bring meaningful context into your charts without additional steps or calculations.

Pivot Table: Window Calculations Now Supported in Totals & Subtotals

In previous versions of Exploratory, when you used Window Calculations (such as Difference from Previous Period or Percentage Difference from Previous Period) in a Pivot Table, the Totals and Subtotals would still use the original aggregation function (e.g., sum).

This often resulted in totals that didn’t match the logic of the window calculation being applied.

With Exploratory v14, Totals and Subtotals now fully respect your Window Calculations.

This means:

If a column is showing Difference from Previous Period, the total column will also calculate the difference of the totals, rather than just summing the raw values.
If a column is showing Percentage Difference, the totals and subtotals will now show percentage differences based on aggregated values, not simple sums.

For example, in the table below, the far-right Total column now displays the difference or percentage difference calculated from the aggregated totals — giving you a much more accurate and intuitive summary.

This enhancement also works for categorical breakdowns.

For example, if you calculate the percentage of employees across three marital statuses for each job type,

each job type’s Subtotal will now correctly show the percentage by marital status.
the overall Total at the bottom will show the percentages by marital status based on all employees.

Additionally, for example, if you have employee data and are calculating the percentage of employees divided into three marital statuses for each job type, the subtotal row for each group (job type) will display the total for that group, and the overall “Total” (at the bottom) will display the percentage for each marital status based on the total of all rows.

This update makes Pivot Tables more consistent, more accurate, and far more useful when working with Window Calculations such as period-over-period, ratio, cumulative sum, moving average, etc.

Number Chart: Sub-Indicator Color Flip & Custom Label

The Number chart — widely used for KPIs in dashboards — allows you to display a sub-indicator beneath the main value (such as month-over-month change or year-over-year difference). Traditionally, this sub-indicator automatically showed:

Green for positive changes
Red for negative changes

Now, in Exploratory v14, you can flip these colors whenever the meaning of “good” and “bad” is reversed for your metric.

This is especially useful for KPIs where lower is better, such as:

Cancellation rate
Return rate
Error rate
Customer complaints
Downtime

With the new Flip Positive/Negative Colors option:

A positive increase (e.g., churn went up) → shown in red
A negative decrease (e.g., churn went down) → shown in green

Custom Label Text for Sub-Metric

And you can add a custom label text to explain what the sub-metric is.

With these updates with Number chart, your dashboards can communicate meaning more clearly without forcing your audience to reinterpret the numbers every time.

Experience v14 — And See What’s Now Possible

Exploratory v14 brings AI deeper into the analytics workflow than ever before — from transforming data, to interpreting charts, to generating full analysis reports. These new capabilities unlock faster, clearer, and more powerful insights for everyone, regardless of technical background.

Try the new features. Explore what they can do.
And let AI remove the friction so you can focus on thinking, discovering, and making better decisions.

We hope v14 transforms the way you work with data, and we can’t wait to hear what you think.

Try Exploratory v14!

Please download Exploratory v14 here and try new AI Function & AI Note Editor!

If you don’t have an Exploratory account yet, please sign up here and try it out. The first 30 days are a free trial period!

If your trial period has already expired but you would like to try this new version, please contact us via the trial extension link that appears in the dialog when you launch the latest version of Exploratory.

If you have any questions or feedback, please contact us at support@exploratory.io!

Exploratory v14 — A New Era of AI-Powered Data Analysis was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Exploratory 13.7 Released!

Kan Nishida — Sat, 16 Aug 2025 04:12:38 GMT

We’re excited to announce the release of Exploratory v13.7!

Although this is a patch release, just like our recent updates, it comes with impactful enhancements that will make your experience faster and more flexible. Here’s what’s new.

⚡ Dashboard Performance Improvements

We’ve made dashboard interactions noticeably faster — both in Exploratory Desktop and on Exploratory Server.

In Exploratory Desktop

20–30% faster end-to-end chart rendering thanks to optimized rendering logic.
When “Update Other Pages” is turned on, charts on the current page now refresh immediately without waiting for other pages to update. When you switch to another page, its charts will already be ready.

You can click on charts to filter the data in Dashboard.

Or, you can use Parameter to filter the data.

💡 Pro Tip: Enable data cache on data wrangling steps used by your dashboard. This loads pre-processed data directly into the server workspace instead of re-running queries or wrangling steps — cutting launch time even more.

Dashboard at Exploratory Server

Entering Interactive Mode is now faster, so you can start exploring your dashboard data with less waiting.

As you might have known, you can publish your Dashboard to Exploratory Server so that you can share with others. Once published, you or viewer users you have shared with can interact with the Dashboard.

But before interacting it, it needs to launch an interactive mode, which creates a working area at the server where the underlying R data processing and optionally SQL queries if used.

Pivot Table — Custom Aggregate Functions for Totals/Subtotals

You can now override the aggregate function used for Grand Totals and Subtotals in Pivot Tables and Summarize Tables.

By default, it uses the same function that you select for the values. Let’s say you selected ‘Sum’ function for Sales column in the Value then the same ‘Sum’ function will be used to calculate the Total and Subtotal.

But now, you can override it.

Example 1: Want to use Sum for each group’s sales values but Mean for the Grand Total.

In such cases, you can select ‘Mean’ in the Format dialog.

Example 2: Hide Totals/Subtotals since they don’t make sense for my analysis (e.g., % change month-over-month).

Here’s an example of calculating the % of difference from the previous value so that we can see how much % of sales growth is by each month. In this case, showing the Total and the Subtotal don’t make sense.

Filter — First Day of N Month Ago & Last Day of N Month Later

We’ve added more relative date filtering flexibility:

First Day of N Months Ago
Last Day of N Months Later

Perfect for scenarios like showing data from “the start of 3 months ago to the end of 3 months later.”

You can now select such options from the Filter’s dropdown.

We have also added the ‘1st Day of Month’ and ‘Last Day of Month’ options under the ‘Relative Date Filter’ operator as well.

You can check out more details for the newly added Date filter options here.

✨ Other Enhancements and Fixes

As always, we’ve included additional improvements and bug fixes.
🔗 Release Notes — Exploratory v13.7

📥 Upgrade Now to Exploratory v13.7

Exploratory v13.7 is available today, get the latest version today!

Download

If you haven’t signed up yet, start your free 30-day trial. If your trial has expired, you can request an extension directly inside the app.

If you have any questions or feedback, feel free to reach out to me at kan@exploratory.io — we’d always love to hear your feedback!

Cheers,
Kan

🚀 Exploratory 13.7 Released! was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Unlocking Insights from Open-Ended Survey Responses with AI-Powered Text Analysis in Exploratory

Kan Nishida — Wed, 30 Jul 2025 11:30:24 GMT

Have you ever collected survey responses to open-ended questions like “Why did you cancel?” or “Any suggestions for improvement?” — only to be left with a pile of text and no clear path to action?

For such free form text data, you can read one by one to understand what your customers or audience are saying until you get 100s or 1,000s of them. At this point, you can still read it one by one if you have a time. Or you might use AI to summarize it for you. But the problem of reading one by one is that not only it takes long time but also you lose big picture based on the patterns in such feedback text.

And the problem with AI summarizing is that you always wonder ‘Is that all? Did I miss something important?’

This is exactly where Exploratory’s Text Analysis with AI Summary comes in. It helps you uncover patterns, themes, and actionable insights from free-form text — without drowning in the details.

Step 1: Text Analysis with Word Count

Start by selecting your free-text column and running the Word Count analysis.

You’ll first see a Word Cloud, which highlights the most frequently used words.

But for a clearer view, we also provide a Bar Chart that ranks word frequencies precisely.

Want to go further? Check out the Word Combination chart, which shows pairs of words that frequently appear together.

Step 2: Discover Hidden Themes with the Co-Occurrence Network

Here’s where things get interesting.

The Co-Occurrence Network Diagram maps how words relate to each other across responses. It reveals clusters of words used together, showing both common and subtle themes.

While the Word Combinations chart shows only two words frequency, this network diagram shows way more than just two.

For example:

The word “time” (orange cluster) often appears with “presentation,” “management,” “little,” and “much.”
The word “think” (green cluster) shows connections to “participants,” “similar,” and “good.”

You can explore the relationships by word color (themes) and line thickness (strength of connection).

But, I know what you are thinking.

This seems to be too much information, what are the patterns are we supposed to see here?
How exactly those words are used together in what context?

To answer these questions, we can get a help from AI.

Step 3: Let AI Summarize the Patterns for You

Click the AI Summary button, and AI will analyze the network structure and provide a concise summary for each theme — complete with example comments.

For example, the orange cluster is summarized as ‘time allocation and session management’. And it shows 5 example text such as:

“I think it would be better to put some restrictions on the presentation time and the number of presentation slides so that presentation finish on time.”
There were a lot of very rich and exciting talks, but I felt it little too much. So maybe it would be good to have seminar that only focus on two case studies. Thank you very much for this time.

It sounds like the participants in this seminar were not satisfied with the time management or allocation of the talks.

The cool thing about this is that it tells us something about the groups with less frequent words, which could have been easily missed by just looking at the network diagram.

For example, there is a group of people talking about a desire for more data, examples, and explanations related to data analysis. And their comments are something like:

I wanted to know the success pattern of data analysis for each industry.
I want you to distribute the materials as data.
I think I wanted to see more concrete examples and demos.

This is where AI truly shines — it captures not only the loudest voices but also the quieter, insightful ones that often go unnoticed. And it gives me what we should do to prepare for the next seminar.

Reality of Text Analysis with AI

Since the rise of ChatGPT, I’ve tried using AI to analyze text directly. While it can be useful, there are common frustrations:

Different AI models give different results — some of them finds 6 clusters, others only 4. What happened to the other 2?!
Sometimes it stops analyzing after a certain number of rows.
And therefore, we still have to go back and verify everything against the raw data.

Why? Because large language models are probabilistic, not deterministic. They can be helpful — but they’re not always reliable.

Why AI Summary in Exploratory Is Different

With Exploratory, the AI doesn’t guess.

We give it the already-analyzed data — like word counts, co-occurrence stats, and clusters — so it can summarize based on structured, reliable information.

That means:

More consistent and explainable results
Better performance on rare but important themes
And peace of mind that what you’re seeing is grounded in data

AI is pretty good at summarizing the data by using the knowledge of the world. Text analysis result like the Co-occurrence network diagram is useful but too much information for human to digest, but not for AI.

Just like we can depend on AI at the level of almost 100% for summarizing documents or articles today, we can count on AI for the accuracy of summarizing something that is already been analyzed.

As we have seen with AI Summary for Text Analysis, using AI in a somewhat narrow scope and in a specific way is a way to go when it comes to the text analysis as of today.

Try It Yourself

Whether you’re preparing for your next customer seminar or trying to make sense of user feedback, Text Analysis with AI Summary in Exploratory gives you high-quality insights from your text data. It’s fast, intuitive, and surprisingly powerful.

You can download Exploratory from here.

If you haven’t signed up yet, start your free 30-day trial. If your trial has expired, you can request an extension directly inside the app.

Note: The AI Summary feature is available for all editions includes Personal, Business, Business Plus, Academic, and Public! 🔥

Let us know what you think — and we can’t wait to hear what insights you’ll be able to gain from your text data!

Cheers,

Kan (CEO / Exploratory)

Contact: kan@exploratory.io

Unlocking Insights from Open-Ended Survey Responses with AI-Powered Text Analysis in Exploratory was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Exploratory v13.5 & v13.6 Released — Selective Data Re-Import, New Date Filters, 2FA, and More!

Kan Nishida — Tue, 29 Jul 2025 15:33:51 GMT

Exploratory v13.5 & v13.6 Released — Selective Data Re-Import, New Date Filters, 2FA, and More! 🚀

We’re excited to announce that Exploratory Desktop v13.5 was released last night — and v13.6 followed this morning! 🎉

These are patch releases with several useful enhancements and important bug fixes to make your experience smoother and more secure.

Here are some of the key updates:

🚀 Feature Highlights — v13.5 / v13.6

Data Re-Import

When you have multiple data sources for your data frame, now you can choose exactly which ones to re-import from — giving you more control and flexibility.

Date Filter

We’ve added new options to the Date Filter, including:

“1st Day of N Months Ago”
“Last Day of Next N Months”
…and more to make dynamic filtering even easier.

Pivot / Summarize Table

When exporting from Pivot Table or Summarize Table, Grand Totals and Subtotals are now included automatically in the exported data.

Security

You can now enable Two-Factor Authentication (2FA) or Multi-Factor Authentication (MFA) from your Account Setting page for added security.

✨ Plus Other Enhancements and Fixes

For other enhancements and bug fixes with Exploratory v13.5 / v13.6, you can check out the release notes here:
🔗 Release Notes — Exploratory v13.6

📥 Upgrade Now to Exploratory v13.6

Exploratory v13.6 is available today, get the latest version today!

Download

If you haven’t signed up yet, start your free 30-day trial. If your trial has expired, you can request an extension directly inside the app.

If you have any questions or feedback, feel free to reach out to me at kan@exploratory.io — we’d always love to hear your feedback!

Cheers,
Kan

Exploratory v13.5 & v13.6 Released — Selective Data Re-Import, New Date Filters, 2FA, and More! 🚀 was originally published in learn data science on Medium, where people are continuing the conversation by highlighting and responding to this story.