<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[learn data science - Medium]]></title>
        <description><![CDATA[Unpacking Data Science One Step At A Time - Medium]]></description>
        <link>https://blog.exploratory.io?source=rss----6ea408ec434d---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>learn data science - Medium</title>
            <link>https://blog.exploratory.io?source=rss----6ea408ec434d---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 06 Apr 2026 23:19:26 GMT</lastBuildDate>
        <atom:link href="https://blog.exploratory.io/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Why Deep Learning Didn’t Replace Tree Models for Tabular Data]]></title>
            <link>https://blog.exploratory.io/why-deep-learning-didnt-replace-tree-models-for-tabular-data-d80b796d652f?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/d80b796d652f</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Mon, 06 Apr 2026 02:38:19 GMT</pubDate>
            <atom:updated>2026-04-06T02:38:19.186Z</atom:updated>
            <content:encoded><![CDATA[<h4><em>Why models like XGBoost and LightGBM still dominate structured data problems.</em></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*57av6R0oTPPTCwFdClUrag.png" /></figure><p>Over the past decade, the world of machine learning and AI has been dominated by one idea: deep learning.</p><p>Neural Networks — the algorithms behind deep learning — has transformed fields like:</p><ul><li>computer vision</li><li>speech recognition</li><li>natural language processing</li></ul><p>Neural networks have transformed fields like image recognition, speech processing, and natural language understanding. Models such as transformers and convolutional neural networks now power everything from ChatGPT to self-driving cars.</p><p>Given that success, it seemed inevitable that deep learning would replace traditional machine learning everywhere.</p><p>But something interesting happened.</p><p>In the world of tabular data, the structured datasets used by most businesses, deep learning never completely took over.</p><p>Instead, algorithms like XGBoost and LightGBM continue to dominate many real-world machine learning applications.</p><p>This sounds surprising to some. But once you understand the nature of tabular data and what deep learning is good at (and not good at), it starts to make sense.</p><h3>Most Businesses Data Are Tabular Data</h3><p>Most organizations are not training AI models on billions of images or internet-scale text corpora.</p><p>Instead, they work with data that looks something like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WB7DjS9qZLqoWyYvh8enBA.png" /></figure><p>Columns represent different concepts:</p><ul><li>age</li><li>income</li><li>purchase behavior</li><li>geographic region</li></ul><p>Unlike images or language, there is no inherent structure connecting these variables.</p><p>Images have spatial structure</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rCGXp8Ef61wutW0I0eZ-bQ.png" /></figure><p>Language has sequential structure</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*u9M53dXZTKxdowwXaOFQBA.png" /></figure><p>But, tabular data is different. It’s simply a collection of variables describing some phenomenon. And it doesn’t have the kind of structure that images and text have.</p><p>Tabular data</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fvvO4WciUlAy9g8Wep1V-Q.png" /></figure><p>Deep learning thrives when data has hierarchical structure (images, language).</p><p>Think of the modern Transformer, which is one of the most important algorithms in deep learning and AI today. When it builds a model on a given text data, it takes the sequence of text and the relation between words, sentences, etc. into account, and predicts the next word.</p><p>But, tabular data typically does not have such sequence and structure. Each value in a given variable is typically independent from other values.</p><p>And it turned out that this difference matters a lot.</p><h3>Why Tree Based ML Models Work So Well</h3><p>Machine Learning models like XGBoost, LightGBM, Random Forest, etc. are still the most common algorithms among practitioners who are building prediction models with business data, and they all share a common fundamental architecture.</p><p>That is Decision Trees.</p><p>Decision trees approach the problem in a way that feels very natural for this type of data.</p><p>Instead of learning abstract representations, they learn rules.</p><p>For example:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QJOru7Oaroj2hPERkLvwVw.png" /></figure><p>Or:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vi7udNnaiFT94bw11EWK0w.png" /></figure><p>These kinds of conditional rules often show up in real-world datasets, and Decision trees are extremely good at discovering them.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*L3Ks5NLUisgpEho2OkDc6g.png" /></figure><h3>The Rise of XGBoost</h3><p>Around the mid-2010s, one algorithm in the tree family became very popular.</p><p>That algorithm was XGBoost.</p><p>XGBoost implemented gradient boosting in a highly optimized way and quickly became the default choice for many machine learning practitioners working with tabular data.</p><p>It builds trees sequentially, each one correcting the mistakes of the previous model.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*GrFtE50PoiEA8TD-.png" /></figure><p>For several years, it dominated the eyes of data science practitioners who were building prediction models for business data.</p><p>But as datasets grew larger, people began to encounter a new problem.</p><p>Training these models could take a long time.</p><h3>Enter LightGBM</h3><p>In 2017, researchers at Microsoft introduced a new boosting framework called LightGBM.</p><p>The goal wasn’t to reinvent gradient boosting.</p><p>Instead, the idea was to make boosting lighter.</p><p>The word <em>Light</em> in LightGBM refers to being lightweight in computation and memory usage.</p><p>Several clever design decisions helped achieve this:</p><ul><li>trees grow leaf-wise, focusing computation on the most useful splits</li><li>features are converted into histogram bins to reduce split evaluations</li><li>rows with large gradients are prioritized using GOSS</li><li>sparse features are compressed using exclusive feature bundling (EFB)</li></ul><h3>Level-wise vs Leaf-wise growth</h3><p>Level-wise (XGBoost)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zRsoL1AXHBVfumTg7jhLgw.png" /></figure><p>Leaf-wise (LightGBM)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*a2xmvGDdJqcJLz8lX7wzag.png" /></figure><p>LightGBM grows trees where the loss decreases most, focusing computation on the most informative parts of the model.</p><p>Together, these ideas dramatically reduce the amount of computation required to train models.</p><p>For people experimenting with machine learning models by trying many features and hyper-parameters, this speed improvement made a huge difference.</p><h3>Why Deep Learning Often Struggles?</h3><p>It’s not that people stopped trying, in fact many researchers have been trying applying neural networks to tabular datasets.</p><p>But very often, Decision tree based boosting models such as XGBoost, LightGBM, still performed better.</p><p>The reason is surprisingly simple.</p><p>Deep learning excels when the data contains rich internal structure.</p><p>Images contain spatial patterns. Language contains grammatical patterns.</p><p>Tabular data usually does not.</p><p>Instead, tabular datasets often contain:</p><ul><li>diverse mix of variables</li><li>engineered features (artificially created extra variables based on original variables)</li><li>sparse categorical encodings (one-hot encoding)</li><li>nonlinear feature interactions</li></ul><p>Decision trees are well suited to discovering these kinds of patterns.</p><h3>What This Means in Practice</h3><p>In many real-world machine learning projects, the workflow often looks like this:</p><ol><li>Build a baseline model.</li><li>Try a tree boosting algorithm.</li><li>Improve the model through feature engineering and parameter tuning.</li></ol><p>And very often, the models that end up performing best are based on boosting algorithms such as XGBoost, LightGBM, etc.</p><p>These algorithms have become reliable workhorses for tabular data problems.</p><h3>Trying These Models Yourself</h3><p>One of the motivations behind building Exploratory was to make various data science tools easier to use, and building prediction models with machine learning models is one of them.</p><p>In Exploratory, you can train tree based models such as:</p><ul><li>Random Forest</li><li>XGBoost</li><li>LightGBM</li></ul><p>directly from an interactive interface.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*4xO-j6CqRxB-6RH1.png" /></figure><p>You can take a look at <a href="https://exploratory.io/note/exploratory/Introduction-to-LightGBM-eAK7zbZ9">this how-to note</a> for more details on how to use LightGBM.</p><p>Instead of writing large amounts of code, you can focus on exploring your data, building features, and comparing models.</p><p>If you work with tabular data such as customer behavior, financial data, operational metrics, etc., it’s definitely worth trying them.</p><h3>A Final Thought</h3><p>As it turned out, the deep learning revolution didn’t eliminate traditional machine learning algorithms.</p><p>Instead, it clarified something important.</p><p>Different types of data require different tools.</p><p>For images and language, as you know, deep learning dominates.</p><p>But, for tabular data, tree based algorithms like XGBoost and LightGBM remain some of the most powerful methods available.</p><p>One of the things I appreciate about boosting algorithms like XGBoost and LightGBM is how practical they are.</p><p>You don’t need massive infrastructure or complicated neural network architectures. You start with your data, build some features (if required), and let the model discover useful patterns.</p><p>In many cases, the results are surprisingly good!</p><p>In Exploratory, you can train models such as Random Forest, XGBoost, and LightGBM directly from an interactive interface and compare their performance on your dataset.</p><p>Sometimes the easiest way to understand the strengths of these models is simply to see how they perform on your own data.</p><h3>Download Exploratory</h3><p>You can start using XGBoost, LightGBM, Random Forest, and other models today in the latest version of Exploratory.</p><p>👉 Download Exploratory</p><p><a href="https://exploratory.io/download">https://exploratory.io/download</a></p><p>If you don’t have an account yet, sign up here to start your 30-day free trial.</p><p><a href="https://exploratory.io/">https://exploratory.io/</a></p><p>If your trial has expired, simply launch the latest version and use the Extend Trial option.</p><p>If you have questions or feedback, feel free to contact me at <a href="mailto:kan@exploratory.io">kan@exploratory.io .</a></p><p>We’d love to hear how you’re using Exploratory to uncover insights in your data.</p><p>Kan Nishida</p><p>CEO, Exploratory</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d80b796d652f" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/why-deep-learning-didnt-replace-tree-models-for-tabular-data-d80b796d652f">Why Deep Learning Didn’t Replace Tree Models for Tabular Data</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[LightGBM Explained: How It Differs from Random Forest and XGBoost]]></title>
            <link>https://blog.exploratory.io/lightgbm-explained-how-it-differs-from-random-forest-and-xgboost-286836838fe7?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/286836838fe7</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Sun, 22 Mar 2026 17:03:57 GMT</pubDate>
            <atom:updated>2026-03-22T17:41:54.095Z</atom:updated>
            <content:encoded><![CDATA[<p><em>The evolution of tree-based models — from robustness to optimization to scalability</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*dhq6nuirJ7cliKbQ.png" /></figure><p>If you work with tabular data (table data), the kind of structured data found in business analytics, finance, marketing, or operations, you’ve probably encountered three popular machine learning algorithms:</p><ul><li>Random Forest</li><li>XGBoost</li><li>LightGBM</li></ul><p>All three rely on decision trees, and they can all produce very strong predictive models. But they were designed with different priorities in mind.</p><p>Random Forest emphasizes simplicity and robustness. XGBoost focuses on highly optimized gradient boosting. LightGBM was created to make boosting faster and more scalable.</p><p>So, which one to choose?</p><p>It depends…</p><p>And this is why I wrote this blog post.</p><p>Understanding why LightGBM was created and how it works makes it much easier to decide whether it’s the right tool for your problem.</p><h3>The Problem LightGBM Was Designed to Solve</h3><p>By the mid-2010s, gradient boosting had already proven to be one of the most powerful techniques for predictive modeling on tabular data, or structured data if you will.</p><p>In particular, XGBoost had become extremely popular after dominating many machine learning competitions.</p><p>However, as datasets continued to grow, practitioners began to encounter new challenges.</p><p>The performance.</p><p>Datasets with millions of rows become normal, and variables (or features) grow thousands, which caused sparse features produced by one-hot encoding.</p><p>Gradient boosting was powerful, but it could also be computationally heavy. This means that it takes time to build models with boosting algorithms when the data size is big.</p><p>Researchers at Microsoft set out to redesign parts of the algorithm so it could handle large datasets more efficiently.</p><p>The result was LightGBM, and they released it as open source in 2017.</p><h3>Why Is It Called “LightGBM”?</h3><p>LightGBM stands for ‘Light Gradient Boosting Machine’.</p><p>The word “Light” does not refer to the speed of light.</p><p>Instead, it refers to the algorithm being lightweight in computation and memory usage.</p><p>The goal was to create a gradient boosting system that could:</p><ul><li>train faster</li><li>use less memory</li><li>scale to larger datasets</li></ul><p>while still maintaining strong predictive performance.</p><p>Before diving into LightGBM’s innovations, it helps to understand how the three algorithms differ conceptually.</p><h3>Random Forest: Many Independent Trees</h3><p>Random Forest builds many trees independently.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*89Fmf8FFrhdNfVbpMlQgOw.png" /></figure><p>Each tree:</p><ol><li>samples the dataset randomly</li><li>selects variables (or features) randomly when splitting</li><li>produces its own prediction</li></ol><p>The final prediction is simply the average (regression) or majority vote (classification).</p><p>Key idea is that many independent trees reduce variance and improve stability compared to a single tree (Decision Tree).</p><p>But the trees do not learn from each other.</p><h3>XGBoost: Trees That Correct Mistakes</h3><p>XGBoost uses a technique called gradient boosting. Instead of building independent trees, it builds trees sequentially to correct the errors made by the previous trees.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fQR2leC8wtZ44U0AoAtbeQ.png" /></figure><p>Boosting algorithms are really performing a form of gradient descent in function space.</p><p>It builds the first tree and predict and calculate the errors (or loss). If it was a regression problem then the errors can be the residual between the actual and predicted values. And this is called ‘Gradient’.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JNaiZyOh6rK_7ipcddk_7w.png" /></figure><p>The next tree will be built to predict the gradient values, not the actual values because the gradient tells the model how to move predictions to reduce loss.</p><p>And combining the predicted value from the first tree and the predicted values from the second tree will become the new predicted values as a model.</p><blockquote>prediction_new = prediction_from_first_tree + learning_rate × prediction_from_second_tree</blockquote><p>Conceptually, the model evolves like this:</p><ul><li>Tree1 → initial prediction</li><li>Tree2 → fix errors from Tree1</li><li>Tree3 → fix remaining errors</li><li>Tree4 → continue improving</li></ul><p>This process usually leads to more accurate models than Random Forest.</p><p>However, as datasets grow larger, training boosting models can become computationally expensive.</p><p>That is where LightGBM comes in.</p><h3>LightGBM: Designed to scale</h3><p>LightGBM does not change the basic idea of gradient boosting.</p><p>Instead of optimizing only the boosting algorithm, LightGBM also optimizes:</p><ul><li>how trees grow</li><li>how rows are sampled</li><li>how splits are evaluated</li><li>how features are represented</li></ul><p>to scale to very large datasets while maintaining strong accuracy.</p><p>These improvements come from four key innovations:</p><ul><li>Leaf-wise tree growth</li><li>Histogram-based splitting</li><li>GOSS (Gradient-based One-Side Sampling)</li><li>EFB (Exclusive Feature Bundling)</li></ul><p>Let’s walk through them one by one.</p><h4>1. Leaf-Wise Tree Growth</h4><p>One of the most distinctive features of LightGBM is leaf-wise tree growth.</p><p>Traditional tree algorithms such as Random Forest and XGBoost grow trees level-wise.</p><p>At each depth of the tree, all nodes are expanded.</p><p>Example:</p><pre>        Root<br>       /    \<br>      A      B<br>     / \    / \<br>    C   D  E   F</pre><p>Every level of the tree expands evenly, and it produces balanced trees, which are stable and predictable.</p><p>But, they may waste computation expanding branches that do not significantly improve predictions.</p><p>LightGBM takes a different approach. Instead of expanding all nodes at the same depth, it expands the leaf that produces the greatest reduction in loss.</p><p>Example:</p><pre>        Root<br>       /    \<br>      A      B<br>     / \<br>    C   D<br>   /<br>  E</pre><p>The tree grows where the model improves most.</p><p>This approach allows LightGBM to reach strong predictive performance with fewer splits.</p><p>The trade-off is that trees can become deeper in certain branches, so LightGBM provides parameters such as max_depth and num_leaves to control model complexity.</p><h3>2. Histogram-Based Splitting</h3><p>Another important optimization in LightGBM is histogram-based splitting.</p><p>Standard tree algorithms may evaluate many possible split thresholds for continuous features.</p><p>Example:</p><pre>Age ≤ 21<br>Age ≤ 22<br>Age ≤ 23<br>Age ≤ 24</pre><p>LightGBM speeds this up using histogram binning.</p><p>Instead of evaluating every unique value, continuous features are grouped into bins.</p><p>Example:</p><p>Original values:</p><pre>23, 25, 27, 29, 35</pre><p>Converted to bins:</p><pre>20–25<br>25–30<br>30–40</pre><p>Now the algorithm evaluates splits only on bin boundaries.</p><p>This dramatically reduces the number of candidate splits and speeds up training.</p><h3>3. GOSS (Gradient-Based One-Side Sampling)</h3><p>Training boosting models on large datasets normally requires processing all rows.</p><p>LightGBM introduces GOSS (Gradient-based One-Side Sampling) to reduce the number of rows used during training.</p><p>The key idea is based on how boosting works.</p><p>In boosting algorithms, each data point has a gradient value indicating how much the model needs to adjust its prediction.</p><ul><li>Rows with large gradients represent predictions where the model is making large errors.</li><li>Rows with small gradients are already well predicted.</li></ul><p>While typical boosting algorithms including XGBoost randomly sample the data, LightGBM uses this gradient information to sample the data.</p><ul><li>Keeps all rows with large gradients</li><li>Keeps only a subset of rows with small gradients</li></ul><p>Example:</p><p>Dataset: 100,000 rows</p><ul><li>Top 20% largest gradients → keep all (20,000 rows)</li><li>Remaining 80% → sample 10% (8,000 rows)</li></ul><p>This wya, it need to use only 28,000 rows instead of 100,000 rows.</p><p>This significantly reduces computation while preserving important learning signals.</p><h3>4. EFB (Exclusive Feature Bundling)</h3><p>Many modern datasets contain high-dimensional sparse features, especially when categorical variables are one-hot encoded.</p><p>Example:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_l3nJ6rkcW-4OmX2q6lMIw.png" /></figure><p>These features are <strong>mutually exclusive,</strong> only one can be active in each row.</p><p>Instead of treating them separately, LightGBM bundles them into one feature.</p><p>Original:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MROexJoGbDLdL38-bUYclw.png" /></figure><p>Bundled:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2f1vmdn0k3vqy7WSUG5N0w.png" /></figure><p>Now the algorithm evaluates splits on <strong>one feature instead of three</strong>.</p><p>This reduces feature dimensionality and speeds up training.</p><h3>How to Choose Each Model?</h3><p>Each algorithm has strengths.</p><h4>Random Forest</h4><p>Good when:</p><ul><li>you want a <strong>simple baseline</strong></li><li>datasets are <strong>relatively small</strong></li><li>minimal tuning is preferred</li></ul><h4>XGBoost</h4><p>Good when:</p><ul><li>datasets are <strong>moderate</strong> in size</li><li>you want strong predictive performance</li><li>stability and extensive tuning options are important</li></ul><h4>LightGBM</h4><p>LightGBM works particularly well when:</p><ul><li>datasets are <strong>large</strong></li><li>feature dimension is <strong>high</strong></li><li>features are <strong>sparse</strong></li><li>training time matters</li></ul><h4>Practical Recommendation</h4><p>A common workflow in machine learning projects is:</p><ol><li>Start with <strong>Random Forest</strong> as a baseline.</li><li>Try <strong>XGBoost</strong> or <strong>LightGBM</strong> to improve performance.</li><li>Prefer <strong>LightGBM</strong> when datasets become large or training time becomes a bottleneck.</li></ol><p>Because of its efficiency and scalability, LightGBM has become a popular choice for many machine learning tasks with tabular data.</p><h3>Final Thought</h3><p>Random Forest, XGBoost, and LightGBM all rely on decision trees, but they represent different philosophies:</p><ul><li>Random Forest focuses on <strong>robust ensembles</strong></li><li>XGBoost focuses on <strong>optimized gradient boosting</strong></li><li>LightGBM focuses on <strong>efficient and scalable boosting</strong></li></ul><p>Understanding these differences helps you choose the right tool, and explains why LightGBM has become an important algorithm in modern machine learning.</p><h3>Try LightGBM with Exploratory!</h3><p>You can try LightGBM with <a href="https://exploratory.io/">Exploratory</a> v14.5 or later versions.</p><ul><li>Go to Analytics view.</li><li>Select LightGBM.</li><li>Select a Target Variable.</li><li>Select Explanatory Variables (Features)</li><li>Click Run button.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*NTjiAbBsm8qzy0if.png" /></figure><p>You can take a look at <a href="https://exploratory.io/note/exploratory/Introduction-to-LightGBM-eAK7zbZ9">this how-to note</a> for more details on how to use LightGBM.</p><h4>Download Exploratory</h4><p>You can start using LightGBM today in the latest version of Exploratory.</p><p>👉 Download Exploratory v14</p><p><a href="https://exploratory.io/download">https://exploratory.io/download</a></p><p>If you don’t have an account yet, sign up here to start your 30-day free trial.</p><p><a href="https://exploratory.io/">https://exploratory.io/</a></p><p>If your trial has expired but you’d like to try the new features, simply launch the latest version and use the Extend Trial option.</p><p>If you have questions or feedback, feel free to contact me at <a href="mailto:kan@exploratory.io">kan@exploratory.io .</a></p><p>We’d love to hear how you’re using Exploratory to uncover insights in your data.</p><p>Kan Nishida<br>CEO, Exploratory</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=286836838fe7" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/lightgbm-explained-how-it-differs-from-random-forest-and-xgboost-286836838fe7">LightGBM Explained: How It Differs from Random Forest and XGBoost</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Auto-Positioning Labels: Keep Your Charts Readable Automatically]]></title>
            <link>https://blog.exploratory.io/auto-positioning-labels-keep-your-charts-readable-automatically-3bbd1f1a0ed8?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/3bbd1f1a0ed8</guid>
            <category><![CDATA[exploratory-data-analysis]]></category>
            <category><![CDATA[data-visualization]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Thu, 19 Mar 2026 11:18:40 GMT</pubDate>
            <atom:updated>2026-03-19T11:18:39.961Z</atom:updated>
            <content:encoded><![CDATA[<h4>Introducing auto-positioning for chart values in Exploratory</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*0ZaCJYxsHlguMW4K.png" /></figure><p>Showing values directly on a chart is incredibly powerful.</p><p>Your audience first sees the pattern in the data. Then they see the exact labels and numbers behind the pattern.</p><p>That combination is often what makes a chart insightful.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*N5WjChl4RnFRC8I7.png" /></figure><p>Most charting systems place labels exactly at the coordinates of the data point.</p><p>For small datasets, this works perfectly.</p><p>But when you have many data points that are close to each other, your chart would become something like this.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*HICPwfUgv-_vBEvD.png" /></figure><p>Instead of helping the reader, the labels start fighting each other.</p><p>Typical problems appear immediately:</p><ul><li>Labels overlap each other</li><li>Labels cover the data points</li><li>Values become unreadable</li><li>Analysts start manually adjusting labels</li></ul><p>What was supposed to clarify the chart ends up making it harder to read.</p><p>This problem is known as label overlap or label collision.</p><p>And it turns out to be surprisingly difficult to solve.</p><h3>The Idea: Let Labels Move (Just a Little)</h3><p>In Exploratory v14.3 we introduced Auto-Positioning for Labels.</p><p>The idea sounds simple:</p><blockquote><em>Instead of fixing labels exactly on the data points (or coordinates), allow them to move slightly until they no longer overlap.</em></blockquote><p>But there are a few important constraints.</p><p>The labels must:</p><ul><li>Avoid overlapping each other</li><li>Avoid being overlapped by marker (e.g. line, bar, etc.)</li><li>Stay visually close to the original point</li><li>Indicate clearly which data point it represents</li><li>Preserve readability</li></ul><p>When the system finds a better layout, the labels reposition automatically.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*yU6XUaOz67Ageq-5.png" /></figure><p>If a label moves away from its original point, a leader line (arrow) connects the label back to the data point.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*MbZujWD2Zy--FoNo.png" /></figure><p>The result is a chart that remains readable even with many labels.</p><h3>Our First Attempt: Physics Simulation</h3><p>This is a solved problem in R with a package called ‘ggrepel’, but while Exploratory is built on top of R system we use Plotly JS (Java Script) for chart rendering. So we needed to build an auto-positioning system in JS layer.</p><p>So, our first approach was to use d3-force, a physics simulation library in JS.</p><p>The idea was appealing.</p><p>Labels repel each other like particles with repulsive force.</p><p>Eventually they should settle into positions where nothing overlaps.</p><p>In theory.</p><p>In practice, it didn’t work.</p><p>When many values are densely packed the labels tend to push each other away and scatter outside the chart area or across the entire screen.</p><p>So we gave up on that approach.</p><h3>The Algorithm That Worked: Simulated Annealing</h3><p>Instead, we adopted an algorithm called Simulated Annealing, which was used in D3-Labeler.</p><p>This is a classic optimization technique inspired by metallurgy.</p><p>When metal cools slowly, its atoms settle into a stable structure.</p><p>Simulated Annealing follows a similar idea:</p><ol><li>Start with the current label layout</li><li>Move labels slightly in random directions</li><li>Evaluate whether the layout improves</li><li>Gradually reduce the randomness over time</li></ol><p>After many small adjustments, the system converges toward a layout with minimal overlaps and good readability.</p><p>The key advantage is that labels stay close to their original positions, rather than flying across the chart.</p><h3>Engineering the System</h3><p>To integrate this into Exploratory, we created a new module that wraps the chart rendering process.</p><p>The workflow looks like this:</p><ol><li>Exploratory renders the chart normally</li><li>The wrapper module analyzes the label positions</li><li>The auto-positioning algorithm runs</li><li>Labels are repositioned if collisions are detected</li></ol><p>This adjustment happens automatically after the chart rendering.</p><h3>Keeping It Fast (Even with Hundreds of Labels)</h3><p>Optimization algorithms can become slow when the number of labels increases.</p><p>To keep performance smooth, we introduced spatial grid partitioning.</p><p>Instead of checking collisions between every pair of labels, the system divides the chart into small grid cells.</p><p>Labels only need to check nearby neighbors inside the same grid.</p><p>This dramatically reduces the number of collision checks.</p><p>As a result, the system remains responsive even with 500+ labels.</p><h3>Additional Improvements</h3><p>We added a few additional features to make the system more practical.</p><h3>Leader Lines</h3><p>If a label moves far from its original point, a leader line automatically appears.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*-HHUEcq1u6kqMrq7.png" /></figure><p>This maintains the visual connection between label and data point.</p><h3>Error Bar Awareness</h3><p>Bar charts with error bars introduce another challenge.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*904GVU9JWcGHAHDV.png" /></figure><p>Labels must avoid overlapping the confidence interval lines.</p><p>The algorithm takes error bar ranges into account when calculating positions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*EjnGqJTRXkuIctIc.png" /></figure><h3>How to Enable Auto-Positioning</h3><p>Using the Auto-Positioning feature is simple.</p><ol><li>Open the chart property dialog.</li><li>Click the gear icon at the top of the chart.</li><li>Then go to the Values tab.</li><li>Enable Show Values.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*_LZ8KK1ot46MRK3_.png" /></figure><p>When the Position is set to Automatic, Exploratory automatically adjusts label positions to avoid overlaps.</p><p>If labels move far from their original point, arrows appear to indicate the connection.</p><p>You can control the color and the arrow visibility using the Arrow Display Threshold setting.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*OTNOi2WxtvIggLtI.png" /></figure><p>Increasing the threshold hides shorter arrows and reduces clutter.</p><p>For a newly created charts, when you enable to show the values (labels) on chart the auto-positioning is automatically set by default. For existing charts that you created before v14.3, you want to manually switch the Position to Automatic.</p><h3>Improving Placement Accuracy</h3><p>You can also control the optimization effort.</p><p>The setting ‘# Tries to Improve Accuracy’ determines how many optimization trials are performed.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3EDsvFapEzU0ecidjiN-9Q.png" /></figure><ul><li>Increase the value → more accurate placement (slower)</li><li>Decrease the value → faster calculation (less precise)</li></ul><h3>A Small Feature That Makes Powerful Data Exploration</h3><p>In Exploratory v14.3 we introduced this auto-positioning system using Simulated Annealing and spatial optimization.</p><p>At first glance, this might look like a small feature.</p><p>But in practice, it changes something fundamental about how you explore data.</p><p>Exploratory Data Analysis is not just about creating charts.</p><p>It is about discovering things you did not expect to see.</p><p>That kind of discovery often happens when you start looking closely at individual data points.</p><p>For example, imagine a scatterplot showing 190 countries.</p><p>At first, you might look for a particular country you are interested in.</p><p>But once the labels become readable, something else begins to happen.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*PDFz0VpltJYt2OQN.png" /></figure><p>You start to notice patterns you did not plan to look for.</p><ul><li>Which countries are close to each other?</li><li>Which countries behave similarly?</li><li>Which countries stand apart from the rest?</li></ul><p>These moments of discovery are the essence of exploratory data analysis.</p><p>However, if the labels overlap and become unreadable, those discoveries become much harder.</p><p>Analysts end up spending time manually adjusting labels instead of exploring the data itself.</p><p>Our goal with Exploratory has always been to build a tool that helps people think better with data.</p><p>Not by automating the thinking process, but by removing the friction that gets in the way of exploration.</p><p>Auto-positioning labels is one of those small features that quietly makes exploration easier.</p><p>And when exploration becomes easier, new insights often follow.</p><p>That is why we believe this feature helps make Exploratory a better environment for true Exploratory Data Analysis.</p><h3>Try Auto-Positioning Today</h3><p>You can start using the Auto-Positioning feature today in the latest version of Exploratory.</p><p>👉 Download Exploratory v14</p><p><a href="https://exploratory.io/download">https://exploratory.io/download</a></p><p>If you don’t have an account yet, sign up here to start your 30-day free trial.</p><p><a href="https://exploratory.io/">https://exploratory.io/</a></p><p>If your trial has expired but you’d like to try the new features, simply launch the latest version and use the Extend Trial option.</p><p>If you have questions or feedback, feel free to contact me at <a href="mailto:kan@exploratory.io">kan@exploratory.io .</a></p><p>We’d love to hear how you’re using Exploratory to uncover insights in your data.</p><p>Kan Nishida</p><p>CEO, Exploratory</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3bbd1f1a0ed8" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/auto-positioning-labels-keep-your-charts-readable-automatically-3bbd1f1a0ed8">Auto-Positioning Labels: Keep Your Charts Readable Automatically</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AI Note Editor: Create Reports 10x Faster, 10x Better]]></title>
            <link>https://blog.exploratory.io/ai-note-editor-create-reports-10x-faster-10x-better-8249816ffe48?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/8249816ffe48</guid>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[data-scien]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Mon, 16 Mar 2026 11:56:24 GMT</pubDate>
            <atom:updated>2026-03-16T11:56:26.211Z</atom:updated>
            <content:encoded><![CDATA[<p>We’re thrilled to introduce <strong>AI Note Editor</strong> in Exploratory v14! 🎉</p><p>This new feature works like having a professional editor — and a data analyst — right beside you as you write. Except it never gets tired, always respond fast, and can analyze your charts with deep statistical knowledge.</p><p>It’s designed to <strong>help you create high-quality analysis reports quickly, clearly, and effortlessly.</strong></p><h3>Why Reporting Is the Most Underrated (and Most Painful) Step in Data Analysis</h3><p>In any real-world data analysis workflow, running the analysis is only half the job. The <em>other</em> half — often the harder half — is explaining what you discovered.</p><p>But let’s be honest: most of us <em>don’t</em> enjoy writing reports. Sometimes, we aren’t sure <strong>how to describe what charts are showing</strong>. So, we end up sending a Slack message with “here’s the chart” or copying &amp; pasting charts in PowerPoint/Slides and hoping the audience magically understands it.</p><p>As a result:</p><ul><li>Screenshots pile up in Slack threads.</li><li>PowerPoint slides become chart image dumps.</li><li>Insights get lost because they’re never clearly communicated.</li></ul><p>This is the communication problem in Data Science workflow that we wanted to solve with the new AI Note Editor.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5kGLUn6-k_cub-SILa-o5Q.png" /></figure><h3>Meet AI Note Editor: Turn Comments &amp; Charts Into a Clear, Polished Report</h3><p>With AI Note Editor, your analysis report writing workflow will become something like this.</p><ol><li>Add your charts.</li><li>Write a few comments, if you like.</li><li>Get a complete data analysis reports generated.</li><li>Edit it as it fits your needs.</li><li>Share your report with others!</li></ol><p>Based on your comments and charts, AI Note Editor generates a <strong>polished, structured, ready-to-share report</strong> for you, right inside Exploratory.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WUoySat756_1AibQWIaobA.png" /></figure><p>No switching tools.</p><p>No copy &amp; paste to Power-Point.</p><p>No asking somebody else to create reports.</p><p>No staring at a blinking cursor trying to think of the right words.</p><p>Your data analysis stays where it belongs — inside the Exploratory workflow — and AI handles the writing.</p><h3>Automatic Chart Interpretation</h3><p>AI Note Editor doesn’t just improve your writing, it can also <strong>interpret your charts</strong> and explain what’s happening in the data. This is something that no general-purpose writing tool can do.</p><p>Give it a chart and the AI will:</p><ul><li>Identify key patterns</li><li>Explain strengths and weaknesses</li><li>Describe trends, anomalies, or outliers</li><li>Summarize relationships between variables</li><li>Highlight important signals</li><li>Put everything into intuitive, natural language</li></ul><p>For example, given a radar chart like the one below, AI will describe the key patterns, strengths, weaknesses, and overall story behind the data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZUDUoBlwjresAk7ziyV8-w.png" /></figure><h4>Detect Trends &amp; Signals</h4><p>Chart interpretation not only interprets numerical values but also provides context-aware interpretations, such as trends and signals within the data.</p><p>For example, with XmR charts (control charts), it can detect whether a signal is present and explain what it indicates accordingly.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OhfeHGHZoHT1fSdS3e2-1w.png" /></figure><h4>AI Note Editor that comes with a deep statistical knowledge</h4><p>Moreover, for charts containing statistical information, the AI Note Editor provides interpretations based on that statistical data.</p><p>For example, for a scatter plot showing the relationship between two variables, it can explain how strong (or weak) the correlation is, what the relationship implies, and whether it is statistically significant or not.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Ph3oxT0qeSRo3eKwOHgMBQ.png" /></figure><h4>AI Note Editor Can Analyze Data and Explain What’s There</h4><p>You’ve probably experienced something like these before:</p><ul><li>“I know what I see with the chart… but how do I put it into words?”</li><li>“I think this trend is important, but I’m not confident how to explain it.”</li><li>“I saw the chart, but I didn’t even notice it until someone pointed it out.”</li></ul><p>AI Note Editor can address such concerns by:</p><ul><li>Helping you verbalize what the data shows</li><li>Pointing out things you might have missed</li><li>Ensuring your analysis is complete</li><li>Reducing the risk of overlooking important signals</li><li>Improving the clarity and quality of your reports — instantly</li></ul><p>AI Note Editor is not just about writing faster. It’s about <strong>analyzing data better</strong> and <strong>communicating more clearly</strong>.</p><h3>It can help you Write Better</h3><p>AI Note Editor also includes a set of tools to improve your writing:</p><ul><li>Summarize long text</li><li>Fix grammar and spelling</li><li>Improve clarity and tone</li><li>Refine wording and expression</li><li>Translate your writing</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CbF-TnbIT3vk9iNp2a0S_A.png" /></figure><p>Whether you’re writing an internal update, a weekly KPI brief, or a full analysis report, AI Note Editor makes the report writing process faster and better.</p><h3>Create Your Own Report Format with Custom Prompts</h3><p>Exploratory provides an “Analysis Report” style out of the box — but let’s be honest, <strong>not everyone writes reports the same way.</strong></p><p>You have your own reporting needs and might need to write in different formats or style to fit your clients or audience’s needs by project to project.</p><p>That’s why AI Note Editor includes <strong>Custom Prompts </strong>support.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZhxkebrAIf3BXzrml-eh-w.png" /></figure><p>Just use ‘<strong>Run Custom Prompt</strong>’ and describe the format you want with Markdown instructing a specific structure with:</p><ul><li>Headings and subheadings</li><li>Bullet points</li><li>Summary sections</li><li>Business-style executive summaries</li><li>Step-by-step analysis</li><li>Narrative storytelling</li><li>Even templates that match your corporate writing guidelines</li></ul><p>Just tell the AI the structure you want — and it will generate the full report in that format.</p><h3>Save &amp; Reuse Prompt Templates</h3><p>Once you create a prompt you like, you can save it as a <strong>template</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mvJid8LtKTTDu43hyH1pMQ.png" /></figure><p>Your templates appear in the Template list, ready to use anytime.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QZdBW9EtvGYSA6xAjE_IKg.png" /></figure><p>This is incredibly useful for reports you produce regularly — weekly reports, monthly summaries, recurring analyses, etc. Just pick a template and generate a polished report in seconds — consistent, clean, and always in the right style.</p><h3>A Gallery of Prompt Examples to Get You Started</h3><p>To help you get started, we have prepared a collection of prompt examples in the <a href="https://exploratory.io/tag/?sort=&amp;language=ja&amp;q=tag%3A%22Ai%20Note%20Editor%22&amp;searchType=keyword">AI Note Editor Gallery</a>. You can browse through the prompts, copy &amp; paste them, and tweak however you like.</p><p>You can create sophisticated report formats immediately — even if you’ve never written a prompt before.</p><h3>Share Templates with Your Team</h3><p>AI Note Editor templates can also be <strong>shared across teams</strong>.</p><p>Your team members can directly import them by clicking the “Import” button from the “AI Prompt Template” dialog mentioned above.</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*MDVv4nYpEzl8fOlolhcvMA.png" /></figure><p>This means:</p><ul><li>Standardized reporting</li><li>Consistent communication style</li><li>Faster onboarding for new members</li><li>Reduced back-and-forth editing</li><li>Higher-quality reports across the organization</li></ul><h3>A New Standard for Data Communication</h3><p>AI Note Editor isn’t just a writing tool — it’s a communication tool with a deep statistical knowledge. It helps you:</p><ul><li>Turn analysis into narrative</li><li>Turn charts into discoveries and insights</li><li>Turn discoveries into stories</li><li>Turn insights into action</li></ul><p>No more copy &amp; paste dumps in Power-Point slides.</p><p>No more staring at charts wondering how to describe what you see.</p><p>No more wasting time wondering what to write and how to explain.</p><p>AI Note Editor gives you clarity.</p><p>Your audience gets deeper understanding of your discoveries.</p><p>And your analysis becomes dramatically more impactful.</p><p>This is what the future of data communication looks like — and it lives directly inside Exploratory.</p><h3>Try AI Note Editor Today!</h3><p>You can start using AI Note Editor right now in the latest version of Exploratory.</p><p>👉 <strong>Download Exploratory v14</strong><br> <a href="https://exploratory.io/download">https://exploratory.io/download</a></p><p>If you don’t have an Exploratory account yet, please <a href="https://exploratory.io/">sign up here</a> to try it out. The first 30 days are a free trial period!</p><p>If your trial has already expired but you want to try the new AI features, simply launch the latest version and use the “Extend Trial” option in the dialog — or contact us directly.</p><p>For questions or feedback, feel free to reach out:<br> 📧 <strong>support@exploratory.io</strong></p><p>We can’t wait to see the reports you create — and how AI Note Editor helps you communicate insights faster, clearer, and more confidently than ever before.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8249816ffe48" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/ai-note-editor-create-reports-10x-faster-10x-better-8249816ffe48">AI Note Editor: Create Reports 10x Faster, 10x Better</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Turn GitHub Issues into Release Notes with AI]]></title>
            <link>https://blog.exploratory.io/turn-github-issues-into-release-notes-with-ai-8857dc7f1f27?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/8857dc7f1f27</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[ai]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Mon, 16 Mar 2026 11:43:31 GMT</pubDate>
            <atom:updated>2026-03-16T12:01:54.398Z</atom:updated>
            <content:encoded><![CDATA[<h4>How I Use Exploratory’s AI Function to Automatically Categorize, Rewrite, and Generate Release Notes</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zi3ttrkn75iE1EYPw7gVhw.png" /></figure><p>Preparing a release note used to take a few hours every release. With AI Function, it now takes less than a few minute.</p><p>Every time we ship a new version, we go through dozens of GitHub issues — bug fixes, enhancements, and new features — and turn them into a release note that our users can easily understand.</p><p>This used to be a very manual process.</p><p>For each issue we had to:</p><ol><li>Read the issue title and description</li><li>Decide whether it is a bug fix, enhancement, or documentation change</li><li>Assign it to a functional category (Data Wrangling, Chart, Analytics, etc.)</li><li>Rewrite the title so users understand what changed and why it matters</li></ol><p>When there are 50–100 issues per release, this quickly becomes tedious.</p><p>Today, we automate most of this workflow using AI Function and AI Note Editor inside Exploratory.</p><p>In this post, I’ll show you exactly how we do it.</p><p>My hope is that this gives you ideas for how you can use AI Function to automate your own text-based workflows.</p><h3>What is AI Function?</h3><p>AI Function in Exploratory lets you create a function using a prompt so that AI (LLM) processes each row of your data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*wUFP3AAbAeKx1clP.png" /></figure><p>You can ask AI to:</p><ul><li>analyze text</li><li>classify information</li><li>summarize content</li><li>generate new text</li><li>transform messy data</li></ul><p>all directly inside your data workflow.</p><p>For example, if you have a column containing customer feedback comments, you can simply write a prompt like:</p><blockquote>Classify the text into several groups.</blockquote><p>AI will analyze each row and assign a category.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-J-Gk6nmRGGnFFnlNOxBYw.png" /></figure><p>You can learn more about AI Function here.</p><ul><li><a href="https://blog.exploratory.io/data-science-2-0-a-new-era-of-text-data-analysis-b6430baadba7">https://blog.exploratory.io/data-science-2-0-a-new-era-of-text-data-analysis-b6430baadba7</a></li></ul><p>I personally use AI Function almost every day.</p><p>When we built this feature, I realized something interesting:</p><blockquote><em>There was far more text data in my daily work than I had ever noticed before.</em></blockquote><p>Issue logs.</p><p>Customer feedback.</p><p>Meeting notes.</p><p>Support conversations.</p><p>Task descriptions.</p><p>These are all valuable data sources, but without the right tools they’re difficult to analyze or automate.</p><p>AI Function changes that.</p><h3>Example: Automating Our Release Notes</h3><p>Let me show you a real example from our workflow.</p><p>We manage all development work in GitHub Issues.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*13wKRqHKF41cAQl5G30g6g.png" /></figure><p>These issues include:</p><ul><li>bug reports</li><li>feature requests</li><li>internal development tasks</li><li>documentation updates</li></ul><p>When we release a new version, we publish the closed issues for that milestone as the release note.</p><p>But raw GitHub issue titles are not written for users — they’re written for developers.</p><p>So we need to transform them.</p><p>Specifically, we need to:</p><ol><li>Categorize issues by product area</li><li>Identify whether they are bug fixes or enhancements</li><li>Rewrite the title so users clearly understand the change</li></ol><p>Before AI Function, I did this manually.</p><p>Now it’s automated.</p><h4>Step 1: Import GitHub Issues</h4><p>First, we import GitHub issues directly into Exploratory.</p><p>For example, we can import issues for the milestone v14.5.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ErhMO9DYUfwzfFYWpDudcw.png" /></figure><p>Once imported, the dataset contains columns like:</p><ul><li>Issue title</li><li>Issue body</li><li>Labels</li><li>Status</li><li>Milestone</li></ul><p>Take a look at this how-to note for details.</p><ul><li><a href="https://exploratory.io/note/exploratory/How-to-Install-Github-Issue-Data-uGz1DrV2">How to Import Github Issue Data</a></li></ul><p>Now that the data is imported, it’s time to work with AI Function.</p><h4>Step 2: Categorize Issues by Product Area</h4><p>First, we categorize each issue into a functional area.</p><p>For this, I create an AI Function with a prompt like this:</p><pre>Based on the title and the body text, categorize text to one of the following groups.</pre><pre>AI Function<br>AI Prompt<br>Summary View<br>Table View<br>Data Source<br>Data Wrangling<br>Chart<br>Analytics<br>Note<br>Dashboard<br>Parameter<br>Project<br>Publish<br>Install<br>Document<br>Others</pre><p>Instead of letting AI invent categories, I give it a predefined list.</p><p>This improves consistency.</p><p>I also provide two columns as input:</p><ul><li>Title</li><li>Body</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JwFPIYrrganFeodUIsyEbA.png" /></figure><p>The title text and the body text would look like the below on the actual Github issue page.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WNs1bgTFhr-FHIUdj-1r3Q.png" /></figure><p>This gives the model enough context</p><p>When executed, the AI analyzes the issue text and assigns an appropriate category for each row.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NNnb7mqyUudAsW2nsg9dhA.png" /></figure><h4>Step 3: Identify Issue Type</h4><p>Next, we classify whether the issue is:</p><ul><li>a bug fix</li><li>an enhancement</li><li>documentation</li><li>other</li></ul><p>The prompt is simple:</p><p>Identify if a given sentence indicates whether it is an issue fix, a product enhancement, documents, or others.</p><blockquote>Identify if a given sentence indicates whether it is an issue fix, a product enhancement, documents, or others.</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YR9CFqSiiC9dQ6jwVbCLcQ.png" /></figure><p>This correctly categorize them into the right group.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*BVn8iI2346LaqxaSc0yinA.png" /></figure><h3>Step 4: Rewrite the Issue Title</h3><p>The final step is improving the issue titles.</p><p>GitHub issue titles are often short and technical.</p><p>For example:</p><blockquote>AI Function: Don’t run the cached step when duplicating the data frame</blockquote><p>That’s clear to developers, but not to most users.</p><p>We already built an internal system to improve and clean up the issue title and generate the Release Note, but we’ve realized that we can simply use AI Function to rewrite a better title for each issue based on the title and the body text</p><p>So I use AI Function again to generate a better description.</p><p>Prompt:</p><blockquote>Based on the ‘title’ and ‘body’ text, write a one or two sentence summary that clearly explains what the issue is and why it matters to users.</blockquote><p>In the AI Function dialog, I’m setting ‘title’ and ‘body’ columns as the Target Columns.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3LRjlRyM0rCdXHX58vByIw.png" /></figure><p>The result:</p><pre>Original: </pre><pre>AI Function: Don&#39;t run the cached step when duplicating the data frame</pre><pre>New:</pre><pre>Duplicating a data frame currently triggers re-execution of AI Function steps, which can cause unnecessary processing time and API costs.</pre><p>Much clearer.</p><p>And this happens for every issue automatically.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3xRb6EdX4a5W7dE_QB_1Cw.png" /></figure><p>That’s all!</p><p>Now that I have categorized the issues and improved the text I can show them in a table format.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ojP3Gi0z3KsrW-NrOBRwrA.png" /></figure><h4>From Raw Issues to Clean Release Notes</h4><p>After these steps, we now have structured information:</p><ul><li>Issue category</li><li>Issue type</li><li>Improved description</li></ul><p>This makes it easy to organize them into a release note.</p><h3>Next Step: Generate Release Note with AI Note Editor</h3><p>Once the data is prepared, we take it one step further.</p><p>We use AI Note Editor inside Exploratory to generate the release note itself.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*o5InimZ9fxVKdpN5IFcAmA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JUpW2nTC-369t7O61s7F7w.png" /></figure><p>Check out this introductory blog post for AI Note Editor.</p><ul><li><a href="https://blog.exploratory.io/ai-note-editor-create-reports-10x-faster-10x-better-8249816ffe48">AI Note Editor: Create Reports 10x Faster, 10x Better</a></li></ul><p>I’ll write another blog post explaining exactly how we do this, so stay tuned!</p><h3>The Real Power: Reproducibility</h3><p>The biggest benefit of this workflow is <strong>reproducibility</strong>.</p><p>Once the AI Function steps are created, we can reuse them.</p><p>What if the new issues have been added in the last minutes?</p><p>You can:</p><ol><li>Re-import issues for the new milestone</li><li>Run the workflow</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*o9PUnOE24MDLnHRmmX1A9Q.png" /></figure><p>What if we will need to generate a release note for another version in future?</p><ol><li>Click on the Data Source step and open the Import dialog</li><li>Update the milestone</li><li>Run the same workflow</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qZbM0sBB5TEgEENMqiM8LQ.png" /></figure><p>We release new versions often, so being able to repeat the same workflow automatically is important for us.</p><p>In any future releases I can simply click a button to re-import data, and the AI Functions automatically process the new issues and produce clean, user-friendly descriptions.</p><p>No manual rewriting required.</p><p>This is the real power of combining:</p><ul><li>data workflows</li><li>AI functions</li><li>reproducibility</li></ul><p>That is the real power of building an automated data wrangling system with AI Function that is reproducible and can be used with any future incoming data.</p><h3>Why Not Just Copy and Paste the Issues into ChatGPT?</h3><p>At this point, you might be thinking:</p><blockquote><em>“Couldn’t I just copy the list of issues into ChatGPT and ask it to do the same thing?”</em></blockquote><p>Yes, you certainly could.</p><p>But once you try to use that approach in a real workflow, several practical limitations quickly appear.</p><p>What makes AI Function inside Exploratory powerful is not just that it uses AI. It’s that AI becomes part of a data workflow.</p><p>Here are some key differences.</p><h4>1. Iterative Workflow Development</h4><p>When working with AI Function, you can build your workflow step by step.</p><p>For example:</p><ol><li>Start with issue categorization</li><li>Review the results</li><li>Improve the prompt</li><li>Run it again</li><li>Add another AI Function step for issue type classification</li><li>Add another step to rewrite titles</li></ol><p>Each step produces a visible column of results, which makes it easy to:</p><ul><li>review</li><li>refine prompts</li><li>improve output quality</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nw4I4WtoYAGqDTsnTfLIxQ.png" /></figure><p>This iterative workflow is much harder to do when you are simply pasting text into an AI chat interface.</p><h4>2. Control Over the Output</h4><p>When AI is part of your data workflow, you have much more control over the result.</p><p>For example, you can:</p><ul><li>specify allowed categories</li><li>combine multiple columns as context</li><li>inspect the output row by row</li><li>filter and fix problematic cases</li></ul><p>This is important if you care about quality and consistency, not just quick answers.</p><h4>3. Reproducibility</h4><p>This is perhaps the most important difference.</p><p>Once you build the workflow, it becomes reproducible.</p><p>Every time you import a new set of issues, you can run the exact same steps and get consistent results.</p><p>With copy-paste AI workflows, you would have to:</p><ul><li>prepare the text</li><li>paste it into the AI</li><li>rewrite prompts</li><li>manually format the results</li></ul><p>every single time.</p><p>AI Function turns that process into something you can run repeatedly with one click.</p><h4>4. AI Works Directly on Your Data</h4><p>Instead of copying and pasting text back and forth, AI Function works directly on the data frame in Exploratory.</p><p>This means you can combine AI with other data operations, such as:</p><ul><li>filtering rows</li><li>joining datasets</li><li>grouping and summarizing results</li><li>visualizing patterns</li></ul><p>AI becomes just another data transformation step inside the workflow.</p><h4>5. Scales to Large Datasets</h4><p>Chat interfaces are great for small tasks, but they quickly become cumbersome when working with hundreds or thousands of rows of data.</p><p>AI Function processes your data row by row, allowing you to apply the same logic consistently across the entire dataset.</p><h4>6. Part of a Larger Data System</h4><p>Finally, AI Function integrates with the rest of Exploratory’s capabilities:</p><ul><li>data wrangling</li><li>visualization</li><li>machine learning</li><li>dashboards</li><li>notes</li></ul><p>In our case, the output of AI Function becomes the input for AI Note Editor, which then generates the release notes themselves.</p><p>This creates a complete pipeline:</p><p>GitHub Issues → AI Function → Enriched Data → AI Note Editor → Release Note</p><h4>The Real Difference</h4><p>Using AI in a chat interface is great for one-off tasks.</p><p>But AI Function allows you to turn those tasks into reusable data workflows.</p><p>And once that happens, something interesting occurs:</p><blockquote><em>AI stops being a tool you occasionally use, and becomes part of your everyday data process.</em></blockquote><p>That’s the real power of AI Function inside Exploratory.</p><h3>Key Takeaways: Why AI Function Is a Game Changer</h3><p>This example is not really about release notes.</p><p>It’s about something much bigger.</p><p>Many everyday workflows involve unstructured text data:</p><ul><li>GitHub issues</li><li>customer feedback</li><li>support tickets</li><li>meeting notes</li><li>task descriptions</li><li>survey comments</li></ul><p>Traditionally, these workflows required manual reading and interpretation, which made them difficult to automate.</p><p>That’s why many teams simply accept them as manual work.</p><p>But AI Function changes this.</p><p>Instead of writing complicated scripts or building custom AI pipelines, you can simply describe the task in plain language and apply it directly to your data.</p><p>In the release note example above, AI Function helped automate tasks that previously required manual effort:</p><ul><li>Categorizing issues by product area</li><li>Identifying whether they are bug fixes or enhancements</li><li>Rewriting issue titles into user-friendly descriptions</li></ul><p>Once the workflow is built, it becomes reusable and reproducible.</p><p>Every new release can go through the exact same process automatically.</p><p>What used to be a repetitive manual task becomes a data workflow you run with one click.</p><p>And this idea applies far beyond release notes.</p><p>Anywhere you have rows of text data, AI Function can help you:</p><ul><li>classify</li><li>summarize</li><li>transform</li><li>generate structured information</li></ul><p>directly inside your data workflow.</p><p>This is why we built AI Function inside Exploratory.</p><p>Not to replace human thinking, but to remove the repetitive parts of working with text, so you can focus on the insights and decisions that matter.</p><p>AI Function turns messy text data into something you can actually work with, and once that happens, many workflows that used to be manual suddenly become automated.</p><h3>Try AI Function with Your Own Data</h3><p>If you work with text data such as:</p><ul><li>issue logs</li><li>support tickets</li><li>customer feedback</li><li>meeting notes</li><li>task lists</li></ul><p>You can build similar workflows with AI Function.</p><p>If you’re new to AI Function, I recommend starting with the examples available in the Create AI Function menu.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TdwSFHL_kqKANImJSpVmBQ.png" /></figure><h3>Try AI Functions Today!</h3><p>You can start using AI Functions today in the latest version of Exploratory.</p><p>👉 Download Exploratory v14</p><p><a href="https://exploratory.io/download">https://exploratory.io/download</a></p><p>If you don’t have an Exploratory account yet, please <a href="https://exploratory.io/">sign up here</a> to try it out. The first 30 days are a free trial period!</p><p>If your trial has already expired but you want to try the new AI features, simply launch the latest version and use the “Extend Trial” option in the dialog, or contact us (support@exploratory.io) directly.</p><p>If you have any questions or feedback, please contact me at <a href="mailto:kan@exploratory.io">kan@exploratory.io</a></p><p>We’d love to hear what data wrangling system you build with AI Functions!</p><p>Kan</p><p>CEO/Exploratory</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8857dc7f1f27" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/turn-github-issues-into-release-notes-with-ai-8857dc7f1f27">Turn GitHub Issues into Release Notes with AI</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Data Science 2.0: A New Era of Text Data Analysis]]></title>
            <link>https://blog.exploratory.io/data-science-2-0-a-new-era-of-text-data-analysis-b6430baadba7?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/b6430baadba7</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[ai]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Tue, 02 Dec 2025 01:11:55 GMT</pubDate>
            <atom:updated>2025-12-02T01:11:53.710Z</atom:updated>
            <content:encoded><![CDATA[<h4>How Generative AI and Prompts Are Transforming Data Science</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*19xGZatnAwdZH1Lkhmm0zg.png" /></figure><p>Since ChatGPT 3.0 arrived in 2023, generative AI has reshaped nearly every industry — and <strong>data science was no exception</strong>.</p><p>Suddenly, we could ask a machine to summarize an article, classify customer feedback, translate text, or even reason about documents it had never seen before.</p><p>It was clear: we were entering a different world.</p><p>But to understand why this shift is so significant, let’s step back and look at how data science has evolved over the years.</p><h3>Before Generative AI: What Is Data Science, Anyway?</h3><p>Let’s first clarify what data science is. While there are various opinions depending on who you ask, it can be simply defined as the intersection of the following three fields:</p><ul><li>Statistics</li><li>Machine Learning</li><li>Programming</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eY0CdVsOqyQ1jJQW1jeIzQ.png" /></figure><p>Before “data science” became a buzzword, the people doing this type of work were simply called <strong>statisticians</strong>. I still remember meeting a statistician friend at a conference years ago — only to find his business card now said “Data Scientist.”</p><p>Why?</p><p>Because the discipline had started to shift.</p><h4>The Rise of R, Python, and Open-Source Machine Learning</h4><p>The difference between <strong>data science</strong> and <strong>traditional statistics</strong> was a frequently debated topic at the time, but broadly speaking, it came down to <strong>programming</strong> and <strong>machine learning</strong>.</p><p>With the rapid evolution of R and Python in the open-source community, using programming languages became a way to do data analysis and processing. And anyone who can program gained free access to advanced algorithms and models that were also developed in the open-source community.</p><p>Furthermore, the explosion of data generated from the internet and mobile devices led to the continuous development of highly complex machine learning and deep learning models utilizing this big data that delivered better prediction results.</p><p>This was a pivotal moment, why?</p><p>Because, previously, using such algorithms and models required purchasing expensive commercial software (e.g. SAS, SPSS, etc.) or building them from scratch. It was the “democratization of algorithms and models,” if you will.</p><p>And it changed everything.</p><p>The conventional wisdom that data analysis meant using statistical software was dramatically altered by the birth of these <strong>open-source programming languages and machine learning models</strong>. And the intersection of these three areas began to be called data science.</p><p>This was the birth of Data Science.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eY0CdVsOqyQ1jJQW1jeIzQ.png" /></figure><h3>Machine Learning -&gt; Generative AI</h3><p>But things have changed dramatically just over the last few years.</p><p>When ChatGPT 3.5 was released in 2022, it quickly became clear that it could not only converse in a chat format but also <strong>interpret, summarize, classify, and answer questions about even the latest documents not included in its training data.</strong></p><p>Previously, if you wanted to analyze the sentiment of text data, for example, you would have to collect a large amount of text data, build a machine learning or deep learning model based on that data, tune it for accuracy improvement, and then predict sentiment.</p><p>But, only a select few with data science knowledge, access to vast amounts of data, and programming skills could do something like this.</p><p>Then, with the emergence of GPT (Generalized Pre-Trained Transformer) models used in generative AI, this necessity disappeared.</p><p>Large Language Models (LLMs) like GPT are massive models already trained on virtually all digitized data worldwide, possessing the necessary capabilities to interpret any text (or image or video) information available. Therefore, when given text data that the model has supposedly never seen before, it can summarize, classify, and score the sentiment of such text data.</p><p>While not universally true, the era of building custom models by feeding them vast amounts of text (or image) data has largely ended.</p><p>For relatively smaller datasets typically encountered in business, or for numerical data, custom statistical or machine learning models still tend to offer better predictive accuracy than generative AI models. This is because GPT’s architecture is inherently specialized in recognizing patterns in high-dimensional data specific to text and pixels.</p><p>But, when it comes to text data?</p><p><strong>Generative AI is the new default.</strong></p><h3>Programming -&gt; Prompts</h3><p>Another significant impact of generative AI on data science is related to accessing these models and “<strong>tuning</strong>” them to improve prediction accuracy.</p><p>Previously, this primarily involved writing programming languages like Python to build custom machine learning and deep learning models, and then tuning them by optimizing the parameters to improve their predictive accuracy.</p><p>However, with these new AI models, we don’t write programming languages or change parameters to tune them in order to improve the accuracy of the results or to get better output.</p><p>Instead, to improve prediction accuracy and obtain better output, we use prompts. And, the language we use for the prompt is our everyday language such as English.</p><p>This is a profound shift.</p><p>You no longer need programming skills to work with state-of-the-art models. Anyone can “tune” a world-class AI just by expressing what they want — in English, Japanese, or any language.</p><p>With this shift, machine learning in data science has been replaced by generative AI (GPT), and programming has been replaced by prompts. Combining these with statistics can be considered the new data science.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*19xGZatnAwdZH1Lkhmm0zg.png" /></figure><p>Of course, statistics has also been significantly impacted by the emergence of generative AI. However, statistics itself has not been transformed into something else or eliminated by AI. Rather, the execution and interpretation of statistics have become easier with AI support, at least that’s where we are now.</p><h3>Challenges with AI Model for Data Analysis</h3><p>But even with these advances, many people quickly discovered that:</p><p><strong>“Using ChatGPT to analyze your own data doesn’t always work.”</strong></p><p>Common frustrations include:</p><ul><li>Inconsistent results halfway through</li><li>Output becomes sloppy or incomplete with many rows</li><li>Hard to verify correctness</li><li>Difficult to process for large datasets</li><li>Some data requires pre-processing before getting fed into AI, but AI doesn’t ‘magically’ take care of it.</li></ul><p>For example, imagine having hundreds or thousands of lines of free-text responses from a survey. If you feed this data to an AI to score the sentiment of each sentence or classify them, some parts might work well, but often, the output becomes inconsistent midway through.</p><p>There’s no guarantee that the AI has actually “reviewed” all the data you provided in the way a human would expect before generating responses for everything. Consequently, when the data volume increases, the answers can become unstable, or you might need to pre-process the data before feeding it to the AI, or it might be difficult to judge if the results returned for hundreds or thousands of lines of data are correct.</p><p>Furthermore, knowing what kind of prompt to write is a significant challenge for many people.</p><p>In other words:</p><p><strong>LLMs are powerful, but not reliable as a stand-alone data processing engine.</strong></p><p>You need a system that:</p><ul><li>Feeds data row by row</li><li>Ensures no rows are skipped</li><li>Produces consistent results</li><li>Lets you transform and clean data as preprocessing</li><li>Gives you a tool to explore and validate the result</li></ul><p>This is exactly where Exploratory’s ‘AI Function’ comes in.</p><h3>Exploratory v14 Introduces “AI Functions”</h3><p>To overcome these challenges and bring generative AI into real analytics workflows, we built <strong>AI Functions</strong> in Exploratory v14.</p><p>AI Functions let you apply AI — <strong>reliably, row by row</strong> — to your own dataset.</p><p>Just ask AI to analyze, transform, or generate information from your data — <strong>directly inside Exploratory</strong> — simply by typing instructions in plain language.</p><p>For example, if you want to classify customer feedback, just type:</p><p><strong>‘Classify the text into several groups.</strong>’</p><p>AI will immediately analyze every comment and assign a category to each one.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fGGiB4uJi3_BGPaOGO3uyg.png" /></figure><p>And this is only the beginning.</p><p>With AI Function, you can create all kinds of custom model functions like:</p><p>“Score the sentiment of the text.”</p><p>“Translate and summarize the text.”</p><p>“Standardize company names.”</p><p>“Using these user attributes, write an email to schedule a meeting with each customer.”</p><p>“Get population for each country presented in this data.”</p><p>and much more with AI directly inside Exploratory.</p><p>Of course, to create this function, you don’t need to know what functions to use or how to use, nor how to code in programming languages. You simply instruct the desired process in a prompt using everyday language. When executed, the AI returns results for each row according to the prompt’s instructions.</p><p>No coding.</p><p>No custom model building.</p><p>No machine learning pipeline.</p><p>All you need is to describe what you want in natural language.</p><p><strong>Exploratory handles the data, AI handles the logic, and you get consistent results for every row of your data.</strong></p><p>This is what <strong>Data Science 2.0</strong> promises.</p><h3>How to Use AI Functions</h3><p>Using AI Function inside Exploratory is straightforward.</p><p>First, select “Create AI Function” from the column header menu.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cdtPBAgyjlD48NrZddcJmA.png" /></figure><p>Then, enter the instruction describing what you want. For example:</p><blockquote><em>Classify the sentences into several groups.</em></blockquote><p>If you already know which categories you want, add them to your prompt for higher accuracy.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wZaaz2iVYQRCqsKXZ8qEuA.png" /></figure><p>For example, here’s an example of classifying text data into 9 groups.</p><p>Here is an example of classifying text into 9 groups.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VZjjYHQ2Skj4zc6_Q55_1g.png" /></figure><h4>About Data Splitting &amp; Parallel Processing</h4><p>By default, Exploratory splits large datasets (200+ rows) into smaller chunks and processes them in parallel. This dramatically improves performance.</p><p>Splitting data:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KTIN8zUIhqk7sd5HIF3bJg.png" /></figure><p>Combining results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*j7UL3wLtobBvYMyoPTeUSg.png" /></figure><p>However, splitting means that each AI request only sees part of the data. This can sometimes lead to slightly different interpretations from chunk to chunk.</p><p>If you prefer the AI to analyze the <strong>entire dataset at once</strong>, you can turn off:</p><p>If you are not happy with the result and rather want the AI to analyze the <strong>entire dataset at once</strong>, you can turn off ‘<strong>Enable parallel processing by splitting data</strong>’ option.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hAHWFxubEnE9_qWtXcw0xQ.png" /></figure><p>This will send the whole data to the AI model at once. The result will be globally consistent, though it will take longer time to process.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9tPKMIehPYFmkRxh17Cl8A.png" /></figure><h3>AI Prompt Templates</h3><p>You can save your prompt instruction as ‘Template’ so that you can use it later.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jtO_RZYrgK5dMXs2N52PkQ.png" /></figure><p>Once you saved it you can select it under ‘Use Template’ for any data you want to run the prompt with.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*T8e4VXuZWM5zP5YMHc0VnQ.png" /></figure><h4>Any Prompt Examples?</h4><p>Yes! In order to get you started quickly, we have prepared many examples. Visit <a href="https://exploratory.io/tag/?sort=&amp;language=ja&amp;q=tag%253A%2522Ai%2520Model%2520Function%2522&amp;searchType=keyword">the AI Function Gallery page</a> to see prompt examples and downloadable sample data you can test yourself.</p><h3>Why Exploratory’s AI Functions?</h3><p>When you use AI Functions inside Exploratory, you don’t just “call an AI model”. You combine AI with a complete data analysis environment.</p><p>This creates a workflow that is far more <strong>reliable, flexible, and powerful</strong> than sending raw data to ChatGPT, etc.</p><p>Here are the key advantages:</p><ol><li><strong>Flexible Pre-processing Before AI — </strong>Clean, filter, reshape, or join data before sending it to AI — ensuring high-quality, consistent outputs.<br>(Here’s a detailed post on <a href="https://blog.exploratory.io/a-new-paradigm-ai-prompt-based-data-wrangling-is-here-e4b43e63b08e">Exploratory’s AI Data Wrangling</a>.)</li><li><strong>Visual Verification You Can Trust — </strong>Use summary views, charts, and comparisons to check results quickly and intuitively. No guessing whether the AI processed everything.</li><li><strong>Seamless Connection to Deeper Analysis</strong> — Once your AI Function outputs are ready, you can immediately continue with Analytics such as Statistical Tests, Multivariate Analysis, Factor Analysis, etc. and by visualizing data with various types of charts. Everything stays connected and reproducible.</li><li><strong>Fast Performance with Parallel Processing </strong>— Exploratory automatically splits large datasets and processes them in parallel — so AI stays fast even for thousands of rows.</li><li><strong>Stable, Row-by-Row Accuracy </strong>— Results stay consistent and reliable across your entire dataset. No missing rows. No unnoticed failures.</li><li><strong>Reusable Prompts as Templates</strong> — Turn any instruction into a reusable “function template” and apply it again and again — just like a real analytical function, but powered by AI.</li></ol><h3>Data Science 2.0 is here!</h3><p>With AI function, you can transform, enrich, and analyze text data simply by <strong>describing what you want</strong>:</p><ul><li>Sentiment scoring</li><li>Cleaning and standardizing company names</li><li>Detecting a customer’s country from a phone number</li><li>Translating sentences</li><li>Extracting company attributes (industry, size, region) from email domains</li><li>Auto-generating email drafts based on customer attributes</li><li>Classifying feedback into categories</li><li>Summarizing long text fields</li></ul><p>And much more.</p><p>In <strong>Data Science 2.0</strong> — where the triangle is <strong>Statistics × AI × Prompt</strong> — the ability to describe your intent in natural language and let Exploratory execute it reliably is a breakthrough.</p><p>But, generative AI alone isn’t enough. You need a platform that <strong>prepares the data, executes the prompts safely and intelligently, and integrates the results into analytics and visualization.</strong></p><p>This is why we built AI Function with Exploratory v14. And we believe this will make data science more accessible to a wider audience, leading to business improvements and better decision-making.</p><h3>Try AI Functions Today!</h3><p>You can start using AI Functions today in the latest version of Exploratory.</p><p>👉 <strong>Download Exploratory v14</strong><br> <a href="https://exploratory.io/download">https://exploratory.io/download</a></p><p>If you don’t have an Exploratory account yet, please <a href="https://exploratory.io/">sign up here</a> to try it out. The first 30 days are a free trial period!</p><p>If your trial has already expired but you want to try the new AI features, simply launch the latest version and use the “Extend Trial” option in the dialog — or contact us directly.</p><p>If you have any questions or feedback, please contact us at kan@exploratory.io</p><p>We’d love to hear what you build with AI Functions — and how Data Science 2.0 transforms your workflow! 🚀</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b6430baadba7" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/data-science-2-0-a-new-era-of-text-data-analysis-b6430baadba7">Data Science 2.0: A New Era of Text Data Analysis</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Exploratory v14 — A New Era of AI-Powered Data Analysis]]></title>
            <link>https://blog.exploratory.io/exploratory-v14-a-new-era-of-ai-powered-data-analysis-04c974023395?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/04c974023395</guid>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Mon, 24 Nov 2025 14:35:00 GMT</pubDate>
            <atom:updated>2025-11-24T16:25:41.093Z</atom:updated>
            <content:encoded><![CDATA[<h3>Exploratory v14 — A New Era of AI-Powered Data Analysis</h3><h4>A new paradigm for text analysis, auto-chart interpretation, and automated reporting.</h4><p>Text analysis that used to take hours now takes minutes.</p><p>Reports that once felt painful now write themselves.</p><p>Charts that you’ve stared at for years now explain their own insights.</p><p>I’m super excited to introduce <strong>Exploratory v14</strong>, our most transformative release ever! 🎉</p><p>This version brings two breakthrough features — <strong>AI Model Functions</strong> and <strong>AI Note Editor</strong> — that make data analysis dramatically easier, faster, and smarter. Whether you’re analyzing &amp; cleaning data, interpreting charts, or writing reports, Exploratory v14 helps you get higher-quality results in a fraction of the time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*iHvIjo81j7X_tg3zn5UTjg.png" /></figure><h3>1. AI Functions: Create AI Model Functions with Your Own Words</h3><p>You can now ask AI to analyze, transform, or generate information from your data — <strong>directly inside Exploratory</strong> — simply by typing instructions in plain language. We call this new capability <strong>AI Function</strong>.</p><p>For example, if you want to classify customer feedback, just type:</p><p><strong>‘Classify the text into several groups.</strong>’</p><p>AI will immediately analyze every comment and assign a category to each one.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5eKi3_9Zc3nVJLRWwaiJNg.png" /></figure><p>And this is only the beginning. With AI Function, you can create all kinds of custom model functions like:</p><p>“Score the sentiment of the text.”</p><p>“Translate and summarize the text.”</p><p>“Standardize company names.”</p><p>“Using these user attributes, write an email to schedule a meeting with each customer.”</p><p>“Get population for each country presented in this data.”</p><p>Just describe what you want, and AI takes care of the rest — no model building, no coding, no training data required.</p><p>If you wanted to do something like this before, you would have had to create your own algorithms or train machine learning models using large text data you collected.</p><p>Not anymore!</p><p>Thanks to generative AI (LLM), anyone can now perform tasks like <strong>text</strong> <strong>classification, sentiment scoring, prediction, translation, and data matching</strong> just by writing prompts in plain English.</p><h3>How to use AI Function?</h3><p>Using AI Function inside Exploratory is straightforward.</p><p>First, select “Create AI Function” from the column header menu.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cdtPBAgyjlD48NrZddcJmA.png" /></figure><p>Then, enter the instruction describing what you want. For example:</p><blockquote><em>Classify the sentences into several groups.</em></blockquote><p>If you already know which categories you want, add them to your prompt for higher accuracy.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wZaaz2iVYQRCqsKXZ8qEuA.png" /></figure><p>Here is an example of classifying text into 9 groups.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VZjjYHQ2Skj4zc6_Q55_1g.png" /></figure><h4>About Data Splitting &amp; Parallel Processing</h4><p>By default, Exploratory splits large datasets (200+ rows) into smaller chunks and processes them in parallel. This dramatically improves performance.</p><p>Splitting data:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KTIN8zUIhqk7sd5HIF3bJg.png" /></figure><p>Combining results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*j7UL3wLtobBvYMyoPTeUSg.png" /></figure><p>However, splitting means each AI request only sees part of the data. This can sometimes lead to slightly different interpretations from chunk to chunk.</p><p>If you prefer the AI to analyze the <strong>entire dataset at once</strong>, you can turn off:</p><p>If you are not happy with the result and rather want the AI to analyze the <strong>entire dataset at once</strong>, you can turn off ‘<strong>Enable parallel processing by splitting data</strong>’ option.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hAHWFxubEnE9_qWtXcw0xQ.png" /></figure><p>This will send the whole data to the AI model at once. The result will be globally consistent, though it will take longer time to process.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9tPKMIehPYFmkRxh17Cl8A.png" /></figure><p>I’ve written more about these trade-offs in a separate post, so feel free to check that out if you want the full details.</p><h3>Want Prompt Examples?</h3><p>Yes! In order to get you started quickly, we have prepared many examples. Visit <a href="https://exploratory.io/tag/?sort=&amp;language=en&amp;q=tag%3A%22Ai%20Function%22&amp;searchType=keyword">the AI Function Gallery page</a> to see prompt examples and downloadable sample data you can test yourself.</p><p>Anyway, with AI function, you can do many other things simply by <strong>describing what you want to get from your data in the prompt</strong> such as:</p><ul><li>Sentiment scoring</li><li>Company name cleanup &amp; standardization</li><li>Detect country from phone number</li><li>Translate text</li><li>Adding company information (industry, size, etc.) from email addresses</li><li>Automatically generating email drafts based on user attributes</li></ul><h3>2. AI Note Editor: Create Reports 10x Faster, 10x Better</h3><p>In a typical data analysis workflow, <strong>writing the report</strong> is just as important as running the analysis itself.</p><p>But let’s be honest: most of us <em>don’t</em> enjoy writing.</p><p>So what happens? Screenshots get dropped into Slack… or pasted into PowerPoint… or left forgotten in a folder somewhere.</p><p>To make report writing easier, faster, and more effective, we’re introducing <strong>AI Note Editor</strong> in Exploratory v14. With AI Note Editor, you can turn your charts, comments, and analysis steps into <strong>high-quality reports — generated automatically by AI right inside Exploratory.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5kGLUn6-k_cub-SILa-o5Q.png" /></figure><p>Just add your charts and comments, and AI Note Editor will write a polished, structured report based on the context.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WUoySat756_1AibQWIaobA.png" /></figure><h4>Automatic Chart Interpretation</h4><p>AI Note Editor doesn’t just summarize your text, it can also <strong>interpret your charts</strong> and explain what’s happening in the data.</p><p>For example, given a radar chart like this, AI will describe the key patterns, strengths, weaknesses, and overall story behind the data.</p><p>For example, given a radar chart like the one below, AI will describe the key patterns, strengths, weaknesses, and overall story behind the data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZUDUoBlwjresAk7ziyV8-w.png" /></figure><p>Even better, chart interpretation works across many chart types:</p><ul><li><strong>Scatter Plots:</strong> evaluates correlation strength &amp; significance</li><li><strong>XmR / Control Charts:</strong> identifies statistical signals</li><li><strong>Time Series:</strong> detects trends, change points, and comparison against benchmarks</li></ul><p>You can see detailed examples in <a href="https://exploratory.io/note/kanaugust/fVe5fal7qG">this separate article</a>.</p><p>This “chart interpretation” feature not only saves time for writing reports, but also improves the quality of analysis by helping you catch insights you might have missed or misinterpreted.</p><h4>AI Tools for Better Writing</h4><p>AI Note Editor also includes a set of tools to elevate your writing:</p><ul><li>Summarize long text</li><li>Fix grammar and spelling</li><li>Improve clarity and tone</li><li>Refine wording and expression</li><li>Translate your writing</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CbF-TnbIT3vk9iNp2a0S_A.png" /></figure><p>Whether you’re writing an internal update, a weekly KPI brief, or a full analysis report, AI Note Editor makes the process dramatically easier.</p><h3>Custom Prompts</h3><p>Need a report in a specific structure — like bullet points, executive summaries, or headings?</p><p>Just use ‘<strong>Run Custom Prompt</strong>’ and describe the format you want using prompts (Markdown supported).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZhxkebrAIf3BXzrml-eh-w.png" /></figure><p>You can also save your custom prompts as <strong>templates</strong> for future reuse.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mvJid8LtKTTDu43hyH1pMQ.png" /></figure><p>Once saved, they appear in the <strong>Templates</strong> list:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QZdBW9EtvGYSA6xAjE_IKg.png" /></figure><p>This is incredibly useful for reports you produce regularly — weekly reports, monthly summaries, recurring analyses, etc. Just pick a template and generate the report instantly, while maintaining consistent quality every time.</p><h3>Prompt Sample Collection</h3><p>To help you get started, we have prepared a collection of prompt examples in the <a href="https://exploratory.io/tag/?sort=&amp;language=en&amp;q=tag%3A%22Ai%20Note%20Editor%22&amp;searchType=keyword">AI Note Editor Gallery</a>. You can browse through the prompts and copy &amp; paste to start!</p><h3>Other New Features</h3><p>Along with the two major features, Exploratory v14 also includes several powerful enhancements that make your analysis workflow even smoother.</p><p>Here are a few highlights.</p><h4>New UI &amp; Enhancements for Reference Lines</h4><p>Reference Lines just got a major upgrade in v14. We redesigned the UI/UX around them so you can add and manage reference lines more easily — right from the top toolbar using the new <strong>Reference Line</strong> button.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rBdTvF0io9HSaNmmWWMjtw.png" /></figure><p>With this update, the following tasks are now much simpler!</p><p><strong>1. Centralized Management for All X and Y Axes</strong></p><p>When you click the <strong>Lines</strong> button, all reference lines are now displayed in a single dialog. You can edit, reorder, or delete reference lines for both axes in one place.</p><p><strong>2. Displaying Multiple Reference Lines</strong></p><p>You can now add multiple reference lines to the <strong>Y-axis</strong> as well, making it easier to highlight targets, thresholds, or comparison lines directly on your chart.</p><p><strong>3. New Types of Reference Lines</strong></p><p>Several calculations previously only available through <em>Window Calculations</em> can now be displayed as reference lines.</p><p>For example:</p><p>Add last year’s <strong>same-month value</strong> as a reference line to compare against current sales.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Wd6hWHc2s3rirBD9.png" /></figure><p>Or create Pareto-style visuals by adding <strong>cumulative totals</strong> or <strong>cumulative percentage</strong> lines alongside your bar charts:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*y67He9_oL0bcuVvI.png" /></figure><p>These enhancements make it easier to bring meaningful context into your charts without additional steps or calculations.</p><h4>Pivot Table: Window Calculations Now Supported in Totals &amp; Subtotals</h4><p>In previous versions of Exploratory, when you used <strong>Window Calculations</strong> (such as <em>Difference from Previous Period</em> or <em>Percentage Difference from Previous Period</em>) in a Pivot Table, the <strong>Totals</strong> and <strong>Subtotals</strong> would still use the original aggregation function (e.g., sum).</p><p>This often resulted in totals that didn’t match the logic of the window calculation being applied.</p><p>With Exploratory v14, <strong>Totals and Subtotals now fully respect your Window Calculations</strong>.</p><p>This means:</p><ul><li>If a column is showing <strong>Difference from Previous Period</strong>, the total column will also calculate <em>the difference of the totals</em>, rather than just summing the raw values.</li><li>If a column is showing <strong>Percentage Difference</strong>, the totals and subtotals will now show <em>percentage differences based on aggregated values</em>, not simple sums.</li></ul><p>For example, in the table below, the far-right Total column now displays the <strong>difference</strong> or <strong>percentage difference</strong> calculated from the aggregated totals — giving you a much more accurate and intuitive summary.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*d6kFbPO2MCHllZdn8I0lvA.png" /></figure><p>This enhancement also works for categorical breakdowns.</p><p>For example, if you calculate the <strong>percentage of employees</strong> across three marital statuses for each job type,</p><ul><li>each job type’s <strong>Subtotal</strong> will now correctly show the percentage by marital status.</li><li>the overall <strong>Total</strong> at the bottom will show the percentages by marital status based on all employees.</li></ul><p>Additionally, for example, if you have employee data and are calculating the percentage of employees divided into three marital statuses for each job type, the subtotal row for each group (job type) will display the total for that group, and the overall “Total” (at the bottom) will display the percentage for each marital status based on the total of all rows.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ktYoRhQxEySrVt1LhEGRew.png" /></figure><p>This update makes Pivot Tables more consistent, more accurate, and far more useful when working with Window Calculations such as period-over-period, ratio, cumulative sum, moving average, etc.</p><h4>Number Chart: Sub-Indicator Color Flip &amp; Custom Label</h4><p>The <strong>Number</strong> chart — widely used for KPIs in dashboards — allows you to display a <strong>sub-indicator</strong> beneath the main value (such as month-over-month change or year-over-year difference). Traditionally, this sub-indicator automatically showed:</p><ul><li><strong>Green</strong> for positive changes</li><li><strong>Red</strong> for negative changes</li></ul><p>Now, in Exploratory v14, you can <strong>flip</strong> these colors whenever the meaning of “good” and “bad” is reversed for your metric.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*b6KsUFN642oJLPH7uOVwKg.png" /></figure><p>This is especially useful for KPIs where <em>lower is better</em>, such as:</p><ul><li>Cancellation rate</li><li>Return rate</li><li>Error rate</li><li>Customer complaints</li><li>Downtime</li></ul><p>With the new <strong>Flip Positive/Negative Colors</strong> option:</p><ul><li>A <strong>positive increase</strong> (e.g., churn went up) → shown in <strong>red</strong></li><li>A <strong>negative decrease</strong> (e.g., churn went down) → shown in <strong>green</strong></li></ul><p><strong>Custom Label Text for Sub-Metric</strong></p><p>And you can add a custom label text to explain what the sub-metric is.</p><p>With these updates with Number chart, your dashboards can communicate meaning more clearly without forcing your audience to reinterpret the numbers every time.</p><h3>Experience v14 — And See What’s Now Possible</h3><p>Exploratory v14 brings AI deeper into the analytics workflow than ever before — from transforming data, to interpreting charts, to generating full analysis reports. These new capabilities unlock faster, clearer, and more powerful insights for everyone, regardless of technical background.</p><p>Try the new features. Explore what they can do.<br> And let AI remove the friction so you can focus on thinking, discovering, and making better decisions.</p><p>We hope v14 transforms the way you work with data, and we can’t wait to hear what you think.</p><h3>Try Exploratory v14!</h3><p>Please download Exploratory v14 <a href="https://exploratory.io/download">here</a> and try new <strong>AI Function &amp; AI Note Editor</strong>!</p><p>If you don’t have an Exploratory account yet, please <a href="https://exploratory.io/">sign up here</a> and try it out. The first 30 days are a free trial period!</p><p>If your trial period has already expired but you would like to try this new version, please contact us via the trial extension link that appears in the dialog when you launch the latest version of Exploratory.</p><p>If you have any questions or feedback, please contact us at <a href="mailto:support@exploratory.io">support@exploratory.io</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=04c974023395" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/exploratory-v14-a-new-era-of-ai-powered-data-analysis-04c974023395">Exploratory v14 — A New Era of AI-Powered Data Analysis</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ Exploratory 13.7 Released!]]></title>
            <link>https://blog.exploratory.io/exploratory-13-7-released-a9bbb1135eaf?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/a9bbb1135eaf</guid>
            <category><![CDATA[business-intelligence]]></category>
            <category><![CDATA[announcements]]></category>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Sat, 16 Aug 2025 04:12:38 GMT</pubDate>
            <atom:updated>2025-08-16T04:11:23.263Z</atom:updated>
            <content:encoded><![CDATA[<p>We’re excited to announce the release of <strong>Exploratory v13.7</strong>!</p><p>Although this is a patch release, just like our recent updates, it comes with impactful enhancements that will make your experience faster and more flexible. Here’s what’s new.</p><h4>⚡ Dashboard Performance Improvements</h4><p>We’ve made <strong>dashboard interactions noticeably faster</strong> — both in Exploratory Desktop and on Exploratory Server.</p><p><strong>In Exploratory Desktop</strong></p><ul><li><strong>20–30% faster end-to-end chart rendering</strong> thanks to optimized rendering logic.</li><li>When “<strong>Update Other Pages</strong>” is turned on, charts on the current page now refresh immediately without waiting for other pages to update. When you switch to another page, its charts will already be ready.</li></ul><p>You can click on charts to filter the data in Dashboard.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hWwnIysGbF9qxtWl8SJikA.png" /></figure><p>Or, you can use Parameter to filter the data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZUDXhPPlznXn8Cp6iE9dbA.png" /></figure><p>💡 <strong>Pro Tip:</strong> Enable <strong>data cache</strong> on data wrangling steps used by your dashboard. This loads pre-processed data directly into the server workspace instead of re-running queries or wrangling steps — cutting launch time even more.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NApEN2TwYYHkhSga1Il1xQ.png" /></figure><h4>Dashboard at Exploratory Server</h4><p>Entering <strong>Interactive Mode</strong> is now faster, so you can start exploring your dashboard data with less waiting.</p><p>As you might have known, you can publish your Dashboard to Exploratory Server so that you can share with others. Once published, you or viewer users you have shared with can interact with the Dashboard.</p><p>But before interacting it, it needs to launch an interactive mode, which creates a working area at the server where the underlying R data processing and optionally SQL queries if used.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZZmyQFVX_i_paden2u_16g.png" /></figure><h4>Pivot Table — Custom Aggregate Functions for Totals/Subtotals</h4><p>You can now <strong>override the aggregate function</strong> used for Grand Totals and Subtotals in Pivot Tables and Summarize Tables.</p><p>By default, it uses the same function that you select for the values. Let’s say you selected ‘Sum’ function for Sales column in the Value then the same ‘Sum’ function will be used to calculate the Total and Subtotal.</p><p>But now, you can override it.</p><p><strong>Example 1: Want to use Sum for each group’s sales values but Mean for the Grand Total.</strong></p><p>In such cases, you can select ‘Mean’ in the Format dialog.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-CtumQdvChh2ijIFC_7NMA.png" /></figure><p><strong>Example 2: Hide Totals/Subtotals since they don’t make sense for my analysis (e.g., % change month-over-month).</strong></p><p>Here’s an example of calculating the % of difference from the previous value so that we can see how much % of sales growth is by each month. In this case, showing the Total and the Subtotal don’t make sense.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4RsfP6HxPDw1Q16_OagJPA.png" /></figure><h4>Filter — First Day of N Month Ago &amp; Last Day of N Month Later</h4><p>We’ve added more <strong>relative date filtering flexibility</strong>:</p><ul><li><strong>First Day of N Months Ago</strong></li><li><strong>Last Day of N Months Later</strong></li></ul><p>Perfect for scenarios like showing data from “the start of 3 months ago to the end of 3 months later.”</p><p>You can now select such options from the Filter’s dropdown.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*746tifV-tCZM12ZjaEsOnQ.png" /></figure><p>We have also added the ‘1st Day of Month’ and ‘Last Day of Month’ options under the ‘Relative Date Filter’ operator as well.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TcHoSp_WFR2Jlmf3hNbnRQ.png" /></figure><p>You can check out more details for the newly added Date filter options <a href="https://exploratory.io/note/exploratory/How-to-Specify-Periods-Using-Filters-TLs2lze5">here</a>.</p><h3>✨ Other Enhancements and Fixes</h3><p>As always, we’ve included additional improvements and bug fixes.<br>🔗 <a href="https://exploratory.io/release-notes">Release Notes — Exploratory v13.7</a></p><h3>📥 Upgrade Now to Exploratory v13.7</h3><p><a href="https://exploratory.io/download">Exploratory v13.7</a> is available today, get the latest version today!</p><p><a href="https://exploratory.io/download">Download</a></p><p>If you haven’t signed up yet, <a href="https://exploratory.io/">start your free 30-day trial</a>. If your trial has expired, you can request an extension directly inside the app.</p><p>If you have any questions or feedback, feel free to reach out to me at kan@exploratory.io — we’d always love to hear your feedback!</p><p>Cheers,<br>Kan</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a9bbb1135eaf" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/exploratory-13-7-released-a9bbb1135eaf">🚀 Exploratory 13.7 Released!</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Unlocking Insights from Open-Ended Survey Responses with AI-Powered Text Analysis in Exploratory]]></title>
            <link>https://blog.exploratory.io/unlocking-insights-from-open-ended-survey-responses-with-ai-powered-text-analysis-in-exploratory-7c134cdf9f3a?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/7c134cdf9f3a</guid>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[text-analytics]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Wed, 30 Jul 2025 11:30:24 GMT</pubDate>
            <atom:updated>2025-07-30T11:30:07.395Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QWg3dvC3Uj6ONPnF9UtWgg.png" /></figure><p>Have you ever collected survey responses to open-ended questions like <strong>“Why did you cancel?”</strong> or <strong>“Any suggestions for improvement?”</strong> — only to be left with a pile of text and no clear path to action?</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2epyj-WCJSWiyzEieVyMQQ.png" /></figure><p>For such free form text data, you can read one by one to understand what your customers or audience are saying until you get 100s or 1,000s of them. At this point, you can still read it one by one if you have a time. Or you might use AI to summarize it for you. But the problem of reading one by one is that not only it takes long time but also you lose big picture based on the patterns in such feedback text.</p><p>And the problem with AI summarizing is that you always wonder ‘Is that all? Did I miss something important?’</p><p>This is exactly where <strong>Exploratory’s Text Analysis with AI Summary</strong> comes in. It helps you uncover patterns, themes, and actionable insights from free-form text — <em>without drowning in the details.</em></p><h3>Step 1: Text Analysis with Word Count</h3><p>Start by selecting your free-text column and running the <strong>Word Count</strong> analysis.</p><p>You’ll first see a <strong>Word Cloud</strong>, which highlights the most frequently used words.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_3-WHiNVMbUi4AVhoIHcdA.png" /></figure><p>But for a clearer view, we also provide a <strong>Bar Chart</strong> that ranks word frequencies precisely.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-KUjtg0lCz4Jgr7WTD1OvA.png" /></figure><p>Want to go further? Check out the <strong>Word Combination c</strong>hart, which shows pairs of words that frequently appear together.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*q1HII3wZLl2Wttwwc2sejQ.png" /></figure><h3>Step 2: Discover Hidden Themes with the Co-Occurrence Network</h3><p>Here’s where things get interesting.</p><p>The <strong>Co-Occurrence Network Diagram</strong> maps how words relate to each other across responses. It reveals clusters of words used together, showing both common and subtle themes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*m8yEeYL00g9A0wC9A1on8Q.png" /></figure><p>While the Word Combinations chart shows only two words frequency, this network diagram shows way more than just two.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jSg2s-mcy7r3lBWyfaNsgw.png" /></figure><p>For example:</p><ul><li>The word <strong>“time”</strong> (orange cluster) often appears with <em>“presentation,” “management,” “little,”</em> and <em>“much.”</em></li><li>The word <strong>“think”</strong> (green cluster) shows connections to <em>“participants,” “similar,”</em> and <em>“good.”</em></li></ul><p>You can explore the relationships by word color (themes) and line thickness (strength of connection).</p><p>But, I know what you are thinking.</p><ul><li>This seems to be too much information, what are the patterns are we supposed to see here?</li><li>How exactly those words are used together in what context?</li></ul><p>To answer these questions, we can get a help from AI.</p><h3>Step 3: Let AI Summarize the Patterns for You</h3><p>Click the <strong>AI Summary</strong> button, and AI will analyze the network structure and provide a concise summary for each theme — complete with example comments.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QWg3dvC3Uj6ONPnF9UtWgg.png" /></figure><p>For example, the <strong>orange cluster</strong> is summarized as ‘time allocation and session management’. And it shows 5 example text such as:</p><ul><li>“I think it would be better to put some restrictions on the <strong>presentation</strong> <strong>time</strong> and the number of <strong>presentation</strong> slides so that <strong>presentation</strong> finish on <strong>time</strong>.”</li><li>There were a <strong>lot</strong> of very rich and exciting talks, but I felt it <strong>little</strong> too <strong>much</strong>. So maybe it would be good to have seminar that only focus on two case studies. Thank you very <strong>much</strong> for this <strong>time</strong>.</li></ul><p>It sounds like the participants in this seminar were not satisfied with the time management or allocation of the talks.</p><p>The cool thing about this is that it tells us something about the groups with less frequent words, which could have been easily missed by just looking at the network diagram.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*f14QeKjJudS2jeh3CPpMOg.png" /></figure><p>For example, there is a group of people talking about a desire for more data, examples, and explanations related to data analysis. And their comments are something like:</p><ul><li>I <strong>want</strong>ed to know the success pattern of <strong>data</strong> <strong>analysis</strong> for each industry.</li><li>I <strong>want</strong> you to distribute the <strong>materials</strong> as <strong>data</strong>.</li><li>I think I <strong>want</strong>ed to <strong>see</strong> more concrete <strong>examples</strong> and demos.</li></ul><p>This is where AI truly shines — it captures not only the loudest voices but also the quieter, insightful ones that often go unnoticed. And it gives me what we should do to prepare for the next seminar.</p><h3>Reality of Text Analysis with AI</h3><p>Since the rise of ChatGPT, I’ve tried using AI to analyze text directly. While it can be useful, there are common frustrations:</p><ul><li>Different AI models give <strong>different results </strong>— some of them finds 6 clusters, others only 4. What happened to the other 2?!</li><li>Sometimes it stops analyzing after a certain number of rows.</li><li>And therefore, we still have to <strong>go back and verify everything</strong> against the raw data.</li></ul><p>Why? Because large language models are <strong>probabilistic</strong>, not deterministic. They <em>can</em> be helpful — but they’re not always <em>reliable</em>.</p><h3>Why AI Summary in Exploratory Is Different</h3><p>With Exploratory, the AI doesn’t guess.</p><p>We give it the <strong>already-analyzed data</strong> — like word counts, co-occurrence stats, and clusters — so it can summarize based on structured, reliable information.</p><p>That means:</p><ul><li>More <strong>consistent</strong> and <strong>explainable</strong> results</li><li>Better performance on <strong>rare but important themes</strong></li><li>And peace of mind that what you’re seeing is grounded in data</li></ul><p>AI is pretty good at summarizing the data by using the knowledge of the world. Text analysis result like the Co-occurrence network diagram is useful but too much information for human to digest, but not for AI.</p><p>Just like we can depend on AI at the level of almost 100% for summarizing documents or articles today, we can count on AI for the accuracy of summarizing something that is already been analyzed.</p><p>As we have seen with AI Summary for Text Analysis, using AI in a somewhat narrow scope and in a specific way is a way to go when it comes to the text analysis as of today.</p><h3>Try It Yourself</h3><p>Whether you’re preparing for your next customer seminar or trying to make sense of user feedback, <strong>Text Analysis with AI Summary in Exploratory </strong>gives you high-quality insights from your text data. It’s fast, intuitive, and surprisingly powerful.</p><p>You can download Exploratory from <a href="https://exploratory.io/download">here</a>.</p><p>If you haven’t signed up yet, <a href="https://exploratory.io/">start your free 30-day trial</a>. If your trial has expired, you can request an extension directly inside the app.</p><blockquote><em>Note: The AI Summary feature is available for all editions includes </em><strong><em>Personal, Business, Business Plus, Academic</em></strong><em>, and </em><strong><em>Public</em></strong><em>! 🔥</em></blockquote><p>Let us know what you think — and we can’t wait to hear what insights you’ll be able to gain from your text data!</p><p>Cheers,</p><p>Kan (CEO / Exploratory)</p><p>Contact: kan@exploratory.io</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7c134cdf9f3a" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/unlocking-insights-from-open-ended-survey-responses-with-ai-powered-text-analysis-in-exploratory-7c134cdf9f3a">Unlocking Insights from Open-Ended Survey Responses with AI-Powered Text Analysis in Exploratory</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Exploratory v13.5 & v13.6 Released — Selective Data Re-Import, New Date Filters, 2FA, and More! ]]></title>
            <link>https://blog.exploratory.io/exploratory-v13-5-v13-6-released-selective-data-re-import-new-date-filters-2fa-and-more-6b1b7c3b6327?source=rss----6ea408ec434d---4</link>
            <guid isPermaLink="false">https://medium.com/p/6b1b7c3b6327</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[announcements]]></category>
            <category><![CDATA[data-analysis]]></category>
            <dc:creator><![CDATA[Kan Nishida]]></dc:creator>
            <pubDate>Tue, 29 Jul 2025 15:33:51 GMT</pubDate>
            <atom:updated>2025-07-29T15:33:29.300Z</atom:updated>
            <content:encoded><![CDATA[<h3>Exploratory v13.5 &amp; v13.6 Released — Selective Data Re-Import, New Date Filters, 2FA, and More! 🚀</h3><p>We’re excited to announce that <strong>Exploratory Desktop v13.5</strong> was released last night — and <strong>v13.6</strong> followed this morning! 🎉</p><p>These are patch releases with several useful enhancements and important bug fixes to make your experience smoother and more secure.</p><p>Here are some of the key updates:</p><h3>🚀 Feature Highlights — v13.5 / v13.6</h3><h4><strong>Data Re-Import</strong></h4><p>When you have multiple data sources for your data frame, now you can choose exactly which ones to re-import from — giving you more control and flexibility.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*3hcMU9FRuV8AaCGD" /></figure><h4><strong>Date Filter</strong></h4><p>We’ve added new options to the Date Filter, including:</p><ul><li>“1st Day of N Months Ago”</li><li>“Last Day of Next N Months”<br>…and more to make dynamic filtering even easier.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*a2YSNuy7RdWiVQFY" /></figure><h4><strong>Pivot / Summarize Table</strong></h4><p>When exporting from Pivot Table or Summarize Table, <strong>Grand Totals and Subtotals</strong> are now included automatically in the exported data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*C4iW5jVPvS392Vsa" /></figure><h4><strong>Security</strong></h4><p>You can now enable <strong>Two-Factor Authentication (2FA)</strong> or <strong>Multi-Factor Authentication (MFA)</strong> from your <a href="https://exploratory.intercom-clicks.com/via/e?ob=L%2BlhZ0DawlqP7MwrPucivFpRfq49Lv9vguhwpQPIpqBP4kg8W34xZnde6h6GKXWq&amp;h=c6c69d1ff16dd9c103edd8e60916d08141a3b0c0-b6uma1h1_215470064573435&amp;l=6de2563e2475f7b29a2a3479aa492305f17825bf-162766537">Account Setting</a> page for added security.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*c0-sSd-qwHGWAkv4" /></figure><h3>✨ Plus Other Enhancements and Fixes</h3><p>For other enhancements and bug fixes with Exploratory v13.5 / v13.6, you can check out the release notes here:<br>🔗 <a href="https://exploratory.io/release-notes">Release Notes — Exploratory v13.6</a></p><h3>📥 Upgrade Now to Exploratory v13.6</h3><p><a href="https://exploratory.io/download">Exploratory v13.6</a> is available today, get the latest version today!</p><p><a href="https://exploratory.io/download">Download</a></p><p>If you haven’t signed up yet, <a href="https://exploratory.io/">start your free 30-day trial</a>. If your trial has expired, you can request an extension directly inside the app.</p><p>If you have any questions or feedback, feel free to reach out to me at kan@exploratory.io — we’d always love to hear your feedback!</p><p>Cheers,<br>Kan</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6b1b7c3b6327" width="1" height="1" alt=""><hr><p><a href="https://blog.exploratory.io/exploratory-v13-5-v13-6-released-selective-data-re-import-new-date-filters-2fa-and-more-6b1b7c3b6327">Exploratory v13.5 &amp; v13.6 Released — Selective Data Re-Import, New Date Filters, 2FA, and More! 🚀</a> was originally published in <a href="https://blog.exploratory.io">learn data science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>