How to visually audit Excel workbooks with Python

In a previous post, I built a simple Python tool to audit an Excel workbook. The goal there was fairly straightforward: scan the model programmatically and flag common structural issues such as hard-coded numbers in formulas, constants living outside designated input sheets, external links, and formulas that drift out of pattern down a column.

How to build a simple Excel model auditor with Python

That approach gives us something useful: a list of potential problems. But lists of warnings only get you so far. They tell you what might be wrong, but they do not help you understand how the model itself is structured.

In this post I want to take the idea a step further and show how to visualize the structure of an Excel model using Python. Instead of just printing audit findings, we start thinking about the workbook as a network of dependencies. Once you look at it that way, you can actually map the model and make its logic visible.

If you want to follow along, you can download the demo workbook and the accompanying notebook below.

Download the exercise file here

The notebook and full output are presented below. An explanation of what the script is doing and why these checks matter follows afterward.

At a very high level, the notebook treats the spreadsheet less like a grid and more like a system of connected calculations. That shift in perspective matters because many spreadsheet problems are not obvious when you look at cells one by one. They become clearer when you look at how information flows through the model.

With that in place, let’s walk through the code step by step and look at what each part is doing and what it tells us about the structure of the workbook.

Step 1: Loading the workbook as structured data

The first important move in the notebook is this:

file = "demo_financial_model.xlsx"
wb = openpyxl.load_workbook(file, data_only=False)
print("Sheets:", wb.sheetnames)

When this runs, Python opens the workbook and reads it as a structured file rather than as a live Excel interface. That distinction matters.

Inside Excel, you are usually interacting with the model visually. You click cells, inspect formulas, trace arrows, and move around sheet by sheet. Python approaches the workbook differently. It reads the file as data. That means it can inspect every worksheet, every cell, and every formula systematically.

In this example, the workbook contains three sheets:

['Inputs', 'Calculations', 'Outputs']

That already tells us something useful about the intended design of the model. The workbook appears to be organized using a familiar structure:

Inputs for assumptions
Calculations for the working logic
Outputs for final results

That separation is common in well-organized analytical models, and it gives the rest of the notebook something to work with. The code can now ask whether the workbook is actually behaving the way that structure suggests it should.

Step 2: Looking for hardcoded constants outside the Inputs sheet

The next section of the notebook scans the workbook for numeric constants that appear outside the Inputs sheet:

hardcoded_cells = set()
input_sheets = ["Inputs"]for sheet in wb.sheetnames:
    ws = wb[sheet]    for row in ws.iter_rows():
        for cell in row:            if cell.data_type == "n":
                if sheet not in input_sheets:
                    if cell.value is not None:
                        hardcoded_cells.add(f"{sheet}!{cell.coordinate}")

This part is based on a simple modeling principle: in many structured workbooks, important assumptions should live in dedicated input areas, not be scattered across calculation sheets.

When assumptions are centralized, they are easier to review, update, and explain. If key numbers start appearing directly inside the calculation layer, the model becomes harder to trust. A reviewer has to ask: is this value intentional, or did someone overwrite a formula?

In this example, the notebook reports:

Hardcoded cells: {'Calculations!A3', 'Calculations!A4', 'Calculations!A2', 'Calculations!C2'}

This ties directly back to the previous post. There, I already discussed that A2, A3, and A4 were not especially concerning because they are just year labels. But Calculations!C2 was more significant. It represented a hardcoded revenue value in a place where I would normally expect a formula driven by inputs such as units and price.

That is the first reason this notebook matters. It is not just building a pretty visualization. It is carrying forward actual audit findings from the earlier script and placing them into the structural picture of the model.

Step 3: Expanding Excel ranges so the relationships are real

Next, the notebook defines a helper function called expand_range():

def expand_range(sheet, start, end):    col1, row1 = coordinate_from_string(start)
    col2, row2 = coordinate_from_string(end)    col1_i = column_index_from_string(col1)
    col2_i = column_index_from_string(col2)    cells = []    for col in range(col1_i, col2_i + 1):
        for row in range(row1, row2 + 1):
            coord = f"{get_column_letter(col)}{row}"
            cells.append(f"{sheet}!{coord}")    return cells

This may look technical, but the idea behind it is very straightforward.

Excel formulas often refer to ranges instead of individual cells. For example, a formula might sum a range like E2:E4. To a human reader, that clearly means three cells. But if we want to build a dependency graph, we cannot treat E2:E4 as one mysterious blob. We need to expand it into its underlying parts:

E2
E3
E4

That is exactly what this function does. If we skip this step, the graph will miss real dependencies and the workbook will appear simpler than it actually is. A model might look like it has fewer connections than it really does, and that could hide the true influence of certain cells.

This is a good example of where Python adds value beyond Excel’s built-in tools. Excel can certainly evaluate a range formula correctly, but Python lets us unpack that formula into explicit relationships that can be counted, visualized, and analyzed.

Step 4: Building the dependency graph

This is the core of the notebook:

G = nx.DiGraph()cell_pattern = r"[A-Za-z_]+!\$?[A-Za-z]+\$?\d+"
range_pattern = r"([A-Za-z_]+)!([A-Za-z]+\d+):([A-Za-z]+\d+)"for sheet in wb.sheetnames:
    ws = wb[sheet]    for row in ws.iter_rows():
        for cell in row:            location = f"{sheet}!{cell.coordinate}"            if cell.data_type == "f":                formula = cell.value                ranges = re.findall(range_pattern, formula)                for r in ranges:
                    rng_sheet, start, end = r
                    expanded = expand_range(rng_sheet, start, end)                    for ref in expanded:
                        G.add_edge(ref, location)                refs = re.findall(cell_pattern, formula)                for ref in refs:
                    ref = ref.replace("$","")
                    G.add_edge(ref, location)

At a high level, this code scans every formula in the workbook and asks a simple question:

Which cells does this formula depend on?

For every dependency it finds, it adds an edge to the graph. This is where the ideas of nodes and edges become important.

A node is a cell
An edge is a dependency relationship between cells

If one cell feeds another, the graph captures that as an arrow. So if Calculations!B3 depends on Inputs!B2, the graph includes a relationship like this:

Inputs!B2 → Calculations!B3

That arrow matters because it tells us something about the model’s logic. It says that the value in the calculation cell is not independent. It comes from somewhere else. If the source changes, the downstream cell changes too.

This is one of the biggest conceptual shifts in the notebook. Instead of seeing the workbook as just a set of formulas, we start seeing it as a directed network of cause and effect.

In this example, after building the graph, the notebook reports:

Nodes: 19
Edges: 10

So in this small workbook, Python has identified 19 participating cells and 10 relationships between them.

That may not sound dramatic, but it is the foundation for everything that follows. Once those relationships are represented as a graph, we can ask much more interesting questions than Excel’s built-in tools usually make easy.

Step 5: Making sure disconnected hardcoded cells still show up

After building the graph from formulas, the notebook adds this step:

for cell in hardcoded_cells:
    if cell not in G:
        G.add_node(cell)

If a value is hardcoded and not referenced by any formulas, it may never appear in the graph automatically. That is because the graph is being built from formula relationships. No formula relationship means no edge, and possibly no node.

But from an auditing perspective, that kind of disconnected cell can be extremely important.

Take Calculations!C2 in this workbook. That cell contains a hardcoded revenue value. If nothing is pointing to it and it is not pointing anywhere else, it could easily disappear from the graph unless we explicitly add it. But the fact that it is disconnected is itself informative. It tells us that the value is not participating in the model’s logic the way we might expect.

So this step ensures that hardcoded constants still appear in the visualization, even if they are structurally isolated.

That matters because the absence of a connection can be just as meaningful as the presence of one. A disconnected calculation may indicate a harmless label, an abandoned piece of logic, or a genuine modeling error. The graph should give us the chance to see that.

Step 6: Coloring the nodes so the structure becomes readable

Next, the notebook assigns colors based on each node’s role:

def node_color(node):    if node in hardcoded_cells:
        return "gold"    if node.startswith("Inputs"):
        return "lightgreen"    elif node.startswith("Calculations"):
        return "lightskyblue"    elif node.startswith("Outputs"):
        return "salmon"    return "lightgray"

This is not just a cosmetic step. It helps translate the graph into something an Excel user can interpret quickly.

The colors in this example mean:

green = inputs
blue = calculations
red = outputs
gold = hardcoded constants outside the input layer

That last category is especially useful because it overlays the audit results from the first notebook directly onto the model structure. Instead of only printing a list of suspicious cells, we can now see where those cells live inside the architecture of the workbook.

This is where the graph starts to become more than just a technical artifact. It becomes an explanatory tool.

For example, a gold node sitting inside the Calculations sheet tells a story. It says: this cell is behaving more like an input, but it is not living in the input layer. That is a structural inconsistency.

That is exactly what was happening with Calculations!C2.

Step 7: Forcing a layered layout that mirrors how models are designed

Then the notebook creates positions for the nodes:

layers = {
    "Inputs": 0,
    "Calculations": 1,
    "Outputs": 2
}pos = {}
y_positions = {}for node in sorted(G.nodes):    sheet = node.split("!")[0]
    layer = layers.get(sheet, 3)    y_positions.setdefault(sheet, 0)    pos[node] = (layer, -y_positions[sheet])
    y_positions[sheet] += 1

This step matters because raw network graphs can get messy very quickly. If we simply let a graphing library place everything wherever it wants, the result often looks like a tangle of arrows.

Here, instead of using a free-form layout, the notebook deliberately places the sheets in columns from left to right:

Inputs → Calculations → Outputs

That mirrors the intended design of the workbook itself.

This is a subtle but important choice. It makes the graph easier for Excel users to read because it aligns with the way they already think about a model. Inputs feed calculations, and calculations feed outputs.

So even though the graph is generated with Python, it still respects familiar spreadsheet design principles. The purpose is not to turn Excel into an abstract math problem. The purpose is to make the model’s structure more visible.

Step 8: Drawing the graph and sizing nodes by influence

The graph is then drawn with this code:

plt.figure(figsize=(12,10))sizes = [400 + len(nx.descendants(G,n))*100 for n in G.nodes]nx.draw(
    G,
    pos,
    with_labels=True,
    node_color=colors,
    node_size=sizes,
    font_size=8,
    arrows=True
)plt.title("Excel Model Dependency Graph")
plt.show()

Here’s the resulting visualization of our model:

Two things are happening here. First, the notebook is visualizing the dependency network we just built. Second, it is sizing each node based on the number of downstream cells that depend on it. In other words, cells that influence more of the model appear larger.

This matters because not all cells are equally important. Some cells are relatively local. Others sit near the top of the dependency chain and affect many downstream calculations.

Sizing nodes this way helps the graph communicate influence, not just existence.

That is another place where Python adds value beyond Excel’s built-in auditing tools. Excel can tell you the direct precedents or dependents of a selected cell, but Python can help summarize influence more broadly across the model.

Even in a small workbook, this starts to reveal which parts of the spreadsheet are structurally important.

Step 9: Identifying orphan cells

Finally, the notebook looks for orphan cells:

orphans = [n for n in G.nodes if G.degree(n) == 0]
print("Orphan cells:", orphans)

In this example, the result is:

Orphan cells: ['Calculations!A3', 'Calculations!A4', 'Calculations!A2', 'Calculations!C2']

This is one of the most useful outputs in the notebook because it reinforces the earlier audit findings with structural evidence.

The year labels in A2, A3, and A4 are disconnected because they are simply labels. That is fine.

But Calculations!C2 is different. It is disconnected for a more interesting reason: it is a hardcoded value sitting outside the normal assumption flow of the model.

That is exactly what I pointed out in the previous post. There, the script flagged Calculations!C2 as a hardcoded constant. Here, the graph shows us something more: it is not just hardcoded, it is also structurally isolated.

That combination makes the issue much easier to understand. It is no longer just a suspicious number on a list. It is a number that sits outside the model’s normal flow.

How this compares to Excel’s built-in auditing tools

Excel already includes several useful tools for auditing models. Features such as Trace Precedents, Trace Dependents, and Show Formulas can be very helpful when you are trying to understand a particular formula or troubleshoot a specific section of a workbook.

Where these tools become more limited is at the workbook level. They are designed primarily for interactive, cell-by-cell inspection. That approach works well when you already know where to look, but it becomes harder when you want to step back and understand the structure of the entire model.

The Python approach used in this notebook is aimed at a different level of analysis. Instead of examining formulas individually, the script scans the workbook in a single pass, extracts the relationships between cells programmatically, and then summarizes the structure of the model.

In practical terms, this means we can inspect the entire workbook at once rather than reviewing formulas one at a time. It also allows the same logic to be applied consistently across multiple files, makes it possible to combine structural visualization with audit findings, and can automatically surface cells that appear disconnected or suspicious.

The goal is not to replace Excel’s built-in auditing features. Rather, Python provides an additional layer of visibility. Excel remains excellent for manually exploring individual formulas and tracing specific dependencies. Python, on the other hand, is well suited for stepping back, analyzing a workbook systematically, and creating repeatable checks. Used together, the two approaches complement each other very well.

Why this matters now

This kind of workflow is becoming increasingly relevant as spreadsheet generation becomes faster and easier. AI tools can now produce workbook structures, formulas, and even entire models from prompts. As that capability grows, the bottleneck shifts. The challenge is no longer simply producing spreadsheets; it is understanding them, validating them, and maintaining them.

That is where an approach like this becomes useful. With a relatively small amount of code, we can begin to inspect workbook structure automatically, visualize how logic flows through the model, highlight structural violations, and identify cells that fall outside the intended design.

None of this removes the need for human judgment. As the earlier post showed, some audit findings will inevitably be false positives that require interpretation. What it does provide is a faster and more focused way to review a model. Instead of manually searching through a workbook for potential issues, we can use code to narrow down where our attention is most needed.

Want to build tools like this for your team?

This example is intentionally small, but the underlying idea is very practical. Teams that rely heavily on Excel can use Python to build lightweight validation tools, auditing utilities, and analysis workflows around their spreadsheets.

If you want to set up tools like this for your team, strengthen Python capabilities around Excel, or build more reliable analytics workflows, take a look at my How I Work page:

How I work

That is where I outline how I help organizations combine Excel, Python, and modern AI tools in ways that are useful, practical, and grounded in real analytical work.

How to visually audit Excel workbooks with Python

Step 1: Loading the workbook as structured data

Step 2: Looking for hardcoded constants outside the Inputs sheet

Step 3: Expanding Excel ranges so the relationships are real

Step 4: Building the dependency graph

Step 5: Making sure disconnected hardcoded cells still show up

Step 6: Coloring the nodes so the structure becomes readable

Step 7: Forcing a layered layout that mirrors how models are designed

Step 8: Drawing the graph and sizing nodes by influence

Step 9: Identifying orphan cells

How this compares to Excel’s built-in auditing tools

Why this matters now

Want to build tools like this for your team?

Like this:

Related

Want more Excel + AI insights? Join my newsletter.

Thank you for signing up. I look forward to becoming savvier about data with you.

Step 1: Loading the workbook as structured data

Step 2: Looking for hardcoded constants outside the Inputs sheet

Step 3: Expanding Excel ranges so the relationships are real

Step 4: Building the dependency graph

Step 5: Making sure disconnected hardcoded cells still show up

Step 6: Coloring the nodes so the structure becomes readable

Step 7: Forcing a layered layout that mirrors how models are designed

Step 8: Drawing the graph and sizing nodes by influence

Step 9: Identifying orphan cells

How this compares to Excel’s built-in auditing tools

Why this matters now

Want to build tools like this for your team?

Share this:

Like this:

Related

Want more Excel + AI insights? Join my newsletter.

Thank you for signing up. I look forward to becoming savvier about data with you.