Paradigm Shift

Embracing Disagreement

Sat, 04 Oct 2025 01:00:00 +0000

For years, I was that engineer who obsessed over the perfect abstraction, the cleanest architecture, the most elegant solution. I'd spend hours refactoring code and building features that were interesting but not valuable. I was optimizing for everything except what mattered most: building products that people actually wanted.

This shift from code-centric to impact-centric thinking doesn't happen naturally. It required unlearning years of training that taught me to optimize for technical excellence. It also required embracing those conversations that make us feel uncomfortable at times to question assumptions we've held sacred.

What I've discovered is that the path from over confident engineer to builder is paved with disagreement. A very specific type of productive tension can be helpful for teams to confront what really matters.

The trouble with teams is that they develop unspoken rules, values, and habits that settle in over time. Everyone knows "this is how we do things here" until someone new joins and suddenly there's friction. Maybe you've been the newcomer who sees things differently, or perhaps you've been on the team when a fresh perspective comes to shake things up. In that moment, there's a choice: Do we smooth over the differences and keep the peace, or do we lean into tension and discover what really matters?

My goal here isn't to confirm what you already believe. Instead I plan to challenge you, to make you question the practices and assumptions that have defined your career. Because here's the truth: disagreement doesn't just make for better code—it makes for stronger teams and better products.

What I've learned is that there are three specific types of disagreement that, when embraced, become a competitive advantage.

Disagreement About Process

The first is disagreement about the process. Simply put, do less but more often.

Before we dive in, I want to recommend the book Shape Up, because it marks a defining moment in my software career. This book did something for me that I know it can do for you. It provides clarity about scope and brought to light arguably the most important skill of the current age given advancements in tooling especially.

If we had asked how long would it take to build Basecamp, we never would have got it done. But if we had asked instead, what is a version of chat, for example, that we could deliver by the end of the week? Well, that's something entirely different.

So when I say do less, but do it more often, I'm not talking about working fewer hours or Claude Code. I'm talking about agreeing to deliver less. And if we agree to this individually and collectively, we not only move quicker by cutting scope, but we de-risk the project by reducing time to market.

First you negotiate to the very essential core as the authors refer to it. Then you expand on this core over time until you reach a point where the effort outweighs the value people are getting from it. Consistent delivery allows you to find this sweet spot without the disappointment that's so rampant in our industry today.

Here's where the disagreement comes. When you start proposing this approach, you'll meet resistance and it will likely come from other programmers. We're trained to be meticulous and critical about details without exception. We're taught that shortcuts lead to technical debt, that rushing leads to bugs, that proper process prevents problems. This mismatch will put you into disagreements regularly. Your fellow engineers will push back.

Your instinct, like mine, might be to avoid that conflict and just go along with the existing process. But organizations move at the speed of trust. The consistency and frequency of delivery are valuable and offer the most effective way to build that trust. And this process helps you and the team build the muscle memory required for scope hammering, with the added bonus that it helps you get to market faster and validate your ideas and often your assumptions.

I was working with a team that historically delivered less often. Our standups were very technical and involved vague terms that kept our non-technical people at a distance. In short, this team isolated itself from collaboration and shared business objectives. I knew we could win the trust of our leaders by showing working software instead of making excuses. I knew this would invite more collaboration and foster real innovation. I knew proposing this would put me in direct conflict with the existing team.

So instead of debating pros and cons endlessly, I simply started doing the work and modeled it for the wider team. I certainly felt some tension, but when questioned, I pointed to our goals and made it clear I wasn't cutting corners but scope. I was intentionally choosing progress over perfection daily, and it was working.

In just a few short weeks, our business leaders felt at home in these standups because the discussion was much more outcome oriented. And this new energy and engagement brought a much faster feedback loop that our team had never seen, and it naturally helped us surface answers to questions we'd stumbled on just the day before.

To my surprise, most of the engineers followed my example, and the team made this transition from surviving to thriving. The cultural shift was uncomfortable at times but pushing through it transformed this company, allowing us to deliver more value more often.

Disagreement About Details

Nearly all software development policies and processes exist to prevent software from being released.

I frame it this way because we're going to be looking at disagreement in the details. And if you break down a lot of our jobs day-to-day, it is truly just managing these details.

What I'll be sharing here may challenge the very decision-making process that you have day-to-day. First, I want to start out by sharing the two different phases that show up in every project. You have the uphill work, where there's a lot of uncertainty and you're doing a lot of problem solving. Then there's the downhill work, where it's more about execution.

Early in the project, the details don't carry as much weight. But over time, as things become more clear, more understood, they become more important. The trouble shows up when we apply equal weight to every possible sharp edge and detail that we encounter. We rarely take the time required to consider how important one detail is versus another. So we just pile on the scope, which ultimately delays delivery.

This is where the daily demo comes in handy. Simply put, you're going to show the previous day's work. And the time pressure you feel to deliver frequently like this provides the forcing function required to cut scope aggressively. This magnifies the details that matter most and helps the team minimize a focus on edge cases that can be tackled another day.

The added bonus here of the daily demo is that working software quickly provides the clarity to set expectations. During these demos, make it a habit to encourage feedback from everyone involved. It doesn't mean that you commit to everything, but it does create this amazing feedback loop. And you do get a ton of value as a team here surfacing, not only problems, but often solutions.

And this is when those disagreements begin to surface. This style of work often conflicts with more traditional roles and even the reward systems that optimize for code above all else. Remind everyone that incremental investment leads to monumental advancement. We're not shooting for a big bang release. This is truly working lean.

One of the challenges with working lean is that decision-making day-to-day just looks radically different. One of the benefits though, is disagreement at this stage helps your team arrive at done more quickly by avoiding gold plating and bike shedding that run rampant when unchecked. The added bonus of course is you find technical issues faster because you're deploying more often. And equally important, you might flush out gaps in your domain knowledge by getting this in front of real stakeholders sooner.

Recently, I went through a code review and I was using just a dictionary instead of a type class in Python. During this review that was flagged and the engineer said this would be an improvement. And this reminded me that if we optimize for code on our teams and in our organizations, we're going to get more time in the code. It's only when you optimize for impact that you consider other factors like speed to market, learning, or simply return on investment.

Now the individual wasn't doing anything out of the ordinary. In fact, most of us are actually rewarded here to be champions of code above all else. And one of the challenges you're going to face here is that if the team is not aligned about what matters most, technical people, well, they'll do technical things but never lose sight of the impact.

Now this disagreement actually gave us an opportunity to discuss the real impact and investment of time. So we talked about whether this improvement was necessary for the current iteration and whether we could wait until we validated this feature with real users.

Now, I want to be clear. If you're working in a classic reward system, you may not always win these arguments. That's okay. Because you still shared something very important that technical people often miss, and that is the distinction between progress and perfection.

Disagreement About Purpose

For the final section here, we're going to talk about purpose and disagreement there. I recommend this book to you called Impact First Product Teams by Matt LeMay. In it, you will find the most practical career advice for someone writing software in 2025. It includes a call for action that will be difficult for most of us, but one that is necessary to grasp as we enter a future where the market is very different than it has been for most of my lifetime working in tech.

If nobody else is talking about it, ask about the highest impact work you can do. Avoid what Matt calls the "low impact death spiral." This is easy, but often not helpful work that product teams get trapped in.

Now there's a vast gap between the work and the impact, and we fill this with useful things like strategy, discovery, scoping, etc. This is often helpful, but the key here is throughout we should be making decisions with impact in view. But so often, the impact feels far away and the technical details are all too familiar, comfortable, and tangible.

And this is the deepest level of disagreement, the most challenging for teams, honestly: questioning not just how we do the work, but applying real judgment about what work we choose to do at all.

Something happened during our quarterly onsite that put our team's impact front and center. I put together a team building activity that essentially said, come up with the money to cover the team's cost each year, or we'll close up shop. Suddenly, every idea was on the table. This constraint stripped away all the comfortable busy work and forced us into difficult conversations.

Should we focus on saving money or making money? This, of course, divided the group initially. What features actually drive revenue versus features that are interesting? What problems are we solving that people will actually pay for?

After lots of refinement, thinking, and working through the problem and solution pairs, we landed on a single solution that everyone was energized about. The outcome was a clear, high-impact project with real potential to increase our revenue. And this type of disagreement is not only necessary, but critical for the longevity of the business. Without it, we choose to tackle low-risk and often low-value work that does little to move the business forward.

Your Competitive Advantage

My hope is that you will be set apart, not by following my advice or conventional wisdom from companies like Meta or Google, but by identifying those opportunities hidden in disagreement and turning them into your strategic advantage.

Here's the key insight: Humility is the prerequisite for productive disagreement because humility allows you to cut scope despite your intuition, ship imperfect code despite your instincts, and question the fundamental assumptions to surface truly high-impact work. With a measure of humility, this will bring your teams closer to impact, however that looks in your organization.

Remember, that lens you bring to the team is really important. It becomes a problem when you think it's the most important.

So disagreements are out there, and they're coming for you, whether you embrace them or not. The question is: will you see it as a burden or a competitive advantage?

Building BM25 Keyword Search In Postgres

Sat, 16 Aug 2025 01:00:00 +0000

It turns out, you can build a surprisingly powerful keyword search, known as BM25, with just a handful of SQL functions. What follows is my journey including what I've learned, how this algorithm works in simple terms, and how I implemented it without any external dependencies.

The Foundation: From TF-IDF to BM25

Before diving into BM25, it's helpful to understand TF-IDF (Term Frequency-Inverse Document Frequency). It's a simple but powerful idea that helps determine the importance of a word per document within a wider collection of documents.

Term Frequency (TF): This is just what it sounds like. How often does a term appear in a single document? The intuition is that if the word "data" appears 5 times in a 100-word article, it's probably more relevant to that article than a word that appears only once in that same document.
Inverse Document Frequency (IDF): This measures how rare a word is across *all* documents. Common words like "the" or "a" will appear everywhere and have a low IDF score, effectively penalizing them. Rare words will appear in fewer documents and should have a higher IDF score, signaling that they are more significant.

By multiplying these two scores, you get the TF-IDF score, which gives you a weighted value for how important a word is to a specific document. This was the starting point for my exploration.

BM25 is essentially a more refined version of this. It builds on the core concepts of TF-IDF but adds a couple of clever improvements:

Term Frequency Saturation: BM25 recognizes that a word's relevance doesn't increase linearly. The difference between a word appearing zero times and one time is huge. The difference between it appearing 20 times and 21 times is negligible. BM25 provides a simple configuration so you can control how quickly the term frequency score "saturates."
Document Length Normalization: Longer documents naturally have an advantage because they have more opportunities for a search term to appear. BM25 provides a simple configuration so you can penalize longer documents to level the playing field, ensuring that relevance isn't just a side effect of verbosity.

The Implementation

The beauty of this approach is that the entire search algorithm lives within Postgres and can be called from any tech stack.

Here's a breakdown of the core components:

Text Processing and Tokenization

The first step in any search system is to break down raw text into meaningful terms, or tokens. I created a tokenize_and_count function to handle this. It takes a block of text and converts it to lowercase, removes common stop words using the built-in English stop words list (like 'a', 'the', 'is'), uses stemming to reduce words to their root form (e.g., "canceling" and "canceled" both become "cancel"), filters out blank tokens, and finally returns a table of unique terms and their frequencies in the text.

    CREATE OR REPLACE FUNCTION tokenize_and_count(input_text TEXT)
    RETURNS TABLE (term TEXT, count INTEGER) AS $$
    BEGIN
        -- Get stemmed tokens with stop words removed
        RETURN QUERY
        WITH tokens AS (
            SELECT word
            FROM ts_parse('default', lower(input_text)) AS t(tokid, word)
            WHERE tokid != 12  -- Filter out blank tokens
        ),
        processed_tokens AS (
            SELECT
                CASE
                    WHEN ts_lexize('public.simple_dict', word) = '{}'
                    THEN NULL  -- It's a stop word
                    ELSE COALESCE(
                        (ts_lexize('public.english_stem', word))[1],
                        word
                    )
                END AS processed_word
            FROM tokens
        )
        SELECT
            processed_word as term,
            COUNT(*)::INTEGER
        FROM processed_tokens
        WHERE processed_word IS NOT NULL
        GROUP BY processed_word;
    END;
    $$ LANGUAGE plpgsql;

Storing Statistics for Performance

Calculating these scores on the fly for every document during every search would be incredibly slow. To optimize this, I created two key tables:

verse_stats: This table stores the pre-calculated term frequencies for each document. It holds the document's ID, its total length, and a JSONB object mapping each term to its count.
term_stats: This table stores global statistics about each term, such as how many documents it appears in. This is crucial for calculating the IDF score efficiently.

I also created a materialized view, global_stats, to keep track of the total number of documents and the average document length across the entire corpus, which is needed for the BM25 calculation.

The BM25 Scoring Function

With the data structures in place, I implemented the core BM25 logic. The calculate_idf function uses the BM25 formula to determine a term's weight. It's slightly more robust than the standard TF-IDF version to ensure scores are non-negative.

    CREATE OR REPLACE FUNCTION calculate_idf(term_doc_count INTEGER, total_docs INTEGER)
    RETURNS FLOAT AS $$
    BEGIN
      -- Modified BM25 IDF formula to ensure non-negative scores
      RETURN ln(1 + (total_docs - term_doc_count + 0.5) /
                    (term_doc_count + 0.5));
    END;
    $$ LANGUAGE plpgsql;

The bm25_term_score function brings it all together, combining the term frequency, document length, and IDF score to produce the final score for a single term in a single document.

Putting it all Together

The final piece is the search_verses function. This function orchestrates the entire process: it takes a user's query text, tokenizes it, and to handle typos, it uses the pg_trgm extension to find terms in term_stats that are similar to the (potentially misspelled) query terms. It then joins the query terms against the verse_stats and term_stats tables, calculates the bm25_term_score for each matching term in each document, sums them up to get a final relevance score, and returns the top results.

    CREATE OR REPLACE FUNCTION search_verses(
        query_text TEXT,
        k1 FLOAT DEFAULT 1.2,
        b FLOAT DEFAULT 0.75,
        limit_val INTEGER DEFAULT 10,
        similarity_threshold FLOAT DEFAULT 0.3
    ) RETURNS TABLE (
        verse_id BIGINT,
        score FLOAT,
        content TEXT
    ) AS $$
    DECLARE
        v_total_docs INTEGER;
        v_avg_length FLOAT;
    BEGIN
        SELECT gs.total_docs, gs.avg_length
        INTO v_total_docs, v_avg_length
        FROM global_stats gs;
        
        RETURN QUERY
        WITH raw_query_terms AS (
            SELECT term
            FROM tokenize_and_count(query_text)
        ),
        query_terms AS (
            SELECT DISTINCT corrected_term AS term
            FROM raw_query_terms rqt
            CROSS JOIN LATERAL (
                SELECT ts.term AS corrected_term
                FROM term_stats ts
                WHERE similarity(rqt.term, ts.term) >= similarity_threshold
                ORDER BY similarity(rqt.term, ts.term) DESC
                LIMIT 1
            ) AS best_match
        ),
        term_scores AS (
            SELECT
                d.verse_id,
                bm25_term_score(
                    (d.terms->>t.term)::INTEGER,
                    d.length,
                    calculate_idf(ts.doc_count, v_total_docs),
                    v_avg_length,
                    k1,
                    b
                ) AS term_score,
                v.text AS doc_text
            FROM
                verse_stats d
            JOIN
                verses v ON v.id = d.verse_id
            JOIN
                query_terms t ON d.terms ? t.term
            JOIN
                term_stats ts ON ts.term = t.term
        )
        SELECT
            ts.verse_id,
            SUM(ts.term_score) AS score,
            ts.doc_text AS content
        FROM
            term_scores ts
        GROUP BY
            ts.verse_id,
            ts.doc_text
        ORDER BY
            score DESC
        LIMIT limit_val;
    END;
    $$ LANGUAGE plpgsql;

Highlights

With everything fully operational I decided to summarize what I've done and benchmark this against something much simpler to get a feel for how this relevance scoring worked in practice.

BM25 Ranking: Industry-standard relevance scoring with proper IDF calculation and document length normalization
PostgreSQL-Native: Zero external dependencies, using built-in text search and JSONB
Fuzzy Matching: Trigram-based typo correction with configurable similarity threshold
Text Analysis: Complete pipeline with tokenization, stemming and stopword removal
Incremental Updates: Efficient document re-indexing with proper term statistics maintenance
Optimized Storage: JSONB term frequencies with GIN indexing for fast lookups
Materialized Statistics: Pre-computed global metrics for efficient scoring

To benchmark this I wrote a simple BEIR script that showed a clear improvement in relevance ranking.

NDCG@10 (relevance)

BM25: 0.2940

ILIKE: 0.1105

BM25 leads by +166.1%

Precision@10 (accuracy of the top 10 results)

BM25: 0.2806

ILIKE: 0.1747

BM25 leads by +60.6%

Recall % (found relevant documents)

BM25: 66.7%

ILIKE: 28.1%

BM25 leads by +38.6pp

The result is a fast, efficient, and typo-tolerant search engine built without any external dependencies. It was a fascinating journey into the mechanics of search, and it's incredibly satisfying to see it working so well. Hopefully, this sheds some light on how modern keyword search works and inspires you to see what you can build with the tools you already have.

Writing the BERT Encoder with Nx

Sun, 22 Jun 2025 01:00:00 +0000

I set out to build a BERT like encoder from scratch this year. But that's not what happened. Instead I spent months stumbling over my inexperience, hitting walls, abandoning assumptions, and ultimately leaning on pretrained weights just to get off the ground. This is that retrospective—every wrong turn, every silent bug, and the hard-won lessons that came from building something I barely understood when I started.

The Vision and the Naivety

My original goal was ambitious: write an encoder from the ground up, train it on biblical text using Masked Language Modeling, and build a hybrid search for the New Testament. I wanted to understand embeddings at the deepest level, and this project was the proving ground. With the original BERT paper in hand I set out to reproduce their success on a much smaller corpus—and I was wildly underestimating what lay ahead. Nothing about this would come together quickly, and early wins like moving from SGD to the Adam optimizer only revealed the volume of problems hiding underneath.

What followed was months of exploration, discovery, dead ends, and hard-won lessons. Like all of my adventures, the mistakes were the most interesting part of the story.

The First Wall: Padding Masks

The biggest trap I fell into early on was assuming the compiler would find bugs for me. I jumped straight into training runs with complex architectures, convinced that more layers meant better results. I should have started by trying to memorize 100 tokens over 10 epochs to validate simple base assumptions.

For a while I did see training loss improve, but validation loss was inconsistent. I didn't train long enough to see the plateau, which gave me a false impression that my attention implementation was working. Eventually I found that learning a single verse was possible, but when I tried to learn 10 verses—with an overfitting, repetitive dataset just to prove it could work—everything fell apart.

That's when I pulled back to inspect the math behind each component individually.

I used IO.inspect to print out the vectors for a shorter verse. The last 20 tokens were clearly padding IDs. I traced query, key, and value projections and realized I wasn't excluding padding positions from attention.

The model was learning from padding. That single masking error was hiding behind what looked like architecture problems and sent me down more than a few dead ends.

The fix was an additive mask that sets padding positions to negative infinity before softmax, effectively zeroing them out:

    defn self_attention(input, w_query, w_key, w_value, w_out, attention_mask) do
      # ... projection code ...
    
      attention_scores = Nx.dot(q, [3], [0, 1], k, [3], [0, 1])
      scaling_divisor = Nx.sqrt(head_dim)
      scaled_attention_scores = Nx.divide(attention_scores, scaling_divisor)
    
      # The fix: -1.0e8 makes padding positions effectively zero after softmax
      additive_mask = Nx.select(attention_mask, 0.0, -1.0e8)
      masked_scores = Nx.add(scaled_attention_scores, additive_mask)
    
      attention_weights = softmax(masked_scores)
      # ... rest of attention ...
    end

Once I properly implemented padding masks, performance jumped dramatically. The learning: there are no shortcuts—you need to verify your math step-by-step on a tiny dataset before scaling up.

The Second Wall: Gradient Chaos

For the longest time, I never computed or printed my gradients to evaluate them. I was hypnotized by my training and validation loss curves, completely ignoring the underlying mathematics.

The symptom was that validation loss didn't have a nice curve. It would improve, then jump around, never settling into the steady descent I expected. So much code had been written at this point, and much of it wasn't properly validated.

I spent some time digging into the Polaris library so I could compute gradients, and with that additional information I was finally able to see the problem: gradient explosion.

Before epoch 6 or 8, the gradient norm would be something like 12. But as training continued I would see 18, then 20, then 33—and it never came down. I consulted with Gemini about this pattern, and you might say I took the LLM's word as gospel: this was a bad signal indicating erratic learning.

From here, I read about gradient clipping and discovered how the original BERT team clipped at 1.0. The fix was straightforward with Polaris:

    {init_fn, update_fn} =
      Polaris.Updates.clip_by_global_norm(max_norm: 1.0)
      |> Polaris.Updates.scale_by_adam()
      |> Polaris.Updates.add_decayed_weights(decay: 0.01)
      |> Polaris.Updates.scale_by_schedule(schedule_fn)
    
    init_optimizer_state = init_fn.(initial_params)

This single change transformed my erratic training into steady, predictable progress. The learning: your gradients are telling you a story. Listen to them.

The Third Wall: Memory Leaks

As I started to scale up my encoder, my RTX 4090 ran out of vRAM by epoch 8 or 10. I moved to the cloud with RunPod using my runpod-cuda12 setup, and even an 80GB H100 failed after 16 hours. That was the clue: this wasn't a hardware limit—it was a leak.

The issue was simple. I had Nx.backend_deallocate calls, but not in the innermost loop. The first batch of each epoch accumulated without release. Memory grew every epoch until the run collapsed around epoch 8-10.

After moving deallocation into the inner loop, I was able to cut my cloud spending and go back to training at home with just one more trick: I cracked open the Nx types.ex file in my deps directory and changed the default float from f32 to bf16. This is still a direct hack—as of this writing, the configuration doesn't yet exist in Nx—but it allowed me to run without compromise in terms of layers, dimensionality, or attention heads.

The Architecture Shift: Pre-Layer Normalization

While training loss was improving, validation loss wasn't moving with my bigger dataset. After fixing the padding mask issue, I set my sights on this problem along with trying to dial in the number of layers and attention heads.

I originally had post-layer normalization—the pattern from the original BERT paper where you normalize after the residual connection. After some chatting with Claude about my architecture, it recommended I try pre-layer normalization instead.

The difference is subtle but important. Post-LN (original BERT):

    # Post-LN: normalize AFTER residual add
    attn_out = self_attention(x, ...)
    add = residual_connection(x, attn_out)
    norm = layer_norm(add)

Pre-LN (what I switched to):

    # Pre-LN: normalize BEFORE sublayer
    norm_attn = layer_norm(x)
    attn_out = self_attention(norm_attn, ...)
    add = residual_connection(x, attn_out)

Why does this matter? In post-LN, gradients must flow through normalization after the residual add. As you stack layers, 12 in my case, that compounding effect can either explode or vanish before it reaches early layers.

Pre-LN keeps the residual path clean. The gradient can flow directly through x + sublayer_output without passing through normalization, while attention and FFN paths still get normalized.

The original BERT team used post-LN with 12 layers, but they also had massive compute, huge datasets, and the ability to tune endlessly. With fewer than 500,000 examples, I needed stability. Pre-LN gave me predictable gradients and a much more forgiving learning rate.

The Recipe Ceiling

After countless training runs and months of tuning the encoder, validation loss was stuck around 1.6. I'd tried everything I could think of: adjusting dropout rates, tweaking learning rates, adding layers or attention head dimensions, increasing weight decay. Nothing broke through the floor. I was convinced I'd hit a data ceiling—that 400,000 training examples simply weren't enough.

Then I pulled in a fresh pair of eyes for a detailed code review, stepping through each component of the training loop. Within hours, two foundational bugs surfaced that had been compounding since my first training run.

First, my dropout implementation was fundamentally wrong. I used a normal distribution for the mask, which quietly dropped far more activations than intended. The model was training under a much harsher regularizer than I thought, and the surviving activations were scaled as if nothing was wrong. It was a silent, destructive bug that made everything look like a data problem.

The fix was textbook dropout—switch to a uniform distribution with a proper Bernoulli mask:

    defn dropout(input, key, training) do
      if training do
        mask_shape = Nx.shape(input)
        random_vals = Nx.Random.uniform_split(key, 0.0, 1.0, shape: mask_shape)
        keep_prob = Nx.subtract(1.0, @dropout_rate)
        keep_mask = Nx.less(random_vals, keep_prob)
        scale_factor = Nx.divide(1.0, keep_prob)
        input |> Nx.multiply(keep_mask) |> Nx.multiply(scale_factor)
      else
        input
      end
    end

Second, my masked language modeling targets were static. Masking happened once during preprocessing, and every epoch saw the exact same masked positions with the same targets. The model could memorize "token 47 is always masked in example 312" rather than learning general language structure. With static masks, my 400,000 examples were effectively a much smaller dataset because the supervised signal never varied.

The fix was to separate tokenization from masking—tokenize once during preprocessing, then apply masks dynamically per batch at training time. This mirrors standard BERT practice and means the model sees different masked positions every epoch, turning each example into many variants of itself.

The results were immediate and dramatic:

By epoch 8, Run 8 already beat the best validation loss across all seven previous runs. By epoch 12, it reached 1.124—a 29% improvement over a ceiling I'd spent months trying to break through. The overfitting I'd been fighting wasn't a data ceiling at all. It was a recipe ceiling.

The Missing Projection: Weight Tying

With my loss curves looking better than ever, I thought I was finally at the end of my journey. Unfortunately, my evals showed a lack of depth because of another critical detail I skipped over early on.

At some point I came to grips with the reality that I couldn't train weights from scratch like the team behind BERT—I didn't have a dataset anywhere near 3 billion tokens. So instead I gave myself a jump start by borrowing the initial weights from all 12 layers of BERT-base. The trouble was that I forgot to include the vocab projection weights, so the final MLM head failed to project my learned geometry into the proper vocabulary.

My evaluation suite told the whole story. The encoder clearly understood theology—75% accuracy on contrastive tests—but couldn't predict the right vocabulary words at masked positions:

    DOC_001: expected=grace    got=[your(0.37), their(0.20), per(0.10), rep(0.05), ab(0.03)]
    DOC_010: expected=image    got=[the(0.98), and(0.01), ##the(0.00), ##d(0.00), ##th(0.00)]
    DOC_025: expected=kingdom  got=[father(0.999), ##otte(0.00), beg(0.00), ##k(0.00), ab(0.00)]

The model was predicting function words, subword fragments, and punctuation instead of theological terms. Even at k=50, not a single doctrinal target appeared. The encoder's internal representations were strong, but the output projection to vocabulary was completely broken because I had initialized my MLM output weights randomly and never tied them to the embedding matrix.

In a BERT encoder, two matrices deal with the vocabulary. The embedding matrix maps token IDs to 768-dimensional vectors at the input. The output projection maps those 768-dimensional hidden states back to vocabulary logits at masked positions. These two matrices perform inverse operations—one goes from vocabulary to hidden space, the other goes back. Weight tying makes this relationship explicit by using the same matrix for both:

    # Without weight tying: two independent matrices
    logits = Nx.dot(hidden_states, out_w) |> Nx.add(out_b)
    
    # With weight tying: one matrix, transposed for output
    logits = Nx.dot(hidden_states, Nx.transpose(embeddings)) |> Nx.add(out_b)

The out_w parameter disappears entirely. The embedding matrix serves double duty, and every gradient that updates the embedding for "grace" simultaneously updates how the model predicts "grace" at masked positions. Without weight tying, I was asking a random matrix to decode a learned representation space—hoping that 768 x 30,522 parameters would independently converge to match the encoder's geometry. They never did.

This also cut ~23 million trainable parameters, which meant less memory and smaller optimizer state. More importantly, it eliminated a failure mode: if the encoder knows what "grace" means, it could now predict "grace" at a masked position, because the same vector represents both.

Aligning for Search: Contrastive Training

With a solid MLM training behind me, I set my sights on the work ahead to align this encoder for search tasks. MLM taught the model to understand tokens in context—"fill in the blank"—but search requires something different. I needed similar meanings to cluster together in the embedding space so that a query and a relevant verse would land near each other.

To get there, I put together a dataset of paired verses drawn from different Bible translations. The same verse in the NLT and NIV is a natural positive pair—same meaning, different wording. I then took the final MLM weights and trained with a contrastive loss function that would pull matching pairs closer together while pushing everything else apart.

The approach works by encoding both sides of a pair through the same encoder, mean-pooling the token outputs into a single sentence vector (with pad masking so padding doesn't count), and then L2-normalizing so cosine similarity becomes a simple dot product. For a batch of 64 pairs, this builds a 64x64 similarity matrix where the diagonal entries are the correct matches and everything off-diagonal is an implicit negative:

While this was the simplest post-training and least error prone step throughout, it still came with some valuable learnings. The one that stuck with me is the key difference in the loss functions. In this training run the model is no longer predicting individual tokens. It's answering: "which sentence out of these 64 (batch size) is the true partner?"

This fine-tuning step took the encoder from understanding words in context to understanding that two differently worded verses about the same concept should live in the same neighborhood of embedding space.

The Last Mile: Query-to-Verse Alignment

After contrastive training, I had an encoder that grouped similar verses together but there was still a gap between how the encoder organized scripture and how a person actually searches. People usually search with short phrases or even questions shaped by their own vocabulary and experience with other software, which often doesn't overlap with the biblical text at all.

To close that gap, I generated a dataset of 1,400 high-quality query-to-verse mappings. Each pair linked a natural search phrase to the verse it was really asking about. For example, "assurance of salvation" mapped to Romans 8:1—"there is therefore now no condemnation for those who are in Christ Jesus." The verse never mentions the word "salvation," but it speaks directly to that theology.

The training used the same contrastive loss as before, but with a fundamentally different dataset. Instead of teaching the encoder that two translations of the same verse should be neighbors, I was teaching it that a human search query and its best answer should be neighbors. This bent the geometry one final time —pulling the embedding space toward how people actually look for scripture, not just how scripture relates to itself.

This was the smallest dataset of the three training phases, but arguably the most targeted. The 1,400 pairs acted as precise instructions: when someone asks about "forgiveness after failure," the encoder should point toward the prodigal son, not just verses that happen to contain the word "forgiveness." Quality mattered far more than quantity here, requiring loads of synthetic data engineering and genuine theological care to validate the mappings before training.

The Missing Baseline: BM25

If I had to do it all over again, I would have started with BM25 as my baseline from day one—not bolted it on toward the end as an afterthought. Keyword search isn't just a fallback. It's a genuinely powerful complement to semantic search, and understanding why made the final hybrid experience far better than either approach alone.

BM25 and semantic search fail in complementary ways. BM25 excels when the user's words overlap with the text—exact phrases, proper nouns, specific references. The encoder excels when they don't: thematic queries like "assurance of salvation" or "fruit of the spirit" surface verses that speak to the concept without containing the exact keywords. The keyword signal anchors results for literal queries while the encoder lifts thematic results that pure text matching would miss.

The challenge with combining them is that their scores live on completely different scales, so a naive weighted sum is notoriously difficult to tune. Reciprocal Rank Fusion sidesteps the problem entirely by ignoring raw scores and working only with rank positions—a document ranked highly by both rises to the top without any normalization at all.

Starting with BM25 as the baseline would have given me a strong search experience much earlier and a clearer benchmark for measuring what the encoder actually added.

The Data Reckoning

More than anything, I spent months building, curating, and tuning my dataset. I knew I couldn't achieve the same size and diversity the original BERT team had, but I was pleasantly surprised with what I could accomplish with just under 500,000 unique training examples.

The trouble was I made several mistakes that cost me significant time:

False diversity. I wanted a diverse set of Bible text and found myself using different translations—NLT, NIV, ESV, NET. This seemed like a good idea, but I didn't actually look at the dataset closely. While subtle differences exist between translations, they share similar tokens and themes. They aren't radically different. I later learned I should have included genuinely different text like Mere Christianity, rich theological writing that isn't the original Bible text.

Quote noise. Biblical text has a lot of quotation marks, and I found this was distracting to the masked token prediction with my simplistic encoder. The model was learning patterns around quote boundaries rather than semantic meaning.

Lazy first pass. While I expanded to books over time, I didn't do my best work on the first pass. A lot of that text data was less than ideal until I circled back to do real cleaning—which only happened by actually reading the text in those contextualized chunks.

If I had to synthesize what I would do differently, it comes down to this: look at the data more. I underestimated the time involved in data work, which has become a recurring theme throughout my machine learning journey.

The Invisible Bugs

Looking back across every wall I hit, the pattern was always the same: something that appeared to work was silently wrong.

Padding masks didn't throw errors—attention just quietly attended to nonsense. Dropout didn't crash—it just dropped 56% of activations instead of 16%. Static masking didn't fail—it let the model memorize targets instead of learning language. The output projection didn't break—it just never converged to match the encoder's geometry. In every case, training loss went down and nothing in the logs said something was broken.

The most dangerous bugs in machine learning are the ones that look like they're working. If you're building a model from scratch, the single most valuable habit I can offer is this: validate every component in isolation on a tiny dataset before scaling up. Print the vectors. Inspect the gradients. Check that your dropout rate actually drops what you think it drops. And when you hit what looks like a ceiling, get a fresh pair of eyes on your code before you assume it's a data problem.

You can find the source code on github.

Adventures with Synthetic Data

Tue, 07 May 2024 01:00:00 +0000

After ElixirConf I found myself increasingly immersed in the world of fine-tuning. This curiosity soon led me on a journey to explore various use cases, eventually drawing me into the realm of synthetic data.

One pivotal moment in this exploration was a post by Edward Donner that detailed his experience training a language model to mimic his unique texting style. The majority of my time was soon dedicated to cleaning raw text message data in order to create a high-quality dataset for fine-tuning.

While the talk was focused on synthetic data generation, I also delved into the intricacies of fine-tuning with Unsloth, evaluation strategies, and even model deployment and serving with Nx. I owe much of the data generation techniques I shared to Jon Durbin, whose DPO approach in particular proved instrumental to my work. Lastly, a big thank you to Paraxial IO for giving me the opportunity to share my insights!

The python source code for fine tuning with unsloth is on Github for anyone interested. I also put the elixir code for the chat app shown in the talk on Github.

Retrieval Augmented Generation Cohort

Sat, 06 Jan 2024 01:00:00 +0000

On November 1st I joined the first RAG cohort from MLOps.Community and went hard for 8 weeks building a proof of concept that combined everything from embedding models, prompting, and retrieval to ranking with a cross encoder to support hybrid search.

I want to extend a public "thank you" to Rahul Parundekar for all his hard work ahead of the long journey, the consistent encouragement and the 1v1 coaching! If anyone needs a machine learning engineer and coach, don't hesitate to get in touch with Rahul.

I put together a succinct version of the final presentation to document my learning, commitments, growth and ultimately to keep me accountable. The entire experience simultaneously energized and propelled me further down the road of machine learning!

The source code is up on Github for anyone who wanted to see the final example with hybrid search.

Fine tuning language models with Axon

Thu, 02 Nov 2023 01:00:00 +0000

In 2023 I spent a lot of time at the intersection of software engineering and machine learning hoping to uncover the next great opportunity for automation. My talk at ElixirConf US documents my nearly year-long effort to deconstruct deep learning for those who feel overwhelmed by the technical aspects.

I started with FizzBuzz because it was a problem most software engineers had some familiarity with which allowed me to subtly bridge the gap to more complex concepts. This widely known interview question was the perfect primer on the subject because underneath the diverse solutions was a well-understood classification problem that provided the optimal foundation for machine learning.

After the more introductory concepts I discussed fine-tuning pre-trained models for text classification. I went into detail about creating a labeled dataset, tokenization, transforming text into embeddings and finally, fine tuning the model with Axon. I emphasized the critical role of data quality, the challenges that come with preparing or cleaning data, and the nuances of encoding text for machine learning models.

Toward the end of the talk I spoke about the trend towards off-the-shelf, open source models, and the opportunities for developers to upskill. For those eager to delve deeper, I recommend Grokking Deep Learning as an invaluable resource. This book greatly simplified the topic and inspired me to learn about neural networks and share with others.

My goal was to make deep learning and model fine-tuning more approachable, especially for those in the Elixir community. While this talk was presented at ElixirConf I actually think it's applicable to all who consider themselves unfamiliar with the basics of machine learning.

Looking back, this journey has been about more than just acquiring knowledge; it's about sharing it, making the path less intimidating, and most importantly, showing how software engineering and machine learning collide to make way for innovation.

The source code is up on Github for anyone who wanted to see a few of those examples in more detail. The full transcript for the talk can be downloaded here.

I did share a complete FizzBuzz implementation I put together with Nx for those who want to see the most basic numerical computing solution in Elixir. The Axon example shows a more declarative approach that ultimately solves the same problem.

Fine tune Mistral 7B with the RTX 4090 and serve it with Nx

Sat, 21 Oct 2023 01:00:00 +0000

Fine-tuning with Bumblebee is great but large models such as Mistral 7B demand over 100GB of vRAM to fine tune with full precision. To efficiently fine-tune this on a single RTX 4090 with only 24GB of vRAM, I turned to the open source Python project lit-gpt. This approach enabled me to fine-tune locally, providing several advantages including fast feedback and the ability to keep proprietary data from external providers.

Setup

Although the process is well documented, I decided to outline the steps required for myself just as much as anyone else.

    $ git clone https://github.com/Lightning-AI/lit-gpt lit
    $ cd lit
    $ git checkout bf60124fa72a56436c7d4fecc093c7fc48e84433
    $ pip install -r requirements.txt
    $ python3 scripts/download.py --repo_id mistralai/Mistral-7B-v0.1
    $ python3 scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/mistralai/Mistral-7B-v0.1

Data engineering

Next we need a dataset to fine tune the model with. Unlike the llama 2 example where I fine tuned for dialog I instead wanted to fine tune for capability with Mistral 7B to see what the model was capable of learning. I found a great fine tuned model worth emulating that creates expressions in JSON that mathjson_solver can solve with. The dataset has questions and answers labeled with `instruction` and `output` respectively.

    [
      {
        "input": "",
        "instruction": "Create a MathJSON solution to the following:\nPhillip is taking a math test and an English test on Monday. The math test has 40 questions and he gets 75% of them right. The English test has 50 questions and he gets 98% of them right. How many total questions does he get right?",
        "output": "\n[\n  \"Add\",\n  [\n    \"Multiply\",\n    40,\n    0.75\n  ],\n  [\n    \"Multiply\",\n    50,\n    0.98\n  ]\n]\n"
      }
    ]

With the instruction JSON we simply copy that file into the directory and run a script to prepare the dataset for fine tuning.

    $ mkdir -p data/alpaca
    $ cd data/alpaca
    $ cp ~/somefolder/demo.json .
    $ cd ../../
    $ python3 scripts/prepare_alpaca.py --checkpoint_dir checkpoints/mistralai/Mistral-7B-v0.1 --data_file_name demo.json

Fine tune Mistral 7B

Once the data is split into test and training sets we are finally ready to fine tune Mistral 7B. It's worth mentioning that we are not fine tuning with full precision because we are tuning with a single RTX 4090 24GB.

    $ python3 finetune/lora.py --data_dir data/alpaca --checkpoint_dir checkpoints/mistralai/Mistral-7B-v0.1 --precision bf16-true --quantize bnb.nf4

After the fine tuning process we need to merge the weights.

    $ mkdir -p out/lora_merged/Mistral-7B-v0.1
    $ python3 scripts/merge_lora.py --checkpoint_dir checkpoints/mistralai/Mistral-7B-v0.1 --lora_path out/lora/alpaca/lit_model_lora_finetuned.pth --out_dir out/lora_merged/Mistral-7B-v0.1

To run this model we first need to copy over a few files from the original model.

    $ cd out/lora_merged/Mistral-7B-v0.1
    $ cp ~/lit/checkpoints/mistralai/Mistral-7B-v0.1/tokenizer.model .
    $ cp ~/lit/checkpoints/mistralai/Mistral-7B-v0.1/*.json .

Evaluate

Before we serve the model with Nx it's important to evaluate it first. This is optional but it does offer a simple way to verify the model has learned something.

    $ pip install sentencepiece
    $ python3 chat/base.py --checkpoint_dir out/lora_merged/Mistral-7B-v0.1

Serving with Nx

If the model is performing well enough we can pull over the 2 pytorch model bin files and copy the config file so Bumblebee can find it.

    $ cd out/lora_merged/Mistral-7B-v0.1
    $ cp ~/lit/checkpoints/mistralai/Mistral-7B-v0.1/pytorch_model-00001-of-00002.bin .
    $ cp ~/lit/checkpoints/mistralai/Mistral-7B-v0.1/pytorch_model-00002-of-00002.bin .
    $ cp lit_config.json config.json

To test this end to end we point Nx at the file system instead of pulling Mistral 7B from hugging face.

    def serving() do
      mistral = {:local, "/home/toranb/lit/out/lora_merged/Mistral-7B-v0.1"}
      {:ok, spec} = Bumblebee.load_spec(mistral, module: Bumblebee.Text.Mistral, architecture: :for_causal_language_modeling)
    
      {:ok, model_info} = Bumblebee.load_model(mistral, spec: spec, backend: {EXLA.Backend, client: :host})
      {:ok, tokenizer} = Bumblebee.load_tokenizer(mistral, module: Bumblebee.Text.LlamaTokenizer)
      {:ok, generation_config} = Bumblebee.load_generation_config(mistral, spec_module: Bumblebee.Text.Mistral)
    
      generation_config = Bumblebee.configure(generation_config, max_new_tokens: 500)
      Bumblebee.Text.generation(model_info, tokenizer, generation_config, defn_options: [compiler: EXLA])
    end

Next you can wire this up in your application.ex

    def start(_type, _args) do
      children = [
        {Nx.Serving, serving: serving(), name: ChatServing}
      ]
    end

You can prompt the model from elixir code with Nx.Serving.

    Nx.Serving.batched_run(ChatServing, prompt)

With this fine tuned model up and running we can ask it to generate a MathJSON expression.

Finally, you can take this output from the model and verify it with help from mathjson_solver.

I want to give a big shout out to Jon Durbin for creating the model that inspired this blog post, the MathJSON dataset and for helping answer a great many questions I had along the way. I also want to thank Sean Moriarity for his work implementing the Mistral 7B model in Bumblebee that made it possible to serve with Nx.

Install CUDA 12 on PopOS

Sat, 19 Aug 2023 18:00:00 +0000

Yesterday I upgraded to the latest version of bumblebee only to learn EXLA took on the latest XLA which dropped support for CUDA 11.2.

Because I was running Pop!OS I assumed it would be something I could `apt install` and call it good. Unfortunately it was more work than my first go round so I wanted to detail it here for anyone who might follow. This is loosely based off the older CUDA 11 install instructions I found but updated for ubuntu 22.04.

Before you get started downloading anything, setup a developer account with nvidia so you can download cuDNN in a few minutes.

    $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
    $ sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
    $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
    $ sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
    $ sudo apt update
    $ sudo apt install cuda-toolkit-12-2

Next download cuDNN from nvidia and install it.

    $ sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.4.25_1.0-1_amd64.deb
    $ sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.4.25/cudnn-local-72322D7F-keyring.gpg /usr/share/keyrings/
    $ sudo apt update
    $ sudo apt install libcudnn8=8.9.4.25-1+cuda12.2

Finally, set a few env variables and the correct XLA_TARGET for EXLA

    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
    export CUDA_HOME=/usr/local/cuda
    export PATH="/usr/local/cuda/bin:$PATH"
    export XLA_TARGET=cuda120
    export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda

Open a new terminal session and verify CUDA 12 is setup correctly by running this command

    nvcc -V

To upgrade from 8.9 download cuDNN 9.1 from nvidia. Before you install it remove libcudnn8 as shown below.

    $ sudo apt remove libcudnn8
    $ sudo dpkg -i cudnn-local-repo-ubuntu2204-9.1.1_1.0-1_amd64.deb
    $ sudo cp /var/cudnn-local-repo-ubuntu2204-9.1.1/cudnn-local-AD7F4AC5-keyring.gpg /usr/share/keyrings/
    $ sudo apt update
    $ sudo apt install libcudnn9-cuda-12=9.1.1.17-1
    $ sudo apt install cuda-toolkit-12-6
    $ sudo apt update

After you upgrade to 9.1.1 if you ever want to downgrade to 8.9 follow these instructions.

    $ sudo apt remove libcudnn9-cuda-12
    $ sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb
    $ sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.7.29/cudnn-local-08A7D361-keyring.gpg /usr/share/keyrings/
    $ sudo apt update
    $ sudo apt install libcudnn8=8.9.7.29-1+cuda12.2

The only difference between 8.9 and 9.1 is a small tweak to the XLA_TARGET in your zshrc. For Nx 7 you would normally use cuda120 but after upgrading to Nx 9 and CUDNN 9.1.1 you need cuda12 instead

    XLA_TARGET=cuda12

Training Axon Models With Nvidia GPUs

Sat, 29 Apr 2023 18:00:00 +0000

A few months back I started a deep dive into machine learning. With all the excitement about Nx I spent the first few weeks building a toy example that solves fizzbuzz, first with Axon and later with Nx. After getting more familiar with Axon I started tinkering with the BERT fine tuning example and found the feedback loop was 45+ minutes.

I didn't have a huge budget but I knew a previous generation nvidia card like the RTX 3060 would improve the turnaround time allowing me to train models more quickly. After looking at some benchmarks and considering a few alternatives I decided to order the 12GB model and take it for a spin.

I started by looking at Nvidia support for linux and decided to install Pop!OS. Next I installed elixir with asdf to get a working dev enviornment before attempting to optimize it further. From a vanilla install of Pop!OS I found nothing was installed for me by default so I had to list the nvidia drivers and install the latest stable driver.

    sudo ubuntu-drivers list
    sudo ubuntu-drivers install nvidia-driver-525
    sudo apt install system76-cuda-11.2 system76-cudnn-11.2

Finally, I exported 2 environment variables that inform the runtime about the installed cuda version and path.

    export XLA_TARGET=cuda111
    export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda-11.2

With this Elixir workstation you can generate training data, train your models and even serve them with Nx!

Despite all the promise and obvious speed improvements this GPU has to offer I found the fine tuning example I started my journey with throws out of memory errors during the 2nd epoch because the RAM usage jumps to 32GB with this specific BERT model.

I was however able to complete the fine tuning example with a slightly smaller BERT variant. Here is the full source for that elixir module for those interested.

Empowered Product Teams: ownership and responsibility

Mon, 19 Sep 2022 18:00:00 +0000

Throughout my career I've had the opportunity to blend, extend and adapt well known engineering practices to match the accelerated demands put on product teams. After years of trial and error I've become convinced empowerment is at the core of both better product teams and better software.

Truly empowered product teams are given more ownership to learn, adapt and respond to feedback. With this ownership comes responsibility that requires a combination of discovery, experimentation, and delivery. Engineering teams must then focus on outcomes instead of output, which better equips them to organize, shape and sequence the work.

A strong product team must navigate uncertainty. Yet too often, we find ourselves crammed into a framework that views software creation not as a journey of discovery, but as a fixed process with segregated roles. By embracing this new approach to knowledge work, software careers can be more fulfilling, rewarding, and impactful.

A Philosophy of Software Design

Tue, 05 Oct 2021 18:00:00 +0000

Last week I started a new role and ahead of this new adventure I read A Philosophy of Software Design with the specific purpose of gifting the book and a summary of it to each of my engineers. What follows is my paraphrased summary of the first 9 chapters for those who might find the topic interesting.

Chapter 1

The greatest limitation in writing software is our ability to understand the systems we are creating

The larger the program, and the more people that work on it, the more difficult it is to manage complexity

2 approaches to fighting complexity

eliminate complexity by making code simpler and more obvious
encapsulate it so that programmers can work on a system without being exposed to all of it’s complexity at once

Chapter 2

Complexity is anything related to the structure of a software system that makes it hard to understand and modify the system

The overall complexity of a system is defined by the complexity of each part multiplied by the amount of time developers spend working on that part

Symptoms of complexity

change amplification -a seemingly simple change requires code modifications in many different parts of the system
cognitive load -refers to how much a developer needs to know to complete a task
unknown unknowns -it’s not obvious what pieces of code should be modified to complete a task

Causes of complexity

dependencies -exists when a given piece of code cannot be understood and modified in isolation. We cannot eliminate dependencies so the goal is to have fewer and make it obvious how and where they are used
obscurity - when important information is not obvious
honorable mention - Inconsistency (as a contributor to obscurity)

Complexity comes from an accumulation of dependencies and obscurities

Chapter 3

Complexity is incremental

Working code isn’t enough because it can introduce complexity if done purely tactical

The more strategic approach strikes a balance between design and working code

Chapter 4

One of the most important techniques for managing complexity is to design systems so that developers only need to face a small fraction of the overall complexity at any given time

Software systems are decomposed into a collection of modules that are relatively independent

The arguments to a function create a dependency between the function and the call site

The goal of modular design is to minimize the dependencies between modules

The best modules provide an interface that is much simpler than its implementation

If a developer needs to know a particular piece of information in order to use a module, that information is part of that modules interface

The best modules are deep: they have a lot of functionality hidden behind a simple interface

A deep module is a good abstraction because only a small fraction of its internal complexity is visible to the user

A shallow module is one whose interface is relatively complex in comparison to the functionality that it provides

By separating the interface of a module from its implementation, we can hide the complexity of the implementation from the rest of the system

Chapter 5

The most important technique for achieving deep modules is information hiding

Each module should encapsulate a few pieces of knowledge, which represent design decisions. And this knowledge is embedded in the modules implementation but does not appear in its interface

Information hiding reduces complexity in 2 ways

it simplifies the interface to a module reducing cognitive load on the developer using the module
it makes it easier to evolve the system. No dependencies so any change will effect only the one module

The opposite of information hiding is information leakage. This occurs when a design decision is reflected in multiple modules. This creates a dependency between the modules

If a piece of information is reflected in the interface for a module, then by definition it has been leaked; thus simpler interfaces tend to correlate with better information hiding

Back door leakage (that which is not visible in the interface) is more harmful because it’s not obvious

Information hiding can often be improved by making a module slightly larger

ex: 2 modules need to be run in a specific order, combining the 2 can hide this complexity

Chapter 6

The modules functionality should reflect your current needs, but its interface should not. Instead the interface should be general enough to support multiple uses

But don't get carried away and build something so general purpose that it is difficult to use

The most important benefit of the general purpose approach is that it results in simpler and deeper interfaces when compared to the special purpose approach

One of the most important elements of software design is determining who needs to know what, and when

Questions to ask about the design

what is the simplest interface that will cover all my current needs
in how many situations will this function be used
is this api easy to use for my current needs

General purpose interfaces provide a cleaner separation between modules, whereas special purpose interfaces tend to leak information between modules

Chapter 7

When adjacent layers have similar abstractions, the problem often manifests itself in the form of pass-through functions

A pass-through function typically indicates that there is not a clean division of responsibility between the modules

A pass-through function makes modules shallow; they increase the interface complexity of a module, which adds complexity without increasing the total functionality of the system

Chapter 8

Pulling complexity down makes the most sense when...

the complexity pulled down is closely related to the modules existing functionality
it results in simplifications elsewhere in the application
it simplifies the modules interface

Chapter 9

The act of subdividing creates additional complexity that was not present before subdivision

complexity can arise from the number of modules. The more you have, the more difficult it can be to keep track of them all and the more challenging it can be to find a desired component within the large collection (ie: more interfaces, every interface adds complexity)
you may need code to manage the collection of modules
subdivision can result in code duplication.

The act of combing code could be beneficial when...

they share information
they overlap conceptually
it is difficult to understand one of the modules without also looking at the other
it results in a simpler interface for those who rely on the behavior(s)

Crucial Conversations

Mon, 27 Sep 2021 18:00:00 +0000

This week I started a new role and for the first time I've put team health at the forefront by re-reading Crucial Conversations with the specific purpose of gifting the book and a summary of it to each of my engineers. What follows is my paraphrased summary of the book, excluding the last 2 chapters, for those who might find the topic interesting.

Chapter 1

The ability to talk openly about high stakes, emotional, controversial topics

The key skill of effective teams is the capacity to skillfully address emotionally and risky issues

Despite the importance of this skill we often avoid these difficult conversations

When we do engage our default response is often self defeating because

We’ve not had great role models to draw from in our personal or professional lives
The reasoning capacity of our brain is cut in half as adrenaline starts pumping
It’s difficult to step back from the content and manage the flow of conversation

Why study this topic? We only have 3 outcomes when faced with a crucial conversation

We can avoid them
We can have them and handle them poorly
We can have them and handle them well

Chapter 2

The key to crucial conversations is effective dialog

Dialog: the free flow of meaning between 2 or more people

To be effective in dialog we seek to get all relevant information out into the open

Beware: the fools choice may encourage silence -between telling the truth and loosing a friend

At the beginning of a crucial conversation we usually don’t share the same pool of meaning

As the pool of shared meaning grows

Better choices can be made because those involved have more relevant information
With input from everyone you get increased ownership and unity for actions that follow

Chapter 3

To start, take a long hard look at yourself and recognize the role you play in dialog

As much as others may need to change, the only person we can directly change is ourself

Under fire we naturally resist complexity and stop adding to the pool of meaning

When emotions run high we swap our original motive for "winning" or even "punishing"

As you feel your motives shift ask yourself “what do I really want here?”

Sometimes we choose personal safety over dialog by choosing silence

We accept the certainty of bad results to avoid the possibility of uncomfortable conversation

Chapter 4

Dialog requires safety for all involved

People will not add to the shared pool of meaning when they do not feel safe

Learn to look for safety problems (dual processing) while in the conversation

Beware: watching for conditions (ie: safety) and content at the same time takes practice

People become defensive because of fear (the condition) not the content itself

The problem is not the message, but when we fail to help others feel safe hearing the message

You can absorb threatening feedback when you respect their opinion and trust their motives

When you remove fear the brain can function with full reasoning capacity

Chapter 5

Safety requires commitment to a shared mutual purpose (the entry condition)

To stay in dialog we need to maintain mutual respect (the continuance condition)

When necessary, step out of the content, make it safe, then step back into the conversation

Beware: don’t sugar coat or water down your message

The 4 skills to establish a mutual purpose

commit to seek mutual purpose (check your heart)
Recognize the purpose behind the strategy
Invent a mutual purpose
Brainstorm new strategies that serve all involved

To establish respect when violated

apologize if you’ve truly disrespected someone
If respect was broken by misunderstanding, use contrasting to clarify the purpose or intent

Chapter 6

Emotions don’t just happen

We feel something because of a thought we ourselves create

We generally tell ourselves a story with partial information

These “stories” help us give meaning so we can justify

Once you’ve created the emotions you have 2 choices

1) You can act on them

2) Or be acted on by them

To challenge the emotional response or story ask “what evidence do I have that supports this?”

Beware: don’t confused stories with facts

We generally tell 3 diff stories

Victim stories "not my fault"
Villain stories "they have bad motives"
Helpless stories "I’m powerless"

Relax your absolute certainty long enough for dialog -the only reliable way to discover motive

Chapter 7

Speak honestly but with confidence, humility and skill

Share the facts, not the conclusions

Invite opposing views

Tell your story (be sure this follows the facts)

Ask for others paths

Talk tentatively

Encourage testing (of your views and opinions)

Chapter 8

The best way to influence is to use your ears

When you invite others to share, you must mean it

Be curious, ask questions to seek understanding

Beware: we often start to insert incorrect motives

When you sense this^ ask “why would a sane, rational person say this?”

Retrace aloud -the other persons path to action (after you hear them explain it)

Work to curb your reaction and return to the facts/story/emotion to seek understanding

4 powerful listening skills

Ask them to share their opinion
Mirror -when their tone or body language doesn’t match their words mirror it back to them
Paraphrase -repeat it back to clarify your own understanding
Prime -sometimes we pour into the shared pool to encourage them to do the same

Keep in mind we are trying to understand their point of view -not necessarily agree with it

Beware: its what you say and how you say it (keep your tone top of mind as you repeat back)

Agree when you agree (don’t waste time debating if you don’t disagree)

Build when key pieces of information are left out -grow the pool of shared meaning

Compare when you differ and be open minded

Chapter 9

Decision making should be decided on up front ahead of the dialog itself

To not violate expectations make it clear how the final decision will be made

4 ways to make decisions (increasing degree of involvement)

command - no involvement just delegate it
Consult -invite others to influence the decision
Vote - when several great options are present
Consensus -most involved but required when a unanimous decision is necessary

To decide ask a few questions

who cares? don’t invite people who aren’t involved
Who knows about this information? (to help with the shared pool, and decision making)
Who must agree to decide?
How many people is it worth involving

You want to avoid violated expectations and inaction (hold people accountable to promises)

Elixir And Phoenix Upgrade Adventure

Sun, 17 Jan 2021 18:00:00 +0000

This week I finished an upgrade from Elixir 1.10 to 1.11.3 and more notably Phoenix 1.4 to 1.5.7. To get a sense of the risk I decided to upgrade Elixir first and keep the bulk of our dependencies as-is to measure what I was up against.

Thankfully the Elixir upgrade, aside from a handful of compiler warnings, wasn't worth writing about. The downside of this easy upgrade was overconfidence which revealed itself as an inability to estimate the effort required to move forward with the latest Elixir, Phoenix, Ecto and Absinthe dependencies.

I've done platform upgrades like this in the past and usually failed to share any lessons learned with the wider community. This time around I decided to take the time and document my adventure to help anyone else who might find themselves in a similar situation.

I'll be doing a brain dump of the most memorable problems I faced with enough detail to unblock others who might feel stuck like I did at times. I would have struggled much more if it wasn't for all the wonderful blog posts, issues and contributions in our community so to all who paid it forward "Thank You!".

PubSub 2.0

The biggest breaking change was the PubSub upgrade from 1.0 to 2.0. I would guess most had no trouble because the guide was simple and straight forward. Just bump the dependencies, update the config and tweak the application.ex but for those of us who used the private api Phoenix.PubSub.Local.list ...well that's when things got interesting.

At first glance PubSub 2.0 had a clear replacement for this functionality in Phoenix.Tracker. The trouble was that Phoenix.Tracker centers around distributed presence tracking and what I needed was a metric about connected users for the local node to restore the Kubernetes autoscaling solution my team put together last year.

I threw together a simple demo app that shows the full solution for anyone who might be in a similar situation. TL;DR I used the Registry to do the heavy lifting.

    def handle_info(:after_join, %{assigns: %{session_id: session_id}} = socket) do
      {:ok, _} = Registry.register(Subpub.Tracker.Registry, "user_sockets", session_id)
    
      {:noreply, socket}
    end

Now I could use the Registry to surface this metric about the local node. Thankfully this was a drop in replacement for the behavior my team used in PubSub 1.0 and as a result the autoscaling feature was back in action with complete feature parity.

    Registry.lookup(Subpub.Tracker.Registry, "user_sockets")

handle_out/3 is undefined or private

Next was another PubSub related issue that showed itself at runtime when using Broadcast. Specifically, broadcasting to a Phoenix Channel without Phoenix. The biggest hurdle was the runtime error itself `handle_out/3 is undefined or private` which was not immediately obvious.

    def broadcast(topic, event, payload) do
      message = %Phoenix.Socket.Broadcast{
        event: event,
        payload: payload,
        topic: topic
      }
    
      Phoenix.PubSub.direct_broadcast(Node.self(), My.PubSub, topic, message)
    end

When I started searching around for clues on this I was lucky enough to find a forum post where Chris explains the problem in detail. Luckily the solution was simple enough, starting with the move away from `direct_broadcast` in favor of `local_broadcast`.

    def broadcast(topic, event, payload) do
      message = %Phoenix.Socket.Broadcast{
        event: event,
        payload: payload,
        topic: topic
      }
    
      Phoenix.PubSub.local_broadcast(My.PubSub, topic, message)
    end

Finally, I added a `handle_info` function to any channel that was used in this way to accomplish the push.

    def handle_info(%{topic: _, event: event, payload: payload}, socket) do
      push(socket, event, payload)
    
      {:noreply, socket}
    end

Absinthe Ecto

Anyone familiar with open source knows that a young ecosystem will see some amount of churn and Elixir is no different. As part of the upgrade I found a handful of places the assoc helper from absinthe_ecto was used for `belongs_to` and `has_many` relationships. As part of the Phoenix upgrade, version compatibility became a problem with this hex library because of the bump to ecto_sql 3.5.3.

    object :post do
      field :user, :user, resolve: assoc(:user)
    end

Because the library is deprecated I removed it and pulled in the source code for assoc to move forward. Side note: I did spend a few minutes looking at dataloader but decided to pull that in another day when I'm battling n+1 issues.

Ecto

In many ways the upgrade to Ecto 3.5 was painless, but I did see a few instances that failed to compile.

    schema "posts" do
      field :comments, {:array, BrokenSchema.Comment}, virtual: true
    end

I did find a useful github issue that José Valim commented on.

The error message is correct, as Comment is not an Ecto.Type. I am actually surprised to how it worked before...

To workaround this I simply flipped to the `any` Ecto.Type and the compiler was happy.

    schema "posts" do
      field :comments, {:array, any}, virtual: true
    end

JSON Serialization

The most frustrating part of the upgrade was the move from Poison to Jason for serialization. Because so much of the Phoenix app is consumed by another client I ran into several stumbling blocks worth mentioning.

The first problem was a runtime error `(FunctionClauseError) no function clause matching in Jason.Encoder` when a field on a given Ecto schema didn't have the correct type during serialization. The biggest challenge with this error is that it failed to offer much information about how to resolve it. I did find a github issue from Oct 2020 that got the ball rolling. And not long after a pull request was merged to give these warnings at compile time.

The only trouble with this was that no public version was published so I pulled down the source code for the latest encoder and used it locally to leverage the compiler. You could also reference the github commit SHA if you prefer. Either way, having these errors at compile time was a game changer so huge thanks to the team for recognizing this and working to resolve it quickly.

The next problem was a runtime error `cannot encode association :comments from Post to JSON because the association was not loaded`. The simple workaround for this problem was to add the association to the `except` list for `derive` but the real problem was how often this had to be done and how painful it was to find all of the edge cases.

    @derive {Jason.Encoder, except: [:__meta__, :comments]}

In sharing about this, I hope someone from the community can suggest a better solution. At worst, I found all the places I sorely needed controller tests to catch regressions like this.

The last problem was that previously all keys would be serialized for any embedded_schema even if the key wasn't explicitly declared as a field. This was trouble because more times than I'd like to admit, `Map.put` was used to insert a new key and value that wasn't declared meaning that at runtime those values wouldn't show up in the json.

    embedded_schema do
      field :title
    end

    def make_fun(data) do
      data |> Map.put(:author, user)
    end

Thankfully the solution was easy enough. You just declare the field on the embedded_schema.

    embedded_schema do
      field :title
      field :author
    end

Absinthe

One last problem was a wide-spread runtime error in the graphQL types. Previously a type of `:integer` would return a value like `10.4` without any trouble. But with the latest dependencies this would throw an error because of the type mismatch.

    object :user do
      field :some_avg, :integer
    end

You can decide what type to use here for the specific domain you are working in. For simple averages it was a toss up between using string, to ensure accuracy, and float. Either way, the solution is easy enough - just change the type in the Absinthe type declaration.

    object :user do
      field :some_avg, :float
    end

Thank You!

Thanks for reading and I hope someone else can move quickly as a result of my sharing this. Huge thanks to the community and core teams for making Elixir a great ecosystem!

Knowledge Work 3.0

Tue, 15 Sep 2020 18:00:00 +0000

This year I read Shape Up, Team Topologies and a slightly more academic book on the topic of lean called The Principles of Product Development Flow in an effort to learn more about delivery engineering. After weeks of careful study, I combined my takeaways with some practical work experience to produce a talk on the subject.

Cookie Authentication with Phoenix LiveView

Fri, 26 Jun 2020 18:00:00 +0000

When I started learning Elixir a few years ago I built a multiplayer game for the iPad using Phoenix. The game had a fairly minimal authentication requirement so I looked around at some of the popular open source libraries available but ultimatly I decided to write something myself to get experience with the language and ecosystem.

Earlier this year I started getting more serious about Phoenix LiveView and quickly discovered login forms present a unique challenge because Plug.Conn.put_session isn't available from any `handle_event` callback. This was an important detail because my first plug pipeline used Plug.Conn.put_session to signal that the user had been authenticated.

Like any challenging technical problem I decided to get creative and come up with something that would unlock this for my needs and potentially the needs of others who might be searching for answers like I was recently.

TL;DR If you prefer to skip the story all the relevant code is available on github

Cookies

Before I begin writing about the LiveView solution I need to share some about how Phoenix identifies the user for a given web request. By default the Endpoint in your Phoenix application includes Plug.Session configured to use the cookie session store.

    defmodule ShopWeb.Endpoint do
      use Phoenix.Endpoint, otp_app: :shop
      
      # The session will be stored in the cookie and signed,
      # this means its contents can be read but not tampered with.
      # Set :encryption_salt if you would also like to encrypt it.
      @session_options [
        store: :cookie,
        key: "_shop_key",
        signing_salt: "bWk6pxHd"
      ]
      
      plug Plug.Session, @session_options
    end

To see this in action spin up the server with iex and make a request to localhost.

    iex -S mix phx.server
    
    curl http://localhost:4000 --verbose

Using the `verbose` flag with curl I saw the response from Phoenix includes a Set-Cookie header.

The Set-Cookie HTTP response header is used to send cookies from the server to the user agent, so the user agent can send them back to the server later. -MDN Web Docs

For better or worse this cookie is all that's needed to identify the user's session. Now that I understood where this cookie originated from, my attention quickly shifted to decoding the value so I could make sense of it.

    set-cookie: _shop_key=SFMyNTY.g3QAAAABbQAAAAtfY3NyZl90b2tlbm0AAAAYZFRuNUtQMkJ5YWtKT1JnWUtCeXhmNmdP.l0T3G-i8I5dMwz7lEZnQAeK_WeqEZTxcDeyNY2poz_M; path=/; HttpOnly

I found a great in depth blog post on decoding Phoenix session cookies and the TL;DR looks something like this.

    set_cookie = "SFMyNTY.g3QAAAABbQAAAAtfY3NyZl90b2tlbm0AAAAYZFRuNUtQMkJ5YWtKT1JnWUtCeXhmNmdP.l0T3G-i8I5dMwz7lEZnQAeK_WeqEZTxcDeyNY2poz_M"
    [_, payload, _] = String.split(set_cookie, ".", parts: 3)
    {:ok, encoded_term } = Base.url_decode64(payload, padding: false)
    :erlang.binary_to_term(encoded_term)

For the initial request I found the value only contained a CSRF token.

    %{"_csrf_token" => "GadhekDDOc28OZVc3tOfzQ=="}

After I got my head around all the moving parts I quickly started searching for ways to alter this cookie so that I could identify the session for each incoming request.

Server Rendered Authentication

Adding data to the cookie is simple with `Plug.Conn.put_session` because the session store is configured for this by default with Phoenix.

    conn |> put_session(:user_id, id)

With a traditional server rendered app I would verify username/password in the POST controller action for login. When the username/password was correct I'd insert something to identify the user.

    case Login.authenticate_user(username, password) do
      nil ->
        conn
        |> put_flash(:error, "incorrect username or password")
        |> render("fail.html")
      id ->
        conn
        |> put_session(:user_id, id)
        |> redirect(to: some_path)
    end

Note: relying on the user id like this should be considered nieve and exists solely for the purpose of illustrating the life cycle of the request for authentication at it's most basic level.

With this user id in the cookie I was then able to write a plug that would look to see if any user id was present and if so, I would add it to `conn` for consistent use throughout the request life cycle.

    case get_session(conn, :user_id) do
      nil ->
        conn
      id ->
        assign(conn, :current_user_id, id)
    end

Next I constructed a plug that would redirect the user if `current_user_id` was not found. This redirect plug only runs for restricted parts of the site, unlike the plug above which runs for every request.

    def redirect_unauthorized(conn, _opts) do
      current_user_id = Map.get(conn.assigns, :current_user_id)
      if current_user_id != nil do
        conn
      else
        conn
        |> redirect(to: login_path(conn, :index))
        |> halt()
    end

The Workaround

The trouble with LiveView in this situation is that event handlers only have access to the web socket and cannot use `Plug.Conn.put_session`. To work around this limitation I use ETS to accomplish the same end result.

Like the traditional server rendered example I verify username/password in the event handler. When the username/password is correct I insert the user id into ETS and redirect the user.

    case Login.authenticate_user(changeset) do
      %Shop.User{id: user_id} ->
        :ets.insert(:shop_auth_table, {:user_id, "#{user_id}"})
    
        redirect = socket |> redirect(to: some_path)
        {:noreply, redirect}
    
      changeset ->
        {:noreply, assign(socket, changeset: changeset)}
    end

Note: as with the server rendered example, relying on the user id like this should be considered nieve and exists solely for the purpose of exploring a workaround.

With the user id in ETS I was then able to write a plug that would look to see if any user id was present and if so, I would add it to `conn` for consistent use throughout the request life cycle just as I did in the server rendered plug.

    case :ets.lookup(:shop_auth_table, :user_id) do
      [{_, user_id}] ->
        assign(conn, :user_id, user_id)
    
      _ ->
        conn
    end

Finally, I constructed another plug that would redirect the user if `user_id` was not found. This redirect plug only runs for restricted parts of the site, unlike the plug above which runs for every request.

    def redirect_unauthorized(conn, _opts) do
      user_id = Map.get(conn.assigns, :user_id)
    
      if user_id == nil do
        conn
        |> put_session(:return_to, conn.request_path)
        |> redirect(to: ShopWeb.Router.Helpers.login_path(conn, :index))
        |> halt()
      else
        conn
      end
    end

Note: This solution isn't sound engineering for a few reasons but the most glaring can be seen after a single user id is inserted into ETS. After which any incoming request from any user will appear to be authenticated. While not production ready, this prototype did however validate a potential workaround that could be used to emulate `Plug.Conn.put_session`.

LiveView Authentication

With this working prototype I now had a renewed sense of urgency to firm it up and share about what I'd learned. To first step was to replace user id with a more privacy friendly value I refer to as `session_uuid` which should be self explanatory.

On the initial request a new `session_uuid` is generated and stored in the cookie with `Plug.Conn.put_session`. This unique session id will be used in the LiveView process to associate the user with a specific ETS entry at login. For any request after the first we simply defer to `ets.lookup` with the users `session_uuid` but more on that later.

    case get_session(conn, :session_uuid) do
      nil ->
        conn
        |> put_session(:session_uuid, Ecto.UUID.generate())
    
      session_uuid ->
        conn
        |> validate_session_token(session_uuid)
    end

In the LiveView for login I extract the `session_uuid` right away in mount and set it with `assign` for use in the event handlers as needed.

    def mount(_params, %{"session_uuid" => key}, socket) do
      changeset = ...
    
      {:ok, assign(socket, key: key, changeset: changeset)}
    end

In the event handler I pattern match out the `session_uuid`, sign it to generate a token and finally I put that token into ETS. After all, this was the secret sauce of that original workaround but this time the user is more securely linked with help from the unique `session_uuid` value found in their cookie.

    def handle_info({:disable_form, changeset}, %{assigns: %{:key => key}} = socket) do
      case Login.authenticate_user(changeset) do
        %Shop.User{id: user_id} ->
          salt = ShopWeb.Endpoint.config(:live_view)[:signing_salt]
          token = Phoenix.Token.sign(ShopWeb.Endpoint, salt, user_id)
          :ets.insert(:shop_auth_table, {:"#{key}", token})
    
          redirect = socket |> redirect(to: some_path)
          {:noreply, redirect}
    
        changeset ->
          {:noreply, assign(socket, changeset: changeset)}
      end
    end

Looking back at the plug code from earlier, when a user's `session_uuid` does yield a token I then verify it. When this token checks out I take the `user_id` and add it to `conn` for consistent use throughout the request life cycle just as I did in the server rendered plug.

Note: Below I use Phoenix.Token.verify to extract the token and pull user id from it. In truth I started with this because I assumed it was somehow "more secure" but later that evolved into a practical mechanism to expire the session.

    def validate_session_token(conn, session_uuid) do
      case :ets.lookup(:shop_auth_table, :"#{session_uuid}") do
        [{_, token}] ->
          case Phoenix.Token.verify(ShopWeb.Endpoint, signing_salt(), token, max_age: 806_400) do
            {:ok, user_id} ->
              conn
              |> assign(:user_id, user_id)
    
            _ ->
              conn
    
          end
      ->
        conn
      end
    end

Finally, I use the same `redirect_unauthorized` plug to redirect the user if `user_id` isn't available in `conn.assigns`.

    def redirect_unauthorized(conn, _opts) do
      user_id = Map.get(conn.assigns, :user_id)
    
      if user_id == nil do
        conn
        |> put_session(:return_to, conn.request_path)
        |> redirect(to: ShopWeb.Router.Helpers.login_path(conn, :index))
        |> halt()
      else
        conn
      end
    end

What about mount?

After I got login working, I flipped over to the restricted LiveView assuming I could get `user_id` without any trouble. Unfortunately I found you don't have access to `conn.assigns` from mount as I would have expected. To work around this constraint I used the `session_uuid` to get the token like I did in the plug.

    def mount(_params, session, socket) do
      socket = assign_new(socket, :current_user, fn ->
        user_id = get_user_id(session)
    
        Shop.User
        |> Shop.Repo.get(user_id)
      end)
    
      {:ok, socket}
    end

The get_user_id function does exactly what the plug did to extract the token, then from it the `user_id` like you might expect.

    def get_user_id(%{"session_uuid" => session_uuid}) do
      case :ets.lookup(:shop_auth_table, :"#{session_uuid}") do
        [{_, token}] ->
          case Phoenix.Token.verify(ShopWeb.Endpoint, signing_salt(), token, max_age: 806_400) do
            {:ok, user_id} ->
              user_id
    
            _ ->
              nil
          end
    
        _ ->
          nil
      end
    end

Bonus!

If you don't like the idea of running `:ets.lookup` along with `Phoenix.Token.verify` each time the mount function executes and you are cool with exposing `user_id` as part of the cookie you could instead alter the plug that previously only set `conn.assigns` to also include a call to `put_session` with the `user_id`.

    def validate_session_token(conn, session_uuid) do
      case :ets.lookup(:shop_auth_table, :"#{session_uuid}") do
        [{_, token}] ->
          case Phoenix.Token.verify(ShopWeb.Endpoint, signing_salt(), token, max_age: 806_400) do
            {:ok, user_id} ->
              conn
              |> assign(:user_id, user_id)
              |> put_session("user_id", user_id)
    
            _ ->
              conn
    
          end
      ->
        conn
      end
    end

Now in the restricted LiveView I can get `user_id` in the mount function with ease.

    def mount(_params, %{"user_id" => user_id}, socket) do
      socket = assign_new(socket, :current_user, fn ->
        Shop.User
        |> Shop.Repo.get(user_id)
      end)
    
      {:ok, socket}
    end

You can find the source code from my adventure on github.