AI in the Library, round two (vs ChatGPT and the zeitgeist)

Once upon a time, I taught a course on AI in the Library in an iSchool. (Check out the reading list and syllabus, which are relevant context for this post.) I haven’t taught it for a while — I got busy with other things, and then ChatGPT dropped and immediately rendered my syllabus obsolete and I wasn’t up for overhauling it.

At least, that was why I wasn’t teaching it in 2022-2024. In 2025 and 2026, there are some pretty different reasons not to be adjunct faculty, particularly in a public university, if you don’t have to be. Like an enormous Project-2025-based, federally directed but decentralized system to dictate syllabi, destroy academic freedom, and fire professors who don’t toe the line. (Examples unfortunately abound, such as University of Oklahoma, Texas A&M, San JosΓ© State University — the last of these being the university where I taught.)

And in the midst of all this, here’s the (excellent) Texas Tribune reporting on ChatGPT prompts used at Texas A&M to flag course content that might take a perspective on race and gender. And so, with a rising sense of nausea, I may as well see how my syllabus would fare in this brave new world.

First (and startlingly), the prompts only cover very high-level course metadata: course title, description, and learning objectives.1 The week-by-week descriptions and reading list, or indeed the syllabus as a whole, aren’t covered, so these prompts don’t reach week 2’s case study of an AI metadata generation prototype in a Black photographer’s archive, or week 3 on the broad and heavily researched field of (ironically) algorithmic bias.

And so, the results:

Course Title. Unsurprisingly, “AI in the Library” raises no issues for the state of Texas (though of course chatgpt takes a great many words to be unconcerned about it).

Course Description.

This course covers what artificial intelligence is and critical analysis of AI systems, in the context of real-world library and archival AI applications. Programming skills not required.

To my surprise, chatGPT is pretty okay with this:

Review of value-neutrality

  • Advocacy or ideological language:
    None identified. The description does not include prescriptive claims, affective outcomes, or ideological commitments.
  • Potential neutrality concerns:
    The phrase β€œcritical analysis of AI systems” is standard academic language and does not, on its own, imply advocacy. However, β€œcritical” can be interpreted broadly; specifying analytical methods or perspectives can improve clarity without reducing neutrality.
  • Overall assessment:
    The description is value-neutral, concise, and appropriate, though it is shorter than the requested 75–125 words and could benefit from greater methodological detail.

I was sure that “critical analysis” would get more pushback (how could we possibly want that in higher education in 2026??), but I guess mostly I’m supposed to be more unnnecessarily wordy, and I could keep waving this IYKYK flag.

Learning Outcomes.

Upon successful completion of the course, students will be able to:

  1. Understand and explain the basics of AI: both its underlying principles and common machine learning techniques.
  2. Discuss realistic ways that AI can be a part of library services.
  3. Critically analyze potential pitfalls of AI systems, including the role of their underlying data sets and their ramifications in society.

Y’all. Despite some honestly fair ways chatgpt pointed out that these verbs are squishy and might be replaced with ones that are sharper and/or lend themselves more obviously to assessment, I absolutely slipped “critically analyze…their ramifications in society” right past it:

β€œCritically analyze potential pitfalls…” β€” This is largely value-neutral. The phrase β€œramifications in society” is acceptable, though specifying analytical contexts (e.g., professional or institutional settings) would improve precision.

Again, let me reiterate that I have a case study on a Black archive. I have a reading list with, inter alia, Meredith Broussard and Catherine D’Ignazio and Safiya Noble and Virginia Eubanks. I have a reading list with Tay (aka “how to make a chatbot unbelievably racist and anti-Semitic in less than 24 hours”) and predictive policing and de-biasing word vectors and feminism and Soylent-Green-based limitations of a major AI ethical framework.

Pretty sure any antagonistic human reading learning objective number 3 there would know what I was getting at, but I guess chatgpt still can’t replace all the jobs.

  1. There’s also a prompt for “course justificiation”, but I was not asked to write one of these for my syllabus. The closest it gets is a “core competencies” section aligning the course with learning objectives for the overall iSchool program. However, these objectives are govened by the iSchool, not the individual professors. Therefore I’ll be leaving the course justification prompt aside. β†©οΈŽ

In which we ask Copilot to do the team a solid

So, I wrote an alien artifact no one else on my team understood. (I know, I know.) I’m not a monster — it has documentation and tests, it went through code review for all that that didn’t accomplish its usual knowledge-transfer goals — and there were solid business reasons the alien artifact had to exist and solid skillset reasons I had to write it, and yet. There we were. With an absolutely critical microservice that no one understood except me.

One day someone reported a bug and my creative and brilliant coworker Ivanok Tavarez was like, you know, I’m pretty sure I know where this bug is in the code. I have no idea what’s going on there, but I asked Copilot to fix it. Also I have no idea what Copilot did. But it seems to have fixed it. Knowing that I’m rather more of an AI skeptic than he is, he asked, would I entertain this code?

And you know what? Let’s do it.

the Mean Girls girls-in-car meme template captioned 'get in loser, we're reviewing code live'

I mean obviously we’ve gotta have a human review this code before it lives in main and there isn’t an option besides me. But suddenly we have an opportunity, because if I turn this code review into a Twitch-meets-Webex livestreaming party, my whole team can watch me talk my way through it, interrupt with questions, and hear my whole mental model of this section of the code, right? Hear my reasons why this code works or doesn’t, fits with the existing structure or doesn’t?

It turns out I just needed some code from outside of myself to make this possible. And the only way to get code for our alien artifact was from an alien author.

And it was great.

I could see the gears turning, the lightbulbs flipping on, the “+1 XP”s pinging up. I think it was the first time anyone else on the team got real mental traction on this code. Actually it was so great I’m now doing a series of walkthroughs where I just narrate my way through the code with the team until we all get tired of it or feel adequately enlightened. And for the first time, I feel like if we need to add more functionality to this microservice, I might actually be able to assign someone else to do it — with handholding and consultation, yeah, but without me being that guy from Nebraska for my team.

https://xkcd.com/2347/ , the one with the Jenga tower labeled 'all modern digital infrastructure' and the tiny yet load-bearing element labeled 'a project some random person in Nebraska has been thanklessly maintaining since 2003'

So…yeah. I’m pretty interested in machine learning (I did the original Andrew Ng Coursera course and several of its followups, on my own time! I did a whole experimental neural net interface (now offline) to MIT’s thesis collection in 2017! I taught an AI course at the SJSU iSchool! I was an ML researcher for the Library of Congress!). But I’m also reflexively skeptical of it (I use it in my own coding for autocomplete, but I’ve never described code and had the LLM write it! that AI course sure did talk a lot about algorithmic bias! I went to a college whose actual mission statement is about the impact of technology on society! I believe in the importance of real human thought and kinda want LLMs to just get off my lawn!). This use case captivated me because it genuinely surprised me. I hadn’t thought about it as a way to potentially expand the capacities of my team — not in some hackneyed capitalist-grind productivity way, but by getting us outside my own head (the limiting feature in this case) and giving us a shared artifact that we could use as the basis of a conversation to genuinely advance our own skills.

I can hear you asking whether the code was any good. For the purposes of this use case, the great thing is it doesn’t matter; it just has to be relevant enough to support substantive review. But fine, fine, I won’t leave you hanging: I had minor stylistic quibbles but in fact it correctly identified a semantic bug I had totally missed and fixed it in a way which fit cleanly into the existing structure, and I merged it without qualms.

And yesterday Ivanok came up with a clever new way to leverage AI for teambuilding and skill-building purposes, so I’m gonna tell you about that too! But in the interests of word count, you’ll have to wait for part two :).

how I think about classes

Someone in a Slack I’m in asked for advice on how to understand classes in programming, and I wrote a mini-novel about how I understand them, so I figured it might as well be a blog post! Here we go —

A class is a noun. That is to say:

  • it is a Specific Thing
  • which I can name.

And I make a class when I have a noun that needs to know things about itself.

What do I mean by “needs to know things about itself”? I mean that this Specific Thing has associated behavior and/or data. For instance, if I’m writing a circulation system for a library, an important noun might be a Checkout.

Checkouts have data they need to know about themselves: for instance, what time was the object checked out, what object is it, when is it due back, what user ID has it. Every Checkout needs to know this kind of data about itself, but the specific values of the data will differ for each Checkout.

Similarly, Checkouts have behavior — that is, verbs that happen to this noun! For instance, a Checkout might know how to resolve itself: add a checkin timestamp, remove the user ID for privacy purposes, et cetera.

And now the sweet part: Checkouts need to know the mechanics of how to resolve themselves, but code outside of Checkouts only needs to know that Checkouts can be resolved. This makes code a lot more readable — if you give the methods on Checkout good names, then you see something like checkout.resolve() and you understand what’s happening; you don’t need to get bogged down in the details of how it happens. You can even update the mechanics of how Checkout resolution works later, and the rest of the code doesn’t need to change. Only the Checkout class itself needs to keep track of the specifics. This principle of encapsulation is fundamental to software design and makes it easier to maintain code over time.

OK, so how do I know when to write a class? I mean, at first, I usually don’t! I write some functions to handle behavior, and then over time I realize one or more of the following:

  • There’s a bunch of data that tends to travel together. Maybe it wants to be different properties of one thing.
  • I keep seeing functions that need more than 4 arguments. Maybe several of these arguments want to be properties of a common thing. (This is the most common way that I notice that I have a bunch of data traveling together.)
  • I realize that there’s an important object or idea in my problem domain, and a big part of the cognitive work I’m doing is modeling its data and behavior, and my life will be easier if I just slap a name on that and treat it as a Specific Thing.

All of this, of course, was a learning journey I had to go on as a baby programmer. It’s easiest to get started by just writing functions that Do The Thing! That’s definitely where I got started, and it took me a while to build an actionable mental model of classes. For me, I think the first thing that got me over the cognitive hurdle was when Django introduced class-based views. I’d been working in Django but it had only had function-based views, and then I had to adapt to this new way of thinking. It was a challenge at first, but Django did such a phenomenal job of writing class-based views as repeating patterns — so the same sorts of methods tended to be exposed in different ones, and you could just grab and overwrite what you needed — that I started to understand their utility. Also, of course, the documentation was spectacular.

I also found it really helpful to watch a whole bunch of Sandi Metz’s Smalltalk-inflected Ruby conference talks. She has incredibly lucid things to say about object-oriented programming; in particular, the way she talks about message-passing really influenced how I think about writing tests, hiding vs showing information, and different objects’ scopes of responsibility. I don’t think you can go wrong with any of them, but the ones that have stuck with me are Nothing is Something, All the Little Things, and Polly Want a Message.

Happy encapsulating!

shaving yaks with gensim 3.8.3

I’m updating some old code which includes a model trained under gensim 3.8.3. (Or so I hope, based on the poetry.lock file.) Current stable is 4.3.0, so…I have some updating to do. In theory I can load a 3.8.x model in 4.0.x, save it, then open that in 4.1.x, et cetera. I’d rather do that than retraining the model (which would be a festival of limited documentation, missing institutional knowledge, and additional yaks), so here I am installing gensim 3.8.3 on my 2021 M1 Mac. What could go wrong, right?

First yak: Cython

gensim depends on numpy, numpy depends on Cython, and Cython 0.29.14 (the version in my poetry.lock) is all,

AttributeError: module 'collections' has no attribute 'Iterable'

However, it turned out my version of numpy wanted a higher version of Cython:

RuntimeError: Building NumPy requires Cython >= 0.29.30, found 0.29.14 at [local directory structure]/lib/python3.8/site-packages/Cython/__init__.py

So I just tried 0.29.30 (being as conservative as possible about upgrading dependencies, since my ultimate target is an old version of gensim), and the error went away.

Second yak: numpy

Every time I try to install numpy on an M1 Mac I run into errors. I can’t even find the errors in my scrollback any more (other than the one where it couldn’t install the C bindings and pleaded for a better Cython), but if you’re reading this post, you probably know the ones. I tried Rosetta; it didn’t help.1

I ended up with the same solution I always end up with2; to wit (modulo numpy version),

poetry run python -m pip install --no-use-pep517 --no-binary :all: numpy==1.24.2

It bothers me that I have to special-case the numpy install, because I’m always thinking about how this would look in CI/CD. But by running pip under poetry, I at least ensure that numpy ends up in the right virtualenv, and poetry is able to find it when it does dependency installation and resolution.

Boss yak: gensim

Now I have all the dependencies I need to install gensim; hooray! I verify that my model loads and saves in 3.8.3. It does. Now I install 4.0.x so that I can —

AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__'

Oh. Oh dear.

This is actually a known bug that has been solved in newer versions of gensim, but that fix didn’t make it back to 4.0.x. OK. Well. I clone gensim locally, apply that patch to my version, and discover that you can poetry install a local project (this is actually an extremely sweet feature).

Now I can load my model in 4.0.x, relying on the backward compatibility guarantee, and —

  File "/path/to/local/version/gensim/models/doc2vec.py", line 328, in docvecs
    return self.dv
AttributeError: 'Doc2Vec' object has no attribute 'dv'. Did you mean: 'dm'
File "/path/to/local/version/gensim/models/keyedvectors.py", line 272, in _upconvert_old_vocab
    if 'sample_int' in self.expandos:
AttributeError: 'KeyedVectors' object has no attribute 'expandos'
  File "/path/to/local/version/gensim/models/keyedvectors.py", line 1700, in _upconvert_old_d2vkv
    self.vocab = self.doctags
  File "/path/to/local/version/gensim/models/keyedvectors.py", line 654, in vocab
    self.vocab()  # trigger above NotImplementedError
  File "/path/to/local/version/gensim/models/keyedvectors.py", line 645, in vocab
    raise AttributeError(
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.

Oh. Oh dear.

This was actually all pretty easy to solve, given that I now had an installable version on localhost. I added a property to gensim/models/doc2vec.py:

@property
def dv(self):
    return self.__dict__['docvecs']

And a check, before trying to access self.expandos:

if not hasattr(self, 'expandos'):
    self.expandos = {}

And I replaced a line that threw a KeyError:

# del self.expandos['offset']
self.expandos.pop('offset', None)

I also commented out the line self.vocab = self.doctags in _upconvert_old_d2vkv, because the very next line calls a function which destroys self.vocab, whereas this line just triggers a NotImplementedError in attempting to reference self.vocab, which should render the assignment moot.

Next steps

I’m going to have to verify that my model actually works as expected with these changes!

I’m also going to pull together a PR patching 4.0.x, assuming the maintainers are open to it, and work is cool with my using a version of this model as a test case (I need something which triggers the bugs in order to test the patch, you know?).

Also, of course, I will glory in these clouds of yak hair I am now surrounded by.

A majestic yak stares directly into the camera
Photo by Quaritsch Photography on Unsplash

Footnotes

1. But it let me have some fun messing around with Terminal settings. Now I have a duplicated version of Terminal.app, named Terminal_Rosetta, which is set to always open in Rosetta mode. I also gave it a different default color scheme so that I can always tell which architecture my terminal is in, because you know it would be the easiest thing in the world to end up with things not working because of weird architecture clash bugs that would take forever to track down.

I will probably never use this again.

2. Which maybe I will remember after having written a whole blog post about it? Honestly, probably not. But at least I might remember that I wrote a blog post about it and will therefore be able to find the answer faster.

Tech screens, how do they work

I just had So Many job interviews, which means I had so many tech screens, all of which ran differently. I didn’t know what to expect going into most of them — I’ve only interviewed in the library world in the past, where many organizations don’t have tech screens at all, but some of these were industry jobs — and organizations varied in how much they explained what to expect, so I’m going to outline the varieties of tech screen I faced in hopes this will help you. All organization names are, obviously, anonymized.

The Order of the Promethean Banner

A screen (preceding any human interviews) via Triplebyte, which presented short blocks of code and asked multiple-choice questions about their outputs, flaws, et cetera. Questions were timed. Googling, documentation, REPLs, et cetera were disallowed.

I am pretty sure I went down in flames on this one. While I got to choose a language I am familiar with, the questions mostly concerned obscure edge cases and details of that language — some of which I know, some of which I have literally never used in ten years of writing software. (One of which my husband, who has been writing extremely hardcore software for thirty years, thinks he’s used once.)

But also: I don’t think it’s a fair assessment of my skills, or the job’s skills. In the real world, I actually do read documentation, or write tests, or try something out in a REPL if I’m not sure of how something works. And: the job description wasn’t for someone writing 6-line functions to exercise weird edge cases; it was for someone operating at a much more senior level where architecture and communication skills are important.

This was also, in effect, the front door to the company. I decided I didn’t want to work at a place whose opener was so hostile, and withdrew this application.

The Society of the Emerald Smiles

“Bring a piece of code you have written, or substantially contributed to, and tell us about it.”

This was great. I knew exactly what to expect in advance. My prep was limited to choosing code and finding its GitHub link, and that was short because I know off the top of my head the code I’m proudest of. I knew that code showcased — not only my skills in general — but the particular skills I wanted them to see for this position. I came in feeling confident and relaxed, because I know this code, and I know I did an excellent job with it. I got to be the expert in the room storytelling and fielding questions rather than a supplicant in an interrogation. The code was firmly grounded in the real world. We got to talk about design decisions, aesthetics, documentation, et cetera as well as syntax-level stuff. πŸ‘

The Union of the Ancient Oak

A take-home: they added me to a private repo which contained a small but working app, and asked me to add some small and clearly-scoped features. Then in the interview we talked about the code I’d written, why I made those choices, alternative possibilities, et cetera.

I’m sensitive to concerns that take-homes unfairly privilege people with lots of spare time, but I don’t think it’s unreasonable to ask candidates to do short prep work, and this was well-scoped. (They said it should take less than three hours, and it did; quite a lot less if I recall correctly.) It used a familiar web framework and followed its conventions, so for anyone familiar with that framework (which successful candidates for this job will be) it was quite easy to get oriented to the code and figure out where the new features might go. The code came with tests (not all of which worked — a nice way to check that your candidates actually run the tests — all of which were simple to fix). Writing the code here was not hard but it was realistic. The sample data had a sense of humor, which was fun, and made their company culture look good.

As with the previous format, this meant that I got to come into the interview with code I was familiar with and could talk about from a position of expertise, which meant I felt confident. Not coincidentally, in both of these interviews, I got strong positive feedback about my code — because both of them had set me up for success, made it easy for me to show them my best work.

The League of the Ghostly Well

Another take-home, this one strictly timed to 90 minutes (solidly within my “reasonable to expect of a candidate” realm). Two parts: one writing up a feature proposal; one doing code review. In other words, this one is striking for involving no actual coding! But above the junior level these skills of design, communication, and mentorship are more important, and this exercise seemed well-scoped to capture those skills. At the same time, it did draw on my syntax-level knowledge due to the utter haplessness of the code I was reviewing πŸ˜‚

I did feel some breathlessness from the time pressure (while simultaneously appreciating that the organization was being respectful of people who have less available time). But as with the preceding, the format meant that I got to the subsequent human interview with code that I was familiar with and could discuss from a position of some confidence. The interviewers were also totally aware that in the real world they would have needed more time to do their best work on these prompts, so they didn’t expect me to be perfect. It left room for conversations about alternatives I would have considered with more time, which is a good sort of thing to talk about in an interview.

The Guild of the Radiant Visage

We had a video call via Hackerrank. They sent me a link to a Hackerrank sandbox first so that I could get familiar with the platform, which was a hospitable touch. It’s quite nice, actually; in-browser coding with input, output, REPL, editor; nice syntax coloring, tab completion, the stuff I do subconsciously while typing works more or less as I expect, the editor has vim and emacs modes too if you’re into that sort of thing.

I was worried this might be like Triplebyte, but no; they presented me with a description of a function they wanted to implement. This function was a simplified version of something they encounter in their day-to-day work, and they took the time to explain the context. The description included sample input and expected output.

I found it extremely nerve-wracking to do live coding, but they were nice about both stepping back when I needed space to think, and throwing me some lifelines. In the end I missed one edge case and was totally kicking myself about that (it’s absolutely the sort of thing I would have caught in a calmer situation) and sure I’d tanked it. But my husband (who does a lot of interviewing for his company) said that the fact that I finished with time to spare was a good sign and that in his standard interview problem (a code review), people who get 40% of the available bugs get a thumbs-up from him. Whew. And as it turns out I did get an offer from this organization, so apparently you really don’t have to be perfect!

Hope this helps, people of the internet. Good luck in your interviews.

“Just a few files”: technical labor, academe, and care

Let’s dissect this quote, β€œThey tired of care-taking, even though this involved little more than continuing to host the project files on a server.”

If there are enough people who want to host “just a few files”, it’s a lot of labor. And server costs add up. The PHP or ruby or whatever programming language your dynamic files run on, that needs to be upgraded too. And again, this may necessitate maintenance so your files remain usable.

You probably have a database backing the system. Guess what, that needs maintenance too.

Upgrading those files to maintain compatibility is probably less than a week of work, assuming that the files come with documentation about how to deploy them and a test suite to make sure the upgrades don’t break key features.

Guess what?

If the team writing your “project files on a server” didn’t include software engineers, your files bloody well do not come with documentation and a test suite.

Given that the article includes neither the word “software” nor the word “engineering”, I know my hypothesis on this one.

OK but let’s back up a bit. I said, best case (remember: this is not the best case), that maintenance is probably under a week of work. Say, yearly.

If you have 30 faculty members who want to host “just a few files”, someone’s doing that full-time for 2-7 months a year.

I actually enjoy doing maintenance work β€” to a point. Most software engineers don’t. Necessary, but not fun. You ask people to spend too high a percentage of their time on maintenance, they hate their jobs, and they quit.

Alternatively, they look the other way, letting “just a few files” languish unmaintained, and if they are any good they KNOW in their hearts the bitrot, the attack surface. They work in a place that doesn’t value quality and if they’re any good it eats them.

I used to fantasize about being able to sunset things. I inventoried every “just a few files” on a whole set of servers. There were 243 of them. There were, at most, 6 of us. We couldn’t kill them. Because faculty like this might complain.

Some of them were critical services. Most of them were running on obsolete versions of whatever they ran on. We didn’t have business owners, documentation, tests. No principled way to transition or upgrade them. No SLAs giving us permission to kill them in 5 years.

“Just a few files.”

TWO HUNDRED FORTY THREE PROJECTS. SIX OF US.

If you want to talk sustainability for software projects β€” first, talk to an actual software engineer. Maintainability is part of their expertise.

And second, the labor conditions of your techies’ work lives are an important part of that plan. If your plan rests on making their jobs suck, it is not a sustainable plan. Or one that anyone should feel good about.

Thank you for coming to my TED talk.

Footnotes

[1] “Sustainability and complexity: Knowledge and authority in the digital humanities.”
Johanna Drucker. Digital Scholarship in the Humanities, Volume 36, Issue Supplement_2, October 2021, Pages ii86–ii94, https://doi.org/10.1093/llc/fqab025.

[2] For example, the actual months, on and off, I spent trying to deal with five years’ worth of deferred maintenance coming due at once on one of my personal projects. I was able to wait that long to touch it because my code had good tests and the deployment system had been set up responsibly by the very capable Andy Dorner…but the result was, when I couldn’t avoid doing maintenance any more, 1) everything was broken, and 2) I was dealing with a system I wasn’t super familiar with, because I hadn’t been the one who set it up, and Andy and I were no longer coworkers, so I had limited access to his expertise.

[3] I say this with tremendous respect for graduate students. I teach them, and some of them have been brilliant, hardworking people whose work has already had, and will continue to have, significant impact beyond the classroom. Nonetheless, the difference between writing code and doing engineering is significant. Just as having read Romeo and Juliet (however insightfully!) does not mean you can stage a Broadway production, knowing how to write scripts that achieve particular tasks does not mean you know how to make those scripts usable by others, flexible around changing requirements, easy to debug, or performant under increased load. Nor does it mean that you can document them clearly, figure out where and how to deploy them, automate that deployment system, write effective tests, or work with others. These are skills people generally develop through experience, by working as part of a community of practice on large, shared projects over time. Research code β€” code written in the course of academic work, for the sake of prototyping something or generating data for a paper β€” often does not have these characteristics. Indeed, some of the worst code I and my colleagues have encountered has been written by people with Ph.D.s in computer science. They may well be brilliant in their areas of research, but computer science and software engineering are very different. Just as my professional experience has given me little formal background about algorithms and I couldn’t write a proof involving big O notation to save my life, academic training rarely provides the type of experience needed to develop engineering skills.

[4] Truly, I cannot emphasize enough how much the community of practice matters here. The critical habits of mind β€” things like “writing tests is important” β€” all come from being embedded in a culture that values them. The techniques and tools for carrying out these habits of mind come from coworkers, conferences, techie Slacks, blog posts, hallway conversations. The ability to keep up with those techniques and tools as they change, doubly so. Asking a software practitioner to work solo inherently undermines their skills. If you support a team that, for whatever reasons of budgetary reality, has only one such person, I implore you to budget for them to attend conferences regularly, and to make time for them to participate in a professional community via conferences, local meetups, and/or open-source participation; this is the most effective professional development you can do, and will also be good for their mental health.

[5] This is a problem that funders are starting to grapple with, typically under the framing of sustainability. I know Mellon has this on its radar, for example, and I want to gratefully recognize this trend. That said, there are also real down sides to funding operations on soft money (see, e.g., “sustainability” here, or the Collective Responsibility white paper on contingent labor in libraries, archives, and museums). Ultimately, good technology work rests in the institution having both a commitment to employ adequate labor to sustain core services, and a funding model to pay that labor.

the great thing about deferred maintenance is everything catches fire at the same time

Once upon a time in 2017, my colleague Andy Dorner, who is awesome at devops, made a magical deploy script for HAMLET. I was like, ugh, Heroku is not a good solution, I have to AWS, I regret my life choices, and he was like, never fear! I will throw together a script which will make it all test/build/deploy every time you git push, using magic.

It worked great and I basically didn’t have to think about it until July of this year.

And then, inevitably in retrospect, I found a deploy didn’t work because nothing worked.

The presenting issue was that a URL we’d used to download a thing for certbot now 404ed. But why stop there? Travis was no longer a thing, so there goes deployment. The Amazon Linux platform had gone comprehensively obsolete, replaced by AL2 (which has a subtly but importantly different set of instructions for…everything, and they usually aren’t clearly distinguished in the various StackOverflow and Amazon documentation places one naturally ends up; for extra fun Amazon Linux isn’t quite RHEL or Centos and definitely isn’t Ubuntu or Debian, so instructions you find for those platforms almost-but-don’t-quite work, too). The supported versions of Python were different.

A sensible and disciplined person carefully swaps out one variable at a time for ease of troubleshooting. I found myself standing in the wreck of a Jenga tower, with no such option. Here’s how I rebuilt it.

(This is the linear, simplified version. In practice, picture two months of spare time flailing, git pushing to redeploy and find out what broke, leaving an absolute wreck of a git history, which it kills me not to clean up, but which I am leaving so that you feel less alone in your own desperate sallies against AWS.)

Gonna need a web server (Beanstalk/AL2)

I created a new EC2 instance, using the latest available Amazon Linux 2/Python 3.8 platform. (It turns out you can’t swap out EC2 instances under your running environment, so I couldn’t just upgrade in place.) I then created a new Beanstalk environment for that instance.

To my great shock, the sample application that Amazon provides for your brand-new Python instance is a pleasant-looking page that links to AWS Python and Django documentation. It was weirdly hospitable.

I also manually copied over all the application-environment variables from the old instance to the new one in the AWS console (Elastic Beanstalk > Environments > (name of env) > Configuration).

Also, make sure to specify a keypair when you create a new environment. You can do this on instance creation, but it’s harder, and specifying it with the environment will apply it to any instances that get autoscaled. If you don’t do this you won’t be able to ssh to the instances, and then you will be sad during the inevitable debugging slog.

Finally, the default (free) instance type that gets created here is t2.micro, but that doesn’t have enough capacity for memory-hungry neural net dependencies, so I had to destroy my first attempt and recreate an environment where I went into advanced options somewhere and specified t2.small.

A digression, in which we update our application

HAMLET had been on Python 3.6, but the only available instances were Python 3.7 and 3.8. So there’s a moment in here where I create a new 3.8 pipenv on localhost and reinstall all the things, updating as needed. Luckily this is a pretty small change and the tests still pass.

How about that database (RDS)

Then I went to attach my new instance to my old database and discovered that’s not a thing, because why would it be. Instead, you need to snapshot your existing database on the old instance; create a new RDS instance attached to your new beanstalk environment; and attach the snapshot during the creation process. Luckily RDS figured it out automagically from there (I was not looking forward to spending time configuring a database).

And then I had a database that Django couldn’t connect to, because pipenv install psycopg2 fails unless you have pg_config, which means you have to install postgresql-devel in the packages section of an .ebextensions config file, which is an example of why my git history is such a mess.

What if code needs to get from GitHub to AWS (GitHub Actions)

This was the easy part. I blew away my travis config and set up GitHub Actions, which didn’t exist in 2017, but do now. This took, like, fifteen minutes and one config file, and just worked. Great job, GitHub.

Psych, your deploy is still broken (.ebextensions)

Remember how Amazon Linux and AL2 have all sorts of subtle, patchily documented differences? Yeah. It turns out the deployment process is among them. The syntax available in .ebextensions files is a little different! .platform/hooks now exists, and you can put shell scripts there and it’s honestly pretty useful — once you figure out what executes when! I referred frequently to this documentation of AL2 config options. After alarmingly many yakshaving commits: before, after.

Mostly this was removing stuff, which was great. Got rid of stuff which was there only to hack around problems with an old version of WSGI. No longer needed logging hacks, because logs work better on AL2. Got rid of certbot conf temporarily because I needed it to work on port 80 before I complicated things with port 443 (stay tuned!) And…

Everything’s better without Apache (nginx)

Andy had set it up with Apache and it worked so I didn’t touch it. That said, I personally have never gotten Apache to work for anything, fleeing instead to the comparatively smooth and glowing embrace of nginx. As long as everything was broken anyway, and AL2 defaults to nginx and certbot has clear instructions for nginx…why not try that.

This meant destroying the Apache config in .ebextensions and then not writing nginx config, because the existing stuff just worked (modulo one little change to the syntax for the WSGI file location — hamlet.wsgi:application instead of hamlet/wsgi.py). That was pretty great.

What if (when) you need to debug it

The most helpful thing here was connecting directly to the EC2 instance (you can do this with eb ssh but I just used the web console) and hand-running commands to see what happened, rather than making an .ebextensions or .platform/hooks change, deploying it, and then hunting through logs. This was also particularly helpful for dealing with packages issues; instructions for various install processes usually reference apt-get and Debian/Ubuntu, but I have yum and AL2, and packages often have slightly different names, and oy, just logging into the instance and doing some yum list is so much easier than guessing and hoping.

Connecting directly to the EC2 instance also makes it easy to view logs, though the download full logs option in the Beanstalk console is good too. Don’t just get the last 100 lines; this doesn’t include all the log files you will need — notably the cfn-init.log, to which I referred constantly. eb-engine.log was very helpful too, and sometimes the nginx access and error logs.

There was also a hilarious moment (…several moments) (…not initially hilarious) when I found that my captcha wasn’t working and updating it didn’t help, and when I went to view the application logs I discovered we weren’t writing them because reasons. Rather than figure out how to do that, there I was learning strace to grab log writes before they went to the ether, which is how I discovered this psycopg2 bug. captcha is still not working.

Miscellaneous small yaks

I needed to repoint my CNAME record for hamlet.andromedayelton.com from my old Beanstalk environment to the new one. The cryptic URL is clearly visible in the AWS web console for Beanstalk, and the hardest part of this was remembering where my CNAME record lived.

Correspondingly, I had to update ALLOWED_HOSTS in the Django config.

There were, at this point, CSRF failures when I submitted forms, but I decided to ignore them until after I’d fixed SSL, on the theory that the CSRF_COOKIE_SECURE = True Django setting might, you know, not work if the “secure” wasn’t available. This was the correct call, as CSRF magically worked once I stood up SSL. Similarly, I didn’t bother to think about whether I was redirecting http to https until after SSL was working, and it turned out there was nothing I needed to do here — my existing conf worked. (Whatever it is. Seriously, I have no idea what Amazon puts in its default nginx file, or how certbot edits it.)

I updated Pillow. captcha is now working.

Speaking of certbot, isn’t that where we started? (SSL)

Remember how my presenting problem was that certbot installation 404ed? And here we are several screens later and I haven’t installed certbot yet? Yes. Well. The prerequisites for the honestly quite helpful certbot instructions include having a working web site on port 80, and we’ve only just gotten there. I used the other Linux + pip instructions variant, because AL2 is extremely other, and no one really seemed to know how to install snapd on it. I filled in some .ebextensions and .platform/hooks specifics with this extremely helpful AL2 + certbot blog post, which (bonus points!) has the clearest diagram I have encountered of the order in which things happen during Beanstalk deployments. In particular, this blog post tipped me off that the certbot certificate installation script needs to run during a postdeploy hook, not during .ebextensions, so that the changes that it makes to nginx configuration will persist across autoscaling events.

On the whole, though, EFF has improved its certbot installation process since 2017, and once I had the entire rest of the web site working, this part was fairly straightforward.

The life-changing magic of tidying up

At this point I had 3 Beanstalk environments — original HAMLET; failed (t2.micro) first attempt at upgrading to AL2; and shiny new working HAMLET. Don’t forget to destroy the old ones when you no longer need them! Terminating them will also terminate associated resources like databases, and then you stop getting charged for them.

Which brings me to the last item on my todo list, PARTY AND BLOG ABOUT IT.

πŸŽ‰ πŸ₯³ 🎊 Hi! I did it!

…but it is still broken

Remember that captcha? The form it guards 500s on submission. More time with strace later, I find that the fast version of (my very old) gensim isn’t compiling, so it’s falling back to the slow version, so it doesn’t have neg_labels available, and it needs that to run infer_vector. I have tried a variety of things to install gensim (notably, build-essentials isn’t a thing on this system; I need yum groupinstall "Development Tools"), but it still doesn’t compile. This means parts of the site work — the slow gensim path is still available — but not anything involving processing uploaded text.

I strongly suspect I’m caught in version shear problems (my gensim is extremely outdated), but upgrading that is going to be its own slow and careful thing, outside the scope of getting HAMLET to stand up again at all.

πŸŽ‰ πŸ₯³ 🎊 Hi! I sort of did it!

I haven’t failed, I’ve tried an ML approach that *might* work!

When last we met I was turning a perfectly innocent neural net into a terribly ineffective one, in an attempt to get it to be better at face recognition in archival photos. I was also (what cultural heritage technology experience would be complete without this?) being foiled by metadata.

So, uh, I stopped using metadata. πŸ€¦β€β™€οΈ With twinges of guilt. And full knowledge that I was tossing out a practically difficult but conceptually straightforward supervised learning problem for…what?

Well. I realized that the work that initially inspired me to try my hand at face recognition in archival photos was not, in fact, a recognition problem but a similarity problem: could the Charles Teenie Harris collection find multiple instances of the same person? This doesn’t require me to identify people, per se; it just requires me to know if they are the same or different.

And you know what? I can do a pretty good job of getting different people by randomly selecting two photos from my data set — they’re not guaranteed to be different, but I’ll settle for pretty good. And I can do an actually awesome job of guaranteeing that I have two photos of the same person with the ✨magic✨ of data augmentation.

Keras (which, by the way, is about a trillionty times better than hand-coding stuff in Octave, for all I appreciate that Coursera made me understand the fundamentals by doing that) — Keras has an ImageDataGenerator class which makes it straightforward to alter images in a variety of ways, like horizontal flips, rotations, or brightness changes — all of which are completely plausible ways that archival photos of the same person might differ inter alia! So I can get two photos of the same person by taking one photo, and messing with it.

And at this point I have a Siamese network with triplet loss, another concept that Coursera set me up with (via the deeplearning.ai sequence). And now we are getting somewhere!

Well. We’re getting somewhere once you realize that, when you make a Siamese network architecture, you no longer have layers with the names of your base network; you have one GIANT layer which is just named VGGFace or whatever, instead of having all of its constituent layers, and so when you try to set layer.trainable = True whenever the layer name is in a list of names of VGGFace layers…uh…well…it turns out you just never encounter any layers by that name and therefore don’t set layers to be trainable and it turns out if you train a neural net which doesn’t have any trainable parameters it doesn’t learn much, who knew. But. Anyway. Once you, after embarrassingly long, get past that, and set layers in the base network to be trainable before you build the Siamese network from it…

This turns out to work much better! I now have a network which does, in fact, have decreased loss and increased accuracy as it trains. I’m in a space where I can actually play with hyperparameters to figure out how to do this best. Yay!

…ok, so, does it get me anywhere in practice? Well, to test that I think I’m actually going to need a corpus of labeled photos so that I can tell if given, say, one of WEB Du Bois, it thinks the most similar photos in the collection are also those of WEB Du Bois, which is to say…

Alas, metadata.

I haven’t failed, I’ve just tried a lot of ML approaches that don’t work

"nope" handpainted in white over wood

“Let’s blog every Friday,” I thought. “It’ll be great. People can see what I’m doing with ML, and it will be a useful practice for me!” And then I went through weeks on end of feeling like I had nothing to report because I was trying approach after approach to this one problem that simply didn’t work, hence not blogging. And finally realized: oh, the process is the thing to talk about…

Hi. I’m Andromeda! I am trying to make a neural net better at recognizing people in archival photos. After running a series of experiments — enough for me to have written 3,804 words of notes — I now have a neural net that is ten times worse at its task. πŸŽ‰

And now I have 3,804 words of notes to turn into a blog post (a situation which gets harder every week). So let me catch you up on the outline of the problem:

  1. Download a whole bunch of archival photos and their metadata (thanks, DPLA!)
  2. Use a face detection ML library to locate faces, crop them out, and save them in a standardized way
  3. Benchmark an off-the-shelf face recognition system to see how good it is at identifying these faces
  4. Retrain it
  5. Benchmark my new system

Step 3: profit, right? Well. Let me also catch you up on some problems along the way:

Alas, metadata

Archival photos are great because they have metadata, and metadata is like labels, and labels mean you can do supervised learning, right?

Well….

Is he “Du Bois, W. E. B. (William Edward Burghardt), 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt) 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt)” or “W.E.B. Du Bois”? I mean, these are all options. People have used a lot of different metadata practices at different institutions and in different times. But I’m going to confuse the poor computer if I imply to it that all these photos of the same person are photos of different people. (I have gone through several attempts to resolve this computationally without needing to do everything by hand, with only modest success.)

What about “Photographs”? That appears in the list of subject labels for lots of things in my data set. “Photographs” is a person, right? I ended up pulling in an entire other ML component here — spaCy, to do some natural language processing to at least guess which lines are probably names, so I can clear the rest of them out of my way. But spaCy only has ~90% accuracy on personal names anyway and, guess what, because everything is terrible, in predictable ways, it has no idea “Kweisi Mfume” is a person.

Is a person who appears in the photo guaranteed to be a person who appears in the photo? Nope.

Is a person who appears in the metadata guaranteed to be a person who appears in the photo? Also nope! Often they’re a photographer or other creator. Sometimes they are the subject of the depicted event, but not themselves in the photo. (spaCy will happily tell you that there’s personal name content in something like “Martin Luther King Day”, but MLK is unlikely to appear in a photo of an MLK day event.)

Oh dear, linear algebra

OK but let’s imagine for the sake of argument that we live in a perfect world where the metadata is exactly what we need — no more, no less — and its formatting is perfectly consistent. πŸ¦„

Here you are, in this perfect world, confronted with a photo that contains two people and has two names. How do you like them apples?

I spent more time than I care to admit trying to figure this out. Can I bootstrap from photos that have one person and one name — identify those, subtract them out of photos of two people, go from there? (Not reliably — there’s a lot of data I never reach that way — and it’s horribly inefficient.)

Can I do something extremely clever with matrix multiplication? Like…once I generate vector space embeddings of all the photos, can I do some sort of like dot-product thing across all of my photos, or big batches of them, and correlate the closest-match photos with overlaps in metadata? Not only is this a process which begs the question — I’d have to do that with the ML system I have not yet optimized for archival photo recognition, thus possibly just baking bad data in — but have I mentioned I have taken exactly one linear algebra class, which I didn’t really grasp, in 1995?

What if I train yet another ML system to do some kind of k-means clustering on the embeddings? This is both a promising approach and some really first-rate yak-shaving, combining all the question-begging concerns of the previous paragraph with all the crystalline clarity of black box ML.

Possibly at this point it would have been faster to tag them all by hand, but that would be admitting defeat. Also I don’t have a research assistant, which, let’s be honest, is the person who would usually be doing this actual work. I do have a 14-year-old and I am strongly considering paying her to do it for me, but to facilitate that I’d have to actually build a web interface and probably learn more about AWS, and the prospect of reading AWS documentation has a bracing way of reminding me of all of the more delightful and engaging elements of my todo list, like calling some people on the actual telephone to sort out however they’ve screwed up some health insurance billing.

Nowhere to go but up

Despite all of that, I did actually get all the way through the 5 steps above. I have a truly, spectacularly terrible neural net. Go me! But at a thousand-plus words, perhaps I should leave that story for next week….

this time: speaking about machine learning

No tech blogging this week because most of my time was taken up with telling people about ML instead!

One talk for an internal Harvard audience, “Alice in Dataland”, where I explained some of the basics of neural nets and walked people through the stories I found through visualizing HAMLET data.

One talk for the NISO plus conference, “Discoverability in an AI World”, about ways libraries and other cultural heritage institutions are using AI both to enhance traditional discovery interfaces and provide new ones. This was recorded today but will be played at the conference on the 23rd, so there’s still time to register if you want to see it! NISO Plus will also include a session on AI, metadata, and bias featuring Dominique Luster, who gave one of my favorite code4lib talks, and one on AI and copyright featuring one of my go-to JD/MLSes, Nancy Sims.

And I’m prepping for an upcoming talk that has not yet been formally announced.

Which is to say, I guess, I have a lot of talks about AI and cultural heritage in my back pocket, if you were looking for someone to speak about that πŸ˜‰