Good Questions / David Rosenthal

On November 21^st Bryce Elder posed Five questions from an ignorant no-coiner about the crypto crash. Each of his five questions identified some interesting apparent anomalies.

Below the fold I look into each of his questions, asking how anomalous its anomalies really were and whether they have persisted into the New Year.

TL;DR none of them are really surprising but reaching that conclusion took a good deal of research.

Why doesn’t network difficulty fall?

Elder explains Bitcoin's difficulty mechanism:

Bitcoin’s proof-of-work algorithm has a difficulty ratchet to keep production steady. Network difficulty adjusts every two weeks, approximately, based on whether miners have been finding new blocks for the blockchain too quickly or too slowly.

And then identifies the first anomaly:

Difficulty tightens when a higher bitcoin price is encouraging miners to join the market, that much makes sense. But when a lower bitcoin price is squeezing margins and pushing the least efficient miners out, shouldn’t difficulty fall?

Here is the chart Elder is looking at.

Source

Then Elder asks:

Maybe difficulty lags behind the price? As said above, the ratchet only adjusts approximately every two weeks. Total hashrate oﬀers a more dynamic measure of power deployed to the network, albeit an estimated one based on how long it’s taking to mine a block. Any change in market structure might show up there first.

And has it? Not really:

The first thing to note is that, depending upon power costs and where they are in the queue for the latest rigs from Bitmain, miners margins vary greatly. So Elder is right that, if one of the two-week adjustemnts increases the difficulty, some of the least economic rigs should be switched off. In theory, if the adjustment decreases the difficulty, some of these idled rigs should be switched back on.

Eyeballing the graph, between early July and late October the price was between $110K and $120K, a 9% range. During that period the difficulty increased from around 140trn to around 155trn, about an 11% range. These aren't big changes, but it appears a bit strange that roughly flat price coincides with a steady-ish increase in difficulty.

Elder has some possible explanations:

Do miners with sunk costs keep running on negative margins in the hope of getting lucky? Are a handful of big miners, maybe advantaged by free power or whatever, keeping difficulty high to drive out competitors, either inadvertently or as part of some devious plan to centralise production and control the network? Or has mining become 55 per cent more eﬃcient since last November?

I estimate that during that time Bitmain and its competitors shipped an additional 1000EH/s to the miners, over and above the new rigs replacing obsolete ones. These leading-edge rigs would have been turned on immediately. During that time the hash rate rose from about 870EH/s to around 1120EH/s, or about 250EH/s. So around 150EH/s must have represented idled, less-efficient rigs being turned on. At the start of the period at most 85% of the rig fleet was working. At the end of the period, as the price started to fall, the most efficient miners had about 100EH/s more than they did at the beginning.

In the subsequent 2 months the price dropped from about $120K to around $85K and the hash rate dropped from around 1120EH/s to around 1060EH/s, or by 60EH/s. But in that time there ws an incremental 50EH/s of new rigs. So 110EH/s of the 150EH/s from the originally idle fleet of uneconomic rigs were turned off. This within the margin or error of my back-of-the-envelope estimates.

My take is that what happened was what should have been expected. Elder's view was too simplistic, and he was misled by (a) the noise in the graph, and ((b) the fact that the Y axis doesn't start at zero, making the noise much more evident.

Source

Elder goes on to note an issue I've been writing about for really a long time:

Related, is it a worry that just three mining pools accounted for more than 45 per cent of this week’s block production? Given nearly all the hardware used across the network is Chinese-made, with Beijing-based Bitmain Technologies alone having an estimated market share of 82 per cent, when do concentration levels among a few organisations become a security concern?

Yes, this concentration has been a problem for Bitcoin's entire history, but everyone has decided to ignore it. As I write, on 6^th January 2026, over the last three days Foundry USA and AntPool have controlled 51.2% of the hash power. As I explained at length in Sabotaging Bitcoin, there are practical difficulties facing an insider attack with 30% of the hash power, but with 51.2% things are much easier.

And, in November, Reuters reports that Bitcoin mining in China rebounds, defying 2021 ban:

Bitcoin mining is quietly staging a comeback in China despite being banned four years ago, as individual and corporate miners exploit cheap electricity and a data center boom in some energy-rich provinces, according to miners and industry data.

China had been the world's biggest crypto mining country until Beijing banned all cryptocurrency trading and mining in 2021, citing threats to the country's financial stability and energy conservation.

After having seen its global bitcoin mining market share slump to zero as a result of the ban, China crept back to third place with a 14% share at the end of October, according to Hashrate Index, which tracks bitcoin mining activities.

This may be what the 80% of new data centers in China that were empty because unsuitable for AI are being used for.

What’s the deal with Tether?

Zeke Faux described Tether as "practically quilted out of red flags" and they are still flapping. What attracted Elder's attention was this:

Stablecoins have long been pitched as crypto’s on-ramp. Swapping fiat money for a fiat-pegged stablecoin like Tether’s USDT or Circle’s USDC allows a trader to switch in and out of positions without having to touch tradfi.

Shouldn’t an on-ramp also work as an off-ramp? There’s not much evidence it does. In the six weeks or so when $1.2tn in value was drawn down from the cryptoverse, the market cap of USDT has increased by approximately $20bn:

Source

This USDT graph is updated from the one Elder used. It shows a clear break in the upward trend on 24^th October. In the 10 preceding weeks it increased by $17B (~10%) as Bitcoin traded beteween a low of $108K and a high on 6^th October of almost $125K.

In the 10 weeks since it has increased by only $3B as Bitcoin dropped from $111K to a low of $84K on 22^nd November, subsequently trading in a range from there to $94K.

The only other stablecoin that matters is USDC:

USDC hasn’t been quite as resilient but over the period is still basically flat:

Source

Here is its updated graph. On 24^th October its market cap was around $76B, it is now around $75B. So, yes, Elder was right, it has been basically flat for 10 weeks where the preceding 10 weeks it had increased by $8B (~12%).

Had the trend of the previous 10 weeks continued, the market cap of (USDT+USDC) would have been $25B higher than it is. One way of looking at this would be that traders "withdrew" $25B or around 10% of the (USDT+USDC) market cap on 24^th October. On this basis Elder is just wrong, the off-ramp has been quite busy.

Elder notes that:

Even if we assume a large percentage of stablecoins are used for non-crypto things (sports betting, remittances, crimes), the recent issuance still looks at odds with the trend.

First, this contradicts the premise of Elder's question as I understand it. Second, most of the uses Elder cites involve a chain of transactions from fiat to stablecoin to fiat on one or more exchanges. These would not cause a net increase in demand for the stablecoin, which would come from and go back to the exchange's reserves.

Then Elder suggest that:

Maybe demand is high because crypto traders have been parking money rather than seeking to withdraw it?

First, demand isn't high relative to the historic trend, it is now $25B lower. Second, even if we accept Elder's $20B number, this is peanuts relative to the $1.2T drop in aggregate cryptocurrency market cap.

Source

Finally in this section Elder asks:

More idle money in the system ought to be good news for the likes of Coinbase, which uses the promise of higher yields on USDC deposits to sell monthly subscription schemes. And how have Coinbase shares been doing?

The day Bitcoin hit the skids on 8^th, COIN closed at $387.27. By 11/20 it was down 38% while Bitcoin was down 47%. It can hardly be a surprise that COIN is highly correlated with Bitcoin.

What explains trash crash PTSD?

Next, Elder asks why cryptocurrencies haven't resumed their progress moonwards:

A popular argument among crypto commentators is that token prices are down because traders are still digesting one bad day in early October.

He suggests some reasons:

Reasons for the October drawdown go from banal (maybe the high-beta cryptosphere just amplified an equities pullback on US-China trade tensions?) to wonky (maybe it all cascaded from a weird synthetic stablecoin depegging on one marketplace?) to the darkly cynical (maybe the big sharp drop was to let bucket-shop crypto brokers close out customer positions they’d never actually bought?).

Source

In anything to do with cryptocurrency, and definitely in this case, I'd go with "darkly cynical". He then asks:

If a market’s not deep, eﬃcient or clean enough to digest a bit of one-day volatility, why get involved?

I'd agree with Elder on this. If we look at the log plot of Bitcoin's history, we can barely see the 47% drop in 6 weeks last October. We can just see, to take some recent examples, the 48% drop in 5 weeks in early 2020, and the 42% drop in 2 weeks in mid-2021. Traders who don't learn from history are doomed to repeat it. Of course, the volatility is precisely the thing that brings the traders to Bitcoin.

Where’s the volume?

Elder notes a wave of selling:

Crypto exchange-traded products have been haemorrhaging money all week. Spot ETP net redemptions yesterday were $1.14bn, including $901mn just from bitcoin ETPs, according to JPMorgan estimates. That’s the worst single day for net outflows since February.

With so much selling, you might expect to see an increase in bitcoin velocity, which measures the rate at which tokens move on the chain.

Source

This graph covers a longer history than Elder's. Already back in 2021 Igor Makarov & Antoinette Schoar found that the "rate at which tokens move on the chain" was irrelevant:

90% of transaction volume on the Bitcoin blockchain is not tied to economically meaningful activities but is the byproduct of the Bitcoin protocol design as well as the preference of many participants for anonymity ... exchanges play a central role in the Bitcoin system. They explain 75% of real Bitcoin volume.

Velocity dropped sharply until November 2023, and then continued to drop gradually. November 2023 was the start of a major increase in Bitcoin's price, which coincided with a sustained increase in trading volume on exchanges.

Source

It thus seems likely that the trend identified by Makarov & Schoar that on-chain activity was largely confided to exchanges increased between 2021 and 2023, and was then saturated. Elder's explanation is only part of the story:

Bitcoin velocity has been plummeting for years, for reasonable reasons. “Digital gold” overtook “internet money” as the preferred reason to hold, while derivatives like perpetual futures removed any need to faﬀ around with the underlying asset.

It isn't just derivatives. Spot trading happens on exchanges. The blockchain is pretty much only used for inter-exchange netting transactions. So Elder's question misses the point:

Nevertheless, is it odd to have a sudden wave of selling that’s almost invisible in the underlying asset? Bitcoin velocity has barely changed over the past month, having bounced meekly oﬀ a record low in early October. Why?

Having a vast derivative market based off a much smaller spot market on exchanges based on a tiny set of transactions on the blockchain is a wonderful playground for traders, because it is easy to manipulate.

Is past performance indicative of future results?

Source

Spoiler: No. In this section Elder expresses skepticism about a piece by Alex “Crypto Alex” Saunders, of Citigroup suggesting that:

The halving cycle is a reason that long-time Bitcoin holders are nervous. We show the price performance in the years after halvings in Figure 3 with the second year showing weakness. These crypto winters have been associated with 80%+drawdowns in the past as shown in Figure 4.

Elder doesn't need any help from me on this "chart necromancy". Halvenings primarily affect miners by roughly halving their income without a corresponding decrease in costs. This may force them to sell coins they have stashed, which at the margin may drive the price down. But with miners' income currently around $40M/day and recent volume peaks on major exchanges around $1.4B/day this is likely to have marginal impact.

Fellow Reflection: Emily Woehrle / Digital Library Federation

This post was written by Emily Woehrle, who attended the 2025 DLF Forum as an Emerging Professionals Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. 2025 Emerging Professionals Fellowships were supported by a grant from MetaArchive.

Emily Woehrle is a Digital Content Librarian at the University of Toronto Libraries, where she supports a large-scale website renewal project and manages the library’s LibGuides service. She also works as a part-time librarian at the Toronto Public Library. With a background in non-profit communications and content management, Emily brings an interdisciplinary perspective to her work and looks forward to sharing experiences and learning from peers at the DLF Forum to advance sustainable, user-centered digital library practices.

Documentation as responsible digital stewardship

Attending the DLF Forum as one of the Emerging Professionals fellows was an incredibly positive experience that left me with new connections, new ideas, and a validating sense of solidarity. The Forum was only my second library conference, yet I felt immediately comfortable among people who get it and were ready to share and listen to each other’s experiences. It was also a joy to navigate the conference with my fellow Fellows.

Two presentations stood out to me over the course of the Forum, and neither focused on trendy hot topics. Instead, they highlighted the importance of documentation; the practical, behind-the-scenes work that keeps digital libraries and archives running by codifying tacit knowledge and establishing the workflows, structures, and guidelines that sustain digital library work.

In my current role, I’m coordinating a large-scale website consolidation project that requires my colleagues and me to build processes and governance structures impacting 20+libraries and departments. Documentation has become essential to scaling the project and keeping everyone aligned. Over the past year, I’ve spent a lot of time thinking about the most effective ways to develop knowledge-management systems, why some workplace cultures prioritize them more than others, and what that means for long-term success.

The first session, “Agile Documentation Development for Digital Preservation Systems,” offered strategies to make documentation immediately useful, iterative, and collaborative. The presenters emphasized creating “minimum viable documents” that favor progress over perfection – start with something usable, then refine it over time. They also underscored how role clarity and interdepartmental culture shape the success of documentation efforts. This session helped me reframe documentation as a living tool whose maintenance must be built into our work rather than treated as an afterthought.

The second session, “Renaming Failure as “readme.files”: Lessons Learned from Early and Mid-career Archives Perspectives,” reminded me that unexpected challenges are inevitable and that they can serve as learning opportunities instead of being perceived as failures. The speaker spoke about the importance of recording “detours” as they happen and how documentation can play a key part in reflection on lessons learned. She also discussed the value of using documentation to close “open loops” when offboarding from a project, ensuring that future staff can build on past work rather than unknowingly duplicating it. It was a practical reminder that documenting failure isn’t about dwelling on mistakes; it’s about giving the next person a clearer path forward.

Taken together, these sessions reinforced that documentation is more than a checklist. It can be a form of care—not only for colleagues and users who will later take on or inherit the work, but also for the library systems that depend on it. Creating collaborative documentation is an often overlooked and undervalued core competency, yet it is fundamental to both project and organizational success. I left the Forum with a renewed commitment to integrating these approaches into my own knowledge-management practices and to advocating for clearer, more collaborative documentation across the teams I work with.

The post Fellow Reflection: Emily Woehrle appeared first on DLF.

2026-01-14: Reflections on the Teaching Experience / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

In the fall of 2025, I was presented with the exciting opportunity to teach CS 433/533: Web Security at Old Dominion University (ODU). This course was designed and previously taught by Dr. Michael L. Nelson. Having experienced this class firsthand as a student, I was thrilled to transition into the role of an instructor, bringing my unique perspective to the curriculum. Through this blog, I aim to share my journey and insights from teaching this course, with the hope that my experiences will serve as a useful resource for fellow colleagues and those venturing into teaching for the first time.

The goal of this course was to review common web security vulnerabilities and exploits, along with the defenses designed to counter them. Students explored topics such as browser security models, web application vulnerabilities, and various attack and defense mechanisms including injection, denial-of-service, cross-site scripting, and more. Alongside theoretical knowledge, students also gain hands-on experience with technologies like Git and GitHub, DOM and JavaScript, the command line interface (CLI), and Node.js.

Before the class commenced, one of the primary tasks involved setting up the course online where students could easily access materials and engage in communication. This platform is crucial, as it serves as the central point for sharing the syllabus, detailing office hours, distributing course materials, and facilitating ongoing dialogue with students. This course utilized GitHub primarily for assignments and sharing resources. While Google Groups were previously utilized effectively for communication, I transitioned to Canvas, the official learning management system of ODU. This decision leveraged a familiar environment, minimizing the learning curve for students and streamlining course interactions.

The first task on Canvas was crafting an updated syllabus that clearly conveyed the course's essence and objectives. It also laid out prerequisites, grading criteria, and ODU's plagiarism policies. It provided students with a clear roadmap of what to expect, including schedules and exam details. Additionally, I strongly recommend implementing a weekly Summary Schedule, an approach inspired by Dr. Michele Weigle's courses. This schedule detailed the topics to be discussed each week, listed any assignments for that week, and their respective due dates. This approach not only helped students gain a clear overview of the entire course, but it also helped me keep track of everything and stay on schedule. It also facilitated timely communication, as it reminded me when to send out weekly announcements, release assigned tasks, and mark due dates to ensure efficient grading.

One of the highlights of this class was the weekly discussion forum, which significantly enhanced communication among students. Initially, Dr. Nelson had developed an activity where students would retweet tweets related to web security weekly on Twitter/X, followed by in-class discussions. This approach was an excellent way for students to learn about current trends and major news in web security. However, as the class was asynchronous, I needed a different method to maintain this engagement. I utilized Canvas to create weekly discussion forums where students could earn points by sharing the latest stories or news related to web security. Each week, I observed students sharing intriguing news and actively reacting to and discussing each other's posts. This method was not only enjoyable, but it also fostered a collaborative learning environment, allowing us all to discover new information together each week.

Canvas Discussion Forum for students to post weekly updates on web security news

Next, it's essential to establish a communication channel with students. I used Canvas Announcements to send weekly messages that informed students about the week's focus, the materials they would need, links to lectures and slides, and any updates regarding assignments. If you have prepared materials in advance, Canvas allows you to schedule these announcements to be released at a later date. This approach ensures timely and organized communication, keeping students informed and engaged throughout the course.

As the saying goes, "All fingers are not equal," and this holds true for students as well. Each student learns at their own pace, some grasp concepts quickly, while others need a bit more time. I learned that it's crucial to offer flexibility and understanding to accommodate these differences. When I first noticed that some students were scarcely participating in class, I realized I needed to take proactive steps. Reaching out to struggling students via email, acknowledging their challenges, and inviting them to connect if they needed support proved to be an effective strategy. This simple gesture often served as an icebreaker, leading to increased attendance at office hours and more frequent email communication. I also discovered that many students in my class preferred office hours after standard working hours due to their job commitments. In response, I adjusted my regular office hours to accommodate students' working schedules, moving them to after 5 PM. Timely grading and constructive feedback also proved to be a key to encouraging student development. Prompt feedback allowed students to reflect and improve their work over the semester. Additionally, offering extra credit opportunities, provided students the chance to boost their grades and reinforce their understanding.

Reflecting on my experience teaching this course, I've come to appreciate the dynamic nature of education and the profound impact that thoughtful instructional design can have on students' learning experiences. Transitioning from student to instructor offered me a unique perspective, allowing me to tailor the course to address the diverse needs and learning styles of my students. Developing online resources, a detailed syllabus, and an interactive discussion forum highlighted the importance of clear communication. Building a supportive environment requires flexibility and empathy, acknowledging each student's unique challenges. Adjusting office hours and maintaining open communication ensured students felt supported, while consistent feedback and extra credit fostered growth and improvement. Overall, I had immense fun connecting with the students, understanding their perspectives, and learning alongside them. The collaborative learning atmosphere enriched not just their educational journey but also my teaching experience, reminding me how rewarding it can be to explore new ideas together.

I extend my heartfelt gratitude to Dr. Michael L. Nelson, Dr. Michele C. Weigle, and Dr. Steven J. Zeil for providing me with this invaluable experience. This course greatly benefited from Dr. Nelson's materials. I also appreciate Dr. Nelson and Dr. Weigle for their willingness to address any questions I had.

- Kritika Garg (@kritika_garg)

Fellow Reflection: Dorian McIntush / Digital Library Federation

This post was written by Dorian McIntush, who attended the 2025 DLF Forum as an Emerging Professionals Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. 2025 Emerging Professionals Fellowships were supported by a grant from MetaArchive.

Dorian McIntush is the Open Scholarship and Data Resident Librarian at Washington and Lee University, where he supports faculty and students with digital research and open knowledge initiatives. He has a commitment to creating equitable access to knowledge and is particularly interested in exploring the environmental impact of digital technologies and open scholarship models that prioritize accessibility and long-term sustainability. Beyond his professional work, he enjoys tromping through Virginia’s hiking trails with his dog, taking on new knitting projects, and cooking interesting recipes for dinner.

When I stepped into my current role of Data and Open Scholarship Resident Librarian at Washington and Lee University in July of this year, I was also stepping for the first time into the world of academic libraries. I focused on public libraries during my MLIS and after graduation I worked at the DC Public Library. My new academic librarian position was also brand new at my institution and was constructed to be a librarian residency. This meant that I would have a lot of freedom to grow into and shape the role, but also no real history to use as a support and learn from. This was simultaneously a gift and also a little daunting.

Coming from public libraries, I had grown used to thinking about access in very practical, immediate terms. Who has a library card? Who can physically get to our building? What barriers keep people from the resources they need? But the sessions at DLF pushed me to think about access in ways that felt more expansive.

Amber Dierking’s presentation on the Queer Liberation Library was a particular highlight for me. I’d already been a user and huge fan of QLL, but hearing Dierking talk about the work behind it reinforced everything I loved about their approach. QLL didn’t reinvent the wheel. They focused on using existing tools, keeping it simple, making it free. As someone building a role from scratch, the creative pragmatism of QLL felt like a blueprint I could make use of.

I was also drawn to Mariam Ismail’s presentation on the 23/54 Project. The work of preserving a community quilt through 3D scanning and building an interactive digital exhibit felt like a perfect example of what digital humanities could be at its best: deeply rooted in community, respectful of material culture, and genuinely expanding access rather than just digitizing for digitization’s sake. It made me think about the special collections and archives at my own institution and how we might engage descendant communities and students in similar ways.

The Data Advocacy for All Toolkit presentation tied these threads together for me in a way I didn’t expect. The team was talking about who gets to tell stories with data, who gets left out of those stories, and how we can teach people to use data ethically for social change. This toolkit offered a framework that felt aligned with my public library values, one that’s accessible, focused on equity, and designed to empower data users and learners.

DLF gave me permission to think big while starting small. I’m returning to W&L with a clearer sense of what this residency could become, not a replica of someone else’s role, but something shaped by the communities I serve and the values I bring from public libraries into this new academic space.

The post Fellow Reflection: Dorian McIntush appeared first on DLF.

Evaluation / Ed Summers

Hopefully this isn’t perceived as me caving, but I’m trying to redirect my ire at the spread of genAI into learning how to evaluate genAI, especially when comparing it to existing non-genAI systems, but also between different genAI solutions (models, etc).

I don’t consider myself a doomer or a boomer, and see genAI as normal technology that needs to be evaluated as a technology. I know this is a broad area, that overlaps somewhat with how the models are themselves built (benchmarking), but if you have recommendations please let me know?

2026-01-12: Eight WSDL Classes Offered for Spring 2026 / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

https://xkcd.com/329/

Eight courses from the Web Science and Digital Libraries (WS-DL) Group will be offered for Spring 2026. The classes will be a mixture of on-line synchronous, asynchronous, and f2f.

CS 418/518 Web Programming, Ms. Nasreen Muhammad Arif, Mondays & Wednesdays 3:00-4:15
Topics: MySQL, React, Express.js, NodeJS, MVC (Model-View-Control), Search Engines, GitHub
CS 431/531 Web Server Design, Mr. David Calano, Mondays, Wednesdays, & Fridays, 10:00-10:50
Topics: HTTP, REST (Representational State Transfer), HATEOAS
CS 432/532 Web Science, Ms. Nasreen Muhammad Arif, asynchronous
Topics: Python, R, D3, ML, and IR.
CS 620 Intro to Data Science & Analytics, Dr. Bhanuka Mahanama, Web, Tuesdays & Thursdays 6:00-7:15 and asynchronous (8 week session starting in October)
Topics: Python, Pandas, NumPy, NoSQL, Data Wrangling, ML, Colab.
CS 625 Data Visualization, Dr. Bhanuka Mahanama, Tuesdays & Thursdays 1:30-2:45
Topics: Tableau, Python Seaborn, Matplotlib, Vega-Lite, Observable, Markdown, OpenRefine, data cleaning, visual perception, visualization design, exploratory data analysis, visual storytelling
CS 732/832 Human Computer Interaction, Dr. Bhanuka Mahanama, Tuesdays 4:30-7:10
Topics: Cognitive and social phenomena surrounding human use of computers.
CS 733/833 Natural Language Processing, Dr. Vikas Ashok, asynchronous
Topics: Language Models, Parsing, Word Embedding, Machine Translation, Text Simplification, Text Summarization, Question Answering
CS 734/834 Information Retrieval, Dr. Santosh Nukavarapu, Tuesdays 4:30-7:10
Topics: crawling, ranking, query processing, retrieval models, evaluation, clustering, machine learning.

While not strictly WSDL courses, Dr. Michael L. Nelson will be teaching CS800 (Mondays, 4:30-7:10), and Himarsha Jayanetti will be teaching CS 450/550 (Mondays & Wednesdays, 3:00-4:15). Dr. Jian Wu is on sabbatical and will not be teaching in the Spring, Dr. Sampath Jayarathna is now the Assistant Dean for Perry Honors College and will not be teaching in the Spring, and Dr. Michele C. Weigle is now an Associate Dean will have few if any classes going forward.

The Fall 2026 semester is still undecided, but will likely be similar to the previous Fall semesters. Previous course offerings: F25, S25, F24, S24, F23, S23, F22, S22, F21, S21, F20, S20, F19, S19, and F18.

--Michael

🇳🇬 Open Data Day 2025 in Owerri: Leveraging Open Data for Child Advocacy / Open Knowledge Foundation

This text, part of the #ODDStories series, tells a story of Open Data Day‘s grassroots impact directly from the community’s voices. The event ‘Leveraging Open Data for Child Advocacy in a Polycrisis Context’ was successfully held on 6th March 2025 in Imo State, bringing together child rights, teachers, advocates, policymakers, data analysts, church leaders and...

The post 🇳🇬 Open Data Day 2025 in Owerri: Leveraging Open Data for Child Advocacy first appeared on Open Knowledge Blog.

🇧🇩 Open Data Day 2025 in Sunamganj: Harnessing Open Data for Flood Preparedness / Open Knowledge Foundation

This text, part of the #ODDStories series, tells a story of Open Data Day‘s grassroots impact directly from the community’s voices. On March 4, 2025, the YouthMappers Chapter at the Institute of Remote Sensing and GIS (IRS), Jahangirnagar University, proudly celebrated Open Data Day 2025 with an impactful event titled “Mapping Resilience: Harnessing Open Data...

The post 🇧🇩 Open Data Day 2025 in Sunamganj: Harnessing Open Data for Flood Preparedness first appeared on Open Knowledge Blog.

Reproducible Notebooks / Ed Summers

I learned about Marimo a few months ago from this podcast episode and have been meaning to try it out. As a long time user of Jupyter notebooks I was interested to hear how Marimo was built (in part) to help solve the problem of reproducibility in data science and research software (Pimentel, Murta, Braganholo, & Freire, 2019). This is a problem I have had a lot of experience with, especially when sharing Jupyter notebooks with others.

The key thing that Marimo brings to the Python notebook to improve reproducibility is reactive execution. Marimo uses the Python AST to remember what cells depend on other cells, and when a change in one requires the execution of another, it will go and update it for you. Because of how they are created and edited, it’s very common for Jupyter notebooks to get into an inconsistent state because of the order in which cells are executed. This problem goes away with Marimo.

But, this aside, I thought I’d mention some somewhat superficial things that I immediately liked about Marimo…bearing in mind I’ve only been using it for one day.

Pandas dataframes appear as nicely formatted tables that can be easily scrolled horizontally, without truncation of values, or elided columns. I’ve gotten this to work in Jupyter notebooks in the past, but it always requires some fiddling it seems, and Marimo does it out of the box.
Table columns are sortable, filterable and can be summarized.
You can page through large tables.
You can easily download tables
You can commit your notebook to a git repository, and diffs in pull requests make sense.
You can easily run the notebook from the command line.
You can embed tests in your notebooks, and run them separately.
Built in basic charts (pie, bar, line, etc) which you can view source for and hand craft if you want (seems to use Altair).

2026-01-12: ODU CS 2025 Trick-or-Research Event Recap / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

The Department of Computer Science (CS) at Old Dominion University (ODU) hosted its annual Trick-or-Research event on October 31, 2025, blending academic exploration with Halloween-inspired creativity. Check out our previous Trick-or-Research blog posts for 2021, 2022, 2023, and 2024. Building on the success of previous years, this event brought together faculty, staff, and students to showcase the department’s cutting-edge research and encourage new collaborations. Designed especially to introduce undergraduate students to the wide range of research opportunities within the department, Trick-or-Research featured interactive lab tours, engaging demonstrations, and opportunities to network with professors and join research groups.

🎃 Trick or Research! | ODU CS Lab Tours 2025 👩‍💻👻 Tour CS labs, meet faculty & grad students, and win prizes! — Computer Science Graduate Society - ODU (@CSGS_ODU) October 13, 2025

Participants explored CS research labs in person at The E.V. Williams Engineering & Computational Sciences Building (ECSB) and Dragas Hall, with virtual option via Gather.town for remote attendees. To add to the festive atmosphere, attendees were encouraged to show their Halloween spirit by dressing in costume, with opportunities to win CS swag and a special prize for the best costume. The combination of hands-on research experiences, creative expression, and community engagement made Trick-or-Research an event that was both academically enriching and fun. 🎃

Online @gather_town Trick-or-Research

— Sampath Jayarathna (@OpenMaze) October 31, 2025

Lab Visits

Our annual Trick-or-Research is happening today @oducs Come and say hi to our research groups. @WebSciDL @NirdsLab @ODUSCI @odu @CSGS_ODU
— Sampath Jayarathna (@OpenMaze) October 31, 2025

Web Science and Digital Libraries (WS-DL)

The Web Science and Digital Libraries (WS-DL) group, led by Dr. Michael Nelson and Dr. Michele Weigle, welcomed visitors interested in web-focused research and digital preservation. Lab members discussed ongoing projects through posters and informal conversations, answering questions about research directions and opportunities for student involvement. The group’s research spans web archiving, web science, social media analysis, digital preservation, human-computer interaction, accessibility, information visualization, natural language processing, machine learning, artificial intelligence, and scholarly data mining. Visitors also had the opportunity to learn more about CS graduate courses and pathways into research.

Lab for Applied Machine Learning and Natural Language Processing Systems (LAMP-SYS)

The LAMP-SYS lab, led by Dr. Jian Wu, shared its work in applied machine learning and natural language processing. Visitors learned about ongoing research in areas such as entity extraction, mining electronic documents, and computational reproducibility in deep learning. Lab members discussed how machine learning and NLP techniques are applied to real-world problems using building blocks from information retrieval, digital libraries, and scholarly big data.

Neuro Information Retrieval and Data Science (NIRDS) Lab

The Neuro Information Retrieval and Data Science (NIRDS) lab, led by Dr. Sampath Jayarathna, engaged attendees with demonstrations and discussions centered on perception- and cognition-aware systems. Students shared examples of their research involving eye tracking, EEG, wearable sensors, and explained how these technologies are used to study user behavior and support real-world applications. The lab’s work emphasizes integrating psychophysiological signals with information retrieval and data science.

Accessible Computing Group

The Accessible Computing group, led by Dr. Vikas Ashok, introduced visitors to research focused on improving digital experiences for users with visual impairments. Lab members discussed intelligent interactive systems, accessible user interfaces, and image-to-speech technologies. Through demonstrations and conversations, the team highlighted how their work enhances web accessibility and human-computer interaction.

Bioinformatics Lab

The Bioinformatics Lab, led by Dr. Jing He, showcased research in computational biology and bioinformatics. Posters and discussions highlighted projects involving genomic data analysis, protein structure modeling, and 3D molecular imaging. Lab members also explained how machine learning techniques are applied to address challenges in medicine and health-related research.

High Performance Scientific Computing Team for Efficient Research Simulations (HiPSTERS)

The HiPSTERS group, led by Dr. Desh Ranjan and Dr. Mohammad Zubair, introduced students to interdisciplinary research in high-performance computing (HPC). Visitors learned about the group’s use of advanced mathematical methods and GPU programming to support large-scale simulations, including examples related to particle collider beam dynamics and scientific computing workflows.

Artificial Intelligence (AI) and Applications Research Group

The AI and Applications research group, led by Dr. Yaohang Li, shared ongoing work in artificial intelligence and machine learning. Through posters and discussions, visitors learned about projects involving machine learning-based physics event generation, particle production simulations, protein crystallization classification, financial data analysis, and generative models. The group connected with students interested in applying AI techniques across scientific and real-world domains.

Hands-On Lab

The Hands-On Lab, established by Ajay Gupta, presented research focused on building practical, end-to-end systems. Lab members discussed projects involving real-time monitoring using sensors and wearables, health-related data collection and analysis platforms, learning management portals, and mobile applications for medical and educational use. Visitors learned how these systems are designed, implemented, and evaluated in real-world settings.

Data Mining Lab

The Data Mining Lab, led by Dr. Lusi Li, shared research in data mining, machine learning, and optimization theory. Topics discussed included online machine learning, representation learning, transfer and multi-view learning, recommender systems, and explainable AI. Attendees explored how these techniques are applied to complex datasets across different application areas.

Internet Security Research Lab

The Internet Security Research Lab, led by Dr. Shuai Hao, presented research focused on networking and security. Lab members discussed measurement-driven and data-centric approaches to studying internet infrastructure, web security, privacy, and cybercrime. Visitors learned how empirical analysis and large-scale data studies are used to understand and address modern security challenges.

Wadduwage Lab

The Wadduwage Lab, led by Dr. Dushan N. Wadduwage, focuses on developing novel computational microscopy techniques that capture biological systems at their most information-rich representations while minimizing redundancy. Students learned how the lab integrates optics, machine learning, and signal processing to advance high-fidelity biomedical imaging and build trustworthy medical AI systems. Lab members discussed research in computational and differentiable microscopy, interpretable and reliable AI for medical decision-making, and advanced tracking algorithms that enable pinpoint-level object and particle tracking in microscopic environments. Through these conversations, visitors gained insight into how interdisciplinary methods are applied to solve real-world challenges in biomedical imaging.

Summary

The 2025 Trick-or-Research event once again highlighted the depth and diversity of research within ODU’s Computer Science department. By combining hands-on demonstrations, engaging conversations, and a festive hybrid format, the event provided students with valuable insight into research opportunities and pathways for involvement. Whether attending in person or virtually, participants left with a deeper understanding of the innovative work happening across CS. If you missed this year’s event, be sure to keep an eye out for the next Trick-or-Research! 🎃💻

-- Pasindu Thenahandi (@Psnd_Snklp) --

Fellow Reflection: Darcy Ruppert / Digital Library Federation

This post was written by Darcy Ruppert, who attended the 2025 DLF Forum as a Public Library Worker Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. The 2025 Public Library Worker Fellowship was supported by Platinum sponsor AM Quartex.

Darcy Ruppert is an archivist and librarian living and working in the greater Seattle area. Since receiving their MLIS from the University of Washington in 2023, Darcy has worked in various museums, special collections, and community archives, specializing in the digital preservation of audiovisual collections. Their professional work has been defined by a commitment towards democratizing access to digital archives themselves and the tools of digital preservation. They are currently managing King County Library System’s Memory Lab, a Mellon Foundation-funded community oral history project with a mission to record, preserve, and share the stories of King County.

I am so grateful to have had the opportunity to attend my first DLF Forum as a 2025 Public Library Worker Fellow. In my current work as project manager for a new community oral history project based out of a large public library system, I often feel somewhat separate from the day-to-day of the rest of my organization. The work that I do connects to and is born out of the work of the library, but the unique nature of the program within my organization means that I am often working through problems and making decisions on my own. For this and many other reasons, it was refreshing to be surrounded by a community of my professional peers who are facing similar challenges and grappling with similar emergent questions, both practical and philosophical, within our field.

The first day’s opening plenary by Dr. Kay Coghill provided, for me, a grounding moment at the very beginning of the Forum. Dr. Coghill shared an entreaty for all of us, as digital library professionals and as human beings, to use our platforms, skills, and resources to mitigate harms to members of systemically marginalized groups in digital spaces. Their incisive talk made me think about my own positionality in digital spaces, and drove me to reflect on my own professional decisions and the cascading effects these (at times, seemingly small) decisions can have. In the work that we do, I think it’s easy to become complacent, to lose sight of the fact that we hold power to protect and, by the same token, harm, when we introduce new solutions without appropriate testing, community consultation, and evaluation. I think there’s a real danger in our field of focusing so much on integrating new technologies and providing “results” to our institutions that we may passively introduce real risks into the lives of our users and members of our greater community.

With Dr. Coghill’s talk framing the rest of the Forum for me, I think I appreciated the sessions I attended with a rejuvenated perspective towards care and stewardship. I thought about the race to adopt “industry standard” digital preservation tools, and what we lose when we fail to properly evaluate these tools for the unique needs of our organizations and user groups. At the University of Denver’s session on curating digital exhibits, I reflected on our ability as cultural heritage stewards to uplift and uncover marginalized voices rather than bend to dominate cultural narratives. At a session on AI-powered transcription, I considered the balance of risk and reward that is inherent to the world of LLMs. I’m thankful for all of the speakers who generously shared their work at the Forum, and for the opportunity to reflect on these important issues alongside my peers.

The post Fellow Reflection: Darcy Ruppert appeared first on DLF.

Author Interview: Kelly Scarborough / LibraryThing (Thingology)

LibraryThing is pleased to sit down this month with Kelly Scarborough, who makes her authorial debut this month with Butterfly Games, a historical novel set in the Swedish royal court during the early 19th century. After working for two decades as a law firm partner and white-collar prosecutor, Scarborough returned to her interest in historical fiction and her love of writing, determined to tell stories about fascinating women who lived through challenging times. Scarborough sat down with Abigail this month to discuss her new book, due out later this month from She Writes Press.

Butterfly Games is based on a true story, and its heroine, Jacquette Gyldenstolpe, on a real person. Tell us a little bit about that story and how you discovered it. What made you feel that it needed to be retold?

Like so many turning points in my life, Butterfly Games began with a book. As a teenager, I fell in love with Désirée, Annemarie Selinko’s novel about Désirée Clary—the silk merchant’s daughter who was once engaged to Napoleon and later became Queen of Sweden. I read it over and over, fascinated by how a woman could be swept into history by forces she never chose.

Years later, during a difficult period in my life, that novel came back to me. I began researching Désirée’s descendants—the Bernadotte dynasty, which still reigns in Sweden today—and uncovered a world of political upheaval, fragile alliances, and private heartbreak. That’s when I stumbled across Jacquette Gyldenstolpe.

Jacquette appears in the historical record mostly as a scandal: a young countess who fell in love with Prince Oscar, the heir to the throne. But the more I read—letters, memoirs, court gossip—the more I realized how much of her story had been left untold. She wasn’t just a footnote in someone else’s rise to power. She was a young woman navigating impossible choices in a world where love could threaten a dynasty.

Once I found her, I couldn’t look away. I knew her story needed to be retold.

What kind of research did you need to do, while writing the book, and what were some of the most interesting things you learned in that process?

Can you see me smiling? I don’t think I’m capable of separating the research I needed to do from the research that simply called to me and took over my brain.

Over the course of several years, I spent more than eighty nights in Sweden, translated hundreds of handwritten letters, and built a chronology with more than five thousand entries to track who was where, with whom, and why. Jacquette’s world became a place I loved to inhabit. One day stands out above all others. I was granted special access to Finspång Castle, Jacquette’s childhood home—now a corporate headquarters, a place closed to the public. No photographs were allowed, so I took frantic notes on my phone as we walked through the women’s wing. In a sitting room, I noticed a small mother-of-pearl nécessaire—a sewing and writing box with tiny compartments for her most personal objects. It stopped me cold. My guide, a retired corporate executive who knew the house intimately, leaned in and whispered, “Jacquette’s.”

The box had been a gift from Jacquette’s husband, Carl Löwenhielm. That moment—imagining her hands opening it, choosing a needle or a quill knife—changed the direction of the book.

Suddenly, Jacquette wasn’t a scandal or a symbol. She was real.

Your book has been described as a good fit for admirers of Philippa Gregory and Allison Pataki. Did the work of these authors, or others, influence you when writing your story?

Absolutely—though in different ways. Philippa Gregory is a master of taking a story with a known, often tragic ending and making it feel suspenseful and intimate. I admire how she builds emotional momentum even when readers think they know what’s coming. Two of my favorites are The Kingmaker’s Daughter and her most recent novel, Boleyn Traitor.

Allison Pataki has also been influential, particularly in how she blends rigorous research with accessible storytelling. I love the smart, resourceful heroines she creates from women who otherwise might be lost to history. Her work reminds me that historical fiction can be immersive without being intimidating—and romantic without losing its seriousness. Both my book clubs loved Finding Margaret Fuller, and I did, too.

You’ve had a full career as a lawyer and prosecutor, before turning to writing. How has that work informed your writing and storytelling?

Don’t get me wrong, I had a lot to learn before writing a novel, but some of the things I loved about law proved useful for writing historical fiction. Law trained me to think in terms of evidence, motive, and connections. When you’re preparing a case, you assemble fragments—documents, testimony, inconsistencies—and shape them into a coherent narrative that persuades a jury.

Writing historical fiction isn’t so different. The facts matter deeply, but facts alone don’t tell a story. You have to decide what belongs at the center, what remains in the background, and where the emotional truth lives. My legal background also made me comfortable sitting with ambiguity. History is full of unanswered questions, and I don’t feel the need to resolve every one neatly. Sometimes what’s most compelling is what can’t be proven.

Tell us a little bit about your writing process. Do you have a particular routine—a time and place you like to write, a particular method? Do you plot your stories out ahead of time, or discover how they will unfold as you go along?

When the stars align, I retreat early in the day to the attic office of my nineteenth-century house in Connecticut, take my Shih Tzu upstairs with me, and leave the modern world behind. I wrote Butterfly Games in nine drafts. There was an outline, but I changed the plot in significant ways as I went along. For the sequel, I’m trying to be a little more disciplined. I started with an outline—but found myself getting too granular—so I switched to ninety old-fashioned index cards. Each card holds one scene: chapter number, date, setting, point-of-view character, and the scene’s pivot point. There’s barely room left for anything else, which forces clarity. I transcribed those cards into Scrivener, and now I’m writing. We’ll see how closely I stick to the plan.

What comes next? Are you working on any additional books?

Yes. Butterfly Games is the first novel in a planned series. The second book picks up after the events of the first and follows Jacquette and Oscar into a far more dangerous phase of their lives—when love has consequences, secrets carry weight, and survival requires choices that can’t be undone.

Tell us about your library. What’s on your own shelves?

My physical library is filled mostly with historical fiction, especially novels with complex, non- linear structures. I return again and again to Hamnet and The Marriage Portrait, as well as The Time Traveler’s Wife and Pure.

On a special shelf, I keep books connected to Jacquette’s world—like Désirée and The Queen’s Fortune—alongside more than a hundred antique Swedish memoirs and histories, many written by people who actually knew Jacquette.

And for bedtime? A Kindle packed with historical romance by Sarah MacLean, Julia Quinn, Tessa Dare, and Lisa Kleypas.

What have you been reading lately, and what would you recommend to other readers?

For lovers of royal historical fiction, Boleyn Traitor is a must-read. I was also lucky enough to read an advance copy of It Girl, which I loved.

My favorite read last year was Broken Country—a deeply emotional novel with one of those intricate narrative structures that stays with you. In fact, I want to read it again.

Weekly Bookmarks / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 The General Strike

A general strike is when working people refuse their labor until demands are met • Research shows We need 3.5% of the population, OR 11 million Americans, to be successFul • The STRIKE CARD below tracks our progress so we all know When it’s time to strike •

🔖 A Metabolic Workspace

Your Second Brain has little somatic metadata. It’s disembodied text floating in a void, stripped of the rich contextual markers that would actually help you remember and use it. When you try to retrieve something from your Obsidian vault, you’re searching keywords. When you try to retrieve something from your actual brain, you might think “that thing I was reading when I was annoyed at that airport” and the whole cluster of associated memories lights up.

We’ve been treating digital notes like they’re interchangeable with mental notes, when they’re actually a much degraded format.

🔖 User Interface by Kent Beck

Here are the beginnings of a pattern language for user interface design. These patterns drive the initial phases of design. They key off of Story, a pattern from the early development language. The patterns are:

User Decision
Task
Task Window

I’m indebted to the user interface pattern mailing list and to Ward Cunningham for the seeds for these patterns.

🔖 How to disable Gemini on Android, Gmail, Chrome, Photos, & Google apps. Opt out of AI tracking now!

Earlier this year, we warned you to turn off Gemini on Android because Google decided that its AI would have access to its users’ apps - even if they had previously turned tracking for Gemini Apps Activity off. So regardless of whether you told Google’s Gemini not to track you in the past, its AI tool might be able to run tasks like send WhatsApp messages, set timers, and even make calls on your Android device. Worryingly, Google’s AI invasion does not stop there. Now, Google has turned on settings like smart features, without your consent, and Gemini can scan your Gmail emails, Photos, Drive, and other apps. In this guide, we show you how to turn off Gemini on Android and all your Google apps – Gmail, Chrome, Photos, and more. Protect your privacy and your data from invasive AI tracking

🔖 /e/OS

/e/OS is an open-source mobile operating system paired with carefully selected applications. They form a privacy-enabled internal system for your smartphone. And it’s not just claims: open-source means auditable privacy. /e/OS has received academic recognition from researchers at the University of Edinburgh and Trinity College of Dublin.

🔖 Digital Sufficiency in Data Centers : Studying the Impact of User Behaviors

The Information Technologies (IT) industry has an increasing carbon footprint (2.1-3.9% of global greenhouse gas emissions in 2020), incompatible with the rapid decarbonization needed to mitigate climate change. Data centers hold a significant share due to their electricity consumption amounting to 1% of the global electricity consumption in 2018. To reduce this footprint, research has mainly focused on energy efficiency measures and use of renewable energy. While these works are needed, they also convey the risk of rebound effects, i.e., a growth in demand as a result of the efficiency gains. For this reason, it appears essential to accompany them with sufficiency measures, i.e., a conscious use of these technologies with the aim to decrease the total energy and resource consumption. In this thesis, we introduce a model for data centers and their users. In the first part, we focus on direct users, interacting with the infrastructure by submitting jobs. We define five sufficiency behaviors they can adopt to reduce their stress on the infrastructure, namely Delay, Reconfig, Space Degrad, Time Degrad and Renounce. We characterize these behaviors through simulation on real-world inputs. The results allow us to classify them according to their energy saving potential, impact on scheduling metrics and effort required from users.

🔖 A data model for Git (and other docs updates)

Hello! This past fall, I decided to take some time to work on Git’s documentation. I’ve been thinking about working on open source docs for a long time – usually if I think the documentation for something could be improved, I’ll write a blog post or a zine or something. But this time I wondered: could I instead make a few improvements to the official documentation?

🔖 Zstandard Compression for WARC Files 1.0

This specification defines a Zstandard-based format for compressed WARC files, as an alternative to the GZIP-based format defined in WARC/1.1 Annex D.

In general, Zstandard can produce significantly smaller files than GZIP while also achieving faster compression and decompression. Zstandard also offers a much wider range of compression levels, ranging from extremely fast compression with a modest size reduction to extremely slow compression with a much better reduction. For files containing many small records, Zstandard dictionaries can be used to reduce file size even further, while still permitting random access to individual records.

🔖 After BowieNet, David Bowie Goes Dark and Shuns Social Media

This is part of a series about the history of BowieNet and David Bowie’s website…

🔖 The Long Heat:Climate Politics When It’s Too Late

A scathing critique of proposals to geoengineer our way out of climate disaster, by the bestselling authors of Overshoot

The world is crossing the 1.5°C global warming limit, perhaps exceeding 2°C soon after. What is to be done when these boundaries, set by the Paris Agreement, have been passed? In the overshoot era, schemes proliferate for muscular adaptation or for new technologies to turn the heat down at a later date by removing CO2 from the air or blocking sunlight. Such technologies are by no means safe; they come with immense risks and provide an excuse for those who would prefer to avoid limiting emissions in the present. But do they also hold out some potential? Can the catastrophe be reversed, masked or simply adapted to once it is a fact? Or will any such roundabout measures simply make things worse?

The Long Heat maps the new front lines in the struggle for a liveable planet and insists on the climate revolution long overdue. In the end, no technology can absolve us of respon

🔖 Edible Perennials for Community Preparedness

Perennials are plants that have a life cycle of at least three years—as opposed to annuals (plants that have to be grown from seed anew each year because they have a single-year lifecycle) and biennials (plants that spend a year growing and then another year producing seed and dying—so they have a two-year life cycle). Some perennial plants are herbaceous (meaning they have tender stalks that may die back in the cold season, then grow again from the root) and others are woody (meaning they have woody stalks that tend to continue growing above soil year after year), In this zine we are mainly discussing edible herbaceous perennials, but we will mention a few woody plants as well.

🔖 DeltaChat

Secure, cross-platform, decentralized super-app messenger.

🔖 Ocean Biodiversity Listening Project

In this project, we attempt to establish a large-scale soundscape monitoring network and characterize ecosystem-specific soundscapes by separating sounds from geophonic, biological, and anthropogenic sources. Based on information retrieval techniques, the acoustic data are transformed into metrics that describe the quality of acoustic habitat, the behavior of soniferous animals, and noise-generating activities. The outcomes will allow managers and stakeholders to use soundscape information to monitor the trends of marine ecosystems and perform data-driven decision making in conservation management.

🔖 MTV Rewind

An indexed collection of MTV videos on Youtube.

🔖 « Il neige. » Alors faites de vos culs des luges.

« Il neige. » Indépendamment de tout cela et de tout ce moi, la neige a toujours été un conversationnel ; un élément de nos conversations communes, parce que même à l’époque où elle n’était pas rare, même à l’époque où elle était attendue, elle advenait et survenait soudainement, nuitamment le plus souvent ou tôt le matin derrière les vitres de nos écoles et de nos yeux encore étourdis de sommeil. Elle arrivait et tout le paysage changeait. La neige est une mutation du paysage et de l’accroche de presque tous nos sens. L’une des rares mutations de la nature qu’il nous est donné d’observer en totalité.

🔖 Sorry, Baby (film)

Sorry, Baby is a 2025 American black comedy-drama film written and directed by Eva Victor, in their directing debut. Starring Victor, Naomi Ackie, Louis Cancelmi, Kelly McCormack, Lucas Hedges, and John Carroll Lynch. The film follows a reclusive college literature professor struggling with depression following a sexual assault.

The film had its world premiere at the 2025 Sundance Film Festival on January 27, where it won the Waldo Salt Screenwriting Award and received widespread critical acclaim. It was released by A24 in selected theaters in the United States on June 27, before expanding nationwide on July 25. The film grossed $3.3 million worldwide against a production budget of nearly $1.5 million. For their acting, Victor was nominated at the 83rd Golden Globe Awards in Best Actress in a Motion Picture – Drama. For their filmmaking, Victor won Best Directorial Debut from the National Board of Review and was nominated for Best Original Screenplay at the 31st Critics’ Choice Awards.

🔖 Deep Learning With Python

Deep Learning with Python, Third Edition makes the concepts behind deep learning and generative AI understandable and approachable. This complete rewrite of the bestselling original includes fresh chapters on transformers, building your own GPT-like LLM, and generating images with diffusion models. Each chapter introduces practical projects and code examples that build your understanding of deep learning, layer by layer.

The third edition is available here for anyone to read online, free of charge.

🔖 Neur is a programming language based on McCulloch-Pitts neurons.

Neural nets have an inclination for recipe crafting that resembles rewriting languages, but do not incur the cost of searching and matching facts against a database, reagents in rules are wired directly between each other like state-machines.

🔖 The future of htmx

htmx is the New jQuery

Now, that’s a ridiculous (and arrogant) statement to make, of course, but it is an ideal that we on the htmx team are striving for.

In particular, we want to emulate these technical characteristics of jQuery that make it such a low-cost, high-value addition to the toolkits of web developers. Alex has discussed “Building The 100 Year Web Service” and we want htmx to be a useful tool for exactly that use case.

Websites that are built with jQuery stay online for a very long time, and websites built with htmx should be capable of the same (or better).

Going forward, htmx will be developed with its existing users in mind.

If you are an existing user of htmx—or are thinking about becoming one—here’s what that means

🔖 The Case for Blogging in the Ruins

Diderot built the Encyclopédie because he believed that organizing knowledge properly could change how people thought. He spent two decades on it. He went broke. He watched collaborators quit and authorities try to destroy his work. He kept going because the infrastructure mattered, because how we structure the presentation of ideas affects the ideas themselves.

We’re not going to get a better internet by waiting for platforms to become less extractive. We build it by building it. By maintaining our own spaces, linking to each other, creating the interconnected web of independent sites that the blogosphere once was and could be again.

🔖 Building software to last forever

This is a great question, and one I have put a lot of thought into, even going so far as to put “Built to last forever” on the landing page. While drafting a lengthly reply I realised that I’d never articulated the design philosophy of Bear to anyone bar friends.

So, without further ado, here are the choices I made in designing and building Bear with regards to longevity…

🔖 Triptych Proposals

Triptych is three simple proposals that make HTML much more expressive in how it can make and handle network requests.

If you are a practical person, you could say it brings the best of htmx (and other attributed-based page replacement libraries, like turbo and unpoly) to HTML. For the more theoretically-inclined, it completes HTML’s ability to do Representational State Transfer (REST) by making it a sufficient self-describing representation for a much wider variety of problem spaces.

🔖 Building a fast website with the MASH stack in Rust

I’m building Scour, a personalized content feed that sifts through noisy feeds like Hacker News Newest, subreddits, and blogs to find great content for you. It works pretty well – and it’s fast. Scour is written in Rust and if you’re building a website or service in Rust, you should consider using this “stack”.

After evaluating various frameworks and libraries, I settled on a couple of key ones and then discovered that someone had written it up as a stack. Shantanu Mishra described the same set of libraries I landed on as the “mash 🥔 stack” and gave it the tagline “as simple as potatoes”. This stack is fast and nice to work with, so I wanted to write up my experience building with it to help spread the word.

TL;DR: The stack is made up of Maud, Axum, SQLx, and HTMX and, if you want, you can skip down to where I talk about synergies between these libraries. (Also, Scour is free to use and I’d love it if you tried it out and posted feedback on the suggestions board!)

🔖 Paper AI Tigers

Making sense of current Chinese AI.

🔖 Disrupting the first reported AI-orchestrated cyber espionage campaign

We have developed sophisticated safety and security measures to prevent the misuse of our AI models. While these measures are generally effective, cybercriminals and other malicious actors continually attempt to find ways around them. This report details a recent threat campaign we identified and disrupted, along with the steps we’ve taken to detect and counter this type of abuse. This represents the work of Threat Intelligence: a dedicated team at Anthropic that investigates real world cases of misuse and works within our Safeguards organization to improve our defenses against such cases.

🔖 The XY Problem

The XY problem is asking about your attempted solution rather than your actual problem. This leads to enormous amounts of wasted time and energy, both on the part of people asking for help, and on the part of those providing help.

🔖 feedtoot

feedtoot is a python script that downloads an RSS or Atom feed, processes the new entries and posts a summary of each as a toot on a Mastodon.

🔖 Getting off US tech: a guide

The United States has become the world’s biggest bully, threatening any country that doesn’t do as it demands with tariffs, and its tech companies are taking full advantage by flexing their muscle and trying to avoid effective regulation around the world. The drawbacks of our dependence on US tech companies have become more obvious with every passing year, but now there can hardly be any denying that where we can pry ourselves away from them, we should make the effort to do so.

🔖 Cloudbreak: Your own cloud in 6 hours

Break up with the Big Tech cloud in this unique 1 x 6hr live training intensive where you are wholly supported in the process of building up your own sovereign, self-hosted and private infrastructure – free from AI, shady data-sharing agreements and prying eyes. Running atop renewable energy on a European Internet backbone, the cloud platform is both tightly locked-down and easy to maintain

AI in the Library, round two (vs ChatGPT and the zeitgeist) / Andromeda Yelton

Once upon a time, I taught a course on AI in the Library in an iSchool. (Check out the reading list and syllabus, which are relevant context for this post.) I haven’t taught it for a while — I got busy with other things, and then ChatGPT dropped and immediately rendered my syllabus obsolete and I wasn’t up for overhauling it.

At least, that was why I wasn’t teaching it in 2022-2024. In 2025 and 2026, there are some pretty different reasons not to be adjunct faculty, particularly in a public university, if you don’t have to be. Like an enormous Project-2025-based, federally directed but decentralized system to dictate syllabi, destroy academic freedom, and fire professors who don’t toe the line. (Examples unfortunately abound, such as University of Oklahoma, Texas A&M, San José State University — the last of these being the university where I taught.)

And in the midst of all this, here’s the (excellent) Texas Tribune reporting on ChatGPT prompts used at Texas A&M to flag course content that might take a perspective on race and gender. And so, with a rising sense of nausea, I may as well see how my syllabus would fare in this brave new world.

First (and startlingly), the prompts only cover very high-level course metadata: course title, description, and learning objectives.¹ The week-by-week descriptions and reading list, or indeed the syllabus as a whole, aren’t covered, so these prompts don’t reach week 2’s case study of an AI metadata generation prototype in a Black photographer’s archive, or week 3 on the broad and heavily researched field of (ironically) algorithmic bias.

And so, the results:

Course Title. Unsurprisingly, “AI in the Library” raises no issues for the state of Texas (though of course chatgpt takes a great many words to be unconcerned about it).

Course Description.

This course covers what artificial intelligence is and critical analysis of AI systems, in the context of real-world library and archival AI applications. Programming skills not required.

To my surprise, chatGPT is pretty okay with this:

Review of value-neutrality

Advocacy or ideological language:
None identified. The description does not include prescriptive claims, affective outcomes, or ideological commitments.

Potential neutrality concerns:
The phrase “critical analysis of AI systems” is standard academic language and does not, on its own, imply advocacy. However, “critical” can be interpreted broadly; specifying analytical methods or perspectives can improve clarity without reducing neutrality.

Overall assessment:
The description is value-neutral, concise, and appropriate, though it is shorter than the requested 75–125 words and could benefit from greater methodological detail.

I was sure that “critical analysis” would get more pushback (how could we possibly want that in higher education in 2026??), but I guess mostly I’m supposed to be more unnnecessarily wordy, and I could keep waving this IYKYK flag.

Learning Outcomes.

Upon successful completion of the course, students will be able to:

Understand and explain the basics of AI: both its underlying principles and common machine learning techniques.

Discuss realistic ways that AI can be a part of library services.

Critically analyze potential pitfalls of AI systems, including the role of their underlying data sets and their ramifications in society.

Y’all. Despite some honestly fair ways chatgpt pointed out that these verbs are squishy and might be replaced with ones that are sharper and/or lend themselves more obviously to assessment, I absolutely slipped “critically analyze…their ramifications in society” right past it:

“Critically analyze potential pitfalls…” — This is largely value-neutral. The phrase “ramifications in society” is acceptable, though specifying analytical contexts (e.g., professional or institutional settings) would improve precision.

Again, let me reiterate that I have a case study on a Black archive. I have a reading list with, inter alia, Meredith Broussard and Catherine D’Ignazio and Safiya Noble and Virginia Eubanks. I have a reading list with Tay (aka “how to make a chatbot unbelievably racist and anti-Semitic in less than 24 hours”) and predictive policing and de-biasing word vectors and feminism and Soylent-Green-based limitations of a major AI ethical framework.

Pretty sure any antagonistic human reading learning objective number 3 there would know what I was getting at, but I guess chatgpt still can’t replace all the jobs.

There’s also a prompt for “course justificiation”, but I was not asked to write one of these for my syllabus. The closest it gets is a “core competencies” section aligning the course with learning objectives for the overall iSchool program. However, these objectives are govened by the iSchool, not the individual professors. Therefore I’ll be leaving the course justification prompt aside. ︎

Beyond Publication: How e-government Platforms Use AI, Blockchain and Feedback Forums to Co-create Better Open Data / Journal of Web Librarianship

DC.creator
Maxat Kassen Political Science, Astana IT University, KazakhstanMaxat Kassen is a Professor of Digital Governance at Astana IT University. He is a former Fulbright Scholar in the University of Illinois at Chicago (USA). His research chiefly focuses on digital governance, blockchain, artificial intelligence and data management. He has published more than 50 academic articles on related topics, including in various international peer-reviewed impact factor journals and the author of four books, titled “Understanding systems of e-government” (Rowman & Littlefield: New York, 2015), “E-government in Kazakhstan: a case study of multidimensional phenomena” (Routledge: London, 2016), “Open data politics” (Springer: Switzerland, 2019) and “Open data governance and its actors: theory and practice” (Palgrave Macmillan: London, 2022).
DC.title
Beyond Publication: How e-government Platforms Use AI, Blockchain and Feedback Forums to Co-create Better Open Data
DC.publisher
Journal of Web Librarianship
DC.date
Sat, 10 Jan 2026 02:55:05 +0000
DC.rights

Scaling research support at Monash University Library / HangingTogether

“Ghost kitchens” are pop-up restaurants geared entirely toward food delivery. They typically rent space in traditional restaurants to prepare food, take orders online, and deliver them to the doorstep via delivery apps like DoorDash or Uber Eats. Ghost kitchens proliferated during the COVID pandemic, which for a time practically extinguished dine-in food service. Restaurants of all descriptions needed to restructure their operations to scale up food delivery as their main service model; ghost kitchens were the extreme example, with the entire service model built around delivery.

The story of ghost kitchens is one of a specific business sector—restaurants—retooling traditional operational structures and service models to meet changing conditions in the marketplace. Gary Pearce, Director, Academic Services in the Monash University Library, touched on a similar theme in a recent OCLC Research Library Partnership webinar, describing how the Library reimagined its operational and service models to scale up research support capacities and better address institutional needs and priorities. As with ghost kitchens, Monash sought to reimagine its services in response to changing imperatives—specifically, the need to deliver research support at scale, within the confines of prevailing budgetary limitations. This situation will surely resonate with other research libraries, and there is much to be learned from Monash’s experiences and innovative solutions.

Retooling operational structures and service models

Academic Services is one of three portfolios at Monash University Library. To address the need to scale research support services and align more closely with stakeholder needs, Academic Services shifted from a traditional liaison librarian model organized on disciplinary lines to a functional specialization approach based on library expertise. This change moved away from multiple teams providing duplicate services to specific disciplines, in favor of agile, project-based service teams that work across disciplines.

A key aspect of Monash’s approach is the creation of a new Library Business Partner role, whose chief responsibility is strategic relationship management with senior leadership in a specific academic area. The Library Business Partner serves as a conduit for two-way communication between the Library and its academic stakeholders: on the one hand, communicating library messaging to the academic unit, and on the other, gathering intelligence and feedback on the unit’s needs and mobilizing capacity within the Library’s service teams to address them.

Pearce provided a rich description of how this retooling of operational structures and service models was conceived and implemented. Here are a few of the themes that emerged from his discussion:

Acknowledging relationship management as a dedicated role: A key innovation was the creation of the Library Business Partner role to manage outreach and engagement with academic units. The Library Business Partner represents the entire Library and therefore can provide a comprehensive view of Library capacities, as well as expedite responses to stakeholder needs. Separating relationship management from service delivery facilitated a shift from a reactive, transactional model to a more proactive, two-way partnership.
Emphasizing a culture of agility: Building a service model that was both scalable and responsive led Monash to adopt an agile approach. Academic Services implemented a matrix organizational structure in which staff have fixed reporting lines with flexible membership across multiple service teams—including research support. Staff have the option to rotate across teams to deepen expertise and experience. Work is divided between “business as usual” work and project work, the latter of which can be scaled up or down as needs and resource availability dictate. While this new operational structure could pose challenges to long-standing professional identities tied to traditional service models, it also opens up new pathways to leverage existing areas of expertise and develop new ones.
Close attention to change management: The new operational structures were a significant departure from previous models. In recognition of this, the development and implementation processes were characterized by consultation, transparency, and communication, including a series of consultative visits to peer institutions facing similar challenges in adapting service development and delivery; presentations to stakeholder groups; regular updates to staff, along with clear milestones and timelines; open channels for questions and feedback; and planning for professional development needs.

These are just some of the key themes that provide the foundation for Monash Library’s story of transformation, scalability, and responsiveness.

Managing implementation is crucial

Pearce’s presentation elicited many questions from the audience in attendance. Collectively, the questions reflected a keen interest in the implementation aspect of the shift to new operational structures and service models, touching on issues like:

Stakeholder response and buy-in: how researchers and staff reacted to service model changes; channels for communication and feedback
Staffing implications: impact of restructuring on staffing counts; work allocations between “business as usual work” and project work; cross-training opportunities
Strategic relationship management: interest in the details of how the new Library Business Partner role works in practice

The audience’s interest in these topics highlight that a shift from a traditional/subject-focused service model to a functional/specialization model requires attention to both structural innovation and the staffing and stakeholder reactions to significant organizational change.

Additional reading

Pearce’s webinar intersects with several OCLC Research studies that complement some of the themes from the emerging from the service model transformation experience. First, check out our work on social interoperability, which we define as the creation and maintenance of working relationships between individuals and organizational units within an institution. Our report describes strategies and tactics that can help strengthen social interoperability skills—an essential element of roles like Monash’s Library Business Partner. In addition, OCLC Research’s forthcoming work on the Library Beyond the Library—an operational principle that emphasizes the importance of the library engaging with the broader institutional environment through strategic alignment, collaboration, and storytelling—connects with Monash’s ambitions to retool its service model to better align with institutional research needs and priorities.

Ready to dive deeper? Listen to the full recording

The webinar and subsequent Q&A offered a richly informative look behind the scenes of a major shift in operational structures and service models to better address the needs of stakeholders. If you didn’t have a chance to join us for the live webinar, please take some time to view the recording. Many thanks to Gary Pearce for sharing his perspective with all of us!

The post Scaling research support at Monash University Library appeared first on Hanging Together.

Fellow Reflection: Cláudia De Souza / Digital Library Federation

This post was written by Cláudia De Souza, who attended the 2025 DLF Forum as the Grassroots Archives and Cultural Heritage Workers Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR.

Cláudia De Souza is an associate professor at the College of Communication and Information at the University of Puerto Rico. She teaches in the Graduate Program in Information Science, focusing on information organization and retrieval, with particular emphasis on the analysis, evaluation, and design of digital libraries and archives. She is also the academic coordinator of the UPR Caribe Digital project, an initiative dedicated to advancing Digital Humanities research and scholarship in and for the Caribbean. She advocates for and fosters open access to knowledge, supports digital preservation, and promotes the dissemination of documentary heritage. Her work is driven by a commitment to enhancing the visibility and accessibility of information resources across the Insular Caribbean.

Attending the 2025 DLF Forum for the first time was an exceptional experience, especially as the sole fellow selected for the Grassroots Archives and Cultural Heritage Workers program. I arrived with high expectations, and the event exceeded them, offering a rich environment for learning, collaboration, and professional growth. I had the chance to meet and engage with attendees from a wide range of institutions and backgrounds, including students, emerging professionals, librarians, and archivists. I am especially grateful to the selection committee for choosing such a diverse group of fellows, which I truly appreciate. This experience of connection and dialogue with new voices gave me the opportunity to interact with a diversity of approaches in professional and academic practice, as well as the value of building collaborative networks.

One of the challenges was choosing among the multiple sessions happening simultaneously over the three days. I decided to focus on the topics most closely related to my work at the University of Puerto Rico and that I am most passionate about in library and information science: open access, metadata, information organization and retrieval, community archives, and the visibility of digital collections.

Among the new tools and approaches I explored, the Marriott Reparative Metadata Assessment Tool (MaRMAT) stood out, an open-source application designed to support reparative metadata evaluations and processes of repair and justice in information. It allows librarians to identify harmful, outdated, or otherwise problematic language in tabular metadata using pre-curated or custom lexicons. I definitely plan to explore its use in the work we are developing with community groups in Puerto Rico, as part of the UPR Caribe Digital project. I was also inspired by the presentation of Krystyna Matusiak, which demonstrated how digital collections can be expanded through exhibits that highlight the stories of underrepresented communities. Digital curation not only extends the reach of archives but also provides new opportunities to tell stories in an inclusive and meaningful way, showing how information organization can impact representation and access to collective memory. I leave the Forum inspired and motivated to apply these insights as part of the Digital Humanities initiatives we are planning at my institution, and it will serve as a model to follow.

I am grateful for the opportunity to participate and look forward to continuing these conversations and collaborations. I also hope that next year I will have the chance to submit a presentation to share with others all that I have put into practice. See you at DLF 2026!

The post Fellow Reflection: Cláudia De Souza appeared first on DLF.

Fellow Reflection: Amaobi Otiji / Digital Library Federation

This post was written by Amaobi Otiji, who attended the 2025 DLF Forum as a Student Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. 2025 Student Fellowships were supported by a grant from MetaArchive.

Amaobi Otiji is pursuing his Master of Information at Rutgers University concentrating in the Technology, Information, and Management pathway. Prior to entering this program, Amaobi earned a bachelor’s degree in history from Howard University and has worked in roles involving federal collections, both digitized and born-digital. His professional interests center on digital curation, metadata development, and exploring new approaches to preserving and sharing underrepresented histories. He is focused on increasing equitable access to information and helping to shape how emerging technologies influence our cultural memory. In his spare time, Amaobi enjoys playing baritone ukulele, attending live theater, and playing video games.

Digital Memory Work Across Regions and Histories: Reflections on Community-Driven Projects

As I attended this year’s DLF Forum, I kept returning to two themes that resonated with me across multiple different presentations: community engagement and the quiet connective work that builds the infrastructure for it. Two of the sessions I attended during the conference stood out to me in particular because they approached these ideas from different perspectives but utilized similar underlying practices for catering their projects to their communities’ needs. These sessions were about the HBCU Digital Library Trust and the Borderlands storytelling initiative. Both of these projects work with different kinds of communities, rooted in distinct histories spread across North America. Yet all of them demonstrated how effective digital stewardship can be when community engagement is built into the planning of a project rather than treated as a final step at the end.

The Historically Black Colleges and Universities (HBCU) Digital Library trust session outlined a fantastic model that was centered on providing long term support for HBCUs and their unique archival histories. Their emphasis on shared ownership reflected a great understanding of these institutions and their historical challenges trying to navigate limited resources, public scrutiny, and a society that too often worked against them. What stood out to me in particular was how intentional their approach felt. Rather than expecting HBCUs to adapt to them, they adapted to the HBCUs by meeting them where they were. The Trust, hosted by the Atlanta University Center’s Woodruff Library, focused on building capacity in ways that supported institutional autonomy and reflected the needs of the communities they were trying to serve. Their model felt refreshing, informed and grounded in culturally informed practices that support long term institutional resilience.

The Borderlands storytelling session approached their community engagement from another direction. Their work seemed shaped by the layered histories and cultural dynamics found in the U.S.-Mexico Borderlands and by the University of Arizona’s position as a Hispanic-Serving Institution (HSI) with ties to the region. Their presentation drew from movements, identities, and complex narratives that define the region as well as from the data intensive methods they were using to support the work. They spoke about their efforts mapping, visualizing, and other forms of data storytelling that were central to how researchers were interpreting that complexity. What especially stood out to me was how they treated their approach to “place” as more than a backdrop. It functioned as a structure that was allowed to shape the research itself and set the terms for how they partnered with their researchers. It felt rooted in the region in a way that kept the work responsive so that it could move across research, teaching, and student engagement while still staying grounded in the histories and contexts that give it direction.

Across both of the sessions, I found myself thinking about how our community and networks shape our digital memory work every day. The HBCU Digital Library Trust and the Borderlands initiative each operate within distinct historical and cultural environments, yet they are undeniably interconnected through their commitment to engagement shaped by the histories and needs of the communities they serve. Together, these sessions were a great illumination of how digital memory work is always anchored somewhere and shaped by places, relationships, and shared histories that give it meaning. For those hoping to steward this work, our role is to listen closely enough that those anchors guide the paths we build.

The post Fellow Reflection: Amaobi Otiji appeared first on DLF.

Meta: Post #1000 / David Rosenthal

This is the one thousandth post to this blog in the 212 months since the first post. That is an average of 4.7 posts per month, or just over one per week, which is my long-term goal for the roughly half my time that isn't taken up with grand-parenting.

Some posts are a lot of work, and take more than a week. Major talks, such as The Gaslit Asset Class or Lessons from LOCKSS typically represent a month's work, as do long posts such as Sabotaging Biitcoin, Drones or The Dawn Of Nvidia's Technology.

The 1000 posts have gained over 6.88M page views, 7.6% of which were for my EE380 Talk. Less publicized but popular posts get around 30K page views, well above the 6.9K average.

The only one of these statistics that I care about is the goal of a post a week. Having an audience is nice when it happens, but that's not why I'm writing. I write for myself, to understand not necessarily to communicate. Despite this, I'd like to thank those who read and comment.

January 2026 Early Reviewers Batch Is Live! / LibraryThing (Thingology)

Win free books from the January 2026 batch of Early Reviewer titles! We’ve got 227 books this month, and a grand total of 2,976 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.

If you haven’t already, sign up for Early Reviewers. If you’ve already signed up, please check your mailing/email address and make sure they’re correct.

» Request books here!

The deadline to request a copy is Monday, January 26th at 6PM EST.

Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to Canada, the US, the UK, Australia, Belgium, Netherlands, Poland, Latvia, Lithuania, Luxembourg and more. Make sure to check the message on each book to see if it can be sent to your country.

Thanks to all the publishers participating this month!

Alcove Press	Anchorline Press	Autumn House Press
Baker Books	Bellevue Literary Press	Bigfoot Robot Books
Cozy Cozies	Egret Lake Books	Entrada Publishing
Femficatio Publishing	First Person Press	Gefen Publishing House
Henry Holt and Company	HTF Publishing	Legacy Books Press
Marina Publishing Group	NeoParadoxa	NewCon Press
Paper Phoenix Press	Prolific Pulse Press LLC	PublishNation
Real Nice Books	Rootstock Publishing	Running Wild Press, LLC
Shadow Dragon Press	Sunrise Publishing	Tundra Books
University of Nevada Press	University of New Mexico Press	Unsolicited Press
UpLit Press	Vibrant Publishers	W4 Publishing, LLC
WorthyKids

DLF Digest: January 2026 / Digital Library Federation

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here.

Hi DLF Community! I’ve really enjoyed connecting with so many of you at the Forum and in Working Group meetings over the past few months. Every conversation, formal and informal, has left me inspired and grateful to be part of a community fueled by care, creativity, and collaboration. I’m excited to keep learning with you all and stay in touch. If you want to chat one-on-one, just send me an email anytime (swillis@clir.org). Wishing you an encouraged, grounded start to 2026!

— Shaneé from Team DLF

This month’s news:

New DLF Working Group: The new Open Source Capacity group kicks off at the end of this month with a meeting on January 28. Learn more about this group and join their inaugural meeting.
Closing Today, Call for Proposals: Digital Library Pedagogy Group (DLF Teach) has extended its toolkit CFP through end of day today; submit here.
Metadata Quality Benchmarks Announced: DLF-AIG Metadata Assessment Working Group (MWG) is pleased to announce the public release of benchmarks for metadata quality. Read about them on the DLF Blog.
IIIF Call for Proposals: IIIF’s annual conference will be held June 1-4 in the Netherlands. Review and submit to the CFP here by January 23.
CLIR Call For Proposals: CLIR is accepting applications for the thirteenth cycle of Recordings at Risk until February 24, 2025. See details here.
Registration Open, Program Available: IIIF’s Online Meeting will take place January 27-29; see the full program on the IIIF website, and register on Eventbrite .
Fellow Reflections: Starting tomorrow (January 6), look for reflections from the 2025 DLF Forum fellows on the DLF blog. Read past fellow reflections here.
Office closure: CLIR’s offices are closed on Monday, January 19, for Martin Luther King Jr. Day.

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.

DLF Digital Accessibility Working Group (DAWG): Tuesday, 1/6, 2pm ET / 11am PT
DLF Born-Digital Access Working Group (BDAWG): Tuesday, 1/6, 2pm ET / 11am PT
DLF AIG Cultural Assessment Working Group: Monday, 1/12, 1pm ET / 10am PT
AIG User Experience Working Group: Friday, 1/16, 11am ET / 8am PT
DLF Digitization Interest Group: Monday, 1/26, 2pm ET / 11am PT
DLF Committee for Equity & Inclusion: Monday, 1/26, 3pm ET / 12pm PT
DLF Climate Justice Working Group: Tuesday, 1/27, 1pm ET / 10am PT
NEW GROUP! DLF Open Source Capacity Group: Wednesday, 1/28, 1pm ET / 10am PT
DLF AIG Metadata Assessment: Thursday, 1/29, 1:15 pm ET / 10:15 am PT
DLF Digital Accessibility Policy & Workflows subgroup: Friday, 1/30, 1pm ET / 10am PT

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member organization. Learn more about our working groups on our website. Interested in scheduling an upcoming working group call or reviving a past group? Check out the DLF Organizer’s Toolkit. As always, feel free to get in touch at info@diglib.org.

Get Involved / Connect with Us

Below are some ways to stay connected with us and the digital library community:

Subscribe to the DLF Forum newsletter.
Join, start, or revive a working group and browse their work on the DLF Wiki.
Subscribe to our community listserv, DLF-Announce.
Bookmark our Community Calendar.
Learn more about becoming a DLF member organization.
Follow us on LinkedIn and YouTube.
Contact us at info@diglib.org.

The post DLF Digest: January 2026 appeared first on DLF.

“All you have to do is publicly execute a few women who have lied”: Mapping Online Misogyny and Feminist Digital Counterprotest in the Post-Pandemic Landscape / Nick Ruest

2025 Blog Year in Review + Brief Update / Cynthia Ng

I’m a few days late with this post. Things have been busy with visiting family for the holidays, and we’ve been dealing with various house things since we moved. Of course, those are just excuses, since I’m managing to write this while I have a cold. Some people may have noticed that I didn’t write … Continue reading

The double life of being chronically ill at work, slow librarianship, and checking in as an expression of care / Meredith Farkas

Developing a long-term illness, whether chronic or acute, is like being dropped into a country completely unfamiliar to you. You don’t know the language, the customs, the cuisine, the people. You feel alone, isolated, and totally out of your depth. Eventually, you start to learn the language, the customs. You find community, fellow travelers, people who can help you understand your new life better. It doesn’t stop being hard, but the learning curve becomes less steep and the isolation less intense.

However, unlike when you’re immersed in a new country and culture, you’re falling into this new place and experiencing that painfully slow acculturation while you’re trying to still live your regular life in parallel. You’re expected to be a good parent, partner, family member, friend, employee, housekeeper, bill-payer, etc. But you’re living in two different realities now and because of that, it’s easy to feel alienated from your regular life, especially if you don’t feel like you can bring that other part of yourself to your interactions at work, at home, or out with friends. The cognitive dissonance can be jarring.

It’s hard enough to live that double life, but adding in the vagaries of seeking out a diagnosis and often not being believed, not to mention coping with the symptoms themselves, can make life feel completely untenable. Before my autoimmune diagnosis, I spent more than a year seeing medical professionals who didn’t believe there was anything wrong with me other than the normal discomforts of aging. I kept asking doctors if they thought my symptoms could be autoimmune and was told “no” over and over again, though that felt wrong to me. One PA suggested that some of my symptoms might stem from anxiety since I have a history of anxiety (like I wouldn’t know at this point what anxiety feels like). Within five minutes of talking with the first rheumatologist I saw, after waiting five months for the appointment, he said to me “this doesn’t sound rheumatological at all.” Luckily he still did all the standard testing which showed that he was very wrong. But not being believed by so many doctors for so long stays with you. It leaves a scar. Every time I see a doctor now, I feel like I’m going to court and I’m ready to be cross-examined, to be picked apart. I’m a bundle of nerves.

And my experience is painfully common, especially for women, as poet Meghan O’Rourke writes in her amazing book about chronic illness, The Invisible Kingdom (a meditation on and journalistic exploration of chronic illness and how it is positioned in our social fabric):

And so it is a truth universally acknowledged among the chronically ill that a young woman in possession of vague symptoms like fatigue and pain will be in search of a doctor who believes she is actually sick. More than 45 percent of autoimmune disease patients, a survey by the Autoimmune Association found, “have been labeled hypochondriacs in the earliest stages of their illness.” Of the nearly one hundred women I interviewed, all of whom were eventually diagnosed with an autoimmune disease or another concrete illness, more than 90 percent had been encouraged to seek treatment for anxiety or depression by doctors who told them nothing physical was wrong with them. (p. 103)

Once I got on meds for my condition that finally started working (most first-line meds for autoimmune conditions take three months on average to produce any effects on the immune system), I thought I was past the worst of it. Other than occasional much smaller flares, I was essentially in remission. I learned my limits. I protected my spoons with my life. I kept my stress low. I felt like I had it figured out. I felt good even. And then I got sicker with new symptoms that have stolen so much from me, including my sleep. The past 10 months have honestly been a nightmare with a carousel of doctors who all have completely different theories of what is going on and with a condition that is constantly evolving so what they see in the moment isn’t the full picture. And each doctor I’ve seen hyperfocuses on something different and ignores every other aspect of my case. It’s all making me feel like I’m going crazy.

Each wrong diagnosis brought me to another country, another reality, another identity, that I lived in for a short while. And in each of these countries, I spent countless hours learning, learning, learning all I could, going down research and subreddit rabbit holes, and spending way too much money on products that did nothing because none of those diagnoses were correct. The dermatologist I see who specializes in autoimmune conditions at a research university seems to have given up even trying to diagnose me and she’s basically my last best hope in the state. She wouldn’t even give me differential diagnoses last time beyond “it’s clearly autoimmune given your systemic symptoms.” She’s the one who has put me on a serious immunosuppressant with debilitating side effects, which I guess at least shows she’s taking it seriously, but if she doesn’t know what she’s treating me for, how does she know that this drug that is making me feel terrible is even going to help? Atul Gawande said of doctors that “nothing is more threatening to who you think you are than a patient with a problem you cannot solve” (quoted in The Invisible Kingdom, page 209) and I feel that in how I’m treated at every appointment I go to. With the exception of the charlatan immunologist who was desperate to diagnose me with MCAS though there was no evidence in support of it, not one doctor has seemed at all interested in figuring out what this is – it’s felt more like a game of “not it.”

Of a similar liminal moment in her own illness, O’Rourke wrote, “in my illness I was moored in an unreachable northern realm, exiled to an invisible kingdom, and it made me angry. I wanted to rejoin the throngs. In dark moments I continued to wonder if the wrongness was me” (99). And of course people will feel like that wrongness is them when we live in a culture that views chronic illness as some sort of weakness or something we caused through through our own bad habits. Like O’Rourke, I feel both exiled from my world and forced to be in it at the same time, which is a unique form of torture. Going to work, pretending things are ok, doing my job, meeting deadlines, helping students, smiling, all the while my body is attacking itself, I’m barely sleeping, I’m spontaneously bleeding from my skin and under my skin, and I’m so itchy I sometimes have to wear gloves to bed or I’ll scratch myself raw in my sleep. You feel like you’re play-acting being yourself, being a person in the world, because you’re not really there anymore. And when you’re suffering, and don’t know what your illness is, and you feel abandoned, it’s easy to go down rabbit holes of self-loathing along with those rabbit holes of fruitless research that make you feel like an unhinged obsessive with a murder board and yarn. As Meghan O’Rourke wrote, “your sense of story is disrupted” (p. 259) and you feel like a stranger to yourself.

I started writing about slow librarianship long before I got really sick, but even then, I knew the importance of fostering a work culture where you can be a whole person. I knew how it felt to have a child and feel like you couldn’t prioritize family obligations over work ever (though working during family time? Totally ok, right?). In a workplace that encourages people to be whole people, workers feel like they can prioritize the things in their lives outside of work that are important – their caregiving responsibilities, their health, the people they love, etc. They feel like they can talk about these things – that they don’t make them liabilities. They can be vulnerable and real. And feeling like you can be vulnerable and real about who you are and how you’re doing means that you can also be vulnerable and real in your work, which makes us better employees who are energized to try new things.

I think a lot of people in positions of power might even want a culture like this, but very few actively create it. They might think that saying “take what time you need” when someone is facing illness is enough. But I think two pieces are missing from this. First, managers need to not only say “take what time you need” but work with their direct reports to address the work that would otherwise pile up. If you say “take all the time you need” but all the work with its stressors and deadlines is still there, you’re not really giving people space. Can you take good care of yourself while you watch the work pile up and up and up? How many of us have come back to work while still not fully recovered from an illness because of the work that was piling up or a class they needed to teach?

Also, managers need to model vulnerability, transparency, and being whole people themselves. If they put up a false front of strength, if they’re not willing to be vulnerable and human and real themselves, if they do not model transparency, there’s no way that others will feel safe doing so. I was lucky to have a boss in my first academic library job who was deeply human in her interactions with her employees, so I got to see what that looked like. And it was her humanity that engendered fierce loyalty in her employees – we all thought the world of her. Even when she made decisions that people didn’t like (which was rare as she really did take our insights to heart), she explained her thinking in a transparent way. Given my later experiences, her way of being feels vanishingly rare. I think a lot of managers feel like they need to project strength, not explain their decisions, not let their direct reports really know them as people with full lives, but I don’t think that’s true. A lot of managers operate out of a place of fear or insecurity, but my first academic library director was confident enough to be her full self, flaws and all.

In a culture where we don’t feel like we can bring ourselves fully to work, I don’t feel like I can talk about my illness. I feel selfish and weak for even considering it. Like, we all have shit going on, right? The world is pretty awful right now. People’s lives are complicated and messy and there’s probably a lot of suffering I know nothing about happening all around me. If they don’t talk about it, who am I to talk about it? I’m not special. While I’ve mentioned being sick at work in the context of being immunocompromised and needing to protect myself and not participate in large, crowded events, even that has felt really uncomfortable. Everyone should feel like they are important enough to their places of work and valued enough to bring up these things without feeling embarrassed or like they’re asking for “special treatment.” I read recently (can’t remember the source) that close to 60% of people with chronic illnesses have not told people at work about it. Imagine hiding something that is such a pivotal and ineluctable piece of many people’s identities and think about the double life that forces them to live.

At work I feel a lot of shame about being sick and I work more than I should given how I’ve been feeling (this is common). I feel like I need to mask how I’m really doing, that people don’t want to hear it. And it’s true that most don’t. There are three people at work I can talk to about my illness, but others seem so incredibly uncomfortable when I mention it, so I’ve learned to just pretend it’s all ok or say nothing. I know that some of my reticence and shame comes from my own internalized ableism as it’s the water we all swim in, but when I worked in that library where the culture encouraged vulnerability, humanity, and care, I remember how different it felt. How much less distance there was between the person I really was and the person I was at work.

In a meeting last year, our Dean was talking about making the next all-library meeting in-person only. Previously, they had always offered them hybrid, but she didn’t like that most people were choosing not to come in-person. And I totally get it, even though it sucks to always be the outlier. She wants it to be a team-building experience and that’s really hard to do when most people are participating from home with their cameras off. At that meeting, for the first time, I disclosed my illness in front of a bunch of people and talked about how important it is to always offer an online option for folks who are medically vulnerable or at least find ways to make indoor spaces safer for those who can’t afford to get sick. My boss then asked to meet with me to talk about how to make our spaces safer. I talked about airflow, encouraging and providing masks, and, during temperate months, having the meetings at places where windows and doors can remain open or even holding them outside (we’d had one meeting at a park a few years ago which had been the best one ever from both a health and team-building perspective). I suggested that we make the Winter meeting fully virtual since it’s the height of flu season and you can usually get people participating more in a fully virtual meeting than a hybrid one, where the online people feel like weird lurkers. I was thanked for my feedback and didn’t hear anything after that. This September, the all-library meeting was held in our campus library where the windows don’t open (albeit with a couple of HEPA filters scattered around, but I know we do have large college spaces where doors and windows can be opened because I went to an all-day union meeting in one where they did just that and we’d used that space for library meetings in the past) and our February meeting is going to be held in-person during the worst flu season in decades. Obviously, none of this was personal or intended to cause harm, but, at the same time, how should I feel under the circumstances? Clearly speaking out in that meeting, something that I had to really steel myself to do, had been pointless. Why would I ever bring it up or ask for anything again?

When we come back from winter break, people inevitably ask each other how it was, but do they really want to know? I think they want to hear “good,” “fun,” “restful,” etc. How do you talk about a “vacation” in which you spent most of it doubled over in pain after eating anything thanks to the toxic meds you have to take and your father-in-law was in the hospital dying the entire time? I wish that I felt I could share what an absolute shitshow it’s been, to feel like I could be a full person at work, but when you know no one really wants to hear it, when it just makes people uncomfortable, it becomes so much easier to smile and say “it was good!” and move on. Who wants to be a buzzkill?

Slow librarianship puts worker well-being over productivity and deadlines, allows workers to be whole people at work, and supports a culture of care. While a radical idea, this even makes good business sense because depleted and burned out workers have been shown to be a major drain on the organization and negatively impact the culture. If you’re a manager and you’re not actively fostering a culture where people can bring their whole selves to work, then you are fostering a culture where people do not feel safe being vulnerable and having needs. Workers, if you know a colleague is struggling with something (an illness, losing a loved one, a difficult caregiving situation, etc.) and you don’t check in to see how things have been going for them, you’re sending the message that you don’t want to hear about these things, that they make you uncomfortable, that they’re not appropriate to take to work. I think we’ve all probably been guilty of this at some point in our lives and maybe we even thought that not asking was the right thing to do. I can imagine some people think that asking is invasive or reminds the person that they are sick, but it is an expression of care. As Philip Hoover writes in his excellent Sick Times article entitled “You know someone with Long COVID. They need you to ask about it genuinely”

Approach us with empathy and curiosity. Ask questions that show a sincere desire to understand. How are you feeling this week? works because it acknowledges the chronic, fluctuating nature of my health. Or a version of Nunez’s question, and one I’ve longed to be asked since I’ve been ill: What has this been like for you?

If our response is tough to hear, try not to smother it in optimism. And tread lightly. While some long-haulers may appear okay in public, much of our suffering occurs in private, shade-drawn rooms, across lonely afternoons, stuck in bed. When in doubt, remember that the act of asking never hurts — but never being asked certainly does.

While I don’t have Long COVID, this piece so perfectly encapsulates how I feel as someone with a mostly invisible chronic illness who would just love to be asked “how are you feeling this week?” instead of feeling like I have to pretend I’m okay. I told my manager at the start of Fall term that my autoimmune disease had become significantly worse and that I didn’t know what my capacity might look like going forward. She expressed sympathy, told me to take what time I needed, and never checked in with me after that. That September, I was taking over a very important committee chair role that was made enormously more time-consuming and onerous by the departure of the three colleagues most involved in supporting this work (two of whom had more than 20 years of institutional knowledge locked up in their heads). I didn’t feel like I had the leeway or support to let things drop as more and more kept piling up on my plate related to my chair role and it became clear that I was expected to do a lot of onboarding for the new people in this role even though I was new to my role and was no one’s manager. While I know my boss is extraordinarily busy, the message that not checking in with how I was doing sent me was very different from what I assume she’d wanted to convey. Checking in with a colleague or direct report seems like such a small thing, and it is in terms of the effort it requires, but the impact it can have in making someone feel cared for and less like they have to live a painful double life is enormous.

Weekly Bookmarks / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Lou Reed/Laurie Anderson/John Zorn January 10th, 2008 Stone, NYC pt 3

Music for saxophone, electric guitar, violin and electronics recorded by Robert O’Haire mixed & mastered by Eric Kramer & Lou Reed Tzadik issue three

🔖 John Zorn’s Naked City - The Marquee Club, New York City, NY, 1992-04-09

John Zorn’s Naked City April 9, 1992 The Marquee Club, New York, NY pro-shot (a neckey - voltarized upgrade)

Personnel: John Zorn: Alto Sax Bill Frisell: Guitar Wayne Horvitz: keyboards Fred Frith : Bass Joey Baron: drums Yamatsuka Eye: vocals

🔖 Pho & Banh Mi Saigonese

We have been at this location for over 20 years as formerly Saigonese Restaurant. The Wheaton area has changed dramatically over time as our customers and their preferences. Our restaurant became more and well-known good places to enjoy Vietnamese cuisines. After serious considerations, with kind suggestions from our customers, we rebranded to Pho and Banh Mi Saigonese.

🔖 About Standard Ebooks

Standard Ebooks is a volunteer-driven effort to produce a collection of high quality, carefully formatted, accessible, open source, and free public domain ebooks that meet or exceed the quality of commercially produced ebooks. The text and cover art in our ebooks are already believed to be in the U.S. public domain, and Standard Ebooks dedicates its own work to the public domain, thus releasing the entirety of each ebook file into the public domain. All the ebooks we produce are distributed free of cost and free of U.S. copyright restrictions.

🔖 The Rime of the Ancient Maintainer

Every culture produces heroes that reflect its deepest anxieties. The Greeks, terrified of both mortality and immortality, gave us Achilles. The Victorians, haunted by social mobility, gave us the self-made industrialist. And Silicon Valley, drunk on exponential curves and both terrified and entranced by endless funding rounds, has given us the Hero Developer: a figure who ships features at midnight, who “moves fast and breaks things,” who transforms whiteboard scribbles into billion-dollar unicorns through sheer caffeinated will.

We celebrate this person constantly. They’re on the front page of TechCrunch et al. They keynote conferences. Their GitHub contributions get screenshotted and shared like saintly relics.

Meanwhile, an unsung developer is updating dependencies, patching security vulnerabilities, and refactoring code that the Hero Developer wrote three years ago before moving on to their next “zero to one” opportunity.

They will never be profiled in Wired.

But they’re doing something far more important than innovation.

They’re preventing collapse.

🔖 One Number I Trust: Plain-Text Accounting for a Multi-Currency Household

No app did exactly what I needed, so I built my own personal finance system using plain-text accounting principles and a powerful Python library called Beancount. This post shows you how I handle imports, investments, multi-currency, and a two-person view.

🔖 Glamorous Christmas: Bringing Charm to Ruby

Today, Ruby 4.0 was released. What an exciting milestone for the language!

This release brings some amazing new features like the experimental Ruby::Box isolation mechanism, the new ZJIT compiler, significant performance improvements for class instantiation, and promotions of Set and Pathname to core classes. It’s incredible to see how Ruby continues to thrive and be pushed forward 30 years after its first release.

To celebrate this release, I’m happy to announce that I’ve been working on porting the Charmbracelet Go terminal libraries to Ruby, and today I’m releasing a first version of them. What better way to make this Ruby 4.0 release a little more glamorous and charming?

🔖 29 Finding a broken trace on my old Mac with the help of its ROM diagnostics

I remembered that people have been working towards documenting the Mac ROM startup tests and using them to diagnose problems, so I decided to give it a shot and see if Apple’s Serial Test Manager could identify my Performa’s issue. Where was the fault on this complicated board? Sure, I could test a zillion traces by hand, but why bother when the computer already knows what is wrong?

🔖 Public Domain Day 2026

1925 was a watershed year for the recording industry. The Jazz Age was in full swing and beginning in March, 1925, widespread adoption of electrical recording meant greater fidelity and new realism for recordings. DAHR documents over 11,000 recordings made in 1925–an all-time high for the industry.

🔖 Pulled 60 Minutes segment on CECOT

This is a screen recording of a 60 Minutes segment about the Centro de Confinamiento del Terrorismo (CECOT) prison in El Salvador, which was intended to be aired December 22, 2025 but was pulled last minute for unclear reasons. Despite being pulled, it aired on Global-TV in Canada anyway.

🔖 Two Years After Cormac McCarthy’s Death, Rare Access to His Personal Library Reveals the Man Behind the Myth

Cormac McCarthy, one of the greatest novelists America has ever produced and one of the most private, had been dead for 13 months when I arrived at his final residence outside Santa Fe, New Mexico. It was a stately old adobe house, two stories high with beam-ends jutting out of the exterior walls, set back from a country road in a valley below the mountains. First built in 1892, the house was expanded and modernized in the 1970s and extensively modified by McCarthy himself, who, it turns out, was a self-taught architect as well as a master of literary fiction.

I was invited to the house by two McCarthy scholars who were embroiled in a herculean endeavor. Working unpaid, with help from other volunteer scholars and occasional graduate students, they had taken it upon themselves to physically examine and digitally catalog every single book in McCarthy’s enormous and chaotically disorganized personal library. They were guessing it contained upwards of 20,000 volumes. By comparison, Ernest Hemingway, considered a voracious book collector, left behind a personal library of 9,000.

🔖 How uv got so fast

uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.

🔖 Post-Platform Digital Publishing Toolkit

The Post-Platform Digital Publishing Toolkit is an iterative digital and print publication by Well Gedacht Publishing exploring how to overcome the limitations of digital publishing on social media and other online platforms, and advocates for self-hosted infrastructures and practices for artists and artists’ book publishers. You can find the first iteration of the print publication here.

🔖 Giambattista Vico

Giambattista Vico (born Giovan Battista Vico /ˈviːkoʊ/; Italian: [ˈviko]; 23 June 1668 – 23 January 1744) was an Italian philosopher, rhetorician, historian, and jurist during the Italian Enlightenment. He criticized the expansion and development of modern rationalism, finding Cartesian analysis and other types of reductionism impractical to human life, and he was an apologist for classical antiquity and the Renaissance humanities, in addition to being the first expositor of the fundamentals of social science and of semiotics. He is recognised as one of the first Counter-Enlightenment figures in history.

🔖 Robots Can Be Hacked in Minutes, Chinese Cybersecurity Experts Warn

Commercial robots have widespread and exploitable vulnerabilities that can allow hackers to take over within hours or even minutes, according to Chinese cybersecurity experts.

Security in the robotics industry is “riddled with holes,” said Xiao Xuangan, who works at Darknavy, an independent cybersecurity research and services firm based in Singapore and Shanghai. Xiao noted that when testing low-level security issues in quadruped robots, his team gained control of one of Deep Robotics’ Lite-series products in just an hour.

2025-12-31: Review of WS-DL's 2024 / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

Better late than never, right? This review of WS-DL's 2024 is incredibly late, but some family concerns delayed my writing and then I never quite got back on track. We had quite a productive 2024, graduating a record three MS students and four PhD students.

Students and Faculty

We did not add or lose any faculty this year, but Dr. Jayarathna received tenure early! Congratulations to Sampath!

We graduated four PhD students:

Bathsheba Farrow defended her dissertation on 2024-06-28. Bathsheba already had a position with the Naval Surface Warfare Center, and will continue with them after her graduation.
Yasith Jayawardana defended his dissertation on 2024-07-09. Yasith took a research scientist position at Georgia Tech.
Gavindya Jayawardena defended her dissertation on 2024-07-10. Gavindya took a post-doc position at the Information eXperience lab at the UT Austin School of Information.
Muntabir Hasan Choudhury defended his dissertation on 2024-11-06. Muntabir took a research fellow position at the Food and Drug Administration (FDA).

Congratulations to Dr. Bathsheba Farrrow @sheissheba for successfully defending her dissertation "A Microservices Approach to EEG Research in the Public Cloud" She is the very first student from the @NirdsLab research group @WebSciDL @oducs to defend the Ph.D. pic.twitter.com/K5wDRa7qu4
— Sampath Jayarathna (@OpenMaze) June 28, 2024

Congratulations to Dr. Yasith Jayawardana @yasithdev for successfully defending his doctoral dissertation "A Realtime Biosignal Processing Framework for Lab Scale Experimentation." Second Ph.D to come out of @NirdsLab research team @WebSciDL @oducs @ODUSCI. pic.twitter.com/zMYQ9Tq6KP
— Sampath Jayarathna (@OpenMaze) July 9, 2024

Congratulations to Dr. Gavindya Jayawardena @Gavindya2 for successfully defending her dissertation “RAEMAP: Real-time Advanced Eye Movement Analysis Pipeline”. 🥳@NirdsLab @WebSciDL @oducs @ODUSCI @OpenMaze pic.twitter.com/35NeNDHT1L
— Yasasi (@Yasasi_Abey) July 10, 2024

pic.twitter.com/1sL4AfDX5Z
— Jian Wu (@fanchyna) November 6, 2024

And we also graduated three MS students, two of whom stayed and entered the PhD program:

Caleb Bradford finished his MS in 2024.
Dominik Soós defended his MS thesis and joined the PhD program.
Lesley Frew defended her MS thesis and joined the PhD program.

My student Dominik Soós successfully defended his Master's thesis today. His thesis title is "Who Wrote the Scientific News? Improving the Discernibility of LLMs to Human-Written Scientific News". Thanks @vikas_daveb @Meng_CS. Dominik will be a PhD student in Fall 2024 @WebSciDL pic.twitter.com/7yVXdoOT0w
— Jian Wu (@fanchyna) July 18, 2024

Congrats to @lesley_elis for a successful MS thesis defense! In "Surfacing Text Changes in Archived Webpages," she showcased a new interface to search web archive collections for added or deleted terms. 🧵@WebSciDL @phonedude_mln @OpenMaze @oducs @odusci pic.twitter.com/onrxQTGuto
— Michele Weigle (@weiglemc) July 29, 2024

We added three new PhD students, two of whom joined from our MS program:

Lawrence Obiuwevwi (@LObiuwevwi) joined WS-DL in Spring 2024 and is working with Dr. Jayarathna.

Dominik Soós (@DomSoos) continued with WS-DL and Dr. Wu in the PhD program in Fall 2024.
Lesley Frew (@lesleyelisabeth.bsky.social) continued with WS-DL and Dr. Weigle in the PhD program in Fall 2024.

We also had six PhD students advance their status:

Yasasi Abeysinghe advanced to candidacy
Nithiya Venkatraman advanced to candidacy
Lamia Salsabil advanced to candidacy
Tarannum Zaki advanced to candidacy
Yash Prakash advanced to candidacy, and also defended his prospectus
Xin Wei defended her prospectus

Publications and Presentations

To generate our annual publication list, we use our tool "Scholar Groups", which scrapes Google Scholar profiles and merges and deduplicates publications. As such, our list is limited by the accuracy of Google Scholar, which is pretty good but not perfect. It looks like for 2024 we published about 28 refereed conference papers, 12 journal articles, and one patent (congratulations, Dr. Ashok!).

Conferences have mostly returned to f2f, but frequently there are still virtual/remote options. Below is a partial list of trip reports for the events where we presented our work:

Digital Humanities and the Study of Mediterranean Mobilities
Modeling, Simulation and Visualization Student Capstone Conference (MSVSCC 2024)
3rd International Conference on Science of Science and Innovation (ICSSI 2024)
ACM Conference for Reproducibility and Replicability (ACM REP 2025)
ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)
2024 Trust and Influence Program Review
Digital Methods for the Study of Mobilities Workshop
ACM Symposium on Eye Tracking Research & Applications (ETRA 2024)
ACM Hypertext 2024
ACM International Conference on Information and Knowledge Management (CIKM 2024)
International Conference on Theory and Practice of Digital Libraries (TPDL 2024)
IEEE Visualization and Visual Analytics Conference (VIS 2024)
International Symposium on Electronic Theses and Dissertations (ETD 2024)
9th Computational Archival Science Workshop (CAS 2024)

🏆Received "Best Presentation Award" back to back for another year. This time for the AI & Autonomous Systems track! 🥳 pic.twitter.com/e1DcbUYngV
— Tarannum Zaki (@tarannum_zaki) April 11, 2024

Tonight at the Poster Session during the #CIKM2024 Welcome Reception, @lesley_elis presented her short paper titled “Retrogressive Document Manipulation of US Federal Environmental Websites”. @weiglemc @phonedude_mln @WebSciDL @cikm2024 pic.twitter.com/MalRAFLGdB
— Himarsha R. Jayanetti (@HimarshaJ) October 23, 2024

I just presented our full paper, "Towards Enhancing Low Vision Usability of Data Charts on Smartphones," at #IEEEVIS 🎉. Grateful for the opportunity to share our work on improving data accessibility for low vision users. #DataViz #Accessibility #LowVision @WebSciDL @ieeevis pic.twitter.com/xZblvfkMhw
— Yash Prakash (@LunaticBugbear) October 17, 2024

Research Presentations and Outreach

In addition to the conferences and workshops listed above where we presented papers, we also had an array of presentations and outreach.

We hosted the third and final year of our NSF Disinformation Detection and Analytics Research Experiences for Undergraduates (REU) Program. The cohort of the third year was especially successful, as reflected in the mid-point and final presentations. As you can see from the list above, many of the REU projects resulted in publications.

We also had our 5th annual "Trick or Research" event. Dr. Jayarathna initiated the event five years ago and the entire department has embraced it.

We also had our fourth annual Research Expo, where we have five PhD students give overviews of their work.

In addition, we had the following events and outreach:

Mohan Krishna presented at the Global Summit on Artificial Intelligence
Dr. Nelson and Dr. Ross Gore presented "Establishing and Growing a Research Identity" at OERI.
Dr. Wu was quoted in a JLab article about the Boost Platform Workshop
Dr. Ashok presented his group's research for Global Accessibility Awareness Day (GAAD)
Dr. Jayarathna presented his group's research at the Perry Honors College
We were happy to hose Prof. Michael Herzog of Magdeburg-Stendal University.
Dr. Jayarathna participated in the Fall 2024 Open House, School of Data Science
Dr. Jayarathna gave an invited talk at Ocean Lakes High School (2024-10-16)

Good to see our friend Professor Herzog @maherzog during his annual visit @odu. We had a nice lunch at the Webb Center, got a chance to exchange some ideas, and met our @NirdsLab @WebSciDL students during the weekly lab meeting. pic.twitter.com/Ry0rr99KzJ
— Sampath Jayarathna (@OpenMaze) March 22, 2024

Happenning now: Drs. Michael Nelson @phonedude_mln and Ross Gore @rossgore are presenting Establishing and Growing a Research Identity at OERI.
cc:/ @WebSciDL @vmasc_odu @oeriatodu pic.twitter.com/CzM7Whbv6f
— Faryaneh Poursardar (@Faryane) May 3, 2024

Software, Data Sets, Services

Our scholarly contributions are not limited to conventional publications or presentations: we also advance the state of the art through releasing software, data sets, and proof-of-concept services & demos. Some of the software, data sets, and services that we either initially released or made significant updates to in 2024 include:

Several repositories supporting publications from the Accessible Computing Lab (Dr. Ashok), including: InstaFetch, Discussion_Dataset, Low Vision Graphical Perception of Bar Charts, GraphLite, and ChartSync.
Multi-Domain Scientific Claim Verification Evaluation Corpus (MSVEC)
TechDrawFinder
FakeNewsPerception

Awards and Recognition

Many members of our group received awards and other forms of recognition, a sampling of which includes:

Yasasi Abeysinghe received the Levenstein Scholarship, Outstanding Researcher Award, and COVES Fellowship.
Tarannum Zaki won the "Best Presentation Award" at MSVSCC2024.
Dominik Soós won the department's best research assistant and Rochana Obadage won the department's best teaching assistant
David Calano received the SMART Scholarship from the DoD, which includes summer internships
Kenny Ajayi had a summer internship at Amazon
Mohan Kriksha had a summer internship at PRA Group
Dr. Nelson received the Eminent Scholar designation.
Dr. Wu won the Provost's Outstanding Undergraduate Research Mentor
Dr. Jayarathna was Senior Faculty Fellow, Office of Naval Research Summer Faculty Program, at Naval Surface Warfare Center Dahlgren Division, Dam Neck Activity during the summer
Dr. Jayarathna received won “Outstanding Undergraduate Research Mentor" and "John R. Broderick Diversity Champion Award" (in addition to graduating three PhD students this year!)

I am honored to be awarded the 2024 Provost's Outstanding Undergraduate Research Mentor of the Old Dominion University. Thanks President Hemphill and Provost Agho. @WebSciDL @oducs @ODUSCI @ODU pic.twitter.com/sAl2xiM2Dh
— Jian Wu (@fanchyna) April 24, 2024

Thank you for joining with me @hanks_norfolk to celebrate my tenure and promotion. Thank you for the camaraderie! @WebSciDL @storymodelers @NirdsLab @oducs pic.twitter.com/2iSJxWJh1J
— Sampath Jayarathna (@OpenMaze) April 30, 2024

Funding

In 2024, we received > $7.8M in six new externally funded grants:

Dr. Wu (PI) received $564k from IMLS for "Preserving Open Access Datasets and Software for Sustained Computational Reproducibility" (WSDL alumnus Dr. Sawood Alam is Co-PI).
Dr. Poursardar (PI) received $150k from NSF for "Advancing Health Equity: Integrating LLM Technology into Homeless Telehealth Services for Chronic Disease Education"
Drs. Nelson and Weigle (Co-PIs) received $399k from IMLS for "Preserving Personalized Advertisements for More Accurate Web Archives" (WSDL alumnus Dr. Mat Kelly is PI)
Drs. Ashok and Poursardar (PI and Co-PI, respectively) received $50k from CCI for "Tackling Dark Pattern-Induced Online Deception of People with Visual Disabilities"
Dr. Ashok (PI), with Drs. Jayarathna and Wu (Co-PIs), received $117k from IMLS for "Enhancing Accessibility of Electronic Theses and Dissertations"
Dr. Jayarathna is also senior personnel (Karen Sanzo (PI)) for a grant of $6.59M from the Virginia Department of Education for three new lab schools in Newport News and Chesapeake

Summary

2024 was a strong year for us: four PhD and three MS students graduated, three new PhD students, and six PhD students advancing their status. One of our faculty members, Dr. Sampath Jayarathna, received tenure. We continued to publish in prestigious venues, with about 40 refereed publications. We helped generate just over $7M in new external funds, from six different grants. WS-DL continues to grow and thrive, and I am proud of all the members and alumni and their progress in 2024.

If you would like to join WS-DL, please get in touch. To get a feel for our recent activities, please review our previous WS-DL annual summaries (2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, and 2013) and be sure to follow us at @WebSciDL to keep up to date with publications, software releases, trip reports and other activities. We especially would like to thank all those who have publicly complimented some aspect of the Web Science and Digital Libraries Research Group, our members, and the impact of our research.

–Michael

Congratulations to All three Dr. J's from the @NirdsLab Dr. Bathsheba Jackson @sheissheba , Dr. Gavindya Jayawardena @Gavindya2 , and Dr. Yasith Jayawardana. @yasithdev. @WebSciDL PhD Crush board updated, just waiting in the border of alumni cloud until the paperwork is done! pic.twitter.com/dwEMMzOpy5
— Sampath Jayarathna (@OpenMaze) July 10, 2024

After the defense, @lesley_elis continued @WebSciDL's tradition of students sharing a hometown meal with us. Lesley's Philly roots showed up in the cheesesteaks (made authentic with Cheez Whiz), @Tastykake, birch beer soda, and @GrittyNHL decorations. Congrats again, Lesley! 🎉🎓 pic.twitter.com/WP8Ny1d2BY
— Michele Weigle (@weiglemc) July 29, 2024

Public Domain Day 2026: Celebrating human creativity and sharing / John Mark Ockerbloom

I’m glad we’ve reached a new Public Domain Day, and that the works I’ve been featuring in my #PublicDomainDayCountdown, and many more, are now free to copy and reuse. I’ve been posting about works joining the public domain in the United States, which include sound recordings published in 1925, and other works published in 1930 that had maintained their copyrights. (Numerous works from 1930, and later, that had to renew their copyrights, and did not, were already in the public domain, though many of the best-known works did renew copyrights as required.) This is the eighth straight year Americans have seen a year’s worth of works join the public domain, after a 20-year freeze following a 1998 copyright extension.

I intend my countdown not just to celebrate the works joining the public domain, but also to celebrate what people have done with those works. In some posts, I note later creations based on those works. In nearly all my posts, I link to things that people have written about those works. Like the works themselves, those responses may have flaws or quirks, but I value them as human reactions to human creations. Whether they’re reviews, personal blog posts, professionally written essays, scholarly analyses, or Wikipedia articles, they’re created by people who encountered an interesting work and cared about it enough to craft a response to it and share it with the world. Those shared responses in turn pique my interest in the writers and the works.

It wasn’t always easy for me to find such responses online. Sometimes I’d go searching for responses to a promising-sounding work, and only find sales listings on e-commerce sites, social media posts not easily linkable or displayable without logging into a commercial platform, paywalled articles that many of my readers can’t view, or generic-sounding pages that read like they were generated by a large language model or a content farm, but not by anyone who I could clearly tell cared about or even read the work in question. Some works I initially hoped to feature got left off my countdown, replaced by other works where I could more readily link to an interesting response.

The people publishing the responses I link to are often swimming against a strong current online. Many online writing systems– including the one I’ve been writing these posts on— are now urging their users to “improve” their posts by letting “AI” write them. Some writers may be tempted to allow it, when facing an impending deadline or writer’s block or anxiety, even when the costs can include muffling one’s own voice, signing onto falsehoods confidently stated by a stochastic text generator, or abusively exploiting existing content and services. Other writers may feel pushed to put their work behind paywalls or other access controls that makes them less likely to be plagiarized or aggressively crawled by those same “AI” systems. And most writers, myself included, find it easy to dash off a quick short take on a social media platform, be quickly gratified by some “like”s, and then have it forgotten. It’s harder to take the time to craft something longer or more thought-out that will be readable for years, and that might take much longer for us to hear appreciated. The easy alternatives can discourage people from devoting their time to better, more lasting creations.

As I’ve noted before, both copyright and the public domain serve important purposes in encouraging the creation, dissemination, sharing, and reuse of literature and art. One reason I write my public domain posts is to promote a better balance between them, particularly in encouraging shorter copyright lengths to benefit both original creators and the public. Similarly, as I’ve noted in another recent post, I value both human creation and automated processes, but I increasingly see a need to improve the balance between those as well, especially as some corporations aggressively push “generative AI”. While I appreciate many ways in which automation can help us create and manage our work, I treasure the humanity that people thoughtfully put into the creation of literature and art of all kinds, and the human responses that those creations elicit.

Today I’m thankful for all of the people, most no longer with us, who made the works that are joining the public domain today. I’m thankful for the new opportunities we have to share and build on those works now that they’re public domain. I’m thankful to all the people who have responded to those works, whether as brief reactions or as new works as ambitious as the works they respond to. And I hope you’ll keeping making and sharing those responses with the world when you can. I look forward to reading them, and perhaps linking to them in future posts.

New Job: Project Gutenberg / Eric Hellman

Personal Note, January 1 2026: I have a new job: Executive Director of the Project Gutenberg Literary Archive Foundation. Here's what I wrote for PG's January Newsletter.

Greetings from the new Executive Director

Happy Public Domain Day! You might hear people say that books published in 1930 have "fallen" into the US Public Domain, or, that they have lost copyright "protection". This is not quite correct. Rather, books published in 1930 have been FREED of copyright restrictions. They have ASCENDED into the public domain and into the embrace of organizations like Project Gutenberg. They now belong to ALL of us, and we need to take care of them for future generations.

On October 21, Project Gutenberg lost its longtime leader, Greg Newby, to pancreatic cancer. I had agreed to step up as Acting Executive Director so that Project Gutenberg could continue the mission that had become Greg's life work: to serve and preserve public domain books so that all of us can use and enjoy them without restrictions. Although I've been doing development work for Project Gutenberg for the past 8 years, I did not really understand what Greg's job entailed, or how many tasks he had been juggling. Three months in, I'm still discovering mysterious-to-me aspects of the organization. I've also been amazed at the dedication and talent of the many volunteers behind Project Gutenberg and our sister organization, Distributed Proofreaders. And at the large number of donors who make the organization financially viable and sustainable. So as of 2026, with your support, I'm continuing as Executive Director.

In the past three months Project Gutenberg has proven to be resilient; we took a heavy blow and managed to keep going. My top priority going forward is to make Project Gutenberg even more sustainable as well as resilient. In other words, my job is be one runner in a relay race: take the baton and make sure I get it to the next runner. That's what we all have to do with public domain books, too. We want them to still be there in 50 years! Whether you're already a volunteer or booster, an avid reader, or just someone curious about what we do, I hope you'll help us pass that baton.

Sabotaging Bitcoin / David Rosenthal

Source

I find myself in the unusual position of defending Bitcoin from its critics, if only reluctantly.

In 2024 Soroush Farokhnia & Amir Kafshdar Goharshady published Options and Futures Imperil Bitcoin's Security and:

showed that (i) a successful block-reverting attack does not necessarily require ... a majority of the hash power; (ii) obtaining a majority of the hash power ... costs roughly 6.77 billion ... and (iii) Bitcoin derivatives, i.e. options and futures, imperil Bitcoin’s security by creating an incentive for a block-reverting/majority attack.

Source

It is worth noting that they are not talking about profiting from double-spending. The Bitcoin blockchain transacts around $17B/day of nominal value in around 450K transactions (average ~$38K), but in 2021 Igor Makarov & Antoinette Schoar found that:

90% of transaction volume on the Bitcoin blockchain is not tied to economically meaningful activities but is the byproduct of the Bitcoin protocol design as well as the preference of many participants for anonymity ... exchanges play a central role in the Bitcoin system. They explain 75% of real Bitcoin volume.

Of course, just because they aren't "economically meaningful" doesn't mean they aren't worth attacking! The average block has ~3.2K transactions, so ~$121.6M/block. As a check. $121.6M * 144 block/day = $17.5B. So to recover their cost for a 51% attack would require double-spending about 8 hours worth of transactions.

I agree with their technical analysis of the attack, but I believe there would be significant difficulties in putting it into practice. Below the fold I try to set out these difficulties.

brevity is for the weak
Maciej Cegłowski

First, I should point out that I wrote about using derivatives to profit from manipulating Bitcoin's price more than three years ago in Pump-and-Dump Schemes. These schemes have a long history in cryptocurrencies, but they are not the attack involved here. I don't claim expertise in derivatives trading, so it is possible my analysis is faulty. If so, please point out the problems in a comment.

The Attack

Farokhnia & Goharshady build on the 2018 work of Ittay Eyal & Emin Gün Sirer in Majority is not enough: Bitcoin mining is vulnerable:

The key idea behind this strategy, called Selfish Mining, is for a pool to keep its discovered blocks private, thereby intentionally forking the chain. The honest nodes continue to mine on the public chain, while the pool mines on its own private branch. If the pool discovers more blocks, it develops a longer lead on the public chain, and continues to keep these new blocks private. When the public branch approaches the pool's private branch in length, the selfish miners reveal blocks from their private chain to the public.
...
We further show that the Bitcoin mining protocol will never be safe against attacks by a selfish mining pool that commands more than 1/3 of the total mining power of the network. Such a pool will always be able to collect mining rewards that exceed its proportion of mining power, even if it loses every single block race in the network. The resulting bound of 2/3 for the fraction of Bitcoin mining power that needs to follow the honest protocol to ensure that the protocol remains resistant to being gamed is substantially lower than the 50% figure currently assumed, and difficult to achieve in practice.

In April 2024 Farokhnia & Goharshady observed that:

Given that the rule of thumb followed by most practitioners is to wait for 6 confirmations, a fork that goes 6 levels deep can very likely diminish the public’s trust in Bitcoin and cause a crash in its market price. It is also widely accepted that a prolonged majority attack (if it happens) would be catastrophic to the cryptocurrency and can cause its downfall.

But, as they lay out, this possibility is discounted:

The conventional wisdom in the blockchain community is to assume that such block-reverting attacks are highly unlikely to happen. The reasoning goes as follows:

Reverting multiple blocks and specifically double-spending a transaction that has 6 confirmations requires control of a majority of the mining power;

Having a majority of the mining power is prohibitively expensive and requires an outlandish investment in hardware;

Even if a miner, mining pool or group of pools does control a majority of the mining power, they have no incentive to act dishonestly and revert the blockchain, as that would crash the price of Bitcoin, which is ultimately not in their favor, since they rely on mining rewards denominated in BTC for their income.

Source

Starting in late 2020, as shown in The Economist's graphic, the spot market in Bitcoin became dwarfed by the derivatives markets. In the last month $1.7T of Bitcoin futures traded on unregulated exchanges, and $6.4B on regulated exchanges. Compare this with the $1.8B of the spot market in the same month.

These huge futures markets enable Farokhnia & Goharshady's attack:

In short, an attacker can first use the Bitcoin derivatives market to short Bitcoin by purchasing a sufficient amount of put options or other equivalent financial instruments. She can then invest any of the amounts calculated above, depending on the timeline of the attack, to obtain the necessary hardware and hash power to perform the attack. If the attacker chooses to obtain a majority of the hash power, her success is guaranteed and she can revert the blocks as deeply as she wishes. However, she also has the option of a smaller upfront investment in hardware in exchange for longer wait times to achieve a high probability of success. In any case, as long as her earnings from shorting Bitcoin and then causing an intentional price crash outweighs her investments in hardware, there is a clear financial incentive to perform such an attack. The numbers above show that the annual trade volume in Bitcoin derivatives is more than three orders of magnitude larger than the required investment in hardware. Thus, it is possible and profitable to perform such an attack.

Assumptions

Farokhnia & Goharshady make some simplifying assumptions:

We only consider the cost of hardware at the time of writing. We assume the attacker is buying the hardware, rather than renting it and do not consider potential discounts on bulk orders.

We ignore electricity costs as they vary widely based on location.

The justification for the first assumption is that it keeps our analysis sound, i.e. we can only over-approximate the cost by making this assumption. As for the second assumption, we note that electricity costs are often negligible in comparison to hardware costs and that our main argument, i.e. the vulnerability of Bitcoin to majority attacks and block-reverting attacks, remains intact even if the estimates we obtain here are doubled. Indeed, as we will soon see, the trade volume of Bitcoin derivatives is more than three orders of magnitude larger than the numbers obtained here.

Goal

As Farokhnia & Goharshady stress, the success of a block-reverting attack is probabilistic, so the attacker needs to have a high enough probability of making a large enough profit to make up for the risk of failure.

My analysis thus assumes that the goal of the attacker is to have a 95% probability of earning at least double the cost of the attack.

Attacker

There are two different kinds of attackers with different sets of difficulties:

Outsiders: someone who has to acquire or rent sufficient hash power.
Insiders: someone or some mining pool who already controls sufficient hash power.

Farokhnia & Goharshady study the outsider case. Both kinds of attacker's practical problems occur in two areas:

Obtaining and maintaining for the duration of the attack sufficient hash power without detection.
Obtaining and maintaining for the duration of the attack a sufficient short position in Bitcoin without detection.

The short position must be maintained for the duration of the attack because succeess may come at any 10-minute block time, and there would not be time to obtain a large enough short position in ten minutes.

Hash Power

The outsider's problems are more complex than the insider's.

Outsider Attack

The outsider attacker requires three kinds of resource:

Mining rigs.
Power to run the rigs.
Data center space to hold the rigs.

Each of these is problematic, but assuming that the difficulties could be overcome, there is then the question of what it would cost to run the attack..

Mining rigs

Could they acquire mining rigs sufficient to provide 30% of the combined insider and outsider hash power, or ~43% of the pre-attack hash power?
How long would it take to acquire the rigs?
Would their acquisition of the rigs be detected?

Bitmain is estimated to have 82% of the market for mining rigs, and they either control or have very close relations with all the major mining pools, who thus have priority access to the latest rigs. Because rigs depreciate rapidly, position in the queue for rigs has a big impact on the profitability of mining. Bitmain is unlikely to give a new customer priority access to rigs.

Because the economic life of mining rigs is less than two years, the first part of Bitmain's production goes into maintaining the hash rate by replacing obsolete rigs. The second part goes into increasing the hash rate. If we assume that the outsider attacker could absorb the second part of Bitmain's production, how long would it take to get the necessary 43% of the previous hash power?

Source

This can be estimated by examining the hash power through time graph. Over the last 3 years the hash rate has increased from about 240EH/s to about 1120EH/s, or about 24EH/s/month. Roughly, 82% of this is Bitmain's output, or about 20EH/s/month. The attacker needs 43% of the current hash rate, or about 482EH/s, or 24 months of the second part of Bitmain's production. At the current price for leading-edge rigs of $14.11/TH/s this would cost about $6.8B plus say $340M in interest at 5%, or $7.14B.

The lack of rigs to increase the hash rate over a period of much less than two years would clearly be detectable.

Power

The Cambridge Bitcoin Energy Consumption Index's current estimate is that the network consumes 22GW. The outside attacker would need 43% of this, or about 9.5GW, for the duration of the attack. For context, Meta's extraordinarily aggressive AI data center plans claim to bring a single 1GW data center online in 2026, and the first 2GW phase of their planned $27B 5GW Louisiana data center in 2030. The constraint on the roll-out is largely that lack of access to sufficient power. The attacker would need double the power Meta's Louisiana data center plans to have in 2030.

Access to gigawatts of power is available only on long-term contracts and only after significant delays.

Data centers

Hyperion

Meta's 5GW Louisiana "Hyperion" data center's "footprint will be large enough to cover most of Manhattan", and the outsider attacker would need two of them. If Meta expects to take more than 5 years to build one of them, the outsider attacker is likely to need a decade.

Estimates for AI data centers are that 60% of the capital cost is the hardware and 40% everything else. Thus the "everything else" for Meta's $27B 5GW data center is $10.8B. "Everything else" for the attacker's two similar data centers would thus be $21.6B. Plus say 5 years of interest at 5% or $5.4B.

Operational cost

Ignoring the evident impossibility of the outsider attacker amassing the necessary mining rigs, power and data center space, what would the operational costs of the attack be?

It is hard to estimate the costs for power, data center space, etc. But an estimate can be based upon the cost to rent hash power, noting that in practice renting 43% of the total would be impossible, and guessing that renters have a 30% margin. A typical rental fee would be $0.10/TH/day so the costs might be $0.07/TH/day. The attack would have a 95% probability of needing 482EH/s over 34 days or less, so $516M or less.

Thus the estimated total cost for the hash power used in the attack would have a 95% probability of being no more than $7.66B. Plus about $27B in data center cost, which could presumably be repurposed to AI after the attack.

Insider Attack

Source

The insider attacker already controls the 30% of the hash power they need, so only the question of detection remains. The essence of the block-reverting attack is that the attacker mines in secret until they can publish a chain with 6 blocks following a target block. A reduction of 30% of the public hash rate over an average period of 17 days would clearly be detectable. The hash rate is noisy, but the graph shows that over the last year the largest drop was 16% from June 14^th to 27^th. There was one large drop in the hash rate, 51% between May 10^th and July 1^st 2021 as China cracked down on mining.

The insider's loss of income from the blocks they would otherwise have mined would have a 95% probability of being 4,590 BTC or less, or about $425M.

Short Position

Both kinds of attackers need to ensure that, when the attack succeeds, they have a large enough short position in Bitcoin that would generate their expected return from the attack's decrease in the Bitcoin price. There are two possibilities:

When the attacker's chain is within one block of being the longest, they have ten minutes to purchase the shorts. There is unlikely to be enough liquidity in the market to accommodate this sudden demand, which in any case would greatly increase the price of the shorts. I will ignore this possibility in what follows.
At the start of the attack the attacker gradually accumulates sufficient shorts. Even assuming there were enough liquidity, and that the purchases didn't increase the price, the attacker has to bear both the cost of maintaining the shorts for the duration of the attack, and the risk of the market moving up enough to cause the position to be liquidated.

The success of a block-reverting attack on Bitcoin would have implications on other cryptocurrencies. It would likely increase the prices of Proof-of-Stake coins such as Ethereum, as being much more difficult to attack, and decrease the price of other Proof-of-Work coins. Derivatives on these coins might be included in the attacker's toolkit, but I will ignore this possibility as the open interest on these coins is smaller.

Farokhnia & Goharshady note that:

At the time of writing, the open interest of BTC options is a bit more than 20 billion USD. Thus, a malicious party performing the attack mentioned in this work would need to obtain a considerable amount of the available put contracts. This may lead to market disruptions whose analysis is beyond the scope of this work. This being said, if the derivatives market continues to grow and becomes much larger than it currently is, purchasing this amount of contracts might not even be detected.

There are two different kinds of market in which Bitcoin shorts are available:

Regulated exchanges such as the CME offering options on Bitcoin and stock exchanges with Bitcoin ETFs and Bitcoin treasury companies such as Strategy.
Unregulated exchanges such as Binance offering "perpetual futures" (perps) on Bitcoin.

Unregulated Exchanges

Patrick McKenzie's Perpetual futures, explained is a clear and comprehensive description of the derivative common on unregulated exchanges:

Instead of all of a particular futures vintage settling on the same day, perps settle multiple times a day for a particular market on a particular exchange. The mechanism for this is the funding rate. At a high level: winners get paid by losers every e.g. 4 hours and then the game continues, unless you’ve been blown out due to becoming overleveraged or for other reasons (discussed in a moment).

Consider a toy example: a retail user buys 0.1 Bitcoin via a perp. The price on their screen, which they understand to be for Bitcoin, might be $86,000 each, and so they might pay $8,600 cash. Should the price rise to $90,000 before the next settlement, they will get +/- $400 of winnings credited to their account, and their account will continue to reflect exposure to 0.1 units of Bitcoin via the perp. They might choose to sell their future at this point (or any other). They’ll have paid one commission (and a spread) to buy, one (of each) to sell, and perhaps they’ll leave the casino with their winnings, or perhaps they’ll play another game.

Where did the money come from? Someone else was symmetrically short exposure to Bitcoin via a perp. It is, with some very important caveats incoming, a closed system: since no good or service is being produced except the speculation, winning money means someone else lost.

So the exchange makes money from commissions, and from the spread against the actual spot price. The price of the perp is maintained close to the spot price by the "basis trade", traders providing liquidity by shorting the perp and buying the spot when the perp is above spot, and vice versa. Of course, the spot price itself may have been manipulated, for example by Pump-and-Dump Schemes.

How else does the exchange make money?

Perp funding rates also embed an interest rate component. This might get quoted as 3 bps a day, or 1 bps every eight hours, or similar. However, because of the impact of leverage, gamblers are paying more than you might expect: at 10X leverage that’s 30 bps a day.

A "basis point (bps)" is "one hundredth of 1 percentage point", so 30bps/day is 0.3%/day or around 120%/year. But the lure of leverage is the competitive advantage of unregulated exchanges:

In a standard U.S. brokerage account, Regulation T has, for almost 100 years now, set maximum leverage limits (by setting minimums for margins). These are 2X at position opening time and 4X “maintenance” (before one closes out the position). Your brokerage would be obligated to forcibly close your position if volatility causes you to exceed those limits.

Unregulated markets are different:

Binance allows up to 125x leverage on BTC.

Although these huge amounts of leverage greatly increase the reward from a small market movement in favor of the position, they greatly reduce the amount the market has to move against the position before something bad happens. The first bad thing is liquidation:

One reason perps are structurally better for exchanges and market makers is that they simplify the business of blowing out leveraged traders. The exact mechanics depend on the exchange, the amount, etc, but generally speaking you can either force the customer to enter a closing trade or you can assign their position to someone willing to bear the risk in return for a discount.

Blowing out losing traders is lucrative for exchanges except when it catastrophically isn’t. It is a priced service in many places. The price is quoted to be low (“a nominal fee of 0.5%” is one way Binance describes it) but, since it is calculated from the amount at risk, it can be a large portion of the money lost. If the account’s negative balance is less than the liquidation fee, wonderful, thanks for playing and the exchange / “the insurance fund” keeps the rest, as a tip.

The bigger and faster the market move, the more likely the loss exceeds your collateral:

In the case where the amount an account is negative by is more than the fee, that “insurance fund” can choose to pay the winners on behalf of the liquidated user, at management’s discretion. Management will usually decide to do this, because a casino with a reputation for not paying winners will not long remain a casino.

But tail risk is a real thing. The capital efficiency has a price: there physically does not exist enough money in the system to pay all winners given sufficiently dramatic price moves. Forced liquidations happen. Sophisticated participants withdraw liquidity (for reasons we’ll soon discuss) or the exchange becomes overwhelmed technically / operationally. The forced liquidations eat through the diminished / unreplenished liquidity in the book, and the magnitude of the move increases.

The second bad thing is automatic de-leveraging (ADL):

Risk in perps has to be symmetric: if (accounting for leverage) there are 100,000 units of Somecoin exposure long, then there are 100,000 units of Somecoin exposure short. This does not imply that the shorts or longs are sufficiently capitalized to actually pay for all the exposure in all instances.

In cases where management deems paying winners from the insurance fund would be too costly and/or impossible, they automatically deleverage some winners.

McKenzie illustrates ADL with an example:

So perhaps you understood, prior to a 20% move, that you were 4X leveraged. You just earned 80%, right? Ah, except you were only 2X leveraged, so you earned 40%. Why were you retroactively only 2X? That’s what automatic deleveraging means. Why couldn’t you get the other 40% you feel entitled to? Because the collective group of losers doesn’t have enough to pay you your winnings and the insurance fund was insufficient or deemed insufficient by management.

For our purposes, this is an important note:

In theory, this can happen to the upside or the downside. In practice in crypto, this seems to usually happen after sharp decreases in prices, not sharp increases. For example, October 2025 saw widespread ADLing as (more than) $19 billion of liquidations happened, across a variety of assets.

How does this affect the outsider attacker? Lets assume that the attack has a 95% probability of costing no more than $7.5B and would reduce the Bitcoin price from $100K to $80K in a single 4-hour period. With 10X leverage this would generate $200K/BTC in gains.

Source

The outsider wants to double the cost of the attack, so needs to short $15B/$180K BTC, or 83,333BTC at 10X leverage for the duration of the attack. Establishing the position costs $8.333B. Assuming the BTC price is fixed at $100K until the attack succeeds the funding rate is zero. But we have to assume that the attacker borrowed the $8.333B for the duration, so would pay interest, plus two commissions plus two spreads. I'll ignore these costs.

Source

I will also ignore the fact that $83B is around 148% of the peak aggregated open interest in Bitcoin options over the past year of about $56B.

The way liquidation of a short works is that as the market moves up, the initial leverage increases. Each exchange will have a limit on the leverage it will allow so, allowing for the liquidation fee, if the leverage of the short position gets to this limit the exchange will liquidate it.

Move %	Leverage
0	10
1	11.1
2	12.5
3	14.3
4	16.7
5	20
6	25
7	33.3
8	50
9	100

The table shows the effect of increasing percentage moves against an initial 10X leveraged short. If we assume a short with an initial 10X leverage and an exchange limit of 50X was taken out on the first of each month of 2025, on 9 of the 12 months it would have been liquidated before the month was out. So Bitcoin is volatile enough that the attacker's short has a high probability of being liquidated before the attack succeeds. And note Binance's "nominal fee" of 0.5% for liquidating $83.33B, or $417M.

In the unlikely event that the attack succeeds early enough to avoid liquidation there would have been one of those "sharp decreases in prices" that cause ADL, so as a huge winner it would be essentially certain that the attacker would suffer ADL and most of the winnings needed to justify the attack would evaporate.

Regulated Exchanges

The peak open interest in Bitcoin futures on the Chicago Mercantile Exchange over the past year was less than $20B, so even if we add together both kinds of exchange, the peak open interest over the last year isn't enough for the attacker.

Conclusions

Neither an outsider nor an insider attack appears feasible.

Outsider Attack

An outsider attack seems infeasible because in practice:

They could not acquire 43% or more of the hash power.
Even if they could it would take so long as to make detection inevitable.
Even if they could and they were not detected, the high cost of the rigs makes the necessary shorts large relative to the open interest, and expensive to maintain.
These large shorts would need to be leveraged perpetual futures, bringing significant risks of loss of collateral through liquidation, and of the potential payoff being reduced through automatic de-leveraging.
The attacker would need more than the peak aggregate open interest in Bitcoin futures over the past year.

Insider Attack

The order-of-magnitude lower direct cost of an insider attack makes it appear less infeasible, but insiders have to consider the impact on their continuing mining business. If the assumed 20% drop in the Bitcoin price were sustained for a year, the cost to the miner controlling 30% of the hash rate would be about 15,750 BTC or nearly $1.5B making the total cost of the attack (excluding the cost of carrying the shorts) almost $2B.

Source

The drop in Bitcoin's price and the smaller, lagging drop in network difficulty over the last three months has decreased miners' revenue by about 25%. In the medium term a further drop is in prospect. Sometime in April 2028 the regular Bitcoin halvening will occur, halving the income of miners in aggregate Bitcoin terms.

Source

These drops turn the insiders' access to data center space and power into a double-edged sword because, as Vicky Ge Huang explains in Bitcoin Miners Thrive Off a New Side Hustle: Retooling Their Data Centers for AI:

mining-company stocks are still flying, even with cryptocurrency prices in retreat. That's because these firms have something in common with the hottest investment theme on the planet: the massive, electricity-hungry data centers expected to power the artificial-intelligence boom. Some companies are figuring out how to remake themselves as vital suppliers to Alphabet, Amazon, Meta, Microsoft and other "hyperscalers" bent on AI dominance.
...
Miners often have to build new, specialized facilities, because running AI requires more-advanced cooling and network systems, as well as replacing bitcoin-mining computers with AI-focused graphics processing units. But signing deals with miners allows AI giants to expand faster and cheaper than starting new facilities from scratch.
...
Shares of Core Scientific quadrupled in 2024 after the company signed its first AI contract that February. The stock has gained 10% this year. The company now expects to exit bitcoin mining entirely by 2028.

I wonder why the date is 2028! As profit-driven miners use their bouyant stock price to fund a pivot to AI the hash rate and the network difficuty will decrease, making an insider attack less infeasible. The drop in their customer's income will likely encourage Bitmain to similarly pivot to AI, devoting an increasing proportion of their wafers to AI chips, especially given the Chinese government's goal of localizing AI.

A 30% miner whose rigs were fully depreciated might consider an insider attack shortly before the halvening as a viable exit strategy, since their future earnings from mining would be greatly reduced. But they would still be detected.

Counter-measures

Even if we assume the feasibility of both the hash rate and the short position aspects of the attack, it is still the case that for example, an attack with 30% of the hash power and a 95% probability of success will, on average, last 17 days. it seems very unlikely that the coincidence over an extended period of a large reduction in the expected hash rate and a huge increase in short interest would escape attention from Bitcoin HODl-ers, miners and exchanges, not to mention Bitmain. What counter-measures could they employ?

Source

The theoretically correct counter-measure would be to raise the 6-block finality criterion to the 24 blocks that corresponds to a pool with 30% of the hash power. But this would violate what people incorrectly believe is the revealed word of Satoshi. And Goharshady correctly pointed out in email that this is in any case impractical:

The 6-block rule is just a convention, there is no dial that can be turned.
Much of the access to the Bitcoin blockchain is via APIs that typically have the 6-block rule hard-codded in.
Many, typically low-value, transactions do not wait for even a single confirmation.
Even it were possible, changing from a one-hour to a four-hour confirmation would have significant negative impacts on the Bitcoin ecosystem.

In the case of an insider attack, the absence of a pool previously mining around one in three of all blocks would readily de-anonymize the attacker. Bitmain would necessarily be aware of the identity of an outside attacker. Although unregulated exchanges are notoriously poor at KYC/AML, the sums involved in the shorts are so large that they would be highly motivated to use the blockchain information to de-anonymize the attacker. Given their terms of service, and the lack of effective recourse, they would be able to ham-string the attack.

Acknowledgements

This post benefited greatly from insightful comments on a draft from Jonathan Reiter, Amir Kafshdar Goharshady and Joel Wallenberg, but the errors are all mine.

Happy New Year / Ed Summers

Blackeye peas and collard greens for us.

2025 down, 2026 to go / Coral Sheldon-Hess

I don’t think any of us went into 2025 thinking this would be an easy year, and boy wasn’t it. Rather than restate our collective challenges, I’m going to stick to my own little life and keep it short with bullet points and pictures. I’ll also list as many positives as I can, because even in the darkest of times, there are sources of joy and reasons to be grateful.

Real life, in no particular order

(OK, there’s some order, let’s do the hard one first) My mom passed away in late October – our relationship was complicated, and so are my feelings, but I’m working through it all with a therapist
(Bittersweet) I adopted Mom’s green cheek conure, Tutu (which we jokingly spell “Teauxtu,” thanks to my brother), who bonded to me quickly and thoroughly, but who is currently plucking all of his feathers anyway — we’re working on it with his vet, and I’m hopeful we’ll get him past this (though I’ll love him anyway, even if he’s a naked bird forever; I cannot overstate how sweet a little guy he is!)
My brother got married to a really nice lady, and I’m so happy for them!
In early October, we moved from Midcoast Maine to Western NY, where there is a larger and more active covid-conscious community – because the economy was so unstable, it took months to sell our house in Maine, but we did finally succeed, the day before Christmas
We joined the mask bloc here for a D&D night and had a lot of fun – looking forward to getting involved in mask distribution and attending more events in 2026
I finally read all of the Murderbot Diaries series by Martha Wells, spurred on by the release of the TV show – then I listened to the audiobooks, and it became my comfort listen for the whole latter part of the year
I ran an accessible birding event here in Rochester, and I’m hoping to run more next year
I took American Sign Language 101 and 102, which I’ve really enjoyed. I’ll be retaking 102 in early 2026 (it was a rough autumn/winter), and then I hope to move on to 103 and 104.
I learned to darn (as in, mending damaged knitted items), and I’m working on expanding my mending skills repertoire – this is well-timed, since, besides his own feathers and our hair, Tutu loves to bite on shirts until they have holes in them
We saw the most vivid aurora we’ve ever seen, including during our time in Alaska
My aunt sent me her scone recipe, and I made so. many. scones. this year

Bird Buddies! (Outdoor birds captured with our feeder cameras.)

A grey catbird with an orange on a birdbuddy feeder

A Baltimore oriole with an orange on a BirdBuddy feeder.

The three baby bluebirds in the video below brought us so much joy this spring! Here, they appear to have a little conference, to decide whether it’s time to fly away or not, before a starling shows up.

Bird Buddies who live indoors

Tutu, a green cheek conure, snuggled into a human hand.

A pair of budgies, one bright blue and white, and one the color of a wild budgie: green and yellow.

A curious-looking whiteface cockatiel front and center of the camera, with a human's face and another cockatiel both partially blocked behind him.

Miscellaney!

The aurora in extra-vivid colors because that's how phone cameras capture them.

A very round squirrel, very close to the camera, aggressively interested in any snacks that might be behind said camera.

Work, in no particular order

I ran the first part of a major, potentially multi-year LibGuides accessibility remediation project (“potentially” because there’s a task force considering how important LibGuides are to us and our patrons, and maybe we will shift focus away from them or have a single accessibility remediator or … something I haven’t thought of, who knows?)
I co-ran a task force focused on accessibility training for library staff
We survived a university-wide “Cybersecurity Alignment” (or the first part of one?)
I learned the developer side of Figma (kind of, I still find parts of it baffling)
I learned a ridiculous amount about authentication to electronic resources (which would be really satisfying if I didn’t keep running into things I still don’t know)

A fairly complicated diagram, showing the various paths users travel while authenticating to electronic resources at CSU.

Coming up in 2026, an incomplete list

I’ll take more ASL classes
I’ll mend some things
I’ll ride my bike more, because there are more safe places to do so where I live!
I’m hoping to work through more herbal studies and practice more of those skills – it may become vital in the near future
There will be an IT “alignment” at work mid-year – it could mean anything from “nothing” to “now I report to Central IT (who doesn’t care about my MLIS and might actually consider it a negative) instead of the Library (who doesn’t require an MLIS for my position, but knows it’s a strength, and also my team is so good, I don’t want to lose them, OK, yes, I am very stressed about this possibility)”
Our landlord will be selling our house, something I think (hope) he didn’t realize he’d be doing when we started renting here – we’ll get first right of refusal on buying it, and he’s willing to write us a lease for as long as we want one if we choose not to buy (NY law means the new owner will have to honor it), so while it’s a stressful situation, it’s not as bad as it could be
A valued colleague (everyone in my department is valued, truly, but his institutional knowledge is unmatched) will be retiring, and because of budget constraints, that will probably mean some shuffling of responsibilities semi-permanently
If we decide we aren’t moving (so, we decide to buy this house or to sign a longer lease), I’ll set up some garden beds and grow some things
I want this to be a thing, but financially, it may not: I really very much want to build or buy a small (like, Class B, or C at the highest) camper so that I can travel again

“Last…” will be first / John Mark Ockerbloom

Ninety-five (or 100) years is a very long time for copyrights to last. But Olaf Stapledon saw a much longer future for us in Last and First Men (reviewed here and here). His book tells a story of successive human species over the next 2 billion years.

Stapledon died in 1950, and his work is already public domain most places outside the US. Tomorrow a copy in Australia will be among the first books to be relisted in my new books listing, finally free for all. #PublicDomainDayCountdown

2025-12-31: From Tables to Triumph: A PhD Journey in Uncertainty-Aware Scientific Data Extraction / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

In January 2021, I began a journey that would span nearly five years, three children, countless late nights, and a singular focus: teaching machines to extract data from complex scientific tables with confidence—and to know when they're uncertain. On October 29, 2025, I successfully defended my dissertation titled "SCITEUQ: Toward Uncertainty-Aware Complex Scientific Table Data Extraction and Understanding" at Old Dominion University. This milestone represents not just the culmination of intensive research but a testament to perseverance, family support, and the power of focused determination.

Finding My Path at LAMP-SYS

When I joined the Lab for Applied Machine Learning and Natural Language Processing Systems (LAMP-SYS), part of ODU's Web Science and Digital Libraries Research Group (WSDL) under the guidance of Dr. Jian Wu, I knew exactly what problem I wanted to solve: making scientific table data extraction both accurate and trustworthy through uncertainty quantification.

Scientific tables are ubiquitous in research papers, containing critical experimental data, statistical results, and research findings. Yet extracting this data automatically from PDF documents remains surprisingly difficult. Unlike the simple, well-structured tables you might see on Wikipedia, scientific tables are complex beasts—featuring multi-level headers, merged cells, irregular layouts, and domain-specific notations that confound even state-of-the-art machine learning models.

But here's the real problem: existing methods don't tell you when they're wrong. They extract data with the same confidence whether they're processing a simple table or struggling with a complex one. For scientific applications where accuracy is paramount, this means researchers must manually verify every single extracted cell—a task that doesn't scale when you're dealing with thousands of tables.

The Research Challenge

My research addressed a fundamental question: How can we build systems that not only extract data from complex scientific tables but also quantify their uncertainty, allowing us to focus human verification effort only where it's needed?

To tackle this challenge, I formulated four research questions:

RQ1: What is the status of reproducibility and replicability of existing TSR models?

Before building something new, I needed to understand what already existed. I conducted the first systematic reproducibility and replicability study of 16 state-of-the-art Table Structure Recognition (TSR) methods. The results were sobering: only 8 of 16 papers made their code and data publicly available, and merely 5 had executable code. When I tested these methods on my newly created GenTSR dataset (386 tables from six scientific domains), none of the methods replicated their original performance. This highlighted a critical gap in the field. This work was published at ICDAR 2023: "A Study on Reproducibility and Replicability of Table Structure Recognition Methods."

RQ2: How do we quantify the uncertainties of TSR results?

To address this, I developed TTA-m, a novel uncertainty quantification pipeline that adapts Test-Time Augmentation specifically for TSR. Unlike vanilla TTA, my approach fine-tunes pre-trained models on augmented table images and employs ensemble-based methods to generate cell-level confidence scores. On the GenTSR dataset, TTA-m achieved an F1-score of 0.798, with over 80% accuracy for high-confidence predictions—enabling reliable automatic detection of extraction errors. This work was published at IEEE IRI 2024: "Uncertainty Quantification in Table Structure Recognition."

RQ3: How can we integrate uncertainties from TSR and OCR for holistic table data extraction?

I designed and implemented the TSR-OCR-UQ framework, which integrates table structure recognition (using TATR), optical character recognition (using PaddleOCR), and conformal prediction-based uncertainty quantification into a unified pipeline. The results were compelling: the accuracy improved from 53-71% to 83-97% for different complexity levels, with the system achieving 69% precision in flagging incorrect extractions and reducing manual verification labor by 53%. This work was published at ICDAR 2025: "Uncertainty-Aware Complex Scientific Table Data Extraction."

RQ4: How well do LLMs answer questions about complex scientific tables?

To evaluate the QA capability of Large Language Models on scientific tables, I created SciTableQA, a benchmark dataset containing 8,700 question-answer pairs across 320 complex scientific tables from multiple domains. My evaluation revealed that while GPT-3.5 achieved 79% accuracy on cell selection TableQA tasks, performance dropped to 49% on arithmetic reasoning TableQA tasks—highlighting significant limitations of current LLMs when dealing with complex table structures and numerical reasoning. This work was published at TPDL 2025: "SciTableQA: A Question-Answering Benchmark for Complex Scientific Tables."

The SCITEUQ Framework

Putting it all together, SCITEUQ (Scientific Table Extraction with Uncertainty Quantification) represents a comprehensive solution to uncertainty-aware scientific table data extraction. The framework achieves the state-of-the-art performance while providing essential uncertainty quantification capabilities that enable efficient human-in-the-loop verification.

Each component contributes to a more reliable approach:

GenTSR provides rigorous cross-domain evaluation
TTA-m quantifies uncertainties in structure recognition
TSR-OCR-UQ integrates structure and content extraction with uncertainty maps
SciTableQA enables systematic evaluation of reasoning capabilities

Publications and Research Impact

My research resulted in five first-author publications at top-tier conferences and journals:

ICDAR 2025: "Uncertainty-Aware Complex Scientific Table Data Extraction."
TPDL 2025: "SciTableQA: A Question-Answering Benchmark for Complex Scientific Tables."
IEEE IRI 2024: "Uncertainty Quantification in Table Structure Recognition."
Nature Scientific Data 2023: "DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding."
ICDAR 2023: "A Study on Reproducibility and Replicability of Table Structure Recognition Methods."

Each of these papers faced initial rejection before ultimately being accepted. This taught me an invaluable lesson: rejection is not failure; it's an opportunity to refine and improve your work.

Industry Experience: From Azure to Alexa to Microsoft AI

While my research focused on scientific tables, my internships at Microsoft and Amazon broadened my perspective on applying machine learning at scale.

Microsoft (Summers 2022, 2023, 2025)

My first two summers at Microsoft were with the Azure team, where I worked on infrastructure optimization problems far from my research area. I developed an AI-human hybrid LLM-based multi-agent system for AKS Cluster configuration, reducing cluster type generation time from 2 weeks to 1 hour (link to blog post). I also designed ML anomaly detection systems on Azure Synapse that reduced hardware maintenance costs by over 20% and formulated new metrics for characterizing node interruption rates that decreased hardware downtime by 25% (link to blog post).

In Summer 2025, I joined the Microsoft AI team under the Bing organization, working on problems at the intersection of large-scale search and AI—which is what I'll be doing when I return to Microsoft full-time in January 2026.

Amazon (Summer 2024)

At Amazon, I worked with the Alexa Certification Technology team in California, where I drove 10% customer growth by designing LLM-based RAG systems with advanced prompt engineering techniques and increased the revenue by over 5% by developing LLM Agents on AWS to improve Alexa-enabled applications (link to blog post).

These internships, while not directly related to my dissertation research, taught me how to apply ML thinking to diverse industrial problems and to work effectively in large, complex organizations.

Balancing PhD Life with Family

Perhaps the most challenging aspect of my PhD journey had nothing to do with research—it was combining my studies with raising three young children. My youngest son, Daniel, was born just six month after I was enrolled in the PhD program. Managing research deadlines, experimental runs, paper submissions, and the demands of parenting three boys (Paul, David, and Daniel) required discipline and sacrifice.

I developed a strict routine: work from 9 AM to 3 PM every day at my research lab, then pick up my kids from school and be fully present for them. This meant no late nights in the lab, no weekend marathons of coding—just consistent, focused work during designated hours. It wasn't always easy. Conference deadlines sometimes meant asking my wife, Olabisi, to take on even more, or my mother, Beatrice, to provide extra support. But this routine kept me grounded and taught me that quality of work matters more than quantity of hours.

The Defense

On October 27, 2025, I defended my dissertation before my committee:

Dr. Jian Wu (Director) - Old Dominion University
Dr. Michael L. Nelson - Old Dominion University
Dr. Michele C. Weigle - Old Dominion University
Dr. Sampath Jayarathna - Old Dominion University
Dr. Yi He (Co-advisor) - College of William & Mary

Their thoughtful feedback, probing questions, and constructive critiques throughout my PhD journey were instrumental in refining my research and pushing me to think deeper about the implications and limitations of my work.

Lessons Learned

Looking back on nearly five years of doctoral work, several lessons stand out:

1. Embrace Rejection as Refinement

Four of my papers were initially rejected. Each rejection stung, but each one ultimately led to a stronger paper. The review process, while sometimes frustrating, forced me to clarify my arguments, strengthen my experiments, and address weaknesses I hadn't noticed. My TPDL 2025 paper on SciTableQA went through two rounds of revisions, but the final version is significantly better than the original submission.

2. Establish Non-Negotiable Boundaries

My 9 AM to 3 PM schedule wasn't just convenient—it was essential for maintaining my sanity and my family relationships. While some might argue that PhD students need to work 80-hour weeks, I proved that focused, disciplined work during reasonable hours can produce quality research. Those boundaries also made me more efficient: when you only have six hours a day, you learn to prioritize ruthlessly.

3. Build for Reproducibility from Day One

My systematic study on TSR reproducibility taught me the hard way how difficult it is to reproduce other people's work. This experience shaped how I approached my own research. Every framework I built—TTA-m, TSR-OCR-UQ, SciTableQA—comes with comprehensive documentation, publicly available code, and clear instructions for replication. Future researchers shouldn't struggle to build upon my work the way I struggled with others'.

4. Choose Problems That Matter to You

I entered my PhD knowing I wanted to work on table extraction with uncertainty quantification, and I never wavered from that focus. This singular vision helped me navigate the inevitable setbacks and distractions that come with doctoral research. When experiments failed or papers got rejected, I could always return to the core question: How do we make scientific data extraction both accurate and trustworthy?

5. Internships Broaden Your Perspective

While my Microsoft and Amazon internships didn't directly contribute to my dissertation, they fundamentally shaped how I think about research. Working on production systems with millions of users taught me to think about scalability, robustness, and real-world constraints in ways that academic research rarely emphasizes. These experiences make me a better researcher because I can now evaluate my work not just on benchmark performance, but on whether it could actually be deployed at scale.

Looking Forward

In January 2026, I'll be joining Microsoft as a Data Scientist 2 with the Microsoft AI team at the Redmond campus in Washington state. My family and I are excited about this new chapter—moving from Norfolk, Virginia, to the Pacific Northwest, and transitioning from academic research to industry applications.

While I'll be working on different problems at Microsoft, the skills and mindset I developed during my PhD—rigorous experimentation, systematic evaluation, uncertainty quantification, and reproducible research—will continue to guide my work. I'm particularly excited about the opportunity to apply research-driven thinking to real-world problems at a scale that can impact millions of users.

Acknowledgments

This journey would have been impossible without extraordinary support:

To Dr. Jian Wu, my advisor, mentor, and guide—thank you for believing in my research vision, for pushing me to think bigger, and for your patience during the inevitable frustrations of doctoral research. Your mentorship has not only shaped my research but also my approach to solving complex problems.

To Dr. Yi He, my co-advisor at William & Mary, your expertise and thoughtful feedback greatly enriched this research. Thank you for your guidance and support throughout this journey.

To my dissertation committee—Drs. Michael Nelson, Michele Weigle, and Sampath Jayarathna—your constructive critiques and expert insights were essential in refining my ideas and strengthening this work.

To my colleagues in WSDL and LAMP-SYS, the collaborative environment, intellectual exchanges, and camaraderie made this journey both enriching and memorable.

To my wife, Olabisi—you walked beside me every step of this journey with unwavering devotion and love. Your patience during the long hours, your understanding through the challenges, and your constant encouragement when the path seemed difficult made this achievement possible. This accomplishment is as much yours as it is mine.

To my sons—Paul, David, and Daniel—you are my greatest blessings and my constant source of joy and motivation. I hope this work serves as an example that with dedication and faith, you too can achieve your dreams.

To God Almighty, who is the source of all wisdom and strength, I give thanks and praise.

-Kehinde Ajayi (@KennyAJ)

I-80 / Ed Summers

Recorded in a Ohio service center somewhere along I-80.

The muted sound of cars and trucks passing at high speed about 200 feet away.

This is a snippet cut and pasted a few times in REAPER:

Soon graduating into the public domain / John Mark Ockerbloom

Yale’s 1905 commencement ceremonies included a honorary doctorate for British composer Edward Elgar, and a portion of his first “Pomp and Circumstance” march. It’s been a staple of graduation processions ever since. The full suite of five marches that Elgar finished takes about 30 minutes to play, but took nearly three decades to complete. The first march has long been in the US public domain; the last, published in 1930, joins it there in two days. #PublicDomainDayCountdown

Two great blues musicians, and 2000 more records / John Mark Ockerbloom

David Seubert writes that more than 2500 records from 1925 digitized by the UC Santa Barbara Library will soon be freely downloadable there. A full listing of these recordings is online (though note that recordings made in 1925 but not released that year won’t be public domain yet).

A highlight of the collection is “St. Louis Blues”, sung by Bessie Smith with Louis Armstrong on cornet. One of the top selling records of 1925, it will be public domain in 3 days. #PublicDomainDayCountdown

Marlene Dietrich comes to America / John Mark Ockerbloom

Marlene Dietrich enjoyed success on stage and screen in 1920s Berlin, but became an international star in 1930. That year she came to the United States to star in Morocco alongside Gary Cooper. Her performance was nominated for an Academy Award. So was the direction by Josef von Sternberg, who also directed her in The Blue Angel and several other films. Morocco was inducted into the National Film Registry in 1992, and will be inducted into the public domain in 4 days. #PublicDomainDayCountdown

Mind The GAAP / David Rosenthal

Senator Everett Dirksen is famously alleged to have remarked "a billion here, a billion there, pretty soon you're talking real money".

Source

Oracle is talking real money; they're borrowing $1.64B each working day. Mr. Market is skeptical that the real money is going to be repaid, as Caleb Mutua reports in Morgan Stanley Warns Oracle Credit Protection Nearing Record High:

A gauge of risk on Oracle Corp.’s (ORCL) debt reached a three-year high in November, and things are only going to get worse in 2026 unless the database giant is able to assuage investor anxiety about a massive artificial intelligence spending spree, according to Morgan Stanley.

A funding gap, swelling balance sheet and obsolescence risk are just some of the hazards Oracle is facing, according to Lindsay Tyler and David Hamburger, credit analysts at the brokerage. The cost of insuring Oracle Corp.’s debt against default over the next five years rose to 1.25 percentage point a year on Tuesday, according to ICE Data Services.

Mutua reports that:

The company borrowed $18 billion in the US high-grade market in September. Then in early November, a group of about 20 banks arranged a roughly $18 billion project finance loan to construct a data center campus in New Mexico, which Oracle will take over as tenant.

Banks are also providing a separate $38 billion loan package to help finance the construction of data centers in Texas and Wisconsin developed by Vantage Data Centers,

Source

But notice that only $18B of this debt appears on Oracle's balance sheet. Despite that, their credit default swaps spiked and the stock dropped 29% in the last month.

Below the fold I look into why Oracle and other hyperscalers desperate efforts to keep the vast sums they're borrowing off their books aren't working.

Part of the reason the market is unhappy started in mid-September with The Economist's The $4trn accounting puzzle at the heart of the AI cloud. It raised the issue that I covered in Depreciation, that the hardware that represents about 60% of the cost of a new AI data center doesn't last long. It took a while for the financial press to focus on the issuea, but now they have.

The most recent one I've seen was triggered by the outage at the CME (caused by overheating in Chicago in November!). In AI Can Cook the Entire Market Now Tracy Alloway posted part of the transcript of an Odd Lots podcast with Paul Kedrosky pointing out a reason I didn't cover why the GPUs in AI data centers depreciate quickly:

When you run using the latest, say, an Nvidia chip for training a model, those things are being run flat out, 24 hours a day, seven days a week, which is why they're liquid-cooled, they're inside of these giant centers where one of your primary problems is keeping them all cool. It's like saying ‘I bought a used car and I don't care what it was used for.’ Well, if it turns out it was used by someone who was doing like Le Mans 24 hours of endurance with it, that's very different even if the mileage is the same as someone who only drove to church on Sundays.

These are very different consequences with respect to what's called the thermal degradation of the chip. The chip's been run hot and flat out, so probably its useful lifespan might be on the order of two years, maybe even 18 months. There's a huge difference in terms of how the chip was used, leaving aside whether or not there's a new generation of what's come along. So it takes us back to these depreciation schedules.

There was a similar problem after the Ethereum merge:

73% of Ethereum miners have just given up: “About 10.6 million RTX 3070 equivalents have stopped mining since the merge.”

We strongly recommend that you do not hit eBay for a cheap video card, despite the listings reassuring you that this card was only used by a little old lady to play Minecraft on Sundays and totally not for crypto mining, and that you should ignore the burnt odor and the charred RAM. Unless you’re poor, and the card’s so incredibly cheap that you’re willing to play NVidia Roulette.

How well do miners treat their precious babies? “GPU crypto miners in Vietnam appear to be jet washing their old mining kit before putting the components up for sale.” There are real cleaning methods that involve doing something like this with liquid fluorocarbons — but the crypto miners seem to be using just water.

But this depreciation problem is only one part of why the market is skeptical of the hyperscalers technique for financing their AI data centers. The technique is called Conduit Debt Financing, and Les Barclays' Unpacking the Mechanics of Conduit Debt Financing provides an accessible explanation of how it works:

Conduit debt financing is a structure where an intermediary entity (the “conduit”) issues debt securities to investors and passes the proceeds through to an end borrower. The key feature distinguishing conduit debt from regular corporate bonds is that the conduit issuer has no substantial operations or assets beyond the financing transaction itself. The conduit is purely a pass-through vehicle, the debt repayment relies entirely on revenues or assets from the ultimate borrower.

Think of it this way: Company A wants to borrow money but doesn’t want that debt appearing on its balance sheet or affecting its credit rating. So it works with a conduit entity, Company B, which issues bonds to investors. Company B takes that capital and uses it to build infrastructure or acquire assets that Company A needs. Company A then enters into long-term lease or service agreements with Company B, and those payments service the debt. On paper, Company A is just a customer making payments, not a debtor owing bondholders.

The structure creates separation. The conduit issuer’s creditworthiness depends on the revenue stream from the end user, not on the conduit’s own balance sheet (because there isn’t really one). This is why conduit debt is often referred to as “pass-through” financing, the economics flow through the conduit structure to reach the underlying obligor.

The article continues to examine Meta's deal in great detail, and notes some of the legal risks of this technique:

Legal risks when things break: Substantive consolidation (court merges conduit with sponsor), recharacterization (lease treated as secured financing), and fraudulent transfer challenges. The structures haven’t been stress-tested yet because hyperscalers are wildly profitable. But if AI monetization disappoints or custom silicon undercuts demand, we’ll discover whether bondholders have secured claims on essential infrastructure or are functionally unsecured creditors of overleveraged single-purpose entities.

The article asks the big question:

Why would Meta finance this via the project finance markets? And why does it cost $6.5 billion more?

That’s how much more Meta is paying to finance this new AI data center using the project finance market versus what they could have paid had they used traditional corporate debt. So why on earth is this being called a win? And even crazier, why are other AI giants like Oracle and xAI looking to copy it?

The $6.5B is the total of the 1% extra interest above Meta's corporate bond rate over the 20 years.

Meta data center

If Counduit Debt Financing is a standard tool of project finance, why is Mr. Market unhappy with the hyperscalers' use of it? Jonathan Weil's somewhat less detailed look at Meta's $27B deal in AI Meets Aggressive Accounting at Meta’s Gigantic New Data Center reveals how they are pushing the envelope of GAAP (Generally Accepted Accounting Principles):

Construction on the project was well under way when Meta announced a new financing deal last month. Meta moved the project, called Hyperion, off its books into a new joint venture with investment manager Blue Owl Capital. Meta owns 20%, and funds managed by Blue Owl own the other 80%. Last month, a holding company called Beignet Investor, which owns the Blue Owl portion, sold a then-record $27.3 billion of bonds to investors, mostly to Pimco.

Meta said it won’t be consolidating the joint venture, meaning the venture’s assets and liabilities will remain off Meta’s balance sheet. Instead Meta will rent the data center for as long as 20 years, beginning in 2029. But it will start with a four-year lease term, with options to renew every four years.

This lease structure minimizes the lease liabilities and related assets Meta will recognize, and enables Meta to use “operating lease,” rather than “finance lease,” treatment. If Meta used the latter, it would look more like Meta owns the asset and is financing it with debt.

Under GAAP, when would Meta be required to treat it as a finance lease?

The joint venture is what is known in accounting parlance as a variable interest entity, or VIE for short. That term means the ownership doesn’t necessarily reflect which company controls it or has the most economic exposure. If Meta is the venture’s “primary beneficiary”—which is another accounting term of art—Meta is required to consolidate it.

Under the accounting rules, Meta is the primary beneficiary if two things are true. First, it must have “the power to direct the activities that most significantly impact the VIE’s economic performance.” Second, it must have the obligation to absorb significant losses of the VIE, or the right to receive significant benefits from it.

Does Meta have “the power to direct the activities" at the data center it will operate?:

Blue Owl has control over the venture’s board. But voting rights and legal form aren’t determinative for these purposes. What counts under the accounting rules is Meta’s substantive power and economic influence. Meta in its disclosures said “we do not direct the activities that most significantly impact the venture’s economic performance.” But the test under the accounting rules is whether Meta has the power to do so.

Does Meta receive "significant benefits"? Is it required to "absorb losses"?:

The second test—whether Meta has skin in the game economically—has an even clearer answer. Meta has operational control over the data center and its construction. It bears the risks of cost overruns and construction delays. Meta also has provided what is called a residual-value guarantee to cover bondholders for the full amount owed if Meta doesn’t renew its lease or terminates early.

The lease is notionally for 20 years but Meta can get out every four years. Is Meta likely to terminate early? In other words, how likely in 2041 is Meta to need an enormous 16-year old data center? Assuming that the hardware has an economic life of 2 years, the kit representing about 60% of the initial cost would be 8 generations behind the state of the art. In fact 60% of the cost is likely to be obsolete by the first renewal deadline, even if we assume Nvidia won't actually be on the one-year cadence it has announced.

But what about the other 40%? It has a longer life, but not that long. The reason everyone builds new data centers is that the older ones can't deliver the power and cooling current Nvidia systems need. 80% of recent data centers in China are empty because they were built for old systems.

But the new ones will be obsolete soon:

Today, Nvidia's rack systems are hovering around 140kW in compute capacity. But we've yet to reach a limit. By 2027, Nvidia plans to launch 600kW racks which pack 576 GPU dies into the space one occupied by just 32.

Current data centers won't handle these systems - indeed how to build data centers that do is a research problem:

To get ahead of this trend toward denser AI deployments, Digital Realty announced a research center in collaboration with Nvidia in October.

The facility, located in Manassas, Virginia, aims to develop a new kind of datacenter, which Nvidia CEO Jensen Haung has taken to calling AI factories, that consumes power and churn out tokens in return.

If the design of data centers for Nvidia's 2027 systems is only now being researched, how likely is it that Meta will renew the lease on a data center built for Nvidia's 2025 systems in 2041? So while the risk that Meta will terminate the lease in 2029 is low, termination before 2041 is certain. And thus so are residual-value guarantee payments.

How does the risk of non-renewal play out under GAAP?

Another judgment call: Under the accounting rules, Meta would have to include the residual-value guarantee in its lease liabilities if the payments owed are “probable.” That could be in tension with Meta’s assumption that the lease renewal isn’t “reasonably certain.”

If renewal is uncertain, the guarantee is more likely to be triggered. But if the guarantee is triggered, Meta would have to recognize the liability.

Weil sums it up concisely:

Ultimately, the fact pattern Meta relies on to meet its conflicting objectives strains credibility. To believe Meta’s books, one must accept that Meta lacks the power to call the shots that matter most, that there’s reasonable doubt it will stay beyond four years, and that it probably won’t have to honor its guarantee—all at the same time.

David Sacks Nov 6

These accounting shenanigans explain why Sam Altman said the queit part out loud recently and then had to walk it back. Jose Antonio Lanz reports this in OpenAI Sought Government Loan Guarantees Days Before Sam Altman's Denial (my emphasis):

OpenAI explicitly requested federal loan guarantees for AI infrastructure in an October 27 letter to the White House—which kindly refused the offer, with AI czar David Sacks saying that at least 5 other companies could take OpenAI’s place—directly contradicting CEO Sam Altman's public statements claiming the company doesn't want government support.

The 11-page letter, submitted to the Office of Science and Technology Policy, called for expanding tax credits and deploying "grants, cost-sharing agreements, loans, or loan guarantees to expand industrial base capacity" for AI data centers and grid components. The letter detailed how "direct funding could also help shorten lead times for critical grid components—transformers, HVDC converters, switchgear, and cables—from years to months."

After this PR faux pas some less obvious way taxpayer dollars could keep the AI bubble inflating had to be found. Just over two weeks later Thomas Beaumont reported that Trump signs executive order for AI project called Genesis Mission to boost scientific discoveries:

Trump unveiled the “Genesis Mission” as part of an executive order he signed Monday that directs the Department of Energy and national labs to build a digital platform to concentrate the nation’s scientific data in one place.

It solicits private sector and university partners to use their AI capability to help the government solve engineering, energy and national security problems, including streamlining the nation’s electric grid, according to White House officials who spoke to reporters on condition of anonymity to describe the order before it was signed.

This appears to be a project of David Sacks, the White House AI advisor and a prominent member of the "PayPal Mafia". Sacks was the subject of a massive, 5-author New York Times profile entitled Silicon Valley’s Man in the White House Is Benefiting Himself and His Friends:

Mr. Sacks has offered astonishing White House access to his tech industry compatriots and pushed to eliminate government obstacles facing A.I. companies. That has set up giants like Nvidia to reap an estimate of as much as $200 billion in new sales.

Mr. Sacks has recommended A.I. policies that have sometimes run counter to national security recommendations, alarming some of his White House colleagues and raising questions about his priorities.

Mr. Sacks has positioned himself to personally benefit. He has 708 tech investments, including at least 449 stakes in companies with ties to artificial intelligence that could be aided directly or indirectly by his policies, according to a New York Times analysis of his financial disclosures.

His public filings designate 438 of his tech investments as software or hardware companies, even though the firms promote themselves as A.I. enterprises, offer A.I. services or have A.I. in their names, The Times found.

Mr. Sacks has raised the profile of his weekly podcast, “All-In,” through his government role, and expanded its business.

The article quotes Steve Bannon:

Steve Bannon, a former adviser to Mr. Trump and a critic of Silicon Valley billionaires, said Mr. Sacks was a quintessential example of ethical conflicts in an administration where “the tech bros are out of control.”

“They are leading the White House down the road to perdition with this ascendant technocratic oligarchy,” he said.

David Sacks Nov 24

Gary Marcus asked Has the bailout of generative AI already begun?:

“The way this works”, said an investor friend to me this morning: “is that when Nvidia is about to miss their quarter, Jen Hsun calls David Sacks, who then gets this government initiative to place a giant order for chips that go into a warehouse.”

I obviously can’t confirm or deny that actually happened. My friend might or might not have been kidding. But either way the White House’s new Science and AI program, Genesis, announced by Executive Order on Monday, does seem to involve the government buying a lot of chips from a lot of AI companies, many of which are losing money.

And David Sack’s turnaround from “read my lips, no AI bailout” (November 6) to “we can’t afford to [let this all crash]” tweet (November 24) came just hours before the Genesis announcement.

I think the six companies Sacks was talking about are divided into two groups:

OpenAI, Anthropic and xAI, none of whom have a viable business model.
Meta, Google and Microsoft, all of whom are pouring the cash from their viable business models into this non-viable business,

This is the reason why the hyperscalers are taking desperate financial measures. They are driven by FOMO but they all see the probability that the debt won't be paid back. Where is the revenue to pay them back going to come from? It isn't going to come from consumers, because edge inference is good enough for almost all consumers (which is why 92% of OpenAI's customers pay $0). It isn't going to come from companies laying off hordes of low-paid workers, because they're low-paid.

So before they need to replace the 60% of the loan's value with the next generation of hardware in 2027 they need to find enterprise generative AI applications that are so wildly profitable for their customers that they will pay enough over the cost of running the applications to cover not just the payments on the loans but also another 30% of the loan value every year. For Meta alone this is around $30B a year!

And they need to be aware that the Chinese are going to kill their margins. Thanks to their massive investments in the "hoax" of renewable energy, power is so much cheaper in China that systems built with their less efficient chips are cost-competitive with Nvida's in operation. Not to mention that the Chinese chip makers operate on much lower margins than Nvidia. Nvidia's chips will get better, and so will the Chinese chips. But power in the US will get more expensive, in part because of the AI buildout, and in China it will get cheaper.

This won't end well

Update: 28^th December 2025

Source

Scott Galloway agrees with me in 2026 Predictions:

If I were advising Xi, I’d counsel him to go for the jugular by engaging in AI-dumping, a repeat of their aughts steel-dumping playbook. It’s already underway — and working. Eighty percent of a16z startups use open-source Chinese models. Same story at Airbnb. China is registering similar or better performance as the American LLM leaders, but with a fraction of the capex. Flooding the market with competitive, less-expensive AI models will put pressure on the margins and pricing power of the Mag 7, taking down a frighteningly concentrated S&P and likely sending the U.S., possibly the globe, into recession.

The average of the three US models' cost is $12.33. The average of the three Chinese models' cost is $1.36. The US models are 9 times more expensive, but they are nowhere near 9 times better.

2025-12-28: IEEE High-Performance Computing 2025 Trip Report / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

The primary objective of this trip was to present our work at the 32nd IEEE International Conference on High-Performance Computing and Data Analytics (HiPC 2025) held from December 17th to 20th in Hyderabad, India. The work presented was primarily developed during my summer internship at Fermi National Accelerator Laboratory (Fermilab) under the supervision of Dr. Marc Paterno.

This was only the second conference I have ever attended, with my first being ACM Hypertext in September, which focused on hypertext and web-based systems. Having that experience as a reference point made it easier to navigate HiPC, which also highlighted how different large international conferences can feel in terms of scale, structure, and research focus.

This trip can be broken down into two main parts: a short sightseeing visit to Delhi and Agra, followed by the conference in Hyderabad.

Delhi and Agra

The trip started with a couple of days in New Delhi. Since I had not been to this part of the world before, I wanted to take the opportunity to explore the city before the conference. Delhi is enormous, both geographically and culturally. Over the first two days of the trip, I ended up walking more than 50 kilometers.

During this time, I visited several UNESCO World Heritage sites, including the Taj Mahal, Agra Fort, and landmarks within Delhi such as Qutub Minar, which is the world's tallest brick minaret. While in Delhi, I also met up with a few friends from Fermilab. Although we didn't do any sightseeing together, we did manage to go out for dinner one evening, which was a nice break from traveling and a fun way to catch up before the conference started. Starting the trip with sightseeing was a great contrast to the dense technical program that followed.

HiPC 2025

I arrived in Hyderabad on December 17, one day before the main technical program began. HiPC 2025 turned out to be the largest conference I have attended so far, both in terms of attendance and breadth of topics covered, spanning high-performance computing, AI systems, and quantum computing.

Day 1: December 18, 2025

The main conference started on December 18. One notable statistic that stood out in the opening session was that only 29% of all submitted papers were accepted to the main proceedings. I was fortunate that Zeus was part of that small fraction.

The Day 1 keynote was given by Dr. Pratyush Kumar from Sarvam AI. His talk focused on what it actually takes to train large language models from scratch. He walked through the challenges of setting up compute and data infrastructure and shared lessons learned while building LLMs in practice.

One part I found especially interesting was his discussion of real-world applications, including examples where language models helped with educational videos with real-time audio in multiple languages, while keeping the same voice as the original speaker. Overall, the keynote gave a very practical view of LLM development, beyond just model architectures.

The rest of the day featured workshops and technical sessions covering HPC systems, AI, and education.

Day 2: December 19th, 2025

The second day was truly special, as it was the day I presented our work at the conference. The Day 2 keynote was delivered by Dr. Jasjeet Singh Bagla from IISER Mohali, who gave an overview of the history and evolution of high-performance scientific computing in India, starting from early academic efforts and moving toward current large-scale systems. He also discussed challenges faced by the scientific community, especially as we move toward exascale computing and increased use of AI/ML in scientific applications. A major point he emphasized was that effective use of HPC systems depends not just on hardware, but also on ease of use, documentation, maintenance, training, and user support.

During the invited talks on the second day, I also listened to a talk by Aravind Ratnam from Q-CTRL titled “Q-CTRL’s critical infrastructure software unlocks useful performance in Quantum Computers.” His talk focused on how infrastructure software plays a crucial role in making today's quantum computers actually useful, especially by reducing errors and improving reliability.

He explained how Q-CTRL's software helps unlock better performance across a wide range of applications, including logistics, scheduling, quantum machine learning, and automotive design. He also shared real-world deployments with industry partners such as Airbus, Mitsubishi, and Mazda.

Another interesting point he made was about the future direction of quantum computing, including integrating quantum systems into data centers and using smaller, more affordable quantum processors that can still deliver useful results when paired with strong, pre-validated software. Overall, the talk gave a clear and practical perspective on where quantum computing is today and how software will play a major role in its progress.

Later in the day, during Technical Session 4 (Algorithms), I presented our work titled: “Zeus: An Efficient GPU Optimization Method Integrating PSO, BFGS, and Automatic Differentiation”. This presentation marked a major milestone for me. In this work, we introduce Zeus, a GPU-accelerated global optimization algorithm that combines:

Particle Swarm Optimization to perform an initial global search, identify promising regions of the search space,

BFGS, a quasi-Newton method, for fast local convergence,

Automatic Differentiation (AD) to compute gradients accurately without requiring users to manually derive them,

and massively parallel GPU execution, where hundreds or thousands of independent optimizations run concurrently.

The algorithm operates in two phases. First, a small number of PSO iterations are used to improve the quality of the starting points. In the second phase, each particle independently invokes a BFGS optimization on the GPU, using forward-mode AD to compute gradients efficiently. Once sufficient convergence is reached, all the threads synchronize and terminate to stop early using atomic operations.

By running many independent optimizations in parallel on GPUs, Zeus achieves 10x--100x speedups over a perfectly parallel CPU implementation while also improving accuracy compared to existing GPU-based methods. One of the advantages of the parallel algorithm is that it is less sensitive to poor starting points, whereas for the sequential version, we must repeatedly restart until sufficient convergence is achieved.

In the talk, I also discussed experimental results from both synthetic benchmark functions, such as the Rastrigin and Rosenbrock, and a real-world high-energy physics application. The example plot shows simulated data from proton-proton collisions at the Large Hadron Collider. When protons collide, their quarks and gluons produce sprays of particles called jets. When two jets are produced, their invariant mass can be reconstructed and fitted by minimizing a negative log likelihood. The pull distribution measures how far each data point is from the fit, in units of its expected uncertainty. A good fit should have pulls fluctuating around zero and mostly within ±2σ. This shows agreement between the simulated data and the model prediction. I also touched on current limitations, such as handling objectives with discontinuous derivatives, and outlined future work, including deeper levels of parallelism and improved stopping criteria.

Presenting this work felt especially meaningful because it tied together my internship experience at Fermilab and my growing interest in high-performance computing. It was rewarding to share our ideas with the community and see how the broader themes of the conference connected directly with our contribution.

Day 3: December 20th, 2025

The third and final day focused heavily on AI/ML topics, along with a very interesting keynote speaker, and concluded with a quantum computing workshop.

The Day 3 keynote was given by Dr. Christos Kozryakis from Stanford University and NVIDIA Research. His talk focused on how AI workloads are shaping modern datacenter design. He argued that current AI systems often follow a supercomputing-style approach, which may not be the best fit as models continue to scale.

Instead, he made a case for scale-out AI systems, where efficiency and system-level design play a bigger role. One idea that stayed with me was his discussion of power and energy efficiency, especially the question of how much AI can realistically fit within a gigawatt of power.

Later in the day, I attended the Quantum Computing Workshop, which was one of the highlights of the conference for me. This workshop was particularly exciting for me, as I will be taking the Quantum Computing course in Spring 2026, and I am interested in exploring how Zeus could be mapped into a hybrid classical-quantum optimization algorithm.

To close the workshop, a speaker from Fujitsu presented the current state of their quantum research, including ambitious plans toward a 1000-qubit machine. After the workshop, I had several valuable discussions with experts in the field. In particular, Dr. Anirban Pathak provided initial guidance on how my current algorithm could be adapted toward a hybrid classical-quantum approach.

Additionally, Aravind Ratnam pointed me to Q-CTRL's learning tutorials, which he recommended as an excellent hands-on resource for building a stronger foundation in quantum computing.

To close the conference, I attended the banquet, which featured a cultural program and an Indian Dinner at the rooftop restaurant.

Closing Thoughts

As only my second conference, HiPC 2025 was both intense and deeply rewarding. Compared to my first conference, I felt noticeably more confident presenting my work, asking questions, and engaging with researchers across different fields. At the same time, the experience reinforced a familiar lesson that conferences are just as much about people and conversations as they are about papers and talks.

I am grateful for the opportunity to present this work, for the feedback I received, and for the many discussions that will shape my future research directions. HiPC 2025 was an unforgettable experience, and I hope to return again.

~Dominik Soós (@DomSoos)

One Year of Learning 2025 / Peter Murray

Inspired by Tom Whitwell's 52 things I learned in 2022, I started my own list of things I learned in 2023 and repeated it last year. Reaching the end of another year, it is time for Things I Learned In 2025. Part way through the year I had the brilliant idea of putting a learning at the bottom of my weekly newsletter, and that worked well until the middle of the year when I stopped publishing newsletter issues. So here is a half year of learnings.

What did you learn this year? Let me know on Mastodon or Bluesky.

In Ethiopia, time follows the sun like nowhere else

Because Ethiopia is close to the Equator, daylight is pretty consistent throughout the year. So many Ethiopians use a 12-hour clock, with one cycle of 1 to 12 — from dawn to dusk — and the other cycle from dusk to dawn. Most countries start the day at midnight. So 7:00 a.m. in East Africa Time, Ethiopia&aposs time zone, is 1:00 in daylight hours in local Ethiopian time. At 7:00 p.m., East Africa Time, Ethiopians start over again, so it&aposs 1:00 on their 12-hour clock.

—If you have a meeting in Ethiopia, you'd better double check the time, The World from PRX, 30-Jan-2015

This could have easily gone in the Thursday Threads on time standards. There are 12 hours of daylight, numbered 1 through 12. Then 12 hours of night, numbered 1 through 12. What could be easier?

From Thursday Threads issue 104 on Long Term Digital Storage.

A biographer embedded with the Manhattan Project influenced what we think about the atomic bomb

In early 1945, a fellow named Henry DeWolf Smyth was called into an office in Washington and asked if he would write this book that was about a new kind of weapon that the US was developing. The guy who had called him into his office, Vannevar Bush, knew that by the end of the year, the US was going to drop an atomic bomb that had the potential to end the war, but also that as soon as it was dropped, everybody was going to want to know what is this weapon, how was it made, and so forth. Smyth accepted the assignment. It was published by Princeton University Press about a week after the bomb was dropped. It explained how the US made the bomb, but it told a very specific kind of story, the Oppenheimer story that you see in the movies, where a group of shaggy-haired physicists figured out how to split the atom and fission, and all of this stuff. The thing is, the physics of building an atomic bomb is, in some respects, the least important part. More important, if you actually want to make the thing explode, is the chemistry, the metallurgy, the engineering that were left out of the story.

—Wars Are Won By Stories, On the Media, 22-Jan-2025

The quote above comes from the transcript of this podcast episode. I've thought about this a lot in the past week as the Trump administration's flood-the-zone strategy overwhelms the senses. In a valiant effort to cover everything that is news, I can't help but wonder about the lost perspective of what isn't being covered. And I wonder where I can look to find that perspective.

From Thursday Threads issue 105 on Facial Recognition.

The origin of the computer term "mainframe" comes from "main frame" — the 1952 name of an IBM computer's central processing section

Based on my research, the earliest computer to use the term "main frame" was the IBM 701 computer (1952), which consisted of boxes called "frames." The 701 system consisted of two power frames, a power distribution frame, an electrostatic storage frame, a drum frame, tape frames, and most importantly a main frame.

—The origin and unexpected evolution of the word 'mainframe', Ken Shirriff's blog, 1-Feb-2025

"Mainframe" is such a common word in my lexicon that it didn't occur to me that its origins was from "main frame" — as in the primary frame in which everything else connected. I've heard "frame" used to describe a rack of telecommunications equipment as well, but a quick Kagi search couldn't find the origins of the word "frame" from a telecom perspective.

From Thursday Threads issue 106 on How much do you know about the credit card industry?.

It takes nearly 3¢ to make a penny, but almost 14¢ to make a nickel

FY 2024 unit costs increased for all circulating denominations compared to last year. The penny’s unit cost increased 20.2 percent, the nickel’s unit cost increased by 19.4 percent, the dime’s unit cost increased by 8.7 percent, and the quarter-dollar’s unit cost increased by 26.2 percent. The unit cost for pennies (3.69 cents) and nickels (13.78 cents) remained above face value for the 19th consecutive fiscal year

—2024 Annual Report, United States Mint

I knew pennies cost the U.S. mint more than one cent to make, but I didn't realize that the cost of nickels is so much more out of whack. I also learned a new word: seigniorage — the difference between the face value of money and the cost to produce it.

From Thursday Threads issue 107 on the humble battery. Also this year, the U.S. mint stopped pressing pennies in November.

It is much harder to get to the Sun than it is to Mars

The Sun contains 99.8 percent of the mass in our solar system. Its gravitational pull is what keeps everything here, from tiny Mercury to the gas giants to the Oort Cloud, 186 billion miles away. But even though the Sun has such a powerful pull, it’s surprisingly hard to actually go to the Sun: It takes 55 times more energy to go to the Sun than it does to go to Mars.

—It’s Surprisingly Hard to Go to the Sun, NASA, 8-Aug-2018

I suppose it that headline above needs some nuance. It is easy to get to the Sun...just escape Earth's gravity and point yourself there. It is hard to get to the Sun in a controlled way that means you won't burn up along the way.

From Thursday Threads issue 108 on Educational Technology.

There are now 23 Dark Sky Sanctuaries in the World

Rum, a diamond-shaped island off the western coast of Scotland, is home to 40 people. Most of the island — 40 square miles of mountains, peatland and heath — is a national nature reserve, with residents mainly nestled around Kinloch Bay to the east. What the Isle of Rum lacks is artificial illumination. There are no streetlights, light-flooded sports fields, neon signs, industrial sites or anything else casting a glow against the night sky. On a cold January day, the sun sets early and rises late, yielding to a blackness that envelopes the island, a blackness so deep that the light of stars manifests suddenly at dusk and the glow of the moon is bright enough to navigate by.

—Take a Look: A Dark Scottish Isle Where Starlight Reigns Supreme, New York Times, 24-Feb-2025

The pictures that accompany this article from the New York Times are stunning (gift link). And to think that there are only 23 places in the world that have reached this level of commitment to the environment.

From Thursday Threads issue 109 on Generative AI in Libraries.

Mexico has only one gun store for the entire country

Mexico notes that it is a country where guns are supposed to be difficult to get. There is just one store in the whole country where guns can be bought legally, yet the nation is awash in illegal guns sold most often to the cartels.

—Mexico faces off with U.S. gunmakers at the Supreme Court, NPR, 4-Mar-2025

And not only is there one gun store, the single store in Mexico is located on an army base and is run by soldiers, according to an article in the Associated Press from 2016.

From Thursday Threads issue 110 on Research into Generative AI.

Plants reproduce by spreading little plant-like things

This is where pollen comes in. Like sperm, pollen contains one DNA set from its parent, but unlike sperm, pollen itself is actually its own separate living plant made of multiple cells that under the right conditions can live for months depending on the species... So this tiny male offspring plant is ejected out into the world, biding its time until it meets up with its counterpart. The female offspring of the plant, called an embryosac, which you're probably less familiar with since they basically never leave home. They just stay inside flowers. Like again, they're not part of the flower. They are a separate plant living inside the flower. Once the pollen meets an embryosac, the pollen builds a tube to bridge the gap between them. Now it's time for the sperm. At this point, the pollen produces exactly two sperm cells, which it pipes over to the embryosac, which in the meantime has produced an egg that the sperm can meet up with. Once fertilized, that egg develops into an embryo within the embryosac, hence the name, then a seed and then with luck a new plant. This one with two sets of DNA.

—Pollen Is Not Plant Sperm (It’s MUCH Weirder), MinuteEarth, 7-Mar-2025 Pollen is not sperm...it is a separate living thing! And it meets up with another separate living thing to make a seed! Weird! The video is only three and a half minutes long, and it is well worth checking out at some point today.

From Thursday Threads issue 111 on End-to-end Encryption.

Most plastic in the ocean isn't from littering, and recycling will not save us

Littering is responsible for a very small percentage of the overall plastic in the environment. Based on this graph from the OECD, you can see littering is this teeny-tiny blue bar here, and mismanaged waste, not including littering, is this massive one at the bottom. Mismanaged waste includes all the things that end up either in illegal dump sites or burned in the open or in the rivers or oceans or wherever. The focus on littering specifically, it's an easy answer because obviously there's nothing wrong with discouraging people from littering, but it focuses on individual people's bad choices rather than systemic forces that are basically flushing plastic into the ocean every minute. Mismanaged waste includes everything that escapes formal waste systems. So they might end up dumped, they might end up burned, they might end up in the environment.

—You're Being Lied To About Ocean Plastic, Business Insider via YouTube, 26-Sep-2024

Contrary to popular belief, most plastic in the Great Pacific Garbage Patch stems from the fishing industry, with only a small fraction linked to consumer waste. The video highlights that mismanaged waste, rather than individual littering, is the primary contributor to plastic pollution, with 82% of macroplastic leakage resulting from this issue. It emphasizes the ineffectiveness of recycling as a solution, noting that less than 10% of plastics are currently recycled, and the industry has perpetuated the myth that recycling can resolve the plastic crisis. Microplastics, which are increasingly recognized as a major problem, originate from various sources, including tires and paint, with new data suggesting that paint is a significant contributor.

From Thursday Threads issue 112 on Social Media Research.

"But where is everybody?!?" — the origins of Fermi's Paradox

The eminent physicist Enrico Fermi was visiting his colleagues at Los Alamos National Laboratory in New Mexico that summer, and the mealtime conversation turned to the subject of UFOs. Very quickly, the assembled physicists realized that if UFOs were alien machines, that meant it was possible to travel faster than the speed of light. Otherwise, those alien craft would have never made it here. At first, Fermi boisterously participated in the conversation, offering his usual keen insights. But soon, he fell silent, withdrawing into his own ruminations. The conversation drifted to other subjects, but Fermi stayed quiet. Sometime later, long after the group had largely forgotten about the issue of UFOs, Fermi sat up and blurted out: “But where is everybody!?”

—All by ourselves? The Great Filter and our attempts to find life, Ars Technica, 26-Mar-2025

This retelling of the Fermi Paradox coms from this story about why, despite the vastness of the universe, we have yet to encounter evidence of extraterrestrial civilizations. Enrico Fermi famously posed the question, "Where is everybody?" suggesting a disconnect between the expectation of abundant intelligent life and the lack of observable evidence. With this comes the Great Filter notion...proposing that there may be significant barriers preventing intelligent life from becoming spacefaring. The article goes on to speculate where we are relative to the "Great Filter" — are we past it, or is it yet in front of us? In other words, have we survived the filter or is our biggest challenge ahead of us?

From Thursday Threads issue 113 on Copyright and Foundational AI Models.

The pronoun "I" was capitalized to distinguish it from similarly typset letters

In fact, the habit of capitalizing “I” was also a practical adaptation to avoid confusion, back in the days when m was written “ııı” and n was written “ıı.” A stray “i” floating around before or after one of those could make the whole thing hard to read, so uppercase it went. And now it seems perfectly logical.

—I Have a Capital Suggestion for a New Pronoun, New York Times, 27-Mar-2025

I'm not buying the opinion author's underlying premise (capitalizing “they” in writing when it refers to a nonbinary person), but the origins of why we capitalize "I" and not other pronouns are fascinating.

From Thursday Threads issue 114 on Digital Privacy.

The word "scapegoat" originated in a 1530 bible translation

Early English Christian Bible versions follow the translation of the Septuagint and Latin Vulgate, which interpret azazel as "the goat that departs" (Greek tragos apopompaios, "goat sent out", Latin caper emissarius, "emissary goat"). William Tyndale rendered the Latin as "(e)scape goat" in his 1530 Bible. This translation was followed by subsequent versions up through the King James Version of the Bible in 1611: "And Aaron shall cast lots upon the two goats; one lot for the Lord, and the other lot for the scapegoat."

—Scapegoat, Wikipedia

A close-up of a goat with light brown fur and curved horns is shown next to text. The text reads:

Have you stared at a word and suddenly wondered about its origins? This entry from the New York Times Flashback Quiz had me wondering about "scapegoat". "scape" — "goat". Why do we say that? It comes from a phrase in the bible where a goat sent into the wilderness on the Day of Atonement as a symbolic bearer of the sins of the people — Leviticus 16:22, to be exact. The translator coined the term from the interpretation of "the goat that departs" and "emissary goat" in that verse.

From Thursday Threads issue 115 on Public and Private Camera Networks.

"Leeroy Jenkins!!!!" was staged

It was one of the first memes ever, a viral sensation that went mainstream back when people still used dial-up internet. Yet the cameraman behind “Leeroy Jenkins” still seems stupefied that anyone fell for it.

—The Makers Of 'Leeroy Jenkins' Didn't Think Anyone Would Believe It Was Real, Kotaku, 25-Dec-2017

First posted on May 10, 2005, this year marks the 20th anniversary of this bit of internet folklore. I remember when this first came out, and I totally believed it was real until earlier this year.

From Thursday Threads issue 116 on Government Surveillance.

Ammonium chloride may be the 6th basic taste

Ammonium chloride is a slightly toxic chemical most notably found in “salmiak,” a salt licorice candy, which is popular in northern Europe. In a new study, researchers found that the compound triggers a specific proton channel called OTOP1 in sour taste receptor cells, which fulfills one of the key requirements to be considered a primary taste like sweet, salty, sour, bitter, and umami. Ammonium is commonly found in waste products and decaying organic matter and is slightly toxic, so it makes sense that vertebrates evolved a specific taste sensor to recognize it.

—Ammonium chloride tastes like nothing else. It may be the sixth basic taste, Big Think, 11-Oct-2023

From Thursday Threads issue 117 on Local Government Surveillance.

Banned in Texas / John Mark Ockerbloom

Struggle over academic freedom in Texas state universities has a long history. Today it’s often over race and gender; in the 1940s, it was over things like John Dos Passos’s USA trilogy. When the University of Texas Board of Regents banned it from classrooms, university president Homer Price Rainey objected to their interference. After they fired him, thousands protested on campus. The first part of the USA trilogy, The 42nd Parallel, joins the public domain in 5 days. #PublicDomainDayCountdown

Editorial Demand for AI Research: Evidence from Taylor & Francis Special Issue Calls for Papers / Journal of Web Librarianship

In the public domain soon, in libraries now / John Mark Ockerbloom

The Penn Libraries, where I work, has first editions of many of the works featured in my #PublicDomainDayCountdown . From today through Public Domain Day, the Libraries social media will feature photos of some distinctive books from 1930.

One of featured photos, also shown here, is of the 1930 edition of W. H. Auden‘s Poems, which made his work known to readers worldwide. Glynn Young writes about the “brilliant collection” of 30 poems, and one verse drama, joining the public domain in 6 days.

O come, let all 4,850 of us adore him / John Mark Ockerbloom

In 1925 the Associated Glee Clubs of America put on a concert like no other. 15 choral groups, with over 850 singers in all, came together in New York’s Metropolitan Opera House to sing a program broadcast on radio across America. Portions were electrically recorded, including “Adeste Fideles”, where the audience of 4000 joined in the carol. Lloyd Winstead writes about the record, which is now in the National Recording Registry, and joins the public domain in 7 days. #PublicDomainDayCountdown

The debut of a dramatic duo / John Mark Ockerbloom

Moss Hart wrote the first draft of Once in a Lifetime, a comedy about Hollywood’s transition to “talkies”, as a 25-year-old unknown. Established playwright George S. Kaufman helped revise it into a Broadway hit. Steve Vineberg calls it the “the finest comedy ever written by Americans”, and discusses how it began a long successful collaboration between the two writers. Still making theatergoers laugh in the 21st century, the 1930 play joins the public domain in 8 days. #PublicDomainDayCountdown

2025-12-24: Beyond Alt-Text: Surprising Truths About How Blind Users Experience Online Discussions / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

Figure 1: (a) Difficulties in understanding posts due to missing or assumed context, (b) need for standardization of posts (Figure 1 in M.J. Ferdous et al.).

Introduction

Social media sites such as Reddit, Facebook, and YouTube play a central role in how people exchange ideas, debate current issues, and build communities, with threaded discussions serving as a key mechanism for interaction across these platforms. These platforms are designed around visual structures: nested replies, indentation, spacing, and visual grouping allow sighted users to quickly scan conversations, identify relationships between posts, and decide where to engage. For blind users who rely on screen readers, accessing these conversations involves a fundamentally different interaction model. As shown in the accompanying video, screen readers present content linearly, reading one element at a time while the user moves through posts and interface elements using keyboard commands such as Tab and Shift+Tab. This linear, element-by-element experience sits awkwardly against the visual threading of replies, creating a clunky mismatch between how the conversation is visually structured and how it is revealed through speech. While this provides access to text, it does not preserve the visual organization that conveys conversational structure. Most accessibility efforts on the web focus on technical compliance. For example, in Figure 1(a), a reply references prior context that appears earlier in the thread, but that relationship is not explicitly conveyed. A blind reader must navigate sequentially through multiple preceding posts to reconstruct the missing context, making it difficult to understand the intent of the reply without significant effort. On the other hand, in Figure 1(b), for blind users relying on screen readers, such posts written in an informal, non-standardized style are difficult to interpret aurally, increasing the need for clearer, more standardized wording to support comprehension. As a result, this difference creates a gap between technical accessibility and conversational usability. In this blog, I address the findings from our recently published IJHCI paper “Understanding Online Discussion Experiences of Blind Screen Reader Users” where we examined how blind screen reader users experience and navigate online discussions, with a focus on challenges that extend beyond traditional notions of web accessibility.

Study Overview

Our findings are based on a qualitative interview study with 20 blind individuals who regularly participate in online discussions. Participants ranged in age from 30 to 60 and included both expert (8) and non-expert (12) screen reader users. All participants reported frequent use of platforms such as Reddit, Facebook, and YouTube for reading and contributing to discussions. We relied on in-depth, semi-structured interviews rather than log analysis or automated metrics. This approach allowed participants to describe their experiences, challenges, and strategies in their words (Figure 2). The goal was to capture not only what difficulties occur, but also why they occur and how users adapt to them in practice.

Figure 2: Illustration of the interview study process.(Figure 2 in M.J. Ferdous et al.)

Key Findings

1. Preference for Longer, Context-Rich Posts

A majority of participants (14 out of 20) reported a preference for longer discussion posts rather than short or minimal replies. This preference contrasts with common design assumptions that shorter content is always easier to consume. Participants explained that longer posts often restate context, clarify intent, and make explicit references to earlier points in the discussion. This reduces the need to navigate backward through a thread to recover missing information. Longer posts also help mitigate issues related to pronunciation errors, slang, or non-standard language, which can be difficult for screen readers to handle accurately. For these users, additional detail supports comprehension and reduces cognitive effort.

2. Difficulty Joining Ongoing Conversations

Nearly all participants (18 out of 20) described joining an already active discussion as particularly challenging. When a discussion has many replies, blind users often need to listen to a substantial portion of the thread before understanding the current state of the conversation. This process can take considerable time, especially when new posts continue to appear while the user is still catching up. Participants frequently described feeling out of sync with the conversation. By the time they felt confident enough to contribute, the discussion had often moved on, resulting in fewer responses to their comments. Two expert users reported fewer difficulties, attributing their success to extremely high speech rates and years of experience developing audio-based skimming strategies. However, these strategies require significant effort to acquire and are not representative of typical screen reader use.

3. Impact of Missing Context

All participants reported encountering posts that were difficult to interpret because they lacked sufficient context. Common examples included replies that referenced earlier comments without restating them, posts that relied on images or videos without description, and comments that implicitly addressed specific parts of an article without indicating which section was being discussed. When context is missing, blind users often attempt to navigate backward through the thread to locate the original reference. This process is time-consuming and cognitively demanding, and it does not always succeed. Participants noted that while searching backward, they sometimes forgot the original question or comment that prompted the search, further increasing confusion and frustration.

4. Limitations of a Single Screen Reader Voice

Some participants, particularly those with extensive screen reader experience, highlighted limitations related to how conversations sound. Listening to multi-person discussions through a single, uniform voice made it difficult to distinguish between speakers and reduced engagement. Participants noted that while basic access was available, the experience lacked the social cues present in face-to-face conversations or even audio chats. These users suggested that auditory differentiation, such as varying voice characteristics across speakers, could improve both comprehension and engagement. This finding suggests that once basic access barriers are addressed, experiential factors become increasingly important for long-term participation.

Design Implications

The findings point to several opportunities for improving conversational usability for blind screen reader users. These opportunities focus on reducing cognitive load and improving contextual awareness rather than solely improving access to individual elements.

Thread summarization could help users quickly understand the main points of a discussion without listening to every post.
Context-aware navigation could allow users to follow specific reply chains or sub-conversations more easily.
Text normalization could convert slang, abbreviations, and informal language into more screen-reader-friendly forms while preserving original content.
Auditory differentiation could improve speaker identification and conversational flow in multi-participant discussions.

These directions build directly on participant feedback and reflect practical extensions of existing assistive technologies.

Conclusion

This study highlights that accessibility in online discussions involves more than making text readable by assistive technologies. For blind screen reader users, the primary challenge lies in understanding and participating in conversations that were designed around visual structure and rapid interaction. Addressing these challenges requires attention to conversational context, navigation, and cognitive effort. By examining the lived experiences of blind users, this work emphasizes the importance of designing discussion platforms that support not only access but also meaningful participation. As online discussions continue to shape public discourse, improving conversational usability is an essential step toward more inclusive digital spaces.

-- Md Javedul Ferdous (@jaf_ferdous)

Reference

Md Javedul Ferdous, Akshay Kolgar Nayak, Yash Prakash, Nithiya Venkatraman, Sampath Jayarathna, Hae-Na Lee, and Vikas Ashok. "Understanding Online Discussion Experiences of Blind Screen Reader Users." International Journal of Human–Computer Interaction (2025): 1-31.

See Dick and Jane free / John Mark Ockerbloom

Given how much “Dick and Jane” have been used sardonically, one might think Zerna Sharp’s schoolbook characters were already public domain. But you can’t copyright names, expressive style, or stock situations, and fair use allows limited copying for purposes like criticism, analysis, and parody.

In 9 days, the 1930 Elson Basic Readers introducing Dick and Jane join the public domain, and their stories’ full texts and original artwork can be reused without limit. #PublicDomainDayCountdown

An impressive body of work / John Mark Ockerbloom

Four writers get credit on the 1930 copyright registration for “Body and Soul”: composer Johnny Green, and lyricists Robert Sour, Edward Heyman, and Frank Eyton. But many more artists shaped the perennial jazz standard we know today. Among its more than 1700 recorded interpretations and variations are covers by Coleman Hawkins, Frank Sinatra, Billie Holiday, Ella Fitzgerald, Tony Bennett and Amy Winehouse. The song surrenders itself to the public domain in 10 days. #PublicDomainDayCountdown

Library Futures: Information Literacy in the Age of GenAi: my slides, notes, and an addendum / Mita Williams

This post includes my presentation made before the Library Futures panel discussion about Ai and IL. It also includes an addendum of a couple points I wish I could have added at the time.

Is Tweaking Enough?: A Follow-up Usability Study of Primo VE at an Academic Library / Journal of Web Librarianship

DC.creator
Christina Norton Michelle Nielsen Ott Elizabeth Bloodworth a Cullom-Davis Library, Bradley University, Peoria, IL, USAb Ames Library, Illinois Wesleyan University, Bloomington, IL, USAChristina Norton is an Assistant Professor and Online Learning Librarian at Bradley University in Peoria, IL.Michelle Nielsen Ott is an Assistant Professor and Science/Health Science Librarian and Head of Reference Services at the Cullom-Davis Library at Bradley University in Peoria, IL.Elizabeth Bloodworth is the University Archivist and Special Collections Librarian at the Ames Library at Illinois Wesleyan University in Bloomington, IL. She previously served for 8 years as a Special Collections Assistant at the Cullom-Davis Library at Bradley University in Peoria, IL.
DC.title
Is Tweaking Enough?: A Follow-up Usability Study of Primo VE at an Academic Library
DC.publisher
Journal of Web Librarianship
DC.date
Mon, 22 Dec 2025 05:31:49 +0000
DC.rights

Why doesn’t network difficulty fall?

What’s the deal with Tether?

What explains trash crash PTSD?

Where’s the volume?

Is past performance indicative of future results?

References

Lab Visits

Web Science and Digital Libraries (WS-DL)

Lab for Applied Machine Learning and Natural Language Processing Systems (LAMP-SYS)

Neuro Information Retrieval and Data Science (NIRDS) Lab

Accessible Computing Group

Bioinformatics Lab

High Performance Scientific Computing Team for Efficient Research Simulations (HiPSTERS)

Artificial Intelligence (AI) and Applications Research Group

Hands-On Lab

Data Mining Lab

Internet Security Research Lab

Wadduwage Lab

Summary

🔖 The General Strike

🔖 A Metabolic Workspace

🔖 User Interface by Kent Beck

🔖 How to disable Gemini on Android, Gmail, Chrome, Photos, & Google apps. Opt out of AI tracking now!

🔖 /e/OS

🔖 Digital Sufficiency in Data Centers : Studying the Impact of User Behaviors

🔖 A data model for Git (and other docs updates)

🔖 Zstandard Compression for WARC Files 1.0

🔖 After BowieNet, David Bowie Goes Dark and Shuns Social Media

🔖 The Long Heat:Climate Politics When It’s Too Late

🔖 Edible Perennials for Community Preparedness

🔖 DeltaChat

🔖 Ocean Biodiversity Listening Project

🔖 MTV Rewind

🔖 « Il neige. » Alors faites de vos culs des luges.

🔖 Sorry, Baby (film)

🔖 Deep Learning With Python

🔖 Neur is a programming language based on McCulloch-Pitts neurons.

🔖 The future of htmx

🔖 The Case for Blogging in the Ruins

🔖 Building software to last forever

🔖 Triptych Proposals

🔖 Building a fast website with the MASH stack in Rust

🔖 Paper AI Tigers

🔖 Disrupting the first reported AI-orchestrated cyber espionage campaign

🔖 The XY Problem

🔖 feedtoot

🔖 Getting off US tech: a guide

🔖 Cloudbreak: Your own cloud in 6 hours

Retooling operational structures and service models

Managing implementation is crucial

Additional reading

Ready to dive deeper? Listen to the full recording

This month’s news:

This month’s open DLF group meetings:

Get Involved / Connect with Us

🔖 Lou Reed/Laurie Anderson/John Zorn January 10th, 2008 Stone, NYC pt 3

🔖 John Zorn’s Naked City - The Marquee Club, New York City, NY, 1992-04-09

🔖 Pho & Banh Mi Saigonese

🔖 About Standard Ebooks

🔖 The Rime of the Ancient Maintainer

🔖 One Number I Trust: Plain-Text Accounting for a Multi-Currency Household

🔖 Glamorous Christmas: Bringing Charm to Ruby

🔖 29 Finding a broken trace on my old Mac with the help of its ROM diagnostics

🔖 Public Domain Day 2026

🔖 Pulled 60 Minutes segment on CECOT

🔖 Two Years After Cormac McCarthy’s Death, Rare Access to His Personal Library Reveals the Man Behind the Myth

🔖 How uv got so fast

🔖 Post-Platform Digital Publishing Toolkit

🔖 Giambattista Vico

🔖 Robots Can Be Hacked in Minutes, Chinese Cybersecurity Experts Warn

Students and Faculty

Publications and Presentations

Research Presentations and Outreach

Software, Data Sets, Services

Awards and Recognition

Funding

Summary

Greetings from the new Executive Director

The Attack

Assumptions