Wim Vanderbauwhede

Resisting software driven hardware obsolescence

2025-12-02T00:00:00+00:00

Considering the sheer volume of mobile phones, and the high greenhouse gas emissions resulting from their manufacture, it is imperative to extend their useful life by several decades. What would it take?

Hardware obsolescence

Computer hardware, which includes phones and IoT devices, exists to run software. When hardware becomes obsolete, it is therefore often because the software is no longer supporting it. There are of course other reasons for hardware obsolescence such as changes in standards (e.g. very soon 3G phones will be obsolete because there will be no 3G networks left), or fashion (it has the wrong colour, is considered too big/too small, the camera is not good enough, …). Addressing this requires a change in consumer attitudes. And of course the hardware can fail or break, and not be repairable at the moment. The EU directive on repair of goods will come into force next year and will go some way to address physical hardware repairs. It requires amongst other things at least 7 years availability of spare parts for most common repairs. What is also very interesting is that it comes with a mandatory Repairability Index and Energy Efficiency Index.

I will focus on software driven hardware obsolescence of mobile phones, but the problem exists for other hardware (e.g. laptops, TVs,…) as well.

Extending phone lifetimes to reduce emissions

Currently, most users replace their phones after two to three years. However, most of the greenhouse gas emissions related to phones are incurred during their manufacturing. To minimise the overall emissions, phones should be used for much longer. According to the European Environmental Bureau report “Coolproducts don’t cost the Earth” (from 2019):

from a global warming perspective our phones should last at least 20 years longer than they currently do

For a phone to last 20-25 years, there would be a need for hardware repairs, and the device would have to designed for repairability. At the very least, the battery should be replaceable. But that is not enough. Current devices typically get 4 years of software updates. The EU directive on repair of goods requires at least 5 years. Recently, both Apple and Samsung have increased after-sales software update support to 7 years, which is a considerable improvement and hopefully indicates a trend of users holding on to their phones for longer, but is still not long enough by far.

The result of the short replacement cycles is that there are lots of perfectly good phones out there, if you don’t mind being behind the curve. Buying a used phone means prolonging that phone’s useful life and avoiding manufacturing of a new phone, so it is a good way to reduce emissions.

Ideological detour

I don’t mind being “behind the curve” at all. The curve is a scoial construct created to make you replace your technology early. My previous phone was from 2011 and it lasted until about 2022, when the USB connector finally failed completely. I moved on to an old iPhone which I got from a colleague, until the physical buttons on that one started to fail. Now I have a used phone from 2017. It’s an Android phone but I have not created a Google account on it. I use my phone effectively as barely more than a dumb phone. I don’t have a contract and I very rarerly use data, positioning, Bluetooth or even Wifi. I find there is no need. I also often walk around without a phone. It is hard to avoid all interaction with the likes of Google and Apple, but I don’t have to play their game.

Upgrading the phone software

Older phones will be running older software versions and therefore be of limited use. One option is to install a newer operating system on the phone (more on the terminology later). This is possible for Android devices (and also old Windows phones) but not for Apple devices.

Upgrading the phone software means upgrading the operating system. The operating system is the software that makes it possible to make phone calls, use wifi or bluetooth, run apps etc.

Android, AOSP, LineageOS, /e/

Android is an operating system for mobile phones. It is developed by the Open Handset Alliance, led by Google. The core system is known as the Android Open Source Project (AOSP) and is free/open-source software. Alternatives such as /e/ or LineageOS are based on AOSP and typically support a more recent version of Android.

Even though AOSP is technically free and open, Google drives the development as they have the contracts with the phone makers. If Google would disappear, AOSP and therefore LineageOS and /e/ would no longer progress. Not that this is to my mind a big issue, but it is important to realise that the phone makers will always rely on a large company to provide their OS, rather than on an open soource project, unless such a project would have the backing of a powerful entity (for example, if the EU would sponsor AOSP, it could exists independently of Google).

In an ideal world, upgrading to LineageOS or /e/ or any such alternative would be supported as part of the regular update mechanism of your phone: when the commercial software support is discontinued, you would be given to option to upgrade to an open source variant. In the real world, as the phone manufacturers profit from the short replacement cycles, this is not the case. As a result, while it is techically possible to upgrade the phone software, the process is not seamless. There are many barriers to upgrading your phone.

Terminology

First of all there is the terminology, which speaks of “ROMs”, “flashing”, “sideloading” and a lot besides.

“ROM” stands for Read-Only Memory, but “a ROM” means the file that contains the software to run on your phone. In the olden days, the system software of a computer would be stored in ROM memory.

A phone has internal storage. Part of this storage is used for the operating system. On Samsung phones this is software called Android, as mentioned above this is effectively made and controlled by Google.

“Flashing” derives from “flash memory”, a kind of non-volatile memory that can be erased in a flash. The typical storage in a phone is a kind of flash memory, and “flashing” is the process of writing the ROM in the appropriate locations on that storage.

Tools

When you search on line, you will come across a bewildering array of tools that are all used in the process of upgrading a phone, for example Odin, Fastboot,Freya,OrangeFox, Heimdall, TWRP, adb, Magisk, and many more besides. You only need a few of them, but finding the right ones can be hard.

Finding the right version for your phone

Different models of phones run a slightly different variant of the operating system, even for same verions of Android, because they have slightly different specifications. You have to find the exact version of the software for your phone or the upgrade will fail. (This is changing through the move towards what is called “Generic System Image”, a single ROM that works on many devices, but as far as I can tell neither LineageOS nor /e/ are using this approach yet.)

Just finding out exactly what device you have is not trivial. Consider even the bluff old “Samsung Galaxy A3”. It is not simply that: at the very least it is a “2016” or “2017” (and the difference is not apparent); and the “2017” can be a SM-A320 F,FL, FL/DS, Y or Y/DS; the “2016” has 10 different models. And the “ROM” you need for it will have yet another name.

The upgrade process

Let’s suppose you identified your phone, found a ROM and found appropriate versions of all the tools. What’s next?

To replace the operating system, the phone has a special “recovery mode”, which the user can activate by pressing specific keys. In this recovery mode the phone runs special “recovery” software that can install updates and wipe data. This is what the engineer in the shop uses when your phone is “bricked” because of a software problem.

It is possible to replace this recovery software by different software that can be used to install a different operating system on the phone. For this, the phone has a “download mode”, again activated by pressing specific keys. In that mode, when the phone is connected to a computer, yet another a special program on the computer can install this new recovery software, which can then be used to install the new operating system. Some phones have a “fastboot” mode which has a similar purpose.

In my case, the alternative recovery software I used is called TWRP; the program running on my laptop to install this new recovery software is called Heimdall, the program to install the new operating system is called adb and the new operating system itself is called LineageOS. So what I had to do is to find the correct versions of TWRP and LineageOS for my phone, as well as versions of Heimdall and adb that would work with those, and then carefully follow instructions – and hope for the best.

Case Study: LineageOS on a Samsung Galaxy A3

What is to be gained?

The most recent version of LineageOS I found for the Samsung Galaxy A3 2017 is LineageOS 19.1. This version corresponds to Android Open Source Project version 12.1. The phone came with Android 6.0.1, upgradable to Android 8, so the LineageOS version is four generations more recent. Official Android security updates for this model in stopped 2021. LineageOS stopped updates for 19.1 on officially supported devices in 2024. The Samsung Galaxy A3 2017 was never officially supported, but beggars can’t be choosers. Meanwhile, LineageOS is at 23 and AOSP at version 16, so the updated device is still four generations behind, but that is still better than eight generations. For example, UK Government apps work on anything from Android 10 upwards. (They are only available on Google Play but that’s another story.)

In any case, the limited support (both from Google and from LineageOS) is an organisational decision, it is not because of any device limitations. Ironically, the Samsung Galaxy A3 2016 is supported by AOSP up to version 13. But in principle, they could be supporting the most recent version, even on much older devices. Maybe we’ll live to see that day.

Why do this at all?

The main reason for this experiment was to see what the skills and knowledge are that are needed to do this.

The actual experience

Frankly, it was a bit of a nightmare.

First I tried OpenAndroidInstaller, “The graphical installer that makes installing alternative Android distributions nice and easy”. It runs on Windows and Linux. I am using the Ubuntu flavour of Linux, and installation was easy, but to run it I still needed to open a terminal and type

flatpak run org.openandroidinstaller.OpenAndroidInstaller

First hurdle was that it turned out I had the wrong version of LineageOS (“ROM”). I had searched for “lineageos samsung galaxy A3” and, after some persevering, found the “latest unofficial release” (the “latest official release” was no longer there). Turns ot this file is for a3xelte (i.e. the 2016 version of the A3) and I needed a3y17lte (because it turned out at that point that my device was a 2017 A3). I found lineage-19.1-20220904-UNOFFICIAL-a3y17lte.zip and I also found twrp-3.7.0_9-0-a3y17lte.img, so all was well.

Then it got stuck because of insufficient permissions on the USB device. Thank you, detailed log info! OK, I know how to use sudo and chmod, so I fixed that.

Then it got stuck at the actual OS installation, detailed log said adb -d wait-for-recovery, and waiting it did, forever, because by the time it issued that command the phone was already past the recovery stage. How a simple race condition can throw a spanner in the works. So that was the end of my attempt using the OpenAndroidInstaller. I don’t want to blame the project, It is a really good effort to make things easier. They listed this phone under “Officially supported devices” as “tested”, so it is a bit odd that it did not work for me.

My next attempt was to use TWRP (version twrp-3.7.0_9-0-a3y17lte). I used the “adb sideload” method as explained on the LineageOS wiki. I found it very hard to find this info on the wiki. There is no support for the A3 2017 but there is for the A3 2016, but even that is not official so there is no link there from the landing page.

I did all the wiping etc. using TWRP as instructed, enabled adb sideload, issued

adb -d sideload  lineage-19.1-20220904-UNOFFICIAL-a3y17lte.zip

And that failed, “E2001: Failed to update vendor image.”

Some more internet searching got me both the explanation and the solution, after a fashion.

The problem is that the ROM expects a vendor partition and this was not present. This is because a crucial difference between the A3 2016 and A3 2017: Google introduced Project Treble, a major change to the Android OS to make it easier and faster for manufacturers to update devices to a new version of Android. Crucially, this involved the introduction of a new partition called “vendor”. The factory version of the A3 2017 doesn’t have this, as it predates this change. So the solution is to add this partition and the poster provided a script to do this.

The explanation of how to use the script was “Flash attached zip file”. It did not specify how to flash it. I flashed it with Heimdall, and it did not work. At this point the phone was a brick: it had been wiped completely. I could not get back into the recovery mode or even switch it off.

I re-flashed TWRP and could get back into recovery mode when connected to USB. The phone showed up on my computer so I copied the repartitioner_a3y17lte.zip file to it and ran the script using TWRP’s “Install” function.

This worked, but it installed an older version of TWRP in the process, and on that version, adb sideload did not work.

I found a more recent version of TWRP with support for the vendor partition and flashed that using Heimdall. Now I could finally sideload the actual LineageOS zip file using adb, and it worked.

Required skills and knowledge

I think the most important skill needed to successfully upgrade a phone when things don’t go smoothly is debugging. This is not something you can teach quickly, it is a skill acquired through practice. You need to formulate a hypothesis of what went wrong, and try to test it and then remedy it. But to do that requires an understanding of what is supposed to happen, and that in turn requires an understanding of the overall system and the tools.

In my case, I needed to know how to install and use a series of command-line tools. Without an understanding of what partitions are, I would not have been able to solve the problem as I would not have understood the proposed solution. I also needed to know how to deal with Linux device permissions.

If I hadn’t found that handy partition script, I would have had to repartition the disk myself. This requires quite a bit of detailed Linux knowledge as it is a very critical step.

Finally, if my phone was not supported, e.g. if I wanted the most recent version of LineageOS on it, I would have to change LineageOS software to work with my phone. This requires detailed knowledge and understanding of both the phone hardware and the Android operating system. Android is based on Linux, so it also requires detailed knowledge and understanding of the Linux kernel, which is the part of the operating system that interfaces with the hardware. It also requires familiarity with the process to create a ROM. And it requires programming skills.

If we were serious about the circular economy, we should train the specialists to install upgrades. I think this should not take more than a week. Training the developers of the upgrades would of course take a lot longer, but there might not really be a need. It would be more important to ensure they are properly paid. In this way we would get more recent upgrades, as developers would be paid to create and support them, as well as professionals to install them. In this way anyone could easily prolong the life of their phones through software upgrades.

The banner picture shows a very blurry picture of person seen from the back, on the phone in a busy street.

The Anti-Dystopians’ Guide to Generative AI

2025-10-10T00:00:00+01:00

These are the slides for the talk “The Anti-Dystopians’ Guide to Generative AI – for students & educators” which is based on the guide of the same title written by Alina Utrata.

If you’d like me to deliver this talk for your community, please contact me.

Demystifying AI

2025-09-19T00:00:00+01:00

These are the resources for a webinar I have delivered in collaboration with Catherine Brys. If you’d like me to deliver this webinar for your community, please contact me.

Webinar description

In this webinar I cover:

The difference between AI, Generative AI and Machine Learning and why it matters
How generative AI tools used for daily tasks actually work – and what they can and can’t do
What to be aware of when using generative AI tools or commissioning work that uses generative AI tools

Webinar slides

These are the slides from the webinar of 18 September 2025.

Demystifying AI slides (PDF)

References

Part IV: Societal impact

Generative AI Can Harm Learning. Bastani, Hamsa, Osbert Bastani, Alp Sungu, Haosen Ge, Özge Kabakcı, and Rei Mariman. 2024.
The real problem with the AI hype Wim Vanderbauwhede, January 2025

Books

The Intelligence Illusion Baldur Bjarnason
The AI Con Emily M Bender & Alex Hanna
Resisting AI Dan Mcquillan

Other Resources

AI-free search

“Disable AI” browser plugin (for Firefox and Chrome only)
- Disables Google’s AI Overview, DuckDuckGo’s AI Assist, Ecosia’s AI Overview, Brave Search’s Answer with AI, and Qwant’s AI Flash Answer
Google specific, no plugin needed: add -ai to your search
Or use a different search engine, for example Ecosia or Qwant.

Interview on generative AI

2025-02-05T00:00:00+00:00

I was interviewed by Elena de Sus of Spanish progressive magazine CTXT (Contexto y Acción). The article in Spanish is published on their web site. This is the English version of the interview, published with kind permission of CTXT.

Wim Vanderbauwhede is Professor in Computing Science at the University of Glasgow, where he leads the Low Carbon and Sustainable Computing research group.

He has written about the high energy consumption of the Large Language Models used for generative AI such as ChatGPT. He posits that their projectec expansion is not sustainable.He has is also skeptical that advances in efficiency will lead to a reduction in emissions from this industry.

He talked to CTXT over video call.

You research low-carbon and sustainable computing. How did you get interested in the topic?

I have been aware of climate change for a very long time, since the 1980s. After all, it is nothing new. Originally I am from Belgium, and when I lived there I was active in an environmental organization, I did volunteering work with them.

For my academic career, my focus has been mostly on improving the efficiency of computers. But it’s been known for a long time that if you improve the efficiency of something, usually it means it gets cheaper and then you get more demand, and because you get more demand, your emissions actually go up, not down.

The whole history of the Industrial revolution is one of improved efficiency. The efficiency gains of the steam engine led to us burning lots of coal, because it allowed to pump the mines more efficiently, so mining coal became cheaper.

Computers have become literally millions of times more efficient since 1930s or 40s. At the same time their use has become ubiquitous. So the total emissions from computing have gone up despite all the efficiency savings. There was this conflict with doing efficiency work, and I wanted to look at sustainability of computing more widely. So some years ago, I essentially got the opportunity to start a new research activity in the department that I work and with the support of the head of the department, and that was the Low Carbon and Sustainable Computing group, which we have now.

The term I use is frugal computing. The frugal computing message is that we should use fewer computing resources, just like essentially we should use fewer of any resource if we don’t want catastrophic climate change. We should not go for growth in the terms of growth of resource consumption and growth of energy consumption, because that is destructive. Our whole societal model is built to encourage us to use more resources and more energy. But that is not a sustainable model.

And the opposite is happening because we are developing very resource intensive stuff like generative AI and Bitcoin.

At some point before the AI hype started, there was the Bitcoin bubble and it looked like Bitcoin might end up using a huge amount of compute resources. But Bitcoin is not a viable currency for something like a nation state. The former finance minister of Greece, Yanis Varoufakis, has written extensively to explain it. If there is a need for of evidence, El Salvador has abandoned bitcoin as their currency. That means that Bitcoin and the derived models will probably remain rather popular in some circles, but not grow dramatically. And therefore the emissions will probably not go up a lot. Also, cryptocurrencies like Ethereum, based on the proof of stake protocol as opposed to proof of work, have become more popular. The carbon footprint of those is literally a hundred times less, so the emissions from cryptocurrencies haven’t had a dramatic growth. The current amount of emissions is not dramatic. So if it stays that way it is not really a problem. It could have been different, but it didn’t turn out like that.

AI is a different issue because there’s huge support from governments around the world. Everybody seems to think it’s like magic and it will create unlimited growth. Or maybe even if they don’t believe that, they act like they believe it. That means that the this is a serious driver for creating more computer chips, data centers and generating more electricity. And at the moment, around 70% of the electricity is still from fossil fuels. So it means we’re just going to burn more coal.

So the problem here is the state support of AI?

It works as a delaying factor. If there is a hype bubble, it would normally collapse by itself, because people start to see that there’s nothing in it. But if governments think this is a good idea and they should invest in it, they will commit the investments and the investments will happen even after people start to see that it wasn’t worth it, because they are slow. So the whole thing gets delayed. That’s really what happens. Not so much that it stops the process. It just causes a delay. But in that delay, of course, you create more emissions.

These days it’s very hard to get economic growth. And if you believe that you must get growth, then anything that promises to give you growth is something that you may want to look into. The UK government, for example, is like that. The US government also, they think that AI is going to give them growth, so they commit to investments in that area and those investments will happen. Probably even if the bubble would burst this year, they would still happen, because they have already set things in motion. It’s not that the governments drive the hype, it just doesn’t help. Of course, if your government is saying that AI is good, then it’s much harder for the ordinary person to say that AI is bad.

With the launch of generative AI models by the Chinese company DeepSeek, which are apparently more efficient and had to get around the chip export constraints for China, it looks like the bubble has burst. Nvidia stock has gone down and there has been a lot of discourse about the fact that we don’t need that many data centers if we can have a more efficient AI. But I think you are skeptical about this.

I looked at DeepSeek, based on the information that they have wanted to give. To start with, the narrative that they had to use less capable computer chips, because of the export restrictions of the US government, in short, this is not true. I’ll explain why. There are export restrictions on GPUs and to satisfy these restrictions, Nvidia has created a special series of chips for the Chinese market, and they are less capable in one specific respect: double precision floating point performance. Now, AI does not use double precision floating point performance. This is needed for supercomputers that do scientific computing. But the Chinese have their own very good supercomputers for scientific computing. So they’re not buying Nvidia GPUs for their supercomputers. They’re buying them for AI; and for AI, you don’t need this.

OpenAI, Google, and the rest of the US companies use an Nvidia GPUs called A100 for running the model. For training, they use a more capable type called H100. For the Chinese market Nvidia sells A800 and H800. In their whitepaper, DeepSeek says that they use the H800. Now the H800 is more capable than the A100 in almost every respect. It’s a little bit less capable in the networking. So if you combine several of these GPUs in a network, the bandwidth of the network is less, and that’s what they explained in that paper, what they did to get around that. And that’s nice engineering, but that’s not getting you so much of a benefit.

All in all, it’s not as if this is some really constrained compute device. This is high end. This is actually better than what most of the large companies use now for their data centers.

Deep Seek has been very clever in two ways. They have this app that people started to like. Their pricing is competitive. They have a lot of smaller models that people can play with. And I think that’s what the media has jumped on, these smaller models. But then that’s nothing special. Meta also has released open source smaller models with Llama. They are not really open source, but DeepSeek is also not really open source because the data set is not open. It’s only the code that does the compute that is released as a binary. But that’s a very different discussion.

I think this was a clever way to say that they have a small model, but what runs the main inference is not all that small. Compared to GPT-4, if they can really get the same performance, then they have done something quite clever because they use a lot less of the parameters at any one time. So it will be a bit more energy efficient, but not all that much.

I mean, this idea is clever, they show that it works and it is a good thing. But we get the same problem. If their pricing is competitive, it means it’s cheaper so more people will use it. So it’s very likely that there will not really be a decrease in emissions as a result. It might very well be an increase if the company gets really big.

We talk a lot about the cost of training the generative AI models, but you have written that using them is a lot more costly.

Um, yes. And that’s both true in terms of environmental cost and in terms of financial cost. I’m not the only one to write about this. There’s lots of people who look at the economic costs and say the training cost is fast becoming just a detail. I calculated it.

For inference (running prompts), it scales with the number of users, whereas the model training, of course, only scales if you make a bigger model. And this is probably where Deep Seek has done something clever because their cluster is not very big. So they managed to train the model on a smaller cluster. That saves them their initial cost. Because they started as a small company. So that’s the training cost, but if they are going to become a big company, then they will need lots of data centers for serving all the queries. Then that will be the dominant cost. If you’re an AI company that works with big models and you train them, and that’s a huge cost, then you need a huge amount of users to make it profitable. But to get all those users, you need a lot of hardware to support them, and that’s expensive.

A few years ago, the costs of training were a lot higher because they started by training these models not very efficiently. They didn’t really know how to do this well. So they needed a lot of resources to get a not so good model. They probably needed to do it a few times and so on. But now it’s definitely the cost of inference that dominates. And also the emissions from inference dominate everything.

Do you think like the markets have overreacted a little bit?

Absolutely. Yeah. Especially the US markets because this is Chinese and they are afraid of China. But I think Nvidia should not be worried. I mean, for the reason I explained, their sales depend more on the fact that people project enormous growth.

The CEOs of big tech companies had been saying that they need like a 100 times expansion of chip manufacturing in the next ten years or so. These claims made the market go up. The problem is the data centers are getting committed. And then the electricity generators have to provision the electricity ahead of time because the data center needs to get the electricity as soon as it’s finished. That means if you want more electricity, you have to start building today. And because most of it is not renewable, it’s going to be gas or coal. So even if all this AI stuff does not happen at all, they will have started building it and then they will want to use it because, well, once you’ve built capacity you want to sell your electricity, right? Otherwise you have made a very bad deal. So that’s the damage I think this is doing. It’s the hype that does the damage.

It’s not possible to scale up semiconductor manufacturing by a factor of 100 because, at best, we could scale up the global mining capacity for all the materials that we need to make the chips by a factor of two. So this factor of 100 is not going to happen. And probably all those people know that.

So everybody may know it’s a bubble?

Yes. But it does a lot of damage because it gives the fossil fuel industry a perfect excuse to produce more fossil fuels, to provide the energy that they say we will need for something that is probably not going to happen.

Do you think this large language models are obviously not worth it, right? Even if they can be useful for some things.

Yeah, I personally think that the generative AI that is pushed by OpenAI and then the other companies that follow suit to compete with them, this is not very useful. I mean, it is useful for specific scenarios, but then usually when you have a specific scenario, you could use a much, much smaller model and achieve the same thing.

We have had the really big models that can do everything for everybody since 2020 or so, meanwhile global productivity definitely has not gone up. The companies that start using the Microsoft Copilot and so on, the large language models for programming, they see that it’s problematic, because it’s much harder to debug code that was not written by your own developers, but written by a machine. Although you may think that you write code faster because the machine writes it, the machine doesn’t guarantee your code is correct. It can’t. You know, a generative language model has no notion of what it means. It’s just guessing. So if you’re lucky, the guess works, and usually it doesn’t. And then the developers still have to debug it. And that takes more time because they can’t read the code so well because they haven’t written it.

And there’s a lot of things like that. If you look at generative AI for image processing, for generating images, superficially it looks brilliant, but it’s actually quite average. It will not replace good illustrators because people who really want a decent illustration cannot use this. Do you want to burn the planet to produce cheap illustrations?

I think before generative AI appeared, there was no demand by the people to have it. And so it’s technology push, right? That’s what it’s called, rather than market pull. So the problem is that by creating this extra technology we create a lot of extra emissions at a point in time where we cannot afford any extra emissions. Emissions should go down. It’s not affordable for the planet as a whole. That’s really the problem. Whether it’s useful or not is neither here nor there. It may be extremely useful, but if it’s still burns the planet, it’s no good.

And from the calculations I’ve done, if those business people’s projections would be realized, AI on its own would be enough to miss all the climate targets. Like I said, this is very unlikely. But it means that they don’t care that it would happen. In terms of energy expenditure, we can’t afford it.

Plain and simple.

We can afford smaller models. In computing science, we really make a big distinction between what we prefer to call Machine Learning and what is being called AI, which usually means generative AI.

Yeah. Okay. There’s a lot of confusion with this. Could you explain what is the difference?

The UK government also makes this mistake. They talk about how AI can do great things like detect cancer in an MRI or X-ray image and therefore we should build more data centers for generative AI. But SegNet, the leading model in colon cancer detection, with a 99% accuracy rate, has 7.6 million parameters, while GPT4 has more than a trillion. This means that SegNet uses 100,000 times less energy than GPT4. It can run on a PC in the hospital. You don’t need to build any data center to get better diagnostics. Just a few servers in hospitals.

But is there something in common between these different things that we call AI?

Most models these days use a neural network. A neural network is an abstraction inspired by the brain where essentially you get some inputs which are numbers, and you multiply them by other numbers and you add them, and that gives you another number as an output. And then usually you try to limit the range of that other number or something. That’s what they call a neuron. So something that gets a few inputs multiplies them by weights and then adds them and then normalizes the result and sends that on to another neuron. And if you do that enough times, you get something that actually can do extrapolation on a very large parameter space. So it is very good at… let’s call it guessing, but it’s statistical approximation.

The model that is used for cancer detection is called a convolutional neural network. These are the ones that are used for images. For text, it’s called recurrent neural network. In an image, you have to look at all the data in parallel in space. So all the pixels are next to one another. In language, your words come one after another. So there are basically two types of neural networks. The generative AI, large language models that we use, they are essentially much advanced versions of these simple neural networks that I described.

There is a difference between a model that detects a pattern in an image and a generative model has to produce new text or a new image. It’s more work than just finding a pattern. That’s also why generative models are more expensive in energy terms. They need to do more computations because they do more work.

Some people are saying that there is a limit on the training data, that that these models have already used as much data as possible and they maybe cannot find a lot more. I don’t know if that’s true.

Uh, it’s worse than that. A lot of the content on the internet now is AI generated. And the problem is that if you give an AI model AI generated content as input data, then it tends to get very poor performance very quickly. It’s called poisoning. But it’s not easy to avoid because the bots that scrape the internet cannot tell whether a page is AI generated or not. That actually means the best quality general purpose data sets will be from before 2022.

Also, you can’t really keep on making the models bigger, you have to start doing these things like what DeepSeek does. Actually OpenAI already did that. They just didn’t do it on the same scale. But so OpenAI has a model that is 1.76 trillion parameters, but actually at any point in time they ever use, what was it again, two times 200 billion. So they use a much smaller subset of that. And this is purely because you can’t access all of them all the time. What DeepSeek has shown is that if you use even less, it still works well. Most of the concepts that they use in their paper are already being researched by all the other companies as well.

Anyway, you can’t keep on making them all bigger and expect that it will perform better because there are limits both on the quality of the data, but also on the engineering of making this happen. So yeah, the performance will probably start to stagnate, will not get much better.

So the notion that we can reach some kind of artificial general intelligence by doing this is false?

This is absurd. I mean, I think the people who promote this idea, they know this this is a distraction, right? Because then you can say “Oh, artificial general intelligence would be very dangerous and we have to have all kinds of safeguards in place to make sure that if we have one, it behaves and it does the right thing for us and so on”, and that’s a perfect distraction not to have to worry about all the real negative consequences of the fact that companies put out all these products. That’s to my mind, what’s behind it.

There is no chance that what is effectively just a statistical pattern generator can become intelligent. There is nothing in the model that actually mimics intelligence.

I mean, people have been thinking about artificial intelligence for probably 50 years or more. Very deeply. And I think anyone who really has spent a lot of thought on this would agree that the generative AI models or whatever models that we call AI now really are not of the type that would give us a self-aware piece of software. It seems intelligent because almost everything that we know is in there. A summary of all the knowledge that humans have put online is in those models. So there is an approximation of just about anything in there. But it’s by no means intelligent.

Cheaper AI does not mean greener AI

2025-01-26T00:00:00+00:00

In this article I have a look at the cost of running queries for GPT-4 and similar models, in view of the drop in price per prompt. The main conclusions are:

The energy efficiency gains for queries to large language models (LLM) are not leading to lower emissions.
On the contrary, the lower prices are likely to lead to increased use and therefore higher emissions.
The cost of a query is mainly made up of the fixed cost (capex) of the data centre (building, cooling and network infrastructure) and GPU servers. The electricity consumption contribution is a small proportion.
Therefore, to maximise profit, the GPU server utilisation is optimised to support as many users as possible on the available hardware.
But higher utilisation means higher energy consumption and therefore higher emissions, even if the energy consumption per query would be lower. The projected strong growth in number of queries makes this even worse, as it means the data centre capacity needs to grow steepl as well.

The urgent need to reduce emissions

To reiterate, according to the 2024 Emissions Gap Report of the UN [1], the world must cut global greenhouse gas emissions to 20 gigatons CO₂-equivalent per year (GtCO₂e/y) by 2040 from the current level of 60 GtCO₂e/y to avoid catastrophic global warming, where “catastrophic” is meant quite literally: there will be a huge increase in the frequency and severity of natural catastrophes if we don’t do this. Large parts of the earth will become unsuitable for habitation and agriculture.

To arrive at a sustainable level of emissions by 2040, global CO₂ emissions should be reduced by close to 20% per year. However, currently, emissions are still rising at 1% – 2% per year, despite the increase in renewable electricity generation capacity.

The 2024 Emissions Gap Report of the UN [1] explains in detail why renewables, carbon dioxide removal and carbon offsetting alone will not be sufficient to meet the targets.

Cheaper prompts, greener prompts?

Note on terminology: I use the terms prompt and query interchangeably. The prompt is what you type, the query is the action of sending it to the server. Furthermore, a token is a small group of characters, between a single character and a word.

The price per query or token for various LLMs has come down considerably compared to prices when GPT-3 was released [2].

However, the energy consumption of GPT-4 is still several times larger than for GPT-3 [5], and the energy consumption of Gemini 1.5 Pro is still of the order of GPT-3. How is this compatible with the more than ten times lower prices for GPT-4 compared to GPT-3? Let’s have a look at the figures — and the factors that influence these.

Electricity pricing for data centres is very low

Large users of electricity pay wholesale prices for electrity. So the more electricity you use, the cheaper it is per unit, ironically. Because of their size, Google or OpenAI pay the lowest prices. The price they pay for their electricity is less than 6 cents per kWh [6].

Energy consumption of a GPT-4 style large language model

As I have discussed in detail in my article [7], the best estimate for the electricity consumption for GPT-3 and BLOOM is 0.003 kWh per query. For the queries used in that work, the average query response length was 100 words. At 6 cents per kWh, the electricity cost for such a query would be 0.018 cents, i.e. $0.00018.

GPT-4 is said to be 3× more expensive than GPT-3 [5], but GPT-4 Turbo could be only 1.5× more expensive, as it is a compressed model.

Gemini 1.5 Pro is said to have 200B parameters [17], which is of the same order as GPT-3. Using the cost per query for GPT-3 query, and model energy consumption scaling as the square root with parameter size as in this paper [18], we estimate that it is 1.07× more expensive than GPT-3. Some say that it is only a 120B parameter model [19]; if that is the case, the factor is 0.8×, i.e. slightly cheaper than GPT-3.

What makes up the cost of a query?

There are three main components to the cost of running a query:

the capes cost of the servers,
the capex cost of the data centre and
the running cost of the data centre.

The capex cost of the servers

For example, a Nvidia DGX-A100 server with eight A100 GPUs [9] would cost $240k. (As I sanity check, in Feb 2023, SemiAnalysis [10] quoted $195k for an 8× A100 server.) Running it for a year would cost about $2,400 (using a power consumption of 4550 W as reported by Nvidia [11]). So, for the running cost to exceed the fixed cost, the GPU server would need to run for a hundred years. But the servers will likely be replaced by the next generation GPU, which will arrive after two years, or at best after 5 years, so the hardware cost makes up the majority of the price.

The capex cost of the data centre

Hyperscale data centres are very expensive to build. The cost for a 60MW data centre that could accommodate 10,000 of the above servers is between $420 and $770M for construction [13]. Such a data centre has an expected life of 15 to 20 years [12].

The running cost of the data centre

The running cost of the data centre is dominated by the cost of the electricity for running the servers, network and cooling. In a modern data centre, the contribution of the network and cooling is small, certainly less that 10%.

So let’s consider a 60MW data centre that can host 10,000 servers. As discussed above, running a single server for a year costs $2,400. We assume a conservative model where we replace the servers only after 5 years (usually they are replaced after 3 years). We take the average cost of $595M for construction and a 20-year lifespan. The data centre will not operate at full capacity from the start, so we assume that we start at 1/4 capacity (2,500 servers), and add 1/4 every 5 years for 20 years. With those assumptions, the electricity cost would be $15M/year.

This assumes that server and electricity costs don’t change much over that period.
It is likely that each newer generation of hardware is about twice as energy efficient. If we took that into account, the electricity would only be $2.4M/year, or less than 2% of the cost average over 15 years.
If the costs for servers and electricity decreased over this period, the relative component of the infrastructure would increase but the electricity cost would still be a small proportion.
If servers were replaced more frequently (every two years), then the contribution of the electricity usage to the cost will be even lower.

(For completeness sake, such a 60MW data centre would use about 400M gallons of cooling water per year [14], but that would cost only about $1M/year.)

Overall costs

On a yearly basis, we have $120M/year for the capex contribution of the servers and $30M/year for the capex contribution of the infrastructure. Consequently, more than 70% of the cost of running a query is the capex contribution of the servers ans the $15M/year for the electricity is less than 10% of the total cost of about $165M/year.

What this tells us is that what matters in terms of profit is to optimise the utilisation of those expensive GPUs. So when the cost per query goes down, it is likely the consequence of improved utilisation, which means more users can be supported simultaneously, rather than improved energy efficiency.

Pricing versus energy cost

Let’s consider the pricing for two popular large language models: Google’s Gemini 1.5 Pro and OpenAI’s GPT-4. Both are very recent models and similar in capabilities.

Gemini 1.5 Pro pricing

Generating 10,000 words using Gemini 1.5 Pro (10 RPM) costs ~$0.28 [15] and the cost is proportional to the number of generated words (1000 words is ~$0.028, 100 words is ~$0.003)

GPT-4 pricing

OpenAI charges between $0.030 and $0.120 per 1,000 tokens on GPT-4 [16] depending on the context length. The $0.030 is for GPT-4 Turbo, which is likely smaller than GPT-4.

The price is much higher than the energy cost

From the above data on cost and pricing of the models, we can calculate the price/cost ratios.

For Gemini 1.5 Pro, the ratio is $0.028 / ($0.0018*1.07) = 14.55x; in other words, the price is 15× higher than the electricity cost. If the model was only a 120B parameters, the ratio would be 18.8×.
For GPT-4, the ratios are {0.030/1.5, 0.06/3,0.120/3 }/ 0.0018 = {11.1×, 11.1×, 22.2×}; in other words, the price is 11× or 22× higher than the electricity cost.

These figures are consistent with the relative cost contributions: with the GPU server and data centre capex cost about 10× larger than the electricity cost, the price should indeed be more than 10× that of the electricity consumption.

As shown above, the electricity consumption does not contribute much to the overall cost. Therefore, Google and OpenAI don’t have a huge incentive to prioritise increasing energy efficiency. The main incentive is to increase utilisation. A higher utilisation means lower energy consumption per query and also a small conbtribution of the capex per query. But it also means a higher overall energy consumption.

It’s also worth noting that the drop in prices can’t be explained by the utilisation gains or the energy efficiency gains: going from 50% utilisation to 100% would reduce the capex contribution by a factor of two and the energy consumption per query by 30%. And based on the above estimates, none of the models have improved dramatically in terms of energy efficiency. So most of the price drop is due to increased competition.

A note on the emissions

There are two main components to the emissions for a query: the electricity use and the emissions from manufacturing the server. We have created a detailed life cycle analysis model for the GPU servers in and AI data centre [20] and calculated the embodied carbon emissions and emissions from use for hardware replacement cycles of 2, 3 and 5 years. The results depend on many assumptions, but the conclusion is robust: embodied carbon from manufacturing the servers will be of the same order as the emissions from running the servers. Replacing the servers sooner by newer hardware does not change the overall picture much.

This is mainly because of the strong growth in demand for AI data centres, which leads to production of ever increasing amounts of hardware, and the increased energy efficiency of the new hardware does not make up for this growth. I have used a figure of 22% growth per year as per the analysis by McKinsey [21]. So although the energy efficiency of the hardware increases with every generation, the combination of embodied carbon emissions and emissions from use resulting from the growth in demand results in a huge increase in emissions. For a more detailed discussion on the growth projections of the demand for AI data centres and the concomitant emissions, please read my article “The real problem with the AI hype” [22].

Conclusion

For both Gemini 1.5 Pro and GPT-4, we see that their energy consumption is still of the order of GPT-3, and even with current low prices, the price is more than ten times the energy cost. This is because the high cost and relatively short lifetime of the GPU servers makes up most of the total cost of running a query. Of course the argument is that both models are more capable than GPT-3. But the point is that large-scale deployment of these models leads to unacceptably high and rapidly increasing CO₂ emissions.

From a climate change perspective, energy efficiency gains are only really meaningful if they result in a reduction of the overall emissions. That is clearly not the case. And the low price is likely to make this only worse, as it will drive adoption and further growth in data centres and so increase both embodied carbon and runtime emissions.

References

The real problem with the AI hype

2025-01-16T00:00:00+00:00

Climate change is an environmental problem. But our environment is what allows our society to thrive. The damage from climate change is societal and economical as well as ecological. And the only way to minimise this damage is to reduce global CO₂ emissions. Even keeping them at the current level will cause catastrophic warming. All this is explained in detail in the 2024 Emissions Gap Report of the United Nations [16].

The current push for generative AI is deeply problematic in many ways. In this article I want to focus on the very real environmental damage caused not so much by the technology itself as by the hype surrounding it.

In short:

The hype creates an expectation of huge growth in demand for AI. Data centre companies have to start building capacity before this demand is realised, even if it never materialises. As a result, data centre capacity is being built up right now at an unprecedented scale.
This requires electricity generators to provision capacity for those future data centres, even if they would never be used.
Without strong growth in demand, electricity generators would phase out fossil fuel generation because generating electricity from renewable sources is more cost-effective. Because of the AI hype, they are no longer phasing out fossil fuel generation as they want to maximise generation capacity to maximise future profits. New fossil fuel powered electricity plants are being developed as a result [1] and existing ones are kept open for longer [2].
As electricity generators of course want to optimise current profits as well, they want to sell all the electricity they can generate, rather than let plants idle.
And so global emissions from electricity generation are not decreasing at all, and are even expected to rise in the near future. This at a time when we need to reduce global emissions urgently and drastically.

Breakdown of data centre emissions

The greenhous gas emissions from a data centre can be broken down into a few main components, based on when and where the emissions are incurred:

Emissions incurred while building the data centre infrastructure (mainly the building itself and the cooling system). We will not discuss this further as it is mostly likely a relatively small contribution.
Emissions incurred while manufacturing the servers used in the data centre. The part of server manufacturing that produces by far the most emissions is the chip production. We will therefore focus on emissions from chip production
Emissions incurred while producing the electricity to run the servers. This is called the carbon intensity of electricity generation. This depends on where the data centre is located. But as data centres are globally distributed, we can consider the global average emissions of electricity production.

We call 1. and 2. the “embodied carbon emissions” and 3. the “emissions from use”.

I think it is important to note that the global carbon intensity of electricity generation is decreasing. Unfortunately, this does not result in a decrease in emissions: what we see is that renewables are installed in addition to fossil fuel generation, rather than replacing them.

A rough estimate of the growth in emissions

Let’s first do a rough estimate. I present it mainly to give an idea of the scale of the global data centre power demand and of the chip production for the data centre servers.

Emissions from data centre use

In 2023, the global electricity demand of data centres was 55 GW according to McKinsey [3]. According to Cushman & Wakefield [4] it was 17 GW + 6 GW + 11 GW = 34 GW for Americas, EMEA and APAC combined, with another 49 GW under development. For APAC [5] and for EMEA [6], Savills confirm 11 GW resp. 6 GW; an extensive report by Lawrence Berkeley National Laboratory (LLNL) [7] gives 20 GW for the US. So the figure used by McKinsey seems to be reliable.

In terms of energy consumption that means 482 TWh/year. Global electricity generation is 30,000 TWh/y according to Our World in Data [8].

Embodied carbon emissions

I estimate the production of chips for data centres to consume 20 TWh/y. This is a very rough approximation, and likely an underestimate: I took the data for TSMC in Taiwan, which is producing most of Nvidia’s GPUs but is of course not the only chip producer in the world. But in Taiwan are the four Gigafabs that make most of the GPUs so I used this as my estimate.

TSMC reports that there are in total 8 Gigafabs of which 4 in Taiwan [9]; also in Taiwan, 4 8-inch fabs and 1 6-inch fab; overall, 16 million 12-inch equivalent wafers in 2023. The combined capacity of the four facilities exceeded 12 million 12-inch wafers in 2023. So the four Gigafabs account for 3/4 of the total production.

In 2023, TSMC’s fabs consumed 23 TWh. Based on a report by S&P Global [10] this was 8% of Taiwan’s total electricity production, which was 288 TWh in 2023 [11]. A article from Sept 2024 in IEEE Spectrum says it will be 12.5% in 2025 [12].

Growth in overall data centre emissions

Suppose the demand for AI data centres grows by 10× in ten years. That is about 25% per year and is within McKinsey’s [3] projections. Assuming a global electricity carbon intensity of 480 gCO₂e/kWh [13] or 0.000480 GtCO₂e/TWh, then by 2035 the data centres would consume 4800 TWh/y and this would result in 2.3 GtCO₂e emissions. The chip production would consume 200 TWh/y or an extra 0.1 GtCO₂e. So the total data centre emissions as a result of 25% year on year growth over 10 years would be 2.4 GtCO₂e with this rough estimate.

Issues with this estimate

There are several oversimplifications in the above estimate. The most important one is that we can expect the electricity carbon intensity to keep decreasing over time. This will also affect the emissions from chip production. Furthermore, the servers are not running at full power all the time. So the actual emissions should be lower than this rough estimate. There are many other factors affecting the estimate, so to do this properly we need a much more refined model.

A better estimate of the growth in emissions

We made a model for the evolution of data centre emissions over time, which takes into account both the embodied carbon [15] and the emissions from use [14]. This model does not take into account the embodied carbon emissions from creating the actual infrastructure (data centres themselves, electricity supplies, networking, roads, water supplies). According to UNEP [17], the footprint of the gobal construction sector was 10 GtCO₂e/y. Data centres are only a small fraction of all global construction, but I could not find reliable data and therefore could not include this contribution in the calculations.

Using our model, we arrive at an electricity consumption of 3330 TWh/y for the data centres by 2035. This is of the same order as the rough estimate, but as expected somewhat lower.

The embodied carbon (which we modeled above by the estimate share of TSMC’s electricity) is considerably higher. The overall figure is 2.7 GtCO₂e of additional emissions (rather than the 2.4 GtCO₂e from our rough model; but rough as it was, this is quite close).

This is quite problematic: the total global CO₂ budget for 2035 is 22 GtCO₂e/y according to the UNEP Emissions Gap Report 2024 [16]. Global emissions from electricity generation were 14 GtCO₂e in 2023 and projected to rise to 15 GtCO₂e by 2035 without the growth in AI.

And if this trend would persist, then after 20 years of 19% growth year on year (McKinsey’s lowest estimate), the additional emissions would be almost 20 GtCO₂e/y.

What about a hundred times growth?

If you think that the above growth rates sound crazy, Dell’s CEO has said that data centre capacity will increase by 100× over the next 10 years [18]. And OpenAI’s Altman has said [19] the world need 100× more semiconductor production capacity, which amounts to the same.

Needless to say, 100× growth would be a disaster: even if it happened in 20 years rather than in Michael Dell’s 10 years, it would mean we end up at 58,000 TWh/y just for the data centres, and additional emissions of 31 GtCO₂e/y on a global CO₂ budget of only 10 GtCO₂e/y [16] by 2045. In other words purely the emissions from making and running the servers would be more than the global emissions budget to meet the climate targets.

The hype on its own is the real problem

To come back to my orginal premise: even if the growth in AI never materialises, the hype has set in motion a chain of events which, if allowed to go unchecked, can only lead to a rise in emissions.

Once the extra electricity generation capacity has been created, generators will want to sell that electricity and therefore push hard to increase consumption. They will feel they have little choice, as they need at least to recoup their investment.

Data centre operators also want to make a profit, or at least not a loss, so even if AI would die an ignoble death, they will try to find new workloads, and again push at consumers to use those new services.

In this way the AI hype leads to increase emissions, even if there was no growth in AI workloads. And this at a time when we need to reduce global emissions urgently and drastically. Therefore any source of considerable additional emissions are problematic.

Note: the banner picture is not AI-generated

References

Emissions from ChatGPT are much higher than from conventional search (updated)

2025-01-06T00:00:00+00:00

Chat-assisted search is one of the key applications for ChatGPT. To illustrate the impact of ChatGPT-style augmented search queries more clearly, I compare the energy consumption and emission of a ChatGPT-style query with that of a conventional Google search-style query. If all search queries are replaced by ChatGPT-style queries, what does that mean for energy consumption and emissions?

tl;dr: Emissions would increase by 60x for a GPT-3 style model of around 175B paramters; for a GPT-4 style model, it could be 200x.

In a previous post I wrote about the potential climate impact from widespread adoption of ChatGPT-style Large Language Models. My projections are in line with those made by de Vries in his recent article [5]. In this post, I look in more detail at the increase in energy consumption from using ChatGPT for search tasks. A more detailed analysis is available as a preprint paper.

Update 2025-01-06

I originally published the estimates below on 2023-11-17 and they are for GPT-3. According to George Hotz GPT-4 consists of 8 instances of 220B-parameter GPT-3 models; Dylan Patel and Gerald Wong claim it is 16 instances of each about ~111B, which is the same in total. But they also state that only 2 of these are routed, so the inference is performed on two 111B models; however, they also share Furthermore, ~55B shared parameters for attention, so we get about 280B parameters. For comparison, GPT-3.5 is a single 175B-parameter model. According to Patel and Wong, the inference cost of GPT-4 is 3x more expensive that of than GPT-3.5, which is much more than the increase of 1.6x in parameters would suggest. As for the inference, the energy consumption dominates, this means that GPT-4 likely consume up to 3x more energy than GPT-3.5, and consequently, a search query with a GPT-4 generated AI summary might have emissions of about 200x larger than a search query without AI summary.

Google search energy and emissions

In 2009, The Guardian published an article about the carbon cost of Google search. Google had posted a rebuttal to the claim that every search emits 7 g of CO₂ on their blog. What they claimed was that, in 2009, the energy cost was 0.0003 kWh per search, or 1 kJ. That corresponded to 0.2 g CO₂, and I think that was indeed a closer estimate.

This number is still often cited but it is entirely outdated. In the meanwhile, computing efficiency has rapidly increased: Power Usage Effectiveness (PUE, metric for overhead of the data centre infrastructure) dropped by 25% from 2010 to 2018; server energy intensity dropped by a factor of four; the average number of servers per workload dropped by a factor of five, and average storage drive energy use per TB dropped by almost a factor of ten. Google has released some figures about their data centre efficiency that are in line with these broad trends. It is interesting to see that PUE has not improved much in the last decade.

Therefore, with the ChatGPT hype, I wanted to revise that figure from 2009. Three things have changed: the carbon intensity of electricity generation has dropped [11], server energy efficiency has increased a lot [9], and PUE of data centres has improved [10]. Combining all that, my new estimate for energy consumption and the carbon footprint of a Google search is 0.00004 kWh and 0.02 g CO₂ (using carbon intensity for the US). According to Masanet’s peer-reviewed article [9], hardware efficiency increases with 4.17x from 2010 to 2018. This is a power law, so extrapolating this to 12 years gives 6.70x. I use 12 years instead of 14 from 2009 as typically servers have a life of 4 years. Therefore the most likely estimate is that the current servers are two years old, i.e. they have the efficiency from 2021.

PUE: 1.16 in 2010; 1.1 in 2023;
efficiency increase of hardware in 12 years: 6.70x
US overall carbon intensity: 367 gCO₂/kWh

0.0003*(1.1/1.16)*(1/6.70) = 0.0000424 kWh per search
0.0000424*367 = 0.02 g CO₂ per search

So the energy consumption per conventional search query has dropped by 7x in 14 years. There is quite some uncertainty on this estimate, but it is conservative, so it will not be less than that, but could be up to 10x. Microsoft has not published similar figures but there is no reason to assume that their trend would be different; in fact, their use of FPGAs should in principle lead to a lower energy consumption per query. In that same period, carbon emissions per search have dropped about 10x because of the decrease in carbon intensity of electricity.

ChatGPT energy consumption per query

There are several estimates of the energy consumption per query for ChatGPT. I have summarised the ones that I used in the following table. There are many more, these are the top ranked ones in a conventional search ☺.

Ref	Estimate (kWh/query)	Increase vs Google search
[1]	0.001 - 0.01	24x - 236x
[2]	0.0017 - 0.0026	40x - 61x
[3]	0.0068	160x
[4]	0.0012	28x
[5]	0.0029	68x

Reference [5] is the peer-reviewed article by Alex de Vries. It uses the estimates from [6] for energy consumption but does not present a per-query value so I used the query estimate from [6]. Overall, the estimates lie between 24x and 236x (from [1], which is a collation of estimates from Reddit and therefore very broad) or 28x to 160x (all other sources).

I consider any estimate lower than 0.002 kWh/query overly optimistic and any estimate higher than 0.005 kWh/query overly pessimistic. However, rather than judging, I calculated the mean over all these estimates. I used four types of means. Typically, an ordinary average gives more weight to large numbers; a harmonic mean gives more weight to small numbers. Given the nature of the data, I think the geometric mean is the best estimate:

Type of Mean	Mean increase
Average	88
Median	61
Geometric mean	63
Harmonic mean	48

As you can see, there is not that much difference between the geometric mean and the media. So we can conclude that ChatGPT consumes between fifty and ninety times more energy per query than a conventional (“Google”) search, with sixty times being the most likely estimate.

Other factors contributing to emissions

Training

Contrary to popular belief, it is the use of ChatGPT, not its training, that dominates emissions. I wrote about this in my previous post. In the initial phase of adoption, with low numbers of users, emissions from training are not negligible, but I assume the scenario where conventional search is replaced by ChatGPT-style queries, and in that case emissions from training are only a small fraction. How much is hard to say as we don’t know how frequently the model gets retrained and what the emissions are from retraining; they are almost certainly much lower as the changes in the corpus are small, so it is tuning.

Data centre efficiency

As far as I can tell, PUE is not taken into account in the above estimates. For a typical hyperscale data centre, it is around 1.1.

Embodied carbon

Neither the Google search estimate nor the ChatGPT query estimates include embodied carbon. The embodied carbon can be anywhere between 20% and 50% of the emissions from use, depending on many factors. My best guess is that the embodied emission are proportionate to the energy consumption, so this would not affect the factor much.

Conclusion

Taken all this into account, it is possible that the emissions from a ChatGPT query are more than a hundred times that of a conventional search query. But as I don’t have enough data to back this up, I will keep the conservative estimates from above (50x - 90x; 60x most likely).

Now, if we want sustainable ICT, then the sector as a whole needs to reduce its emissions to a quarter from the current ones by 2040. The combined increase in energy use and growth in adoption of ChatGPT-like applications is therefore deeply problematic.

References

Google Search energy consumption estimates

[7] “The carbon cost of Googling”, Leo Hickman, 2009, The Guardian
[8] “Powering a Google search”, Google, 2009
[9] “Recalibrating global data center energy-use estimates”, Eric Masanet et al, 2020
[10] “Data Centers: Efficiency”, Google, 2023
[11] “Carbon intensity of electricity, 2022”, Our World in Data, 2023)

ChatGPT energy consumption estimates

[1] “AI and its carbon footprint: How much water does ChatGPT consume?”, Nitin Sreedhar, 2023, lifestyle.livemint.com
[2] “ChatGPT’s energy use per query”, Kasper Groes Albin Ludvigsen, 2023, Towards Data Science)
[3] “How much energy does ChatGPT consume?” (Zodhya, 2023, medium.com)
[4] “The carbon footprint of ChatGPT” (Chris Pointon, 2023, medium.com)
[5] “The growing energy footprint of artificial intelligence” (Alex de Vries, 2023, Joule
[6] “The Inference Cost Of Search Disruption – Large Language Model Cost Analysis” (Dylan Patel and Afzal Ahmad, 2023, SemiAnalysis)

FORTRAN, an “infantile disorder”?

2024-11-23T00:00:00+00:00

A few notes on what the computing scientist Edsger Dijkstra said and didn’t say about Fortran, and the origin of that term “infantile disorder”.

Edsger Dijkstra

Edsger Dijkstra was a Dutch computing scientist who helped shape the field in the 1960s-1980s and made important contributions to networking, concurrency and structured programming. He is probably best known for his strong objection to GOTO statements.

What Dijkstra said and didn’t

In an article about Fifty years of BASIC ¹ I came across a claim about something Dijkstra had allegedly said:

He also spewed bile in the direction of FORTRAN (an “infantile disorder”), PL/1 (“fatal disease”) and COBOL (“criminal offense”).

Now, in the article referenced to back up this claim, “How do we tell truths that might hurt?” (1975) , Dijkstra writes:

FORTRAN, “the infantile disorder”, by now nearly 20 years old, is hopelessly inadequate for whatever computer application you have in mind today: it is now too clumsy, too risky, and too expensive to use.

PL/I –“the fatal disease”– belongs more to the problem set than to the solution set.

Note the quotes, which indicate he was quoting was someone else who called FORTRAN that. In The Humble Programmer (1972) he says

When FORTRAN has been called an infantile disorder, full PL/1, with its growth characteristics of a dangerous tumor, could turn out to be a fatal disease.

So there also Dijkstra is saying that someone else called it that, and I will come back to this later. His own thoughts on FORTRAN are quite a bit more nuanced:

The second major development on the software scene that I would like to mention is the birth of FORTRAN. At that time this was a project of great temerity and the people responsible for it deserve our great admiration. It would be absolutely unfair to blame them for shortcomings that only became apparent after a decade or so of extensive usage: groups with a successful look-ahead of ten years are quite rare! In retrospect we must rate FORTRAN as a successful coding technique, but with very few effective aids to conception, aids which are now so urgently needed that time has come to consider it out of date.

Where Dijkstra went wrong in my opinion is in how he followed on from that:

The sooner we can forget that FORTRAN has ever existed, the better, for as a vehicle of thought it is no longer adequate: it wastes our brainpower, is too risky and therefore too expensive to use. FORTRAN’s tragic fate has been its wide acceptance, mentally chaining thousands and thousands of programmers to our past mistakes.

Because what effectively happened was that by 1977, the FORTRAN 77 standard incorporated the basic tenets of structured programming (control structures, recursive subroutines, blocks), and therefore effectively became a structured programming language, or at least a language that allowed structured programming while not enforcing it.

This was most likely a consequence of Dijkstra’s own efforts, and as result we have a programming language that has never been forgotten but indeed has grown to incorporate more and more modern features. But it still has that bad reputation under “serious” computing scientists.

When Dijkstra calls FORTRAN “too risky” to use, it is hard to know which precise risks he had in mind, but presumable a key risk had to do with the fact that in early FORTRAN, every loop was effectively a labeled jump, rather than a proper control structure, and certain ways of selection were thinly disguised conditional jumps. Dijkstra held (see e.g. his famous “GOTO” article) that no loop should have an early exit.

Another obvious risk is that even FORTRAN 77 is not type safe. However, Fortran 90 programs that disallow certain legacy FORTRAN 77 constructs are actually type safe.

And in both cases, it is possible to use source-to-source compilation to convert the risky code into safe code.

The “Infantile Disorder”

The term “infantile disorder” is not a medical term an most likely was borrowed from the famous work by Lenin, “Left-Wing” Communism: An Infantile Disorder. But it should be noted that the original term in Russian, Детская болезнь (Detskaya Bolezn) means something closer to “childhood ailment”; in Dutch, Dijkstra’s mother tongue, it would be “kinderziekte” which is a lot less dramatic. In the first English translation, the term in the title was “infantile sickness”.

And indeed, that was Lenin’s view: “Left-Wing” Communism was merely a childhood ailment which, easy to cure and which, when cured, would result in more robust organism:

There is therefore nothing surprising, new, or terrible in the “infantile disorder” of “Left-wing communism” among the Germans. The ailment involves no danger, and after it the organism even becomes more robust. (Ch. 5)

the error consisted in numerous manifestations of that “Leftwing” infantile disorder which has now come to the surface and will consequently be cured the more thoroughly, the more rapidly and with greater advantage to the organism. (Ch. 8)

Of course, the mistake of Left doctrinairism in communism is at present a thousand times less dangerous and less significant than that of Right doctrinairism (i.e., social-chauvinism and Kautskyism); but, after all, that is only due to the fact that Left communism is a very young trend, is only just coming into being. It is only for this reason that, under certain conditions, the disease can be easily eradicated, and we must set to work with the utmost energy to eradicate it. (Ch. 10)

It is probably a stretch to extend this to the case of FORTRAN, but when we do, we could consider the state of FORTRAN before its standardisation in 1977 the childhood with its ailments; the standardisation process the cure; and the subsequent revisions the growth into the robust and versatile programming language that Fortran is today, including co-arrays, OOP support, and many functional programming features.

¹ I could say a lot about BASIC too, it was the first language I ever learned, thanks to the brilliant TI-994A user manual. I used various dialects over the years, up to Visual Basic. Dijstra said that “It is practically impossible to teach good programming to students that have had a prior exposure to BASIC”. Luckily, nobody taught me how to program.

The banner picture shows the K Computer Mae Station (京コンピュータ前駅) in Kobe, Japan. The K Computer was the most powerful supercomputer in 2011, and its dominant workload until it was decommissioned in 2019 was Fortran.

Decarbonising the Computing Science curriculum

2024-04-17T00:00:00+01:00

In this post I’d like to explain how we have embedded sustainability and decarbonisation learning outcomes in our undergradudate Computing Science and Software Engineering degree programmes.

The ambitious goal

I am a professor in the School of Computing Science of the University of Glasgow, in Scotland. In 2021, I started the Low Carbon and Sustainable Computing research activity in our School (Glasgow has Schools and Colleges where other univs have Departments and Faculties). One of my aims is to get sustainability and decarbonisation learning outcomes embedded in all degree programmes of the University.

Our students are the generation that will be affected strongly by climate change. Therefore, they should learn about the need for decarbonisation, sustainability and sustainable development, the reasons for the current crisis and the ways in which change can be achieved. Through the curriculum, students should be equipped with knowledge, skills, attributes and values to act in their personal and professional lives, spread awareness and help create systemic solutions.

The current state of affairs

From next academic year (2024-25), all undergradudate degree programmes in Computing Science and Software Engineering will have sustainability and decarbonisation learning outcomes embedded in such a way that they must be assessed, in other words teaching sustainability is now a non-optional part of these programmes. I think this is a success, even though there is a long way to go yet.

The approach

In 2022 I wrote a position paper on the need to embed sustainability and decarbonisation in our programmes, and shared it with some key people. My key point was that to realise this goal, Schools needed dedicated support. A lot of academics are willing to teach about sustainability and decarbonisation, but they don’t know where to start and are usually overworked as well. So I proposed to create a new role, that of “Sustainability Subject Advisor”. This person would have the know-how and the time to advise staff on embedding Sustainability and Decarbonisation in their courses and programmes. (The terminology was the subject of much debate, every single word of it — such is academia. Also, it turned out that, while “sustainability” is not controversial, the term “decarbonisation” can be).

The Vice-Principal for Learning and Teaching agreed to sponsor a grassroots initiative in my College. I would have preferred if the University Learning and Teaching Committee had agreed to approve the role at universit level, but at least it meant I had the authority to push for the creation of this new role in my College. The Dean for Learning and Teaching of the College championed the proposal in the College Senior Management Committee, which is made up of all Heads of School. So the role became official in every School.

The implementation

So we went ahead. The Sustainability Subject Advisor my colleague Dr. Lauritz Thamsen and I analysed the programmes and created programme-level Aims and Intended Learning Outcomes (ILOs), which are the core of a programme specification. It is tempting to simply create a new course that will meet the aims, but our experience with other topics such as ethics etc has been that this does not work well, as students consider it “not real computing”. Also, in the Glasgow BSc Honours degrees, 1st and 2nd year do not count towards the degree classification, so our choice was to embed the material throughout the programme in existing courses, and preferentially in 3rd-year ones.

As agreed with the VP, we started with a pilot programme. We analysed the courses and identified suitable courses with coordinators that were willing to support the initiative, and worked with them to define the sustainability Aims and ILOs. When we got approval for the pilot programme (a lengthy process), we repeated the exercise for all our undergraduate programmes. Because they share a common core of mandatory courses, in the end the number of courses with additional sustainability Aims and ILOs was small. Last month, we obtained approval for all these changes, so they can now be rolled out.

This focus on relatively few courses does not mean that we only teach sustainability in those courses. But in those selected courses, these learning outcomes are not optional, so the material must be taught and assessed, whereas in the other courses, it is optional. In this way we ensure that teaching of sustainability is embedded as an essential component of the programmes.

Lessons learned

What we have learned from this initiative is that having a dedicated person like our Sustainability Subject Advisors is a necessary requirement; another key requirement is buy-in from the local stakeholders (lecturers and members of the teaching committee) for the required change. Once you have the buy-in, people will help you to achieve the change. However, to extend this initiative to the whole university, buy-in from high-level management is required, because that is what is needed to ensure that the role can be created in every School in the university, and that the Heads of School will support the initiative locally.

What’s next

I intend to keep working on two fronts: one, to keep on trying to get sustainability embedded in Glasgow’s degree programmes, first in our College, then more widely; and the other, to try and convince departments at other universities to follow suit, by giving talks, and writing articles like this one. So if you’d like me to give a talk to your department on this, please get in touch.

The insatiable hunger of (Open)AI

2024-03-10T00:00:00+00:00

(Open)AI needs enormous amounts of energy and compute hardware. Meeting these needs would lead to a huge increase in CO₂ emissions. The only way to avoid catastrophic warming is to drastically reduce CO₂ emissions. In other words, the planned growth of (Open)AI is entirely unsustainable.

Others have written about the energy needs to provide AI services. Here I focus on the impact of the production of the computer chips needed for this.

The urgent need to reduce emissions

To reiterate, according to the UN Emissions Gap Report 2023, the world must cut global greenhouse gas emissions to 20 gigatons CO₂-equivalent per year (GtCO₂e/y) by 2040 from the current level of 60 GtCO₂e/y to avoid catastrophic global warming, where “catastrophic” is meant quite literally: there will be a huge increase in frequency and severity of natural catastrophes if we don’t do this. Large parts of the earth will become unsuitable for habitation and agriculture.

To arrive at a sustainable level of emissions by 2040, global CO₂ emissions should reduced by close to 20% per year. However, currently, emissions are still rising at 1% - 2% per year.

The Emissions Gap Report explains in detail why renewables, carbon dioxide removal and carbon offsetting alone will not be sufficient to meet the targets.

The growth of AI is unsustainable

Many experts have pointed out that the energy required to provide AI services is huge, and that this in itself means the steep growth of AI is unsustainable. Apart from my own estimates about the energy needs and resulting CO₂ emissions from AI, there have been many other recent articles, for example Kate Crawford in Nature, Kate Saenko in The Conversation, Alex de Vries in specialist journal Joule, or this recent article in The Guardian.

In the rest of this article I focus on OpenAI as the main proponent of this unsustainable growth. There are many other such AI companies (e.g. Anthropic), but OpenAI has been most clear in their messaging about their needs for energy and computer chips.

(Open)AI needs a lot more energy in the world

OpenAI is probably the best known AI company. It is responsible for AI products such as DALL·E, ChatGPT and Sora. OpenAI is a private company owned 49% by Microsoft (49% by all other investors, and 2% by the OpenAI non-profit foundation). So for practical purposes, it is a Microsoft-controlled company, the same Microsoft that claims it will be “carbon negative by 2030”.

In an interview with Bloomberg at the WEF in Davos in January 2024, the CEO of OpenAI, Sam Altman, made clear how huge the energy needs of this company are, and admitted that this is goes contary to meeting the global climate targets:

Interviewer: Considering the compute costs and the need for chips, does the development of AI in the path to AGI threaten to take us in the opposite direction on the climate?

Altman: Yes, we do need way more energy in the world than I think we thought we needed before. My my whole model of the world is that the two important currencies of the future are compute/intelligence and energy. You know, the ideas that we want and the ability to make stuff happen and the ability to like run the compute. And I think we still don’t appreciate the energy needs of this technology.

Then Altman goes on to say we need more nuclear and we need fusion, “at massive scale, like a scale that no one is really planning for”.

Interviewer: So I want to just go back to my question in terms of moving in the opposite direction. It sounds like the answer is potentially yes on the demand side, unless we take drastic action on the supply side.

Altman: But there, there is no – I see no way to supply this, to manage the supply side, without a really big breakthrough.

Interviewer: Right. Which is this is, does this frighten you guys? Because you know, the world hasn’t been that versatile when it comes to supply. But AI as you know, as you have pointed out, is not going to take its time until we start generating enough power.

Altman: It motivates us to go invest more in fusion and invest more in new storage. And not only the technology but what it’s going to take to deliver this at the scale that AI needs and that the whole globe needs.

Where can all that energy come from?

There is a question of timescales here. Most companies have horizons of 5-10 years but not much beyond that. Building new nuclear power plants takes at least 20 years, and fusion at scale is still not even viable in the lab, so at best that will also take another 20 years. And in the meanwhile, we desperately need to cut emissions to stop catastrophic warming, and we can only do this by stopping the use of fossil fuels.

What Altman (and therefore Microsoft) is really saying is therefore “keep the fossil fuels” and even “increase fossil fuel electricity generation” because they know that fusion will not be around, and nuclear capacity will not increase substantially, for another several decades. They also know that the growth in renewables is too slow to meet their demands. So for the next decades, the only way to produce the energy they need is by burning more carbon. And two decades more of emissions from fossil fuels is an unmitigated disaster.

But what about energy efficiency?

The energy efficiency of computing is still doubling every 2.6 years. If we assume, very optimistically, that this trend will hold for another 20 years, then by 2040, computers would be 64 times more energy efficient. So in this best-case scenario, we could double compute capacity every 2.6 years without increasing energy consumption.

Which means that Altman wants AI to grow even faster than that. This is borne out by another action of OpenAI that was recently in the news. To do all that additional compute, they needs a lot more computers, and that requires a dramatic increase in chip manufacturing.

And making chips is one of the human activities that releases huge amounts of greenhouse gases.

(Open)AI also needs tremendous amounts of chips

According to an article of February 2024 in the Wall Street Journal, discussed in more detail on CNBC,

OpenAI CEO Sam Altman wants to overhaul the global semiconductor industry with trillions of dollars in investment

Altman has said AI chip limitations hinder OpenAI’s growth, and as this project would increase chip-building capacity globally

Altman wants to raise $7,000 billion. For reference, the new 2 nm TSMC fab will cost $34 billion. Seven trillion would allow to build two hundred such fabs. According to Z2Data, 16 fabs for < 10 nm are currently being built. So Altman’s plan could increase this by more than ten times, even if some of the money is used for other purposes.

TSMC says that the combined capacity of their four GigaFabs exceeded 12 million 12-inch wafers in 2023. As the fabs also produce silicon for older nodes > 28 nm, I have conservatively assumed that the capacity for a new GigaFab would be 1.2 million 12-inch wafers per year. Using data on embodied carbon of chip production from a paper by researchers from Harvard University, such a fab is responsible for 13.6 MtCO₂e/y of embodied carbon in chips.

If there were two hundred such fabs, that would amount to 2.7 GtCO₂e/y.

Considering the planet’s carbon budget by 2040 is 20 GtCO₂e/y, purely making so many chips would take 14% of the global carbon budget; running them could take again as much, so if this estimate is accurate, this plan could see “AI” eating almost 30% of the global carbon budget for 2040.

Refining the estimates

(tl;dr: estimates have considerable uncertainty but the overall conclusions don’t change)

It looks doubtful that production of raw materials, esp. rare earths, can meet this kind of demand. From January 2024 MIT Technology Review article The race to produce rare earth elements:

According to the IEA, demand for rare earths is expected to reach 3 to 7 times current levels by 2040. Delivering on the 2016 Paris Agreement would require the global mineral supply to quadruple within the same time frame. At the current rate, supply is on track to merely double.

The reason why the 2016 Paris Agreement requires the global mineral supply to quadruple is mainly to do with the need to electrification of the global economy, not with chip production. On the other hand, a gigantic investment in the semiconductor industry would most likely increase the rate of production of raw materials. So we can cautiously assume that the chip production might still quadruple.

This is not necessarily good news though, because, although it means that we can’t produce ten times more chips, and so embodied carbon emissions will grow more slowly, it also means there is less scope for moving to more energy-efficient chips. If a company can’t replace its servers by more energy efficient ones, but still wants to grow its compute capacity, it will need to keep using the previous, less efficient generation. The compute capability of that generation will be similar to the next one, the main difference is in energy efficiency. The result is that growth in compute capacity means growth in emissions.

The estimate from Harvard I used might be on the high side. Imec’s analysis of 2020 gives a considerably lower estimate, nearly ten times lower, but their model only counts water, electricity and greenhouse gases from the process, not the emissions from mining, producing the precursor materials, packaging etc., so it is not the full emissions. Nevertheless, even taking those into account, it might still be 3-4 times lower.

On the other hand, I did not include the older fabs that are still producing chips in my estimate, nor the 16 fabs being built at the moment. The emissions for anything < 28 nm are of the same order of those for a 2 nm process. Using the data from Wikipedia’s List of semiconductor fabrication plants, then the total production capacity of all current fabs < 28 nm amounts to about 14 GigaFabs. Taking into account these 30 fabs would increase my estimate by about 10% to 12%.

Also, in the above calculation I assumed that the chips produced were CPUs or GPUs, as this is what the GigaFab produces. In practice, every server will have RAM and SSDs as well. So let’s assume the two hundred new fabs produce these in equal amounts, so instead of producing four CPUs, for every CPU they will produce a GPU, RAM and SSD. Using data from the 2022 paper by Tannu et al, the contribution of each of these is resp. 4%, 11%, 9% and 38%. Compared to the CPU this means that the GPU has 2.7x more embodied carbon, the RAM 2.2x and the SSD nearly 10x. So we’d need to revise our estimate upwards by a factor of (1+2.7+2.2+10)/4 = 4x. We also see from these figures that the chips amount only for 62% of the total embodied carbon, so a closer estimate for the embodied carbon resulting from the envisaged expansion in semiconductor production capacity might be up to six times higher.

Using the Harvard and Imec figures as upper and lower bounds, and assuming at least a doubling and a most a tenfold increase in production, this would mean that the best estimate is an increase in emissions between 0.7 and 20 GtCO₂e/y. The geometric average would be 3.7 GtCO₂e/y, which is in any case a staggeringly high number, about 20% of the world’s 2040 carbon budget, purely due to the embodied carbon in the production of the chips.

Conclusion: the world can’t afford this growth in AI

Both the embodied carbon an the emissions from use entailed purely by the needs of OpenAI are huge.

Even with my most optimistic estimate, they would account for close to 10% of the world’s 2040 carbon budget. OpenAI’s plans would make emissions from ICT grow steeply at a time when we simply can’t afford any rise in emissions. This projected growth will make it incredible hard to reduce global emissions to a sustainable level by 2040.

In the worst case, the embodied emissions of the chips needed for AI compute could already exceed the world’s 2040 carbon budget. Running the computations would make the situation even worse. AI on its own could be responsible for pushing the world into catastrophic warming.

Universal stack operations using Uxn primitives

2023-12-04T00:00:00+00:00

I wanted to know if the Uxntal primitives are complete, in the sense that they allow access to all elements on the stack (what I call “covering the stack”). And if so, what the minimum set of Uxntal primitives is required to move an element to or from any location on the working stack.

The proof of these claims is by construction of a recursive algorithm, so it is a proof by induction. I have written this in Haskell notation because I think it is most clear. For completeness, I also looked at duplication and deletion. As the set of primitives is Uxn-specific, I assume the presence of a return stack.

tl;dr:

The Uxn primitives are complete (no surprise there)
The instructions SWP, ROT, STH and STHr are sufficient to cover the entire stack.
Adding DUP, we can replicate arbitrary sequences.
Adding POP we can remove arbitrary sequences.

Uxntal and Uxn

Uxntal is the programming language for the Uxn virtual machine. As Uxn is a stack machine, Uxntal is a stack language, similar to e.g. Forth or Joy in that it uses reverse Polish notation (postfix). It is an assembly language with opcodes for 8-bit and 16-bit operations on the stack and memory. To get the most out of this article, it is best if you have basic knowledge of Uxntal, either from the above resources or for example the great tutorial at Compudanzas.

A brief overview of used Haskell syntax

In Haskell, for example a function f of x and y is written as f x y, so no parentheses. Parentheses are only used to enforce precedence, for example f (x+1)

Also, a recursive function can be defined as

    f 1 = ...
    f n = ... f (n-1) ...

So there is no need for an if or case or similar constructs. Also, if a function argument is used in final position on both sides of the definition, it can be omitted, for example

    f x = id x

can be written as

    f = id

(In this example, id is the identity function, which returns its argument)

Finally, composing functions is done using the . operators. For example, f1 (f2 (f3 x)) can be written as (f1 . f2 . f3) x. Combining both features, we can write for example

    f = f1 . f2 . f3

Minimal set of primitives

The type for all primitives is

    prim :: ([Byte],[Byte]) -> ([Byte],[Byte])

The primitive prim takes a tuple (pair) of two lists of bytes and returns a tuple of two lists of bytes. (You can read the arrow as “from … to”.) For this discussion, the stacks don’t have to be of fixed size, so using lists is fine.

The notation x:xs means that x is the first element of a list and xs is the rest of the list.

The following Uxn primitives allow access to the entire stack

        swp (x:y:wst',rst) = (y:x:wst',rst) -- swap in Forth
        rot (x:y:z:wst',rst) = (z:x:y:wst',rst)
        sth (x:wst',rst) = (wst',x:rst)
        sthr (wst,x:rst') = (x:wst,rst')

For replication and deletion, we also need

        dup (x:wst',rst) = (x:x:wst',rst)
        pop (x:wst',rst) = (wst',rst) -- drop in Forth

Accessing the stack

I define two functions, one to move the kth element on the stack to the 1st position, the other its inverse.

Put the kth element of the stack at position 1

In Forth terminology this is called roll.

    -- ( x a b -- a b x )

    k_to_top 1 = id
    k_to_top 2 = swp
    k_to_top 3 = rot
    k_to_top k = swp . sthr . (k_to_top (k-1)) . sth

Put the 1st element of the stack at position k

Somewhat to my surprise, Forth does not define an inverse for roll. Of course it is not necessary, as we can apply roll k k-1 times. But that means quadratic complexity; and I like to have an inverse for every primitive.

    -- ( a b x -- x a b )

    top_to_k 1 = id -- k==1, do nothing
    top_to_k 2 = swp
    top_to_k 3 = rot . rot
    top_to_k k = sthr . (top_to_k (k-1)) . sth . swp

    id = (top_to_k k) .  (k_to_top k)

With these two functions, any element on the stack can be moved between two arbitrary position k1 and k2:

    (top_to_k k2) .  (k_to_top k1)

This proves that the four primitives are sufficient to cover the entire stack.

Growing the stack

Variants that keep the original element in place and so grow the stack are easily defined based on these two:

Put the kth element of the stack at position 1, keep the original element

This function grows the stack by duplication of the element. In Forth terms, this is called pick.

    -- ( x a b -- x a b x )

    k_to_top_keep 1 = dup
    k_to_top_keep k = swp . sthr . (k_to_top_keep (k-1)) . sth

The inverse of this is simply pop (drop in Forth).

    id = pop . (k_to_top_keep k)

Put the 1st element of the stack at position k, keep the original element

This function also grows the stack by duplication of the element:

    -- ( a b x -- x a b x )

    top_to_k_keep 1 = id
    top_to_k_keep k = sthr . (top_to_k k) . sth . dup

The inverse of this requires the ability to delete a value at a given position of the stack:

Shrinking the stack

This function deletes an element from an arbitrary position on the stack.

    -- ( x a b -- a b )

    delete_k 1 = pop
    delete_k k = sthr . (delete_k (k-1)) . sth

With this function, the inverse of top_to_k_keep is delete_k (k+1):

    id = (delete_k (k+1)) . (top_to_k_keep k)

The Haskell code shows these functions at work and demonstrates the inverses. It is straightforward to implement the above functions in Uxntal, I leave that to the reader as an exercise.

Embedding a stack-based programming language

2023-11-27T00:00:00+00:00

When @lizmat asked me to write a post for the Raku advent calendar I was initially a bit at a loss. I have spent most of the year not writing Raku but working on my own language Funktal, a postfix functional language that compiles to Uxntal, the stack-based assembly language for the tiny Uxn virtual machine.

But as Raku is nothing if not flexible, could we do Uxntal-style stack-based programming in it? Of course I could embed the entire Uxntal language in Raku using a slang. But could we have a more shallow embedding? Let’s find out.

Stack-oriented programming

An example of simple arithmetic in Uxntal is

    6 4 3 ADD MUL

This should be self-explanatory: it is called postfix, stack-based or reverse Polish notation. In infix notation, that is 6*(4+3). In prefix notation, it’s MUL(6, ADD(4,3)). The integer literals are pushed on a stack, and the primitive operations ADD and MUL pop the arguments they need off the stack an push the result.

The mighty `∘` operator, part I: definition

In Raku, we can’t redefine the whitespace to act as an operator. I could of course do something like

    my \stack-based-code = <6 4 3 ADD MUL>;

but I don’t want to write an interpreter starting from strings. So instead, I will define an infix operator ∘. Something like this:

    6 ∘ 4 ∘ 3 ∘ ADD ∘ MUL

The operator either puts literals on the stack or calls the operation on the values on the stack.

By necessity, ∘ is a binary operator, but it will put only one element on the stack. I chose to have it process its second argument, and ignore the first one,because in that way it is easy to terminate the calculation. However, because of that, the first element of each sequence needs to be handled separately.

Returning the result

To obtain a chain of calculations, the operator needs to put the result of every computation on the stack. This means that in the example, the result of MUL will be on the stack, and not returned to the program. To return the final result to the program, I slightly abuse Uxntal’s BRK opcode. On encountering this opcode, the value of the computation is returned and the stack is cleared (in native Uxntal, BRK simply terminates the program). So a working example of the above code is

    my \res = 6 ∘ 4 ∘ 3 ∘ ADD ∘ MUL ∘ BRK

Some abstraction with subroutines

Uxntal allows to define subroutines. They are just blocks of code that you jump to. In my Raku implementation we can simply define custom subroutines and call them using the Uxntal instructions JSR (jump and return), JMP (jump and don’t return, used for tail calls) and JCN (conditional jump).

my \res =  3 ∘ 2 ∘ 1 ∘ INC ∘ ADD ∘ MUL ∘ 4 ∘ &f ∘ JMP ;

sub f {
    SUB ∘ 5 ∘ MUL ∘ 2 ∘ ADD ∘ RET
}

(the instruction RET is called JMP2r in Uxntal)

Stack manipulation operations

One of the key features of a stack language is that it allows you to manipulate the stack. In Uxntal, there are several operations to duplicate, remove and reorder items on the stack. Here is a contrived example

my \res =
    4 ∘ 2 ∘ DUP ∘ INC ∘ # 4 2 3
    OVR ∘  # 4 2 3 2
    ROT ∘ # 4 3 2 2
    ADD ∘ 2 ∘ ADD ∘ MUL ∘ BRK ; # 42

Keeping it simple

Uxntal has more ALU operations and device IO operations. It also has a second stack (return stack) and operations to move data between stacks. Furthermore, every instruction can take the suffix ‘2’, meaning it will work on two bytes, and ‘k’, meaning that it will not consume its arguments but leave them on the stack. I am omitting all these for simplicity.

The mighty `∘` operator, part II: implementation

With the above, we have enough requirements to design and implement the operator. As usual, I will eschew the use of objects. It was my intention to use all kind of fancy Raku features such as introspection but it turns out I don’t need them.

We start by defining the Uxntal instructions as enums. I could use a single enum but grouping them makes their purpose clearer.

enum StackManipOps is export <POP NIP DUP SWP OVR ROT BRK> ;
enum StackCalcOps is export <ADD SUB MUL INC DIV>;
enum JumpOps is export <JSR JMP JCN RET>;

We use a stateful custom operator with the stack @wst (working stack) as state. The operator returns the top of the stack and is left-associative. Anything that is not an Uxntal instruction is pushed onto the stack.

our sub infix:<∘>(\x, \y)  is export {
    state @wst = ();

    if y ~~ StackManipOps {
        given y {
            when POP { ... }
            ...
        }
    } elsif y ~~ StackCalcOps {
        given y {
            when INC { ... }
            ...
        }
    } elsif y ~~ JumpOps {
        given y {
            when JSR { ... }
            ...
        }
    } else {
        @wst.push(y);
    }

    return @wst[0]
}

This is not quite good enough: the operator is binary, but the above implementation ignores the first element. This is only relevant for the first element in a sequence. We handle this using a boolean state $isFirst. When True, we simply call the operator again with Nil as the first element. The $isFirst state is reset on every BRK.

    state Bool $isFirst = True;
    ...
    if $isFirst {
        @wst.push(x);
        $isFirst = False;
        Nil ∘ x
    }

The final complication lies in the need to support conditional jumps. The problem is that in e.g.

    &ft ∘ JCN ∘ &ff ∘ JMP

depending on the condition, ft or ff should be called. If ft is called, nothing after JCN should be executed. I solve this by introducing another boolean state variable, $skipInstrs, which is set to True when JCN is called with a true condition.

    when JCN {
        my &f =  @wst.pop;
        my $cond = @wst.pop;
        if $cond>0 {
            $isFirst = True;
            f();
            $skipInstrs = True;
        }
    }

The boolean is cleared on encountering a JMP or RET:

    if $skipInstrs {
        if (y ~~ JMP) or (y ~~ RET) {
            $skipInstrs = False
        }
    } else {
        ...
    }

This completes the implementation of the operator ∘. The final structure is:

our sub infix:<∘>(\x, \y)  is export {
    state @wst = ();
    state Bool $isFirst = True;
    state $skipInstrs = False;

    if $skipInstrs {
        if (y ~~ JMP) or (y ~~ RET) {
            $skipInstrs = False
        }
    } else {

        if $isFirst and not (x ~~ Nil) {
            @wst.push(x);
            $isFirst = False;
            Nil ∘ x
        }

        if y ~~ StackManipOps {
            given y {
                when POP { ... }
                ...
            }
        } elsif y ~~ StackCalcOps {
            given y {
                when INC { ... }
                ...
            }
        } elsif y ~~ JumpOps {
            given y {
                when JSR { ... }
                ...
            }
        } else {
            @wst.push(y);
        }
    }
    return @wst[0]
}

Memory and pointers

Like most stack languages, Uxntal also has load and store operations to work with memory. Uxntal does not have a separate instruction memory, so you can mix code and data and even write self-modifying code. There are load and store operations on absolute (LDA,STA) and relative (LDR,STR) addresses. In my Raku implementation, I don’t distinguish between those. I use arrays as named stretches of memory. So for example the following native Uxntal snippet

@array #11 #22 #33
;array #0001 ADD2 LDA ( puts 0x22 on the stack )

becomes

    my @array = 0x11, 0x22, 0x33;
    @array ∘ 0x0001 ∘ ADD ∘ LDA

and that would be close enough, were it not that in Uxntal memory is declared after subroutines. So what I actually need to do is

    (array) ∘ 0x0001 ∘ ADD ∘ LDA
    sub array { [ 0x11, 0x22, 0x33 ] }

The way I handle the pointer arithmetic is by pattern matching on the type. Instructions ADD, SUB and INC can take an integer or a label, which in Raku is an Array. The valid argument type for these operations is in pseudo-code for types:

Pair  = (Fst,Int)
Fst = Array | (Fst,Int)
Addr = Int | Array | Pair

In words, it can be an integer, a label or a pair where the second element of the pair must be an integer and the first is either a label or a pair.

For example for the INC operation, we do

    given (arg) {
        when Int { push @wst,arg+1}
        when Array { push @wst,(arg,1)}
        when List { push @wst,(arg,1)}
    }

For ADD and SUB we do something similar but check if either arg is Int, Array or List. If both arguments are Int, we return the result of the operation; if only one of the arguments is an Int, we return the pair of arguments as a List; otherwise we throw an error as it is not a valid type.

The non-integer return values of this kind of arithmetic are used in LDA and STA. Here, the only valid type is the following:

Addr = Array | (Addr, Int)

In other words, an address must always be relative to a label. So we will check if the argument is not an integer.

Hello, Uxntal

With this machinery we can run the following Uxntal “Hello World” program

    |0100

        ;hello JSR2

    BRK

    @hello
        ;hello-word ;print-text JSR2
    JMP2r

    @print-text ( str* -- )
        ;while JSR2
        POP2
    JMP2r

    @while
            ( send ) DUP2 LDA #18 DEO
            ( loop ) INC2 DUP2 LDA ;while JCN2
    JMP2r

    @hello-word "Hello 20 "World! 00

( the `#18 DEO` instruction prints a byte on STDOUT )

in Raku as follows:

    #|0100
        &hello ∘ JSR2 ∘ 
    BRK;

    sub hello {
        (hello-world) ∘ &print-text ∘ JSR2 ∘ 
        RET
    }

    sub print-text { # str* --
        &loop ∘ JSR2 ∘ 
        RET
    }

    sub loop {
        DUP2 ∘ LDA ∘ 0x18 ∘ DEO ∘ 
        INC2 ∘ DUP2 ∘ LDA ∘ &loop ∘ JCN2 ∘ 
        RET
    }

    sub hello-world { ["Hello,",0x20,"World!", 0x0a,0x00] }

As this program has a loop implemented as a tail recursion, it is complete in terms of illustrating the features of a stack-based program in Uxntal.

Conclusion

So in conclusion, we can easily embed a stack-based programming language such as Uxntal in Raku purely by defining a single binary operator and a few enums for the instructions. This is mainly courtesy of Raku’s state variables and powerful pattern matching.

The code for this little experiment is available in my raku-examples repo, with two sample programs stack-based-programming.raku and hello-uxntal.raku and the module implementing the operator Uxn.rakumod.

More on Funktal: I/O devices and state

2023-08-03T00:00:00+01:00

In a previous post, I introduced Funktal, a frugal functional programming language created for the Uxn VM. The Uxn VM is the heart of a clean-slate computing platform called Varvara. This post explains how you can access Varvara’s I/O devices from Funktal, and the closely related support for mutable state. I illustrate these features using three small GUI-based programs.

The main purpose of Uxn and Varvara is as a portable platform for GUI-based applications, such as the left editor, the noodle drawing program and many others, in particular games. To make Funktal a practical language for this platform, support for I/O devices (keyboard, mouse, screen, audio) is essential.

Mutable state: why and how

Uxn I/O is event based, for example a button press or mouse click results in an event handler (aka vector or callback) being called. These handlers can access the Uxn VM’s working and return stack as well as its memory. Most programs are stateful, so a efficient mechanism for handling state is important.

Keeping state on the stack

We could keep the state on the stack. This is OK for simple cases but quickly gets cumbersome. Suppose we have three items of state then our function needs to look like this:

(\s1_in s2_in s3_in.
    <all computations>
    s3_out s2_out s1_out
)

But we don’t know the type of items on the stack, so we need explicit typing, e.g.

types {
    RGB = Int Int Int MkRGB
}

(\(RGB,Int,Bool) <- s1_in:Bool<- s2_in: Int<- s3_in:RGB.
    <all computations>
    s3_out s2_out s1_out
)

The tuple syntax only indicates that the function pushes multiple values on the stack, it is not an algebraic data type and so we can’t construct it or pattern match on it.

Putting the state in a record type

An improvement would be if the stack held an instance of a record type, because then all we have to do is keep such an instance on the stack.

types {
    RGB = Int Int Int MkRGB
    State = Bool Int RGB MkState
}

(\State <- s_in:State. <all computations> s_out )

It still means that this state should remain on the stack, and if there are several event handlers each with their own state, either that means juggling those states or creating an overall state for all event handlers in the program and passing that around. Either way that would mean additional code for accessing the state.

Because each instance of a type is immutable, this approach also means that every update of the state requires to construct a new type. (And in practice we’ll have to delete the old one, otherwise we’d run out of memory very quickly. But Funktal does not have managed memory yet. Yet another story.) Constructing types is expensive (and deletion even more so).

Making state mutable

For all these reasons, I decided to add mutable state to Funktal. It is very simple: in a special block called state we define a singleton instance of the type used for the state:

types {
    RGB = Int Int Int MkRGB
    State = Bool Int RGB MkState
}

state {
    s : State
}

So this defines s as a mutable instance of State. There can only be one such instance per type. And now the function simple becomes

( <all computations on s> )

To access the information in a stateful record, there are two built-in functions, get and put, which access a field in the record using its (base-0) index:

0_1 s get -- gets the Bool from the record
42 1_1 s put -- puts 42 in the Int slot

These are polymorphic functions that work for any record type registered as a state.

To make this more readable, I define some constants to identify the fields. Suppose the purpose of the fields in the State type are greyscale, transparency and colour, we can define

constants {
    greyscale#State = 0_1
    transparency#State = 1_1
    colour#State = 2_1
}

and write

greyscale#State s get -- gets the greyscale from the record
42 transparency#State s put -- puts 42 in the transparency field

(The ‘#’ has no special meaning, I use it as a separator so that I can use the same field names in different types.)

Because State is a proper Funktal type, you can also use pattern matching to bind the field values to names:

s (\None <- (greyscale transparency colour MkState) : State . <all computations on s> )

But as bound variables are immutable, you can’t update the state this way.

Mutable arrays

Funktal has a built-in Array type for immutable array constants, with built-in functions size and at. Within the context of a mutable state type, such arrays can be updated using a built-in update function. For example, assuming a state s : AState has a field labelled array which is of type Array, we can write

val idx array#AState s get update

Devices

I/O devices in Uxn are typically defined in this way:

|20 @Screen &vector $2 
    &width $2 &height $2 &pad $2 
    &x $2 &y $2 &addr $2 
    &pixel $1 &sprite $1

In Funktal we simply define a corresponding record type:

types {
    Screen = Int Int Int Int Int Int Int Int8 Int8 MkScreen
}

And to make this more readable, there is an aliasing mechanism

aliases {
    Vector = Int
    Width = Int; Height = Int; Pad = Int; X = Int; Y = Int; Addr = Int; Pixel = Int8; Sprite = Int8
}

types {
    Screen = Vector Width Height Pad X Y Addr Pixel Sprite MkScreen
}

The key difference with our mutable state is that each device has a unique address.

devices {
    0x20 scr : Screen
}

And to make a clear distinction with state operations, the built-in function to access I/O devices are read and write, for example

(\None<-colour:Int8 . colour sprite#Screen scr write )

Bits and bobs: blocks, loops, done

There are a few more features of Funktal that make implementing device interactions easier.

The `Block` type

A very common action in Uxn programs is to read constant data for sprites. For example, the following is typical:

;font-hex ADD2 .Screen/addr DEO2

This can’t be implemented with the existing Arrays API in Funktal, because the addr field of the screen device expect the actual address in Uxn memory. Therefore I added a Block datatype, which is simply a contiguous sequence of bytes:

font_hex : Byte 128 Block = [
     0x00,0x7c ,0x82,0x82 ,0x82,0x82 
    ... ]

This is only used for constants, and the instance of the type (i.e. the constant) contains the address of the first element (like an array in C). So you can write:

font_hex + addr#Screen scr write

The `loop` built-in

Although I conceived Funktal as a functional language, where a natural programming paradigm is using higher-order list functions such as map and fold, for I/O actions it is very common to loop over some action that does not return anything. It is of course easy to implement such a loop in native Funktal, but it can be done more efficiently in Uxntal, so Funktal has a built-in loop function which takes a start and end value and a lambda function, and returns nothing:

loop: None <- Int <- Int <- (None<-Int)

For example

0 9 `(\None<-idx:Int. idx 1 + print ) loop

will print all values in the range. (And although I know all the arguments to the contrary, the range is 0 to 9, not 0 to 8).

Stateful loop iteration

Sometimes you might to maintain some state during the loop iteration, similar to a fold. This can be done via the stack:

0 -- This puts a 0 on the working stack
0 len 1_1 - Int
`(\ Int <- val : Int <- idx : Int .
    idx array at (\Int <- elt : Int .
        val idx array update
        elt
    )
) loop
-- Clear the working stack
(\None<-null:Int.)

This example loops over indices 0 to len-1, and gets the element from an array. It then updates the array with the value on the stack, and puts the element on the stack. So if the array was [11, 22, 33, 44], it will become [0, 11, 22, 33]. The one thing to look out for is that this will leave the final element on the stack, so you may want to remove it by calling an empty lambda on it, which is what (\None<-null:Int.) does.

The `done` built-in

Funktal normally expects a called function to return. Event handlers have nowhere to return to. As a simple way to handle this is I added the done built-in, which will tell the compiler to emit a BRK rather than a JMP2r at the end of a function. If it is used elsewhere, it simply emits a BRK which is useful for debugging.

Examples

I have implemented a few examples of GUI-based programs. Two are ports of Uxn demos, the third renders the Funktal logo.

Example 1: DVD

This example is a simple program that bounces the DVD icon around the screen.

DVD icon bouncing

It is a straight port of the dvd.tal program. We define types for the system and screen devices and the dvd state:

types {
    System = Word Int8 Int8 Int Int8 Int8 Word Word Word Int8 Int8 MkSystem
    Screen = Int Int Int Byte Int8 Int Int Int Int8 Int8 MkScreen
    DVD = Int Int Int Int MkDVD
}

Then we create the instance:

state {
    dvd : DVD
}

devices {
    0x00 sys : System
    0x20 scr : Screen
}

There are no loops in this program, and no controls. It simply calculates the new position of the icon on every clock tick. The only event handler is onFrame. Because this is a straight port, it does use the stack for some values. The main program registers that event handler, does some setup and puts the borders on the stack:

main {
    ...
`onFrame vec#Screen scr write
...
    20 w#Screen scr read 20 -
    20 h#Screen scr read 20 -
}

These borders are used by the event handler but not modified:

functions {

    -- takes the borders from the stack
    onFrame = ( \ xmin xmax ymin ymax .
        0x00 drawDVD
        x#DVD dvd get
        y#DVD dvd get
        dx#DVD dvd get
        dy#DVD dvd get
        (\ Int <- x : Int <- y : Int <- dx : Int <- dy : Int .
            ... calculate new position ....
        )
        0x01 drawDVD
        -- put the borders back on the stack
        xmin xmax ymin ymax
        done
    )
}

-- takes the colour (0 or 1) as argument
drawDVD = (\ None <- c : Int8. ... )

Note the use of done at the end of the onFrame function, because it is an event handler.

Example 2: Snake

This example is a straight port of the snake.tal program, a game where you control a snake to eat apples, and the snake grows longer and longer.

Snake in action

The main differences with the DVD code is that there are buttons to control the direction of the snake, so we need a controller device, and that the tail of the snake is an array, so we use loops to update it.

The controller device:

types {
    Controller = Vector Button Key MkController
}

devices {
    0x80 ctl : Controller
}

The button handler is registered as before:

`onButton vec ctl write

The handler itself:

onButton = (
    button#Ctl ctl read
    (\None <-b:Int8.
         b 8_1 /=
        `( b noEscape )
        `( reset )
         if
    )
    done
)

noEscape = (\ None <- b : Int8 .
    b 4_1 >> (\None <-bb : Int8 .
        bb 0_1 /=
        `( bb dir#Snake snake put )
        `() if
    )
)

So we read the button value, compute a value based on the button that is pressed, and write that into the dir field if the snake state. The Snake type is

aliases {
    X=Int8; Y=Int8
    Dir=Int8; Len=Int8; Dead=Int8
    Tail = Int 32 Array
}

types {
    Snake = Dir Len Dead X Y Tail MkSnake
}

state {
    snake : Snake
}

This is an example of a state containing an array. The main action is drawSnake

drawSnake = (\None<-c:Int8.
    -- draw tail
    snake_icns addr#Screen scr write 
    len#Snake snake get (\None <- len:Int8 .
        0 len 1_1 - Int
        `(\None<-idx:Int .
            idx (tail#Snake snake get) at (\None <- xy:Int.
            xy 8_1 >> Int8 Int 3_1 << x#Screen scr write
            xy        Int8 Int 3_1 << y#Screen scr write
        )
        c sprite#Screen scr write
        ) loop
    )
    -- draw head
    ...
)

The tail is an array of 16-bit integers which are actually pairs of 8-bit integers representing the coordinate for each segment of the tail. So what the code does is reading xy from the tail array, unpacking it into an x and y value, and multiplying these by 8 because we have 8x8 sprites so the movement is in steps of 8 pixels.

Example 3: Funktal logo rendering

I implemented various ways of rendering the Funktal logo.

Funktal logo rendered using the example code

This example is rendering it one pixel at a time for every frame event (clock tick). So it has no internal loops. It is a bit complicated because the Funktal logo can be decomposed into isosceles right triangles (the base and height are the same) and symmetry operations. The array triangleEncodings contains a list of (triangle orientation, triangle position) pairs. There is a state counters with a counter for the triangle and the row and column position of the current pixel to be drawn.

onFrame = (\None.
    0_1 run get 1_1 ==
   `(
        trCodeIdx counters get
        rowIdx counters get
        colIdx counters get
        (\None <- tIdx:Int8 <- rIdx:Int8 <- cIdx:Int8.
             tIdx Int triangleEncodings at rIdx cIdx drawPixel
             cIdx rIdx <
            `( cIdx 1_1 + colIdx counters put )
            `( 0_1 colIdx counters put
                 rIdx triangleDim 1_1 - <
                `( rIdx 1_1 + rowIdx counters put )
                `( 0_1 rowIdx counters put
                     tIdx 14_1 <
                    `( tIdx 1_1 + trCodeIdx counters put )
                    `() if
                 ) if
             ) if
        )
    ) `() if
    done
)

There is a single control, pressing any key toggles pausing/continuing the rendering. That is the purpose of the first condition in the above code, run is a separate state.

Conclusion

I hope this post and the previous one have provided some idea of what Funktal is like and what you can do with it. I plan to write another, longer article on how the compiler is made. For more information, see the a specification (aimed at people who want to program in Funktal) and the design document (aimed at people who want to help develop Funktal or are just curious), both still very much in flux.

The banner picture shows a detail of the control panel of a Japanese train

Funktal: a frugal functional programming language

2023-04-10T00:00:00+01:00

Funktal is a functional programming language for the Uxn virtual machine, a tiny VM with 8-bit opcodes and 64 kB of memory. I have written about implementing functional constructs in Uxn’s native stack based assembly language Uxntal in a previous post.

Rationale

The main reason for creating Funktal was to see if it was possible to create a statically typed functional language with algebraic data types and function types that could run on the Uxn VM, with a compiler that could be implemented in Uxntal. This is motivated by the observation that most modern languages are very resource-intensive: typical projects take a lot of disk space, compilers are large and require a lot of CPU cycles and memory to compile code, and the programs themselves are also very often CPU- and memory-intensive.

Hard disks and solid-state drives are major contributors to the embodied carbon in a computer, followed by the CPU and memory. Reducing these resources is an important way to reduce CO₂ emissions from computing. The ability to write useful software for older generations of hardware allows to use them for longer, and that is the main way to reduce embodied carbon.

Funktal design principles

The main principle for the design of Funktal is that it should use as little memory as possible, both for the compiler and the programs it produces. This influences most of the design decisions. But at the same time, it should be minimally but fully featured: I want a pure, strict functional language, with a sufficiently expressive static type system. Also, I want the language to be simple to write easy code but expressive enough to write more complex code.

The main characteristics of Funktal are:

It uses postfix notations (aka Reverse Polish Notation, Forth-style).
It is entirely based on lambda functions (anonymous functions), but has named functions too.
All variables are immutable (as they are lambda function arguments)
The type system is based on primitive types and product and sum types and function types to create new types.
Typing is optional, and therefore Funktal is not fully type safe.
I/O is not pure.

There is a specification (aimed at people who want to program in Funktal) and a design document (aimed at people who want to help develop Funktal or are just curious), both still very much in flux.

Funktal by example

All examples can be found in the examples folder in the repo. To try them out, please see the README for installation instructions.

Basic syntax

Funktal is whitespace separated and consists of a number of blocks, the most important of which are types, constants, functions, and main. There are only expressions, so the entire main program is a single sequence of expressions. Newlines are only for readability and as comment delimiters: anything after a -- until a newline is considered a comment.

main {
    6 7 * print -- prints 42
    0x0a print  -- prints a newline
    "Hello" print -- prints Hello
}

Example 1: printing

Lambdas and named functions

Funktal is a functional language so the key building block is the lambda function (anonymous function). Each lambda is enclosed in parentheses and the arguments are listed between \ and ..

Lambdas do not need to have arguments. Because Funktal is a stack language, if there is anything on the stack, it will be used as argument for the functions. But arguments are often convenient.

main {
    6 (\x. x x * x 2 * + x -  print ) -- 42
    6  (\x. 7 (\y . x y * print ) ) -- 42
    2 84 `( / ) (\ x y div . y x div apply ) print -- 0x002a
}

Example 2: lambdas

The second example shows a nesting lambdas, with the x argument of the outer lambda in scope in the body of the nested lambda.

The last line is an example of quoting of functions: ( / ) is anonymous function without arguments which only performs a division. More fully we could write it as (\ x y . x y /). Normally, this function would be called right away. By quoting it with a backtick, it is not called until we call it explicitly using apply. The example also shows that functions can be passed as arguments to other functions.

Primitive types

Funktal has primitive types Int8, Int16, AChar, Byte and Short. Int is currently a synonym for Int16. AChar is an ASCII character; Byte is a raw byte value and Short a raw 2-byte value. The arrow <- in the type signature separates arguments and return type of a function. The colon : separates the argument from its type.

main {
    6 (\ Int <- x : Int . x x * 2 x * + x - ) print
    (\ Int8 . 6 (\ Int8 <- x : Int8 . x x * 2 x * + x - ) print )
    0x2a 0x2b (\ Byte <- b1: Byte <- b2 : Byte . b1 b2 & ) print
}

Example 3: primitive types

Without type information, Funktal defaults to 16-bit operations. If you want to use 8-bit operations, explicit typing is necessary, as in the examples above. As shown in the second example, a lambda without arguments can still have an explicit return type.

Constants

Constants are a convenience. The main purpose is to define arrays of values (e.g. bitmaps) and strings, but scalars are also supported. There is a built-in type that is not strictly speaking primitive: Array, used to create array constants. Its constructor takes the type and number of the elements in the array.

constants {
    hello : AChar 6 Array = "Hello!"
}

main {
    hello print
}

Example 4: constants

Sum types and conditionals

As mentioned above, Funktal has algebraic data types. The Boolean type is an example of a sum type (similar to an enum): it has two alternatives, True and False. The example also shows the if builtin, which takes two quoted lambdas and a condition, i.e. any expression which returns True or False.

The Any type is the supertype of all types. Funktal does (currently) not have polymorphism. The Any type can be used explicitly and is also the type of any untyped expression. Because Funktal allows untyped expressions and does not do type inference (yet), it is not fully type safe.

types {
    Bool = True | False
}

main {
    True (\ Any <- cond : Bool .
         cond
        `( "true" print )
        `( "false" print )
         if
    )
}

Example 5: if

Product data type construction and pattern matching

This is an example of a record or product type. The RGB type is a triplet of 8-bit integers. The reason why the entire expression is wrapped in a typed lambda is that otherwise the integers would be treated as 16-bit. Funktal does not have proper type checking yet. The return type of a function determines the size of the operations and constants used in the function body.

types {
    RGB =  Int8 Int8 Int8 RGB
}

main {
        (\ Int8 . 42 43 44 RGB (\ Int8 <- (r g b RGB) : RGB . r ) ) print
}

Example 6: records

Recursion

Funktal does not have mutable variables so it has no loops. Instead, it uses recursion.

Factorial with named functions

This is a straightforward recursion to calculate a factorial: if b==e then return the result r else recurse with the counter b+1 and the accumulator r*b. This is a tail recursion. It also demonstrates the use if named functions, and Funktal’s natural ability to support point-free programming.

function {
    fact = ( 1 1 fact_rec )
    fact_rec =  (\ Int  <- e : Int <- b : Int <- r : Int . b e == `( r e * )  `(  e b 1 + r b * fact_rec) if )
}

main {
    5 fact 1 * print
}

Example 7: factorial

I use an actual recursive function fact_rec and a wrapper to initialise the counter and the accumulator.

Recursion without named functions

Recursion means a function calls itself. But what if the function doesn’t have a name? A fixed-point combinator allows to do recursion on unnamed functions. The most common one is the Y-combinator. The way I’ve done this here is a bit different, but equivalent. The quoted function describes the recursion. But because that function has no name, it can’t call itself. The function (\f. f f apply) takes it as its argument, so now it has a name and can be called recursively.

main {
    5
    `(\ n <- f . n 1 == `(1) `( n 1 - f f apply n * ) if)
    (\ Int <- f : Any . f f apply ) print
}

Example 8: factorial with fixed-point

Lists and fold

Algebraic data types can also be used to construct lists, like this:

types {
    List = List Any Cons | Nil
}

This is a recursive type, a List is either a function Cons with takes a List and some value, or a function Nil which takes no arguments. So we can build lists by writing e.g.

Nil 11 Cons 22 Cons 33 Cons

In practice, it is handy to have a function to generate a range of numbers (range below) and list manipulation functions like head, tail, fold and map. The example shows the use of range and fold to calculate a factorial by multiplying all values in a list.

functions {
    -- First element of a list
    head = (\ Any <- (xs x Cons) . x )
    -- The rest of the list
    tail = (\ List <- (xs x Cons) . xs )
    -- Creates a list with a range of integers
    range = ( Nil range_rec )
    range_rec = (\List <- b: Int <- e: Int <- lst : List . b e == `( lst e Cons ) `( b 1 + e lst b Cons range_rec ) if )
    -- A reduction: fold takes a list, an accumulator and a function and combines all elements of the list into the accumulator
    fold = (\ Any <- lst : List <- acc : Any  <- f : Any . lst `Nil is `( acc ) `( lst tail acc lst head f apply f fold ) if )
}

main {
    (\Int . 1 5 range 1 `( * ) fold ) print
}

Example 9: lists

Implementation

The Funktal compiler should be implementable in Uxntal (or even Funktal) and run on Uxn. I did not feel I was sufficiently fluent in Uxntal to use it as the implementation language. Instead, I opted to write the compiler in Fortran, but in such a way that porting to Uxntal should be straightforward.

Why Fortran? Funktal is essentially an art project; using Fortran is a statement. I could have done this in C, but I prefer Fortran’s arrays. I am using Fortran-90 but with a very restricted feature set. In case you don’t know Fortran, here are some of its characteristics:

No lexical scoping
Numeric labels for goto; no break
Arrays starting by default at 1 but can start at any integer value
No unsigned integers
No native hash tables
Implicit typing based on the first letter of the variable name (*)

(*) But luckily you can disable that feature in Fortran-90

Furthermore, because of the restricted subset I use:

No pointers so no native linked lists
No derived types, so no structs
No dynamic allocation

It is almost as if I’d have taken the “Real Programmers Don’t Use PASCAL” essay too literally:

LANGUAGES
The easiest way to tell a Real Programmer from the crowd is by the programming language he (or she) uses. Real Programmers use FORTRAN. […]
Real Programmers do List Processing in FORTRAN
Real Programmers do String Manipulation in FORTRAN.
[...] If you can't do it in FORTRAN, do it in assembly language. If you can't do it in assembly language, it isn't worth doing.

STRUCTURED PROGRAMMING
[…] Some quick observations on Real Programmers and Structured Programming:
Real Programmers aren't afraid to use GOTO's.
[...]
[…]. As all Real Programmers know, the only useful data structure is the Array. Strings, lists, structures, sets – these are all special cases of arrays and can be treated that way just as easily without messing up your programming language with all sorts of complications. […]

Be that as it may, this restricted subset maps cleanly to Uxntal, and also forces me to think very carefully about data structures. As a result, the compiler in its current states about 5,000 lines of code, allocates less than 64 kB and compiles to an executable of about 100 kB. For reference, uxnasm is 20 kB, uxncli is 25 kB and uxnemu is 50 kB. But gcc and gfortran are 1.2 MB and rustc is 15 MB.

Status

Funktal needs a lot more work (compilers are never finished), but it is now in a state that most of the Uxn demo applications can be ported to it. It already supports devices and state, as explained in the follow-on post. At the top of my long wish list is library support and memory management.

Apart from that, there are plenty of bugs and shortcomings that need fixing. But it is already good enough to have some fun with, which is of course the main purpose.

The banner picture shows a wooden telephone in a temple in Kyoto

The climate cost of the AI revolution

2023-03-06T00:00:00+00:00

ChatGPT and other AI applications such as Midjourney have pushed “Artificial Intelligence” high on the hype cycle. In this article, I want to focus specifically on the energy cost of training and using applications like ChatGPT, what their widespread adoption could mean for global CO₂ emissions, and what we could do to limit these emissions.

Key points

Training of large AI models is not the problem

Training of large AI models requires a lot of electricity. However, for a modest growth scenario where there would be a hundred very popular AI-based services in the entire world, I estimate that the global CO₂ emissions from training AI are likely to remain relatively small.

Large-scale use of large AI models would be unsustainable

For that same modest growth scenario, with a hundred very popular AI-based services in the entire world, the electricity consumption resulting from the use of these services would lead to unsustainable increases in global CO₂ emissions.

Renewables are not making AI more sustainable

Using renewables to power AI is not the solution. Large scale adoption of AI would lead to a huge increase in electricity consumption, much more than can be offset even by the fastest possible roll-out of renewables. So even if all AI is powered by renewables, it will not help us reduce global emissions and we will still miss the global climate targets.

Reducing the climate cost of AI

Technological solutions to increase energy efficiency of AI are likely to lead to a more than proportionate increase in demand for AI - as history has shown us with other technology adoptions. As with any activity that consumes energy, the best way to limit energy consumption is to limit the activity. As a society we need to treat AI resources as finite and precious, to be utilised only when necessary, and as effectively as possible. We need frugal AI.

Carbon cost of electricity usage

Many tech companies make much of their use of renewables to power their services. What metric should we use for the electricity usage of ICT in general and Large Language Models in particular? I argue that the metric to use is the carbon intensity of the geographical area in which the electricity is generated and can be traded. In practice, that means the country or group of countries of generation. If the usage is globally distributed, we should use the weighted average intensity.

Because electricity is still predominantly generated from fossil fuels. In the US, where most of the data centres for AI training and use are located, according to IEA data this is 60%; only 21% is truly renewable. According to EuroStat, in the EU renewables account for 22% and according to National Grid, in the UK it is 38%. Therefore, using renewables for increased electricity usage simply results in displacement of the emissions. Renewables are already the cheapest form of generation, so generators do not need market pull to install more capacity: to maximise their profit, they will maximise their renewables capacity. Even when the generation is on-site, the argument still stands: the electricity use to power AI could be traded on the grid. So in the case of GPT-3 we should use the overall carbon intensity of the US, which is 371 gCO₂/kWh according to the Carbon Fund, or alternatively the global carbon intensity, 476 gCO₂/kWh according to the IEA World Energy Outlook 2019.

Carbon emissions from ICT

There is an imperative to reduce emissions from information and communication technologies (ICT): purely to keep to the Paris agreement, they should drop to a quarter of current emissions in the next 20 years, from about 2 GtCO₂e/y (including embodied emissions) to 500 MtCO₂e/y. So this is the global ICT carbon budget for the future.

This is not going to happen through renewables: with business as usual, by 2040 renewables will be at best 70% of all generated electricity, and crucially, generation from fossil fuels will largely remain constant. So even though we will have more electricity, we will not have less emissions. Therefore, reducing global electricity consumption from ICT is critical.

Training Large Language Models

Although the large amount of CO₂ emissions resulting from AI training has received a lot of attention, I would argue that training of LLMs is not the main problem.

According to a peer-reviewed paper by Patterson et al., training GPT-3 generates 552 ton of CO₂ (tCO₂e). (Using yearly average carbon intensity, it is 477 tCO₂e; the paper used the actual intensity during the period of training, and that was slightly higher). This is not much compared to the total emissions from ICT, but it is still the same amount of CO₂ as produced by heating 250 average UK homes for one year.

However, on the one hand, this is just a single model. On the other hand, one of the problems with those large language models is that they need to be kept up-to-date. People will expect the chat bot to know who this week’s Prime Minister is, or the current hit songs, games and movies. Which means that training will need to be an ongoing process. So the carbon footprint will become many times larger than it already is. For the sake of argument, let’s assume as a worst case that the model would need to be retrained fully every day. Then emissions from training would be 365 times larger, so about 175 ktCO₂e/y. If globally there would be a hundred such models from competing companies in different countries, that would mean an increase in global emissions of about 20 MtCO₂e/y.

This assumes full retraining; it will likely become possible to split the training data into a large static pre-trained set which needs to be updated infrequently, and a dynamic set which needs to be updated weekly or even daily. It is also likely that computational efficiency gains will be found.

In any case, even if this was not so, and even if updates were daily, emissions from electricity generation used for training would not exceed 2% of global ICT CO₂ budget — assuming that the number of such large models does not rise to thousands.

What about the embodied carbon? Microsoft claims that the supercomputer used to train GPT-3 hosts 10,000 GPUs and 285,000 CPU cores. Assuming these were similar to NDv2 instances, we can estimate their embodied carbon starting from work by Boavizta in the order of 2 tCO₂e per node, for 1250 nodes (8 GPUs per node). This is based on the assumptions that each node has a 3TB SSD, 512GB RAM and 2 Xeon CPUs, plus 8 V100 GPUs with each 32GB RAM.

So the total embodied carbon is of the order 2.5 ktCO₂ per machine. However, it takes 14 days to train GPT-3. So to manage daily retraining, 14 such machines are needed. Assuming 4 years useful life, that would result in embodied carbon of the order of 35 ktCO₂e/y. So the embodied carbon is of the order of 20% of the total emissions, and we can put the total between 10 and 100 MtCO₂e/y to account for uncertainties in the estimates.

This means that the increased consumption from training of LLMs could with the highest estimate amount to 20% of the global ICT CO₂ budget. Nevertheless, as the cost of this amount of energy for training would probably be prohibitive, and there are clearly technical options to reduce the energy consumption, I don’t think it is likely that training of LLMs will lead to more than a few percent increase in ICT CO₂ emissions. So far, so good.

Using Large Language Models

Next, let’s consider the use of LLMs. An estimate for the footprint of ChatGPT is given by Chris Pointon as 77,160 kWh per day assuming 13 million users per day with 5 questions each, so 65 million queries. This would generate 30 tCO₂e per day or 0.5 gCO₂e per query.

Just to be clear, in the big picture (total global ICT emissions of 2 GtCO₂e) this kind of footprint is still very small. But with a billions of queries per day, that means tens of MtCO₂e/y.

If that sounds improbable, consider this: currently, Bing and Google each process 10 billion queries per day. So already, we would be looking at an electricity consumption of 8.7 TWh/y or emissions of 4 MtCO₂e/y purely from Bing and Google searches alone, without any growth or any other applications.

If there would be 100 such models as assumed above, that would mean 435 TWh/y or 200 MtCO₂e/y, even without taking into account the embodied carbon; recall that ICT has a proportional carbon budget of 500 MtCO₂e/y by 2040, so that would be 40% of that budget; and this is a budget that can’t be exceeded without letting global warming get out of control.

There is probably no room for a hundred major search engines, but there are many other use cases for ChatGPT and other LLMs, e.g. dynamic content generation for SEO spamming, better email spam etc and there are of course also the image-based generative models. So the coexistence of just a hundred large applications of LLMs in the whole world is entirely plausible.

From this it is clear that large-scale adoption of LLMs would lead to unsustainable increases in ICT CO₂ emissions.

Reducing the climate cost

Assuming that ChatGPT-style LLMs make it through the Gartner hype cycle and are here to stay, then their energy consumption will become a major concern. In the big picture, the best way to address this would of course be not to use this highly polluting technology. After all, there is no burning need for LLMs, it is currently very much a case of a solution looking for a problem.

Second best would be to put a carbon tax on electricity usage. That might seem like a good way to curb electricity use in general. However, companies would likely resort to on-site generation. Considering the scale of the required electricity generation, it is more precise to say “private generation” than “on-site” or “local”: 435 TWh/y is 15% of the global ICT electricity consumption. This can’t simply be generated by putting a few solar panels on the roofs of the data centres. It would require a 500 MW wind farm per application. For example, the Whitelee wind farm near Glasgow, the largest on-shore wind farm in the UK, has a maximum generative capacity of 539 MW and covers an area of 55 km² (about the size of Manhattan). Solar power in hot countries has a higher energy density, but is still of the same order: e.g. the Bhadla Solar Park in India, one of the largest solar farms in the world, has 2.7 GW capacity and covers 160 km², so 500MW would require 30 km².

So to provide “on-site” generation for an LLM application such as ChatGPT-enabled search, a company would have to buy a similar area of land for its private wind or solar farm, thereby reducing the area available for replacing generation from fossil fuels.

There is however a lot of scope for energy savings. To start with, for many applications there is no need for a model of the size of GPT-3. Something 10x smaller will do the job just fine, at a fraction of the cost. For example, many of Google’s current models are of that size. Of course, if there are many, then combined they will have similar footprints.

Then there are potential efficiency savings, e.g. through use of energy-efficient hardware accelerators such as FPGAs, Google’s TPU chips or Cerebras’ Wafer-Scale Engine. All of these have in principle the potential to be an order of magnitude more efficient for both the training tasks and queries.

In fact, for many tasks an LLM or other large-scale model is at best total overkill, and at worst unsuitable, and a conventional Machine Learning or Information Retrieval technique will be orders of magnitude more energy efficient and cost effective to run. Especially in the context of chat-based search, the energy consumption could be reduced significantly through generalised forms of caching, replacement of the LLM with a rule-based engines for much-posed queries, or of course simply defaulting to non-AI search.

However, improved resource usage efficiency also lowers the relative cost of using a resource, which leads to increased demand [1]. This is known as Jevons paradox or the “rebound effect”. Jevons described in 1865 how energy efficiency improvements increased consumption of coal. What this means is that the way to reduce the climate cost of the AI revolution can’t be purely technological. As with any activity that consumes energy, the best way to limit energy consumption is to limit the activity.

As a society we need to treat AI resources as finite and precious, to be utilised only when necessary, and as effectively as possible. We need frugal AI.

Immutable data structures and reduction in Raku

2022-11-20T00:00:00+00:00

For a little compiler I’ve been writing, I felt increasingly the need for immutable data structures to ensure that nothing was passed by references between passes. I love Perl and Raku but I am a functional programmer at heart, so I prefer map and reduce over loops. It bothered me to run reductions on a mutable data structure. So I made a small library to make it easier to work with immutable maps and lists.

A reduction combines all elements of a list into a result. A typical example is the sum of all elements in a list. According to the Raku docs, reduce() has the following signature

multi sub reduce (&with, +list)

In general, if we have a list of elements of type T1 and a result of type T2, Raku’s reduce() function takes as first argument a function of the form

-> T2 \acc, T1 \elt --> T2 { ... }

I use the form of reduce that takes three arguments: the reducing function, the accumulator (what the Raku docs call the initial value) and the list. As explained in the docs, Raku’s reduce operates from left to right. (In Haskell speak, it is a foldl :: (b -> a -> b) -> b -> [a].)

The use case is the traversal of a role-based datastructure ParsedProgram which contains a map and an ordered list of keys. The map itself contains elements of type ParsedCodeBlock which is essentially a list of tokens.

role ParsedProgram {
    has Map $.blocks = Map.new; # {String => ParsedCodeBlock}
    has List $.blocks-sequence = List.new; # [String]
	...
}

role ParsedCodeBlock {
    has List $.code = List.new; # [Token]
	...
}

List and Map are immutable, so we have immutable datastructures. What I want do do is update these datastructures using a nested reduction where I iterate over all the keys in the blocks-sequence List and then modify the corresponding ParsedCodeBlock. For that purpose, I wrote a small API, and in the code below, append and insert are part of that API. What they do is create a fresh List resp. Map rather than updating in place.

I prefer to use sigil-less variables for immutable data, so that sigils in my code show where I have use mutable variables.

The code below is an example of a typical traversal. We iterate over a list of code blocks in a program, parsed_program.blocks-sequence; on every iteration, we update the program parsed_program (the accumulator). The reduce() call takes a lambda function with the accumulator (ppr_) and a list element (code_block_label).

We get the code blocks from the program’s map of blocks, and use reduce() again to update the tokens in the code block. So we iterate over the original list of tokens (parsed_block.code) and build a new list. The lambda function therefore has as accumulator the updated list (mod_block_code_) and as element a token (token_).

The inner reduce creates a modified token and puts it in the updated list using append. Then the outer reduce updates the block code using clone and updates the map of code blocks in the program using insert, which updates the entry if it was present. Finally, we update the program using clone.

reduce(
    -> ParsedProgram \ppr_, String \code_block_label {
        my ParsedCodeBlock \parsed_block =
            ppr_.blocks{code_block_label};

        my List \mod_block_code = reduce(
            -> \mod_block_code_,\token_ {
                my Token \mod_token_ = ...;
                append(mode_block_code_,mod_token_);
            },
            List.new,
            |parsed_block.code
        );
        my ParsedCodeBlock \mod_block_ =
            parsed_block.clone(code=>mode_block_code);
        my Map \blocks_ = insert(
            ppr_glob.blocks,code_block_label,mod_block_);
        ppr_.clone(blocks=>blocks_);
    },
    parsed_program,
    |parsed_program.blocks-sequence
);

The entire library is only a handful of functions. The naming of the functions is based on Haskell’s, except where Raku already claimed a name as a keyword.

Map manipulation

Insert, update and remove entries in a Map. Given an existing key, insert will update the entry.

sub insert(Map \m_, Str \k_, \v_ --> Map )
sub update(Map \m_, Str \k_, \v_ --> Map )
sub remove(Map \m_, Str \k_ --> Map )

List manipulation

There are more list manipulation functions because reductions operate on lists.

Add/remove an element at the front:

# push
sub append(List \l_, \e_ --> List)
# unshift
sub prepend(List \l_, \e_ --> List)

Split a list into its first element and the rest:

# return the first element, like shift
sub head(List \l_ --> Any)
# drops the first element
sub tail(List \l_ --> List)

# This is like head:tail in Haskell
sub headTail(List \l_ --> List) # List is a tuple (head, tail)

The typical use of headTail is something like:

my (Str \leaf, List \leaves_) = headTail(leaves);

Similar operations but for the last element:

# drop the last element
sub init(List \l_ --> List)
# return the last element, like pop.
sub top(List \l_ --> Any) ,
# Split the list on the last element
sub initLast(List \l_ --> List) # List is a tuple (init, top)

The typical use of initLast is something like:

my (List \leaves_, Str \leaf) = initLast(leaves);

Compiling stack-based assembly to C

2022-10-15T00:00:00+01:00

I wrote a proof-of-concept compiler from Uxntal to C. The generated code is linked with a slightly modified version of the Uxn VM/Varvara code to provide stand-alone applications.

Uxntal and Uxn

Uxntal is the programming language for the Uxn virtual machine which forms the heart of the Varvara clean-slate computing stack. As Uxn is a stack machine, Uxntal is a stack language, similar to e.g. Forth or Joy in that it uses reverse Polish notation (postfix). It is an assembly language with opcodes for 8-bit and 16-bit operations on the stack and memory. To get the most out of this article, it is best if you have basic knowledge of Uxntal, either from the above resources or for example the great tutorial at Compudanzas.

An Uxntal-to-C compiler

What I call an Uxntal-to-C compiler is a program that converts an Uxntal program into a C program that, when compiled with a C compiler and executed, has the same functionality as the Uxntal program has when assembled and run on the Uxn emulator.

Why compile Uxntal to C? Mainly for fun, and out of curiosity. I was curious about the challenges involved and the limitations. Compiling Uxntal programs does result in speed-ups for compute-intensive applications. However, the fact that Uxn is computationally not very efficient is to my mind a bit of a red herring. As the main purpose of Uxn is to create interactive applications, the behaviour of these is dominated by the I/O activity, including the display and audio which are managed by SDL. The effect of the Uxntal code being compiled or interpreted will therefore be small for typical Uxn target, because either the program run time will be dominated by I/O waits or the computations will take place in the SDL layer. And that was another reason behind this experiment: it provides evidence. I verified my assumptions on a number of examples (see below), and the total power saving of the compiled version is negligible.

I initially considered LLVM and WASM as targets. WASM seems attractive at first because it is purportedly stack based, but it turns out that loops are not stack based, nor is function argument passing. Both WASM and LLVM are typed assembly languages and assume that code and data are in separate memory spaces and that code is read-only, so they offer no additional benefit as a compilation target for Uxntal over C.

Limitations

There are two aspects of Uxntal that can’t be supported in an ahead-of-time C compiler with static code analysis.

Jumps to computed addresses

The first is jumps to computed addresses, because that is a concept that is not supported in C (nor in LLVM or WASM). A jump to a constant relative address can be resolved at compile time and supported, but run-time computed jumps have no equivalent. Fortunately, in practice Uxntal’s linter discourages this for jumps longer than one instruction, and the allowed case of a run-time computed binary value is supported.

Self-modifiable code

The second is self-modifiable code. In C, LLVM and WASM, code and data are separated and a program can’t modify its on source. Fortunately, in practice the use of self-modification in Uxntal is limited to storing of local variables through code such as

LIT &x $1

and evaluation of instructions from the stack using code such as

LIT MUL
... 
#00 STR $1

Run-time evaluation through self-modification of the instructions is only supported for a specific case:

#06 #07 LIT ADD 
...
#00 STR BRK 

In principle, any store of a byte in the code memory results in self-modification, but the above is the most common pattern used to evaluate a byte on the stack as an instruction.

Also fundamentally, the compiler expects human-readable Uxntal code, in particular it relies on the mnemonics to identify instructions. So while this is valid Uxntal code, it will not work:

80 06 80 07 1a

It is in general impossible for a compiler to distinguish between opcodes and data because of Uxntal’s dynamic nature. In fact, a value can be used as both depending on a run-time condition. So the compiler needs the meta-information provided by the mnemonic notation.

Design

The overall approach is to use the runtime data structures used in the Uxn emulator, i.e. arrays that represent the ram, stacks and devices. Rather than reading bytes from the rom file and evaluating them using a case statement, we generate C code with subroutines corresponding to the instructions. The control flow is purely based on subroutine calls.

The design is quite simple. There is a Token sum type for all token variants and some record types for code blocks and the full program. All of these are in UxntalTypes. Definitions of the Uxntal operations are in UxntalDefs.

Because of Uxntal’s very regular structure, the tokeniser is trivial (split on whitespace and newlines); the parser is using regular expressions and is also straightforward. Because Uxntal programs are simply sequences of instructions, labels and data of one or two bytes, the parser does need only limited context. We parse the code into a data type that reflects the different types of tokens as described on the Uxntal page of the XXIIVV wiki:

Padding				Literals
`\|`	absolute	`$`	relative	`#`	literal hex
Labels				Ascii
`@`	parent	`&`	child	`"`	raw word	`'`	raw char
Addressing				Pre-processor
`,`	literal relative	.	literal zero-page	`%`	macro-define	`~`	include
`:`	raw absolute	`;`	literal absolute

Then we perform two transformation passes:

Replace all relative addressing by absolute addressing.

So after this step, only @, . and ; remain.
Split the code into blocks and identify them as subroutines or data.

This is only slightly more complex because we need to add an explicit jump to the next block for blocks that do not end in a jump. Blocks that do not contain operations are considered data, all other blocks are subroutines.

This transformation allows us to create equivalent C subroutines an store the data in RAM. There are a few special cases, see the UxntalParser code for details.

After this step we analyse the code to determine which labels refer to subroutines and which to data.

The main advantage of this approach is that it simplifies control flow handling: there is no need for goto statements or labels because Uxntal’s label-based loops have been turned into recursive subroutines. So all we need to do is generate the code for those subroutines and the code to call them.

The actual emitter is straightforward because it relies on a runtime library which contains a subroutine for every Uxntal instruction.

In practice there are a few additional complications, specifically to handle the limited use of self-modification, and to handle conditional jumps. Also, the memory allocations and subroutine declarations need to be collected and grouped at the start of the source code, so there is quite a bit of state to be maintained.

Each instruction subroutine takes the required arguments from the stack and pushes its result on the stack, if any. Consequently, all subroutines have a signature void f(void). This means all function pointers are of the same type. Because function pointers in C are machine size, we can’t put them directly on the Uxn stack. Instead, we store them in a separate array and put the index into that array on the stack.

With this approach, we can handle most of the dynamic nature of Uxntal.

Optimisations

Inlining operations

The generated C code at this stage has a very large number of subroutine calls, and disappointingly the C compiler (gcc) does not inline most of them. So we do this ourselves in a first optimisation pass. This is easy (if cumbersome) because the subroutines don’t take or return arguments and are guaranteed non-recursive, so we can simply replace the call with the definition.

Stack to register

A second optimisation is more complicated but has also a much bigger effect on performance: we replace as much as possible stack operations with register operations. This is a little bit more complicated than at first sight appears. I might write a separate post about the algorithm.

Further optimisations

There are a number of further optimisations that could be considered, but all of them are more complex and would not result in a dramatic additional performance improvement. Some of them I have implemented but they are not enabled:

Inline subroutines. This is only partially implemented. It is rather tricky because recursive subroutines can’t be inlined, so we need an analysis to detect recursion. For simple, in-routine recursion that is easy, but recursion can occur through a chain of tail calls, so we need to identify tail calls and follow those through.
Use goto instead of function call. This is a simple optimisation which does not require a separate pass so it’s done directly in the emitter.

Both of these optimisations make little or no difference for most applications I tested so I don’t enable them. Some other optimisations I have not finished implementing:

Stack to register for subroutine calls. This is quite complicated, mostly because of recursion, but also because it is (in general) not possible to infer the type of a function called via a function pointers.
Stack to register for conditional blocks. This is the case where a condition is created through a computed jump, a typical example would be

… EQU JMP [ INC2 ] …

So if EQU returns 0, ADD2 will be executed, else it will be jumped over. The instruction has to be idempotent, and I think most commonly this is used with JMP2r. The effect on performance will generally be minimal.

Performance

Code used for testing

I did some limited performance evaluation using five command-line programs: three versions of fib*, primes, and stencil. With -O=2 (inlined ops and stack-to-reg), the compiled version is up to 12x faster than the original version. With the additional optimisations this might increase a bit, maybe to 15x, but not more.

The three version of the Fibonacci calculation are a modified version of uxn/projects/examples/exercises/fib.tal and the two versions used in an article that criticised Uxn for being inefficient.

The original fib.tal is very terse (ignoring the print function):

|0100 ( -> ) @reset

    #0000 INC2k ADD2k
    &loop
        ( print ) DUP2 ,print JSR
        ( linebreak ) #0a18 DEO
        ADD2k LTH2k ,&loop JCN
    ( halt ) #010f DEO

BRK

I modified it by writing a loop around it to repeat the calculations 2^16 times:

#ffff #0000 &iterate
...
INC2 GTH2k ,&iterate JCN

What I call fib2.tal is taken from the article:

|0100

#0020 ;fib JSR2
#01 #0f DEO BRK

@fib ( N -- fib[N] )
    DUP2 #0001 GTH2 ,&inductive-case; JCN JMP2r
    &inductive-case;
    DUP2 #0001 SUB2 ;fib JSR2 ( stack now N fib[N-1] )
    SWP2 #0002 SUB2 ;fib JSR2 ( stack now fib[N-1] fib[N-2] )
    ADD2 JMP2r

And fib32.tal is a 32-bit version of this code, also from that article:

|0100

#0020 ;fib JSR2
#01 #0f DEO BRK

@fib ( N -- fib[N] )
( not[n < 2] equivalent to n > 1 )
    DUP2 #0001 GTH2 ,&inductive-case; JCN #0000 SWP JMP2r
    &inductive-case;
    DUP2 #0001 SUB2 ;fib JSR2 ( stack now N fib[N-1] )
    ROT2 #0002 SUB2 ;fib JSR2 ( stack now fib[N-1] fib[N-2] )
    ;add32 JSR2 JMP2r

This uses the 32-bit math library.

I also tested uxn/projects/examples/exercises/primes.tal.

|0100 ( -> ) @reset

    #0000 INC2k
    &loop
        DUP2 ,is-prime JSR #00 EQU ,&skip JCN
            DUP2 DUP2 ;mem STA2
            ,print/short JSR
            ( space ) #2018 DEO
            &skip
        INC2 NEQ2k ,&loop JCN
    POP2 POP2
    ;mem LDA2 ,print/short JSR
    ( halt ) #010f DEO

BRK

@is-prime	
    DUP2
    #0001 EQU2 ,&fail JCN
    STH2k
    #01 SFT2 #0002
    &loop
        STH2kr OVR2 ( mod2 ) [ DIV2k MUL2 SUB2 ] ORA ,&continue JCN
            POP2 POP2 POP2r #00 JMP2r
            &continue
        INC2  GTH2k ,&loop JCN
    POP2 POP2 POP2r #01
JMP2r
    &fail POP2 #00 JMP2r

Finally, I wrote stencil.tal, which is a 3-D 6-point stencil code, so the value of each point in a 3-D space is calculated based on the values of its six neighbours (i+1,i-1),(j+1,j-1),(k+1,k-1). This is a very common pattern in scientific computing and a good number crunching test. The code is a bit long to list here. It is a quadruple-nested loop: a time loop containing loops over the x, y and z directions of the 3-D space. At each point, the calculation is simply the weighted average of the current value and the sum of the six neighbours:

.idx LDZ2 #0001 ADD2 LDA2 ( p(x-1,y,z) )
.idx LDZ2 #0001 SUB2 LDA2 ( p(x+1,y,z) )
ADD2
.idx LDZ2 #0010 ADD2 LDA2 ( p(x,y-1,z) )
.idx LDZ2 #0010 SUB2 LDA2 ( p(x,y+1,z) )
ADD2
ADD2
.idx LDZ2 #0100 ADD2 LDA2 ( p(x,y,z-1) )
.idx LDZ2 #0100 SUB2 LDA2 ( p(x,y,z+1) )
ADD2 
ADD2
#0003 MUL2 

.idx LDZ2 LDA2 ( p(x,y,z) ) 
ADD2
#0004 DIV2

The time loop repeats this calculation 2^16 times.

Performance results

I compiled these example with as optimisations inlining of operations and stack-to-register transformation. I used time to obtain the timings.

Code	Emulated (s)	Compiled (s)	Speed-up
`fib.tal`	1.57	0.96	1.6x
`fib2.tal`	0.471	0.047	10x
`fib32.tal`	1.86	0.151	12.3x
`primes.tal`	6.9	0.93	7.4x
`stencil.tal`	96.9	7.8	12.4x

For the examples with low speed-ups, the reason is that a lot of the time spend is I/O activity, which takes the same amount of time for the emulated and compiled versions:

$ time uxncli fib.rom >/dev/null
real	0m1.568s
user	0m0.924s
sys	0m0.644s

$ time uxncliprog >/dev/null
real	0m0.964s
user	0m0.384s
sys	0m0.580s

Even if we print to /dev/null, that still takes considerable I/O time.

For the tests without I/O (fib2, fib32, stencil), the speed-up is between 10x and 12.4x. I think with the additional optimisations, it might increase to maybe 15x but not more than that.

Power consumption

Finally, I had a look at the power consumption of a typical GUI-based Uxn app. I used powertop with the bunnymark benchmark, running 10,000 rabbits. I used htop for the CPU utilsation.

My laptop is nearly three years old, it is a PCSpecialist Fusion IV, which is really a TongFang PF4MN2F. The CPU is an Intel Core i7-10510U 1.80GHz and it has 16GB DDR4 memory.

Baseline

System baseline power is estimated at 766 mW

Power est.    Usage     Device name
564 mW      3.1%        DRAM
157 mW      3.1%        CPU core
44.9 mW      3.1%        CPU misc

Emulated

15% CPU

System baseline power is estimated at 7.98 W

Power est.    Usage     Device name
5.03 W     57.7%        CPU core
1.43 W     57.7%        CPU misc
1.02 W     57.7%        DRAM

Compiled

9% CPU

System baseline power is estimated at 8.17 W

Power est.    Usage     Device name
5.09 W     55.4%        CPU core
1.59 W     55.4%        CPU misc
978 mW     55.4%        DRAM    

What we see is that most of the power is drawn by the CPU and that there is effectively no difference between the emulated and compiled versions. I measured this a few times and the error margin is about 0.5 W so within that margin the results are identical. I also measured the power consumption for emulated and compiled versions of other applications in the demos folder. They all consume considerably less power than bunnymark but there was no significant difference between emulated and compiled versions.

Conclusion

The main takeaway of this experiment of compiling Uxntal to C is that it provides evidence that there is no need to do this for most Uxn applications: it will have no significant effect on performance or power consumption.

On the other hand, if you have to or want to you now can compile Uxntal to C and it can give speed-ups of the order of 10x for compute-heavy applications.

Code

The compiler code and demos can be found in my nito repo on Codeberg. The README has the instructions and a list of the code on which I tested the compiler.

The banner picture shows a row of bronze prayer wheels in a temple in Kyoto at night.

Functional programming in stack-based assembly

2022-10-09T00:00:00+01:00

Quoting in Uxntal: lambda functions, tuples and lists for free

What does it take to bring functional programming to a stack-based assembly language? tl;dr: not all that much. Uxntal has everything it takes to build a basic mechanism (“quoting”) that lets us create lambda functions, tuples, cons lists and more.

Uxntal and Uxn

Concepts such as lambda functions, quoting, partial application and closures are common in functional programming languages, but if you are not familiar with these you should still be able to follow most of the explanation. My article “Cleaner code with functional programming”, explains the basics of functional programming.

Although Uxn is a stack machine and Uxntal a stack language, it is quite easy to do register based programming by using labels as variables: the purely stack based

|0100
#06 #07 MUL

can be written as

|0000
@r1 $1 @r2 $1 @r3 $1
|0100
#06 .r1 STZ
#07 .r2 STZ
.r1 LDZ .r2 LDZ MUL .r3 STZ

where we store the values in memory and load them when needed. My implementation of lambda functions makes use of this approach.

Uxntal also has a simple but powerful macro mechanism which just creates short names for groups of tokens. I make heavy use of macros in what follows.

Because of the conciseness of its syntax, I use the venerable functional language Haskell for some of the examples below. A short primer on Haskell is my article “Everything is a function”.

Anonymous functions

Uxntal supports variables and named function calls through labels. And as the program is stored in writeable memory, it can be overwritten or modified in place.

I wanted to see if I could implement or emulate the behaviour of anonymous functions (called “lambda functions” in functional programming). For example, I’d like to be able to write something similar to the following Haskell code:

    map (\x -> x*x ) lst

or equivalent in Python:

    map( (lambda x: x*x), lst)

which would square all elements in the list lst. And I would like to be able to use lambda functions as arguments and as return values:

    (\f -> f) (\x -> 2*x) 3

and

    (\x -> (\y -> x+y)) 2 3

I also want to be able to combine lambdas and named functions:

    ( \x y -> (sum-sq x y) + 2*x*y ) 3

The reason I want to do this is mostly curiosity, but there are some practical advantages because the lambda functions can be generated dynamically based on runtime values. Also, the “quoting” mechanism used to build lambdas is more general and allows for “lazy” or delayed evaluation.

From the above examples, the clear feature of a lambda function is that it identifies by name the variables used as its arguments. I want to reflect this closely in Uxntal. The other key feature is that the lambda functions are values, and we need to apply them to an argument to get a computation. That is very similar to calling a function in Uxntal. Suppose we have

|0100
#0003 #0004 ;f JSR2
BRK
@f
    .x STZ2
    .y STZ2
    .x LDZ2 .x LDZ2 MUL2 .y LDZ2 .y LDZ2 MUL2 ADD2
    .x LDZ2 .y LDZ2 MUL2 #0002 MUL2 ADD2
    JMP2r

Using some macros I have defined for convenience, I can write this as

|0100
3 4 ;f call
BRK
@f
    ->x
    ->y
    x x * y y * +
    x y * 2 * +
    return

With a lambda notation, this would become

|0100
3 4  [' \x. \y.
        x' x' *' y' y' *' +'
        x' y' *' 2' *' +'
    ]' apply
BRK

The nested example (\x -> (\y -> x+y)) 2 3 would be with named functions:

|0100
3 2 ;f call
BRK
@f
    ->x
    ;g call
    return
@g
    ->y
    x y +
    return

and with lambdas

|0100
3 2 [' \x.
        [' \y.
            x' y' +'
        ]'
    ]' apply
    apply
BRK

In other words, the function is defined inside the quoted brackets and called using apply rather than call.

Implementation

So how do we do this? There are several components than need to be brought together to have named variables, nesting, and functions as values.

Uxntal quoting and unquoting

Quoting

First of all, we need some mechanism to defer evaluation of an operation, which I call “quoting” for short. Luckily, Uxntal has the essential feature: it is possible to quote an operation and unquote it later. For example:

#06 #07 ADD

would immediately compute 6*7; but if we write

#06 #07 LIT ADD

then we have the ADD operation as a symbol on the stack. This is what I call “quoting” for opcodes.

Unquoting through self-modification

To unquote the symbol and so evaluate the expression, we can do

#00 STR $1

which is a bit of Uxn magic: it is a relative store with a relative address of 0, and effectively it takes the symbol from the stack and puts it as the next instruction to be executed. The $1 is just a placeholder on the stack to create the space for the store. So the following would calculate 6*7 and print out *.

|0100
    #06 #07 LIT ADD
    #00 STR $1
    #18 DEO ( prints the character to stdout )
BRK

There isn’t really anything magical going on here: an equivalent program would be

|0100
    #06 #07 LIT ADD
    ;eval STA @eval $1
    #18 DEO ( prints the character to stdout )
BRK

or even

|0100
    #06 #07 LIT ADD
    ;eval STA ;eval JSR2
    #18 DEO ( prints the character to stdout )
BRK

@eval $1
JMP2r

The key mechanism is that Uxntal allows to overwrite the program code, so the $1 placeholder can at runtim be replaced by any byte, and all bytes are valid instructions.

Unquoting without self-modification

Even if Uxntal did not have modifiable code, we could still quote and unquote. After all, e.g. LIT MUL is exactly the same as LIT 1a or #1a so we can always put opcodes on the stack, they are just bytes. And to unquote them, we could use conditional jumps, for example:

#06 #07 LIT MUL
( opcode on the stack )
#1a EQU ,&eval-mul JCN
...
&eval-mul
MUL
...

So the self-modification is purely a more efficient way of unquoting.

Kinds of symbols

Apart from the opcodes, there are several other types of symbols we need to be able to quote and unquote: variables, constants and function calls.

For the variables, we need to handle declaration and use: the declaration results in the argument being stored at the location referenced by the variable, and the use results in the value stored at the referenced location being read. I store the variables in the zero-page memory:

@x $2 @y $2 @z $2

The declaration macro \x. should when unquoted result in

.x STZ2

Using a variable macro x' should when unquoted result in

.x LDZ2

Unquoting a constant simply means putting it on the stack.

Finally, named function call should when unquoted result in

;f JSR2

Grouping symbols

As we want to be able to nest lambdas, we need some delimiters to group the quoted symbols. That is what the [' and ]' bracket macros do. A quoted sequence returns its address onto the working stack. In this way we can pass lambdas around as values.

Building the lambda

To build the lambda function, I need to store the quoted symbols. Crucially, I need to be able to identify the kind of each symbol. I encode each symbol using three bytes, the third byte is a label to identify the kind of symbol. The opening and closing brackets are also labelled and stored in this way. The opening bracket symbol contains the size of the lambda; the closing bracket is only there as a jump target. For example,

[' \x. x' 1' +' ;f call' ]'

is encoded as

Value   Label
------  ------
00 07   LAMBDA
.x __  BIND
.x __  ARG
00 01   CONST
ADD __  OPCODE
;f      CALL
__ __   END

The __ are unused slots because for simplicity I have currently made all symbols the same size (this will probably change as I don’t like inefficiency). For an 8-bit version, it would be possible to encode everything in two bytes.

Each quoting operation is implemented as a function and those functions keep track of where to write the symbols in memory. The memory for the lambdas starts from |0100, so I effectively overwrite the program. This is OK because the lambda definition in the program takes up more space than the encoding of the lambda, so there is no risk of overwriting named functions.

The lambda stack

Because I want to be able to nest lambdas, I need a stack. I could abuse the return stack for this purpose, but I don’t think using either of the Uxn stacks for persistent state is a good idea. So I build a stack in the second half of the zero-page memory. This stack stores tuples of the starting address and the size (in 3-byte words) of each lambda. Each quoting operation manipulates that stack to create the memory encoding for each lambda.

Applying the lambda

The unquoting operation (the apply call) uses the same lambda stack. It takes the address of the lambda as argument, and loops over all symbols in the stored representation. The interesting case is that of nested lambdas: when the symbol represents an opening bracket, the evaluator puts the address of the nested lambda on the stack and jumps to the closing bracket, which acts as a no-op. For non-nested lambdas, the closing bracket returns the lambda’s address. In this way I can evaluate nested lambdas as part of an apply call, and I can also return lambdas.

More uses of quoting

The quoting mechanism can be used for other purposes than creating lambda functions. Or to look at it another way, lambda functions that take no arguments (“blocks”) are valid:

[' ;f call' ]'

This gives us an option to have deferred calls.

A lazy `if`

Lazy means here that we will only evaluate the true or false branch after evaluating the condition, instead of evaluating both and returning the result based on the condition. We can create a lazy-if function using quoting as follows:

[' <false-expr>' ]' [' <true-expr>' ]' <cond>  ;lazy-if call

The lazy-if function is quite simple:

@lazy-if
    ,&if-true JCN
    ( if-false )
    POP2 apply-tc
    &if-true
    NIP2 apply-tc

Here, apply-tc is a tail call version of apply, so equivalent to apply return. We could of course create a case expression in this way too.

Tuples

The quoting mechanism can also be used to create immutable lists, or actually tuples (a tuple is a generalisation of a pair, can have any number of values of any type but can’t be modified):

[' 2 4 + \' 3 5 + \' ]' ( a tuple (6,8) )
[' 1 2 + \' 4 \' ]'  ( a tuple (3,4) )
SWP2
apply * apply * / ( 4 )

In this example, [' ... ]' first creates two tuples and stores it somewhere; then apply puts the values on the stack. It would be quite easy to write an indexing function to access element by index.

[' 2 4 + \' 3 5 + \' ]'  ( creates (6,8) )
[' 1 2 + \' 4 2 /  2' ]'  ( creates (3,2,2) )
SWP2
apply * ( 6*8 )
SWP2
apply [ * * ]  ( 3*2*2 )
/ ( 48/12 )

Another example to illustrate the functions to work on tuples:

[' [ 1 2 + ] \' [ 3 4 * ] \' 5' ]' ->l ( store the tuple in l )
l fst  print16-nl ( first element )
l snd print16-nl ( second element )
l 2 at print16-nl ( at takes the index (base 0) and returns the element )
[  + + ] POP2
l empty print8-nl ( test if the tuple is empty, returns #00 here )
[' ]' empty print8-nl ( test if the tuple is empty, returns #01 here )
l size print16-nl ( returns the number of elements in the tuple, i.e. 3 )

Because Uxntal is not statically typed, there is no difference between immutable lists and tuples.

Cons lists

What I call a “cons list” is a list constructed starting from an empty list by adding a single element. The function to construct such a list is typically called cons in functional programming languages, and in Haskell it has a corresponding operator :. So the list

[ 1 2 3 4 ]

is really syntactic sugar for

1:2:3:4:()

which is a shorter notation for

(cons 1 (cons 2 (cons 3 (cons 4 ()))))

We can use the tuples in combination with a cons function in Uxntal to create cons lists:

[' ]' 7 cons 6 cons 5 cons ( [ 5 6 7 ] )

There are a few functions to manipulate such lists, the most common ones are head which returns the first element and tail which returns the rest of the list (car and cdr in Scheme).

[' ]' 7 cons 6 cons 5 cons ->l ( (5:6:(7:[])) )
l tail tail head ( )
l tail head
*

There is of course also a length function, and a function null to check if the list is empty:

[' ]' 7 cons 6 cons 5 cons ->l ( (5:6:(7:[])) )
l 4 cons length ( 4 )
l tail tail tail null  ( #01 )

Partial application and closures

The nested lambdas allow partial application:

5 [' \x.  [' \y. x' y' +' ]'  ]' lambda-call

This means that we don’t have to provide values for all arguments, and what we return is a function that has been specialised with the arguments that have been provided. In the example, we will effective obtain a function that will calculate y+5. This is a technique that can be used to generate specialised functions from a template.

This looks a lot like a proper closure but while it seems to work, the value is not really captured. We simply store 5 in @x; if I modify x between the lambda calls, it will use the modified value. With a proper closure, once it has been created, it does not matter that the original value gets modified. This is not the case in my approach because x and y are globals, not locals. It is possible to address this but it would be very expensive:

In \x., we store the address of .x somewhere;
Then we check the entire downstream lambda definition for occurrences of that address;
We need to take into account that any further occurrence of \x. resets this;
Then we could replace .x LIT LDZ2 OPCODE' with .x LDZ2 CONST', so the value would become embedded in the lambda.

If you want to address this issue, please let me know.

Desugaring

For example:

5 \[ \x. x' x' *' \[ \y. y' 1' +' \] lambda-call' \] lambda-call ( returns #001a )

The \[ … \] brackets indicate the start and end of a quoting region. Within a quoting region, all quote symbols make up the anonymous function. I use macros to make it a bit nicer. Desugaring the example one layer, we get the following:

#0033
['
    .x bind'
    .x arg'
    .x arg'
    LIT MUL2 opcode'
    ['
        .y bind'
        .y arg'
        #0001 const'
        LIT ADD2 opcode'
    ]'
    ;lambda-call call'
]'
;lambda-call call

Conclusion

Although it may seem at first sight that a stack-based assembly language is quite far removed from a high-level functional language, in Uxntal we can implement many fundamental functional programming concepts such as lambdas, lazy conditionals, tuples and lists and concepts such as partial application and closures, simply by introducing the concept of quoting and unquoting symbols. Uxntal’s simple macro mechanism provides sufficient abstraction to create readable functional programs.

Code

The code implementing the constructs described in this article is available in my hyakuwa repo on Codeberg.

8-bit vs 16-bit

By default, my implementation uses 16-bit words as values. It is possible to use 8-bit constants, arguments and operations. The macro files quote-lambda_macros_8bit.tal and lambda_decls_8bit.tal have the appropriate definitions, or with less sugar you can use bind8',arg8' and const8'.

What’s next?

As is the nature of such projects, there is always a lot more that could be done. There are two main drawbacks to the current approach: the macro mechanism is not expressive enough and the computational overhead is very high.

To address the former we could create a custom assembler, which effectively means we have a new functional language that assembles into Uxntal, either source or rom. If we did that, we could write

5 \[ \x. x' x' *' \[ \y. y' 1' +' \] lambda-call' \] lambda-call

as, for example,

5 (\x -> x x * (\y -> y 1 +))

This would be a lot more readable and it would also allow to tailor the memory allocation for variables.

We can’t fundamentally address the computational overhead. It can definitely be reduced as the current implementation is not optimised. But effectively, the quoting mechanism is a kind of interpreter, so it always incurs the read-eval-write overhead. What we could do instead is compile the Uxntal code itself, rather than emulating it. But that will be the topic of another article.

The banner picture shows a row of small rabbit status at a Shinto shrine in Kyoto.

The politics of writing compilers

2021-12-20T00:00:00+00:00

Compilers are pieces of software that convert program code from one format into another. Typically, they convert source code into a binary format for execution on specific hardware. Compilers can also target virtual machines instead of physical hardware, or they can convert source code into different source code.

(Some programming languages are not compiled but interpreted, which means they require another piece of software, an interpreter, to run them. An interpreter is effectively a compiler and virtual machine combined, because it transforms the source code into some internal representation that it can execute. In this article, I focus on compilers but the same issues apply to interpreters.)

Most end users never deal with compilers because they simply run the compiled applications. Some users have the need, know-how and skills to compile code written by others; fewer again have the need, know-how and skills to write their own code and compile it. And quite few people have the know-how and skills to write a compiler. And yet, compilers are crucially important, as without a compiler, programs can’t run.

This automatically takes me to the question of politics. Whoever controls the compiler has some power over the users (both the programmers and the end users), and therefore compilers are political objects.

But what about compiler research, the field of computing science which investigates new theories, formalisms and techniques to advance the knowledge on compilers? Generally speaking, compiler researchers will not consider their work political. A compiler is a tool that can be used regardless of political convictions, and research into better compilers just leads to better tools. Compiler writers can make a similar argument: we just make them.

There are many different aspects to this. As a compiler researcher or compiler writer, you could ask yourself the following questions:

What are the reasons for doing this work?
- Why are you writing this compiler? Or what is the target compiler for your research?
- Who is going to benefit from your work? You? Your employer? The community? What community?
What are your assumptions?
- Assumptions on the programmer:
  - What do they need to know?
  - What education level do they need?
  - How wealthy do they need to be?
- Assumptions on the user:
  - Who is the user of your compiler?
  - What are their required skills and background?
  - How usable and accessible is your compiler?
  - Which users can afford to use your compiler?
- Assumptions on the computer:
  - If your compiler targets a specific hardware architecture, who has access to this hardware?
  - Does your compiler need the latest hardware and/or latest operating system to run?
  - How much memory and disk space does it need?
  - Does it need internet access?
- Assumptions on the availability:
  - Is it available free of charge?
  - Does it work on many operating systems?
  - Is it available in a readily-useable form?

Inevitably, the answers to these questions are inherently political.

The banner picture shows activists and a candidate of the Communist Party of Japan.

Frugal computing: developer perspective

2021-12-20T00:00:00+00:00

On the need for low-carbon and sustainable computing and what developers can do about it.

This is a follow-up on my article about Frugal Computing, focusing on the what developers can do to help reduce the carbon emissions from computing.

Key points

The problem:

The current emissions from computing are about 2% of the world total but are projected to rise steeply over the next two decades. By 2040 emissions from computing alone will be more than half the emissions level acceptable to keep global warming below 1.5°C. This growth in computing emissions is unsustainable: it would make it virtually impossible to meet the emissions warming limit.
The emissions from production of computing devices far exceed the emissions from their electricity usage, so even if devices are more energy efficient producing more of them will make the emissions problem worse.
The CO₂ emissions from the internet infrastructure resulting from individual internet usage are also very large and growing steeply because of the increased use of higher-resolution video and VR/AR.

The solution:

As a society we need to start treating computational resources as finite and precious, to be utilised only when necessary, and as effectively as possible. We need frugal computing: achieving the same results for less energy.

Developer actions:

Make software that works on older devices, the older the better.
Make software that will keep on working for a very long time.
Make software that uses the least amount of total energy to achieve its results.
Make software that also uses the least amount of network data transfer, memory and storage.
Make software that encourages the user to use it in a frugal way.

Extending the useful life of computing devices is key

End-user computing devices (phones, laptops, desktops) create more emissions during their manufacturing than during their useful life, and this is not likely to change significantly in the next two decades. Therefore, we must extend the useful life of our computing devices. This is the top priority.

Make software that works on older devices, the older the better

One of the main reason why users upgrade their devices is that the device is no longer capable of supporting the needs of new software. This can be because the new software requires

more resources than the device has (memory, CPU speed, network bandwidth, screen resolution);
a more recent version of other software than device can support, including the operating system.

This is why when developing software, you should make it work on older devices by design. That way, users with older devices can use your software without having to upgrade. That also means your software should use the least amount of resources (CPU, memory, storage etc) possible to achieve its results, as older devices have fewer resources.

Make software that will keep on working for a very long time

It is also important that the software you write will keep on working for as long as possible, ideally forever. One reason why your software might stop working is that its resource utilisation grows over time. This can for example be the case if it needs increasingly more memory or disk space the longer it gets used. Another reason is that bugs and vulnerabilities that are discovered only after a long time might not get fixed.

The software needs to be supported for as long the device lasts. So frugal software requires a long-term commitment in terms of updates for security and bugfixing.

Being frugal with resources

Whereas for mobile phones the emissions from usage are much lower than the emissions from manufacturing, for laptops and desktop computers, emissions from usage are still significant.

Make software that uses the least amount of total energy to achieve its results

Not only do older devices have fewer resources, resource consumption eventually means emissions, because all resources on a device consume energy. In practice, a large source of emissions resulting from end user device activity is the local Wifi, because transfering the data (e.g. video) consumes a lot of energy. However, on laptops, desktops and servers, CPU and GPU power consumption is also a significant factor.

The consequence is that as a developer, you need to be aware of all factors that contribute to the total energy consumption of a task performed by your software. For apps and web sites, the dominant sources of emissions are in the home. For non-networked games, the power consumption of the CPU and GPU is the main source of emissions.

Make software that encourages the user to be frugal

For some applications, the behaviour of the user can have a major effect on the resources it uses. If you are developing such an application, consider if you can encourage or nudge the user to use fewer resources.

For example:

Web browsers need resources depending on the number of sites the user is accessing concurrently as well as on the design of the sites;
for video based applications, energy consumption depends on the resolution of the video;
if the user experiences the app as sluggish or erratic, they might be more inclined to upgrade their device.

[Note: post edited on 2022-12-07 because the original post assumed that internet network emissions are proportional to the traffic volume, and more recent research shows this is not the case.]

Generic datastructure traversals with roles and introspection

2021-12-13T00:00:00+00:00

I am a lambdacamel and therefore I like to adapt concepts and techniques from functional programming, and in particular from the Haskell language, to Raku. One of the techniques that I use a lot is generic traversals, also known as “Scrap Your Boilerplate” after the title of the paper by Simon Peyton Jones and Ralf Lämmel that introduced this approach. In their words:

Many programs traverse data structures built from rich mutually-recursive data types. Such programs often have a great deal of “boilerplate” code that simply walks the structure, hiding a small amount of “real” code that constitutes the reason for the traversal. ”Generic programming” is the umbrella term to describe a wide variety of programming technology directed at this problem.

So to save you having to write your own custom traversal, this approach gives you generic functions that do traversals on arbitrary data strucures. In this article, I will explain how you can easily implement such generics in Raku for arbitrary role-based datastructures. There is no Haskell in this article.

Roles as datatypes by example

I implemented of these generics for use with role-based datatypes. Raku’s parameterised roles make creating complex datastructures very easy. I use the roles purely as datatypes, so they have no associated methods.

For example, here is an example code snippet in a little language that I use in my research.

map (f1 . f2) (map g (zipt (v1,map h v2)))

The primitives are map, . (function composition), zipt and the tuple (...), and the names of functions and vectord. The datatype for the abstract syntax of this little language is called Expr and looks as follows:

# Any expression in the language
role Expr {}
# map f v
role MapV[Expr \f_,Expr \v_] does Expr {
    has Expr $.f = f_;
    has Expr $.v = v_;
}
# function composition f . g
role Comp[Expr \f_, Expr \g_] does Expr {
    has Expr $.f = f_;
    has Expr $.g = g_;
}
# zipt t turns a tuple of vectors into a vector of tuples
role ZipT[Expr \t_] does Expr {
    has Expr $.t = t_
}
# tuples are just arrays of Expr
role Tuple[Array[Expr] \e_] does Expr {
    has Array[Expr] $.e = e_
}
# names of functions and vectors are just string constants
role Name[Str \n_] does Expr {
    has Str $.n = n_
}

The Expr role is the toplevel datatype. It is empty because it is implemented entirely in terms of the other roles, which thanks to the does are all of type Expr. And most of the roles have attributes that are also of type Expr. So we have a recursive datatype, a tree with the Name node as leaves.

We can now write the abstract syntax tree (AST) of the example code using this Expr datatype:

my \ast = MapV[ 
    Comp[
        Name['f1'].new,
        Name['f2'].new
    ].new,
    MapV[
        Name['g'].new,
        ZipT[
            Tuple[
                Array[Expr].new(
                    Name['v1'].new,
                    MapV[
                        Name['h'].new,
                        Name['v2'].new
                    ].new
                )
            ].new
        ].new
    ].new
].new;

The typical way to work with such a datastructure is using a given/when:

sub worker(Expr \expr) {
    given expr {
        when MapV {...}
        when Comp {...}
        when ZipT {...}
        ...        
    }
}

Alternatively, you can use a multi sub:

multi sub worker(Mapv \expr) {...}
multi sub worker(Comp \expr) {...}
multi sub worker(ZipT \expr) {...}
...

In both cases, we use the roles as the types to match against for the actions we want to take.

(For more details about algebraic datatypes see my earlier article Roles as Algebraic Data Types in Raku.)

Generics

If I want to traverse the AST above, what I would normally do is write a worker as above, where for every node except the leaf nodes, I would call the worker recursively, for example:

sub worker(Expr \expr) {
    given expr {
        when MapV {
            my \f_ = worker(expr.f);
            my \v_ = worker(expr.v);
            ...
        }
        ...        
    }
}

But wouldn’t it be nice if I did not have to write that code at all? Enter generics.

I base my naming and function arguments on that of the Haskell library Data.Generics. It provides many schemes for traversals, but the most important ones are everything and everywhere.

everything is a function which takes a datastructure, a matching function, an accumulator and an update function for the accumulator. The matching function defines what you are looking for in the datastructure. The result is put into the accumulator using the update function.
```
  sub everything(
      Any \datastructure, 
      Any \accumulator, 
      &joiner, 
      &matcher 
      --> Any){...}
```
everywhere is a function which takes a datastructure and a modifier function. The modifier function defines which parts of the datastructure you want to modify. The result of the traversal is a modified version of the datastructure.
```
  sub everywhere(
      Any \datastructure, 
      &modifier 
      --> Any){...}
```

The most common case for the accumulator is to use a list, so the updated function appends lists to the accumulator:

sub append(\acc, \res) {
    return (|acc, |res);
}

As an example of a matching function, let’s for example find all the function and vector names in our AST above:

sub matcher(\expr) {
    given expr {
        when Name {
            return [expr.n]
        } 
    }
    return []
}

So if we find a Name node, we return its n attribute as a single-element list; otherwise we return an empty list.

my \names = everything(ast,[],&append,&matcher); 
# => returns (f1 f2 g h v1 v2)

Or let’s say we want to change the names in this AST:

sub modifier(\t) {
    given t {
        when Name {
            Name[t.n~'_updated'].new 
        }
        default {t}
    }
}

my \ast_ = everywhere(ast,&modifier); 
# => returns the AST with all names appended with "_updated"

Implementing Generics

So how do we implement these magic everything and everywhere functions? The problem to solve is that we want to iterate through the attributes of every role without having to name it. The solution for this is to use Raku’s Metaobject protocol (MOP) for introspection. In practice, we use the Rakudo-specific Metamodel. We need only three methods: attribute, get_value and set_value. With these, we can iterate through the attributes and visit them recursively.

Attributes can be $, @ or % (and even & but I will skip this). What this means in terms of Raku’s type system is that they can be scalar, Iterable or Associative, and we need to distinguish these cases. With that, we can write everything as follows:

sub everything (\t, \acc,&update,&match) {
    # Arguments a immutable, so copy to $acc_
    my $acc_ = acc;
    # Match and update $acc_
    $acc_ =update($acc_,match(t));
    # Test the attribute type
    if t ~~ Associative {
        # Iterate over the values
        for t.values -> \t_elt  {
            $acc_ = everything(t_elt,$acc_,&update,&match)
        }
        return $acc_; 
    }     
    elsif t ~~ Iterable {
        # Iterate
        for |t -> \t_elt  {
            $acc_ = everything(t_elt,$acc_,&update,&match)
        }
        return $acc_; 
    }

    else { 
        # Go through all attributes
        for t.^attributes -> \attr {
            # Not everyting return by ^attributes 
            # is of type Attribute
            if attr ~~ Attribute {
                # Get the attribute value
                my \expr = attr.get_value(t);
                if not expr ~~ Any  { # for ContainerDescriptor::Untyped
                    return $acc_;
                }
                # Descend into this expression
                $acc_ = everything(expr,$acc_,&update, &match);
            }
        }
    }
    return $acc_
}

So what we do here essentially is:

for @ and % we iterate through the values
iterate through the attributes using ^attributes
for each attribute, get the expression using get_value
call everything on that expression
the first thing everything does is update the accumulator

everywhere is similar:

sub everywhere (\t_,&modifier) {
    # Modify the node
    my \t = modifier(t_);
    # Test the type for Iterable or Associative
    if t ~~ Associative {
        # Build the updated map
        my %t_;
        for t.keys -> \t_k  {
            my \t_v = t{t_k};
            %t_{t_k} = everywhere (t_v,&modifier);
        }
        return %t_; 
    }     
    elsif t ~~ Iterable {
        # Build the updated list
        my @t_=[];
        for |t -> \t_elt  {
            @t_.push( everywhere(t_elt,&modifier) );
        }
        return @t_; 
    }

    else {
        # t is immutable so copyto $t_
        my $t_ = t;
        for t.^attributes -> \attr {            
            if attr ~~ Attribute {
                my \expr = attr.get_value(t);
                if not expr ~~ Any  { # for ContainerDescriptor::Untyped
                    return $t_;
                }
                my \expr_ = everywhere(expr,&modifier);                
                attr.set_value($t_,expr_);
            }
        }
        return $t_;
    }
    return t;
}

So what we do here essentially is:

for @ and % we iterate through the values
iterate through the attributes using ^attributes
for each attribute, get the expression using get_value
call everywhere on that expression
update the attribute using set_value

This works without roles too

First of all, the above works for classes too, because the Metamodel methods are not specific to roles. Furthermore, because we test for @ and %, the generics above work just fine for data structures without roles, built from hashes and arrays:

my \lst = [1,[2,3,4,[5,6,7]],[8,9,[10,11,[12]]]];

sub matcher (\expr) {
    given expr {
        when List {
            if expr[0] % 2 == 0 {                
                    return [expr]                
            }            
        }
    }
    return []
}

my \res = everything(lst,[],&append,matcher);
say res;
# ([2 3 4 [5 6 7]] [8 9 [10 11 [12]]] [10 11 [12]] [12])

Or for hashes:

my %hsh = 
    a => {
        b => {
            c => 1,
            a => {
                b =>1,c=>2
            } 
        },
        c => {
            a =>3
        }
    },
    b => 4,
    c => {d=>5,e=>6}
;

sub hmatcher (\expr) {
    given (expr) {
        when Map {
            my $acc=[];
            for expr.keys -> \k {                
                if k eq 'a' {
                    $acc.push(expr{k})
                }
            }
            return $acc;
        }
    }
    return []
}

my \hres = everything(%hsh,[],&append,&hmatcher);
say hres;
# ({b => {a => {b => 1, c => 2}, c => 1}, c => {a => 3}} {b => 1, c => 2} 3)

Conclusion

Generic datastructure traversals are a great way to reduce boilerplate code and focus on the actual purpose of the traversals. And now you can have them in Raku too. I have shown the implementation for the two main schemes everything and everywhere and shown that they work for role based datastructures as well as traditional hash- or array-based datastructures.

How to reduce the carbon footprint of your digital lifestyle

2021-11-19T00:00:00+00:00

The CO₂ emissions from manufacturing and use of digital devices (laptops, phones, tablets, TVs, …) are huge and rising steeply. Here is what you as a consumer can do to help reduce your digital carbon footprint.

Key points

The problem:

The current CO₂ emissions from the internet and digital devices form about 2% of the world CO₂ emissions but are expected to rise steeply over the next two decades. By 2040 this “digital carbon footprint” alone will make up more than half of the acceptable global carbon footprint to keep global warming below 1.5°C. So the growth in digital carbon footprint is unsustainable: it would make it virtually impossible to keep global warming below the safe limit.
What is little known is that the CO₂ emissions from production of digital devices exceed the emissions from the electricity they use over their lifetime. So even if newer devices are more energy efficient, producing more of them will make the emissions problem worse.
Apart from the carbon footprint of production and energy consumption of our digital devices, their main purpose — accessing the internet — also causes increasing amounts of CO₂ emissions. As an end user, you only have direct control over a small fraction of this: the energy consumption of your home (or office etc) network. The core network consumes a huge amount of energy but it does this regardless of how much data you transfer. But there is indirect consumer control: if we collectively use more and more data, the network operators have to install additional capacity. Conversely, if we collectively used a lot less data, some of the networking infrastructure could be powered down and even decommissioned. So it still makes sense to be frugal with your internet data.

The solution — what you as a consumer can do:

Buy fewer new devices (laptop/desktop/phone/tablet/TV):
- Keep using your existing devices for as long as possible.
- Have your devices repaired rather than replacing them.
- Only buy devices you really need.
When you use your devices, use them as little as possible and in the most energy-efficient manner:
- Reduce your internet usage.
- Reduce your resolution when watching streaming video.
- Turn off your camera in online meetings.

Buy fewer new devices

The CO₂ emissions from production of digital devices far exceeds the emissions from the electricity they use. Therefore the best possible action from a consumer perspective is not to buy new devices.

For digital devices such as mobile phones, tablets, laptops, desktop computers and televisions, the carbon footprint from manufacturing is so large that no amount of energy efficiency savings during use can compensate for it. So buying a new device effectively results in more emissions, even if it is a more energy-efficient model. It is better to keep using an older device for longer, even if it less energy-efficient. For the same reason it is better to have your devices repaired than to replace them.

Companies are starting to realise that increasing numbers of consumers want to use their devices for longer. Apple has recently introduced a repair self-service and several brands of Android phones have started to offer up to four years of support on some models.

When you use your devices, use them frugally

The internet network infrastructure also generates huge amounts of CO₂ emissions. This is the “invisible” carbon footprint of the physical infrastructure required to serve, store and transfer video, voice and data.

As an end user, you only have direct control over a small fraction of this: the energy consumption of your home (or office etc) network. The core network consumes a huge amount of energy but it does this regardless of how much data you transfer. But there is indirect consumer control: if we collectively use more and more data, the network operators have to install additional capacity. Conversely, if we collectively used a lot less data, some of the networking infrastructure could be powered down and even decommissioned. So it still makes sense to be frugal with your internet data. The same argument holds for cloud storage: manufacturing storage devices has a huge carbon footprint. If we collectively store less data in the cloud, fewer storage devices are needed, and that reduces emissions.

For video calls, the most effective action is to switch off your camera whenever possible, even if this does not save a lot of energy in the immediate sense.
For watching streaming video (including TV), reducing the resolution has a similar effect.
When you use a search engine or browse the web you also increase your footprint. Search requires a lot of storage hardware as well as compute power to calculate the ranking of results. The cost of browsing the web is also high in an less intuitive way: the machinery for brokering and serving ads and tracking user behaviour requires a lot of compute power.

Conclusion

In a nutshell, the key actions you can take to reduce your digital carbon footprint are to buy fewer new devices, keep using your existing devices for as long as possible and only buy devices you really need: this reduces the carbon footprint of manufacturing. And when using your device, the most effective action is to be frugal: reduce your internet usage, reduce the resolution when watching streaming video and turn off your camera in video calls. Store as little data as possible, especially in the cloud.

Postscript: video calls are still better than cars, even electric ones

Even with your camera on, it is still better to have a video conference than to drive to a face-to-face meeting. Assuming average estimates for the emissions from internet usage and car journeys, driving just 500 m causes the same emissions as a video call of one hour. And it might come as a surprise that even if you drive a new electric car, that distance only increases to 1 km.

[Note: post edited on 2022-12-07 because the original post assumed that internet network emissions are proportional to the traffic volume, and more recent research shows this is not the case.]

Why I wrote Haku

2021-10-17T00:00:00+01:00

A few weeks ago I released Haku, a Japanese natural-language programming language. Haku is a strict functional language with implicit typing, and an example program looks like this:

裂くとはパーツとヨウソで
若しパーツが空に等しいなら
［［ヨウソ］］ですけど
そうでなければ、
一パートはパーツの頭、
一パーツ一はパーツの尻尾、
一マエはパートの頭、
では
若しマエが〈ヨウソ引く一〉に等しいなら
［ヨウソ・パート］・パーツ一ですが
そうでなければ
［ヨウソ］・パーツ
の事です。

本とは
列は壱と弐と三と四と五と七と八と十一と十三と十六と十七、
仮一は列と空を裂くので畳み込む、
仮二は逆な仮一、
魄は仮二を逆で写像する、
魄を見せる
の事です。

The repository README explains the language and gives some background, as does this presentation. I have also written a separate post about the implementation of Haku in Raku.

This article is about my motivation for creating Haku.

I am interested in how programming languages influence the programmer’s thinking (the old adage of “to the programmer with a hammer, everything looks like a thumb”).

From personal experience, I observe that my thinking patterns are quite different when I program in a functional language, an imperative one or an object-oriented one. There is also a marked difference between programming in a statically or dynamically typed language.

But what about the influence of the programmer’s native language? Most programming languages are based on English, and in particular function calls typically use English word order.

Arithmetic in English and Flemish

For example, let’s consider the common arithmetic operations +, -, *, /. If we use named functions rather than operators for these, and used the common parentheses-and-commas syntax for function calls, we get something like

A+B: add(A,B) 
A-B: subtract(A,B)
A*B: multiply(A,B)
A/B: divide(A,B)

In English we would express this most commonly as an infinitive or as an imperative. For the infinitive, we have:

to add A and/to B
to subtract A and/from B
to multiply A and B
to divide A and/by B

The pattern is 'to' <verb> A 'and' B.

For the imperative, we have:

add A and/to B
subtract A and/from B
multiply A and B
divide A and/by B

The pattern is <verb> A 'and' B, so the same pattern apart from the to. And it is easy to see how this pattern informed the typical function call syntax <verb> '(' A ',' B ')'.

However, in Flemish (or Dutch), the order of the arguments is quite different. For the infinitive, we have:

A en/bij B optellen; A optellen bij B
A en/van B aftrekken; A aftrekken van B
A en/met B vermenigvuldigen; A vermenigvuldigen met B
A en/door B delen; A delen door B

The first variant has the pattern A 'en' B <verb>; the second variant needs a different preposition for each verb but the pattern is A <verb> <preposition> B.

For the imperative, we have:

tel A op bij B; tel A en/bij B op
trek A af van B; trek A en/van B af
vermenigvuldig A en/met B
deel A en/door B

There is not just one single general pattern but three: <verb> A 'en' B, <verb> A <preposition> <preposition> B and <verb> A <preposition> B <preposition>, depending on whether the verb has a preposition as part of it or not, and on the position we choose for that preposition.

So not only are word orders for infinitive and imperative quite different, there is no simple rule for the imperative word order. Which makes me wonder what programming languages would look like if their developers had not been English native speakers.

That question becomes even more interesting for non-Indo-European languages, because despite the example above, there are still lots of grammatical similarities between languages such as English, French and German.

Japanese natural-language programming languages

There is one such language that particularly interests me and that is Japanese. I have been learning it for a long time and written several posts on Japanese language related topics.

Besides having a very different grammar, Japanese has writing system that is a very different from the Latin alphabet as well as its own number system.

So I decided to create a natural-language programming language based on Japanese.

There are already several Japanese natural-language programming languages, all made by Japanese native speakers. Wikipedia lists eight but there are actually only four that are still under active development: Dolittle, Produire, Nadeshiko and Mind.

Dolittle ドリトル is an object-oriented language specifically designed for teaching children to program and follows the Logo tradition with a turtle to draw shapes.
Produire プロデル is an imperative and object-oriented language but more general purpose. It also has a turtle library, so education is definitely one of the main design purposes.
Nadeshiko なでしこ (meaning “pink”, the flower) is an open source general purpose imperative language.
Mind is also imperative. Although it is actually a Forth-style stack-based language, in general structure it is similar to the other three.

All these language are complete with support for graphics, networking etc and their own IDE and/or web-based editor. They are practical programming languages, so they all support the use of Arabic numerals as well as operators for arithmetic, logic and comparison operations.

Haku

My motivation to create Haku was not to create a practical language. I wanted to explore what the result is of creating a programming language based on a non-English language, in terms of syntax, grammar and vocabulary. In particular, I wanted to allow the programmer to control the register of the language to some extent (informal/polite/formal). Nadeshiko and Mind allow this to some extent, but I wanted even more flexibility.

Grammar

My main motivation for creating Haku is the difference in grammar between Japanese and most Indo-European languages.

Notions such as “noun”, “adjective”, “adverb” and “verb” are not quite so clearly defined in Japanese. For example, consider the word yasashii, “kind”. A person who is kind is a yasashii hito. A person who is not kind is a yasashikunai hito. But in its own right, yasashiku is and adverb, and ~nai is the plain negative verb ending, e.g. “I don’t understand” is wakaranai. And this “adjective” can get a past tense: “the person who was not kind” is yasashikunakatta hito. And if we chain adjectives, e.g. “a kind and and clever person”, we get yasashikute kashikoi hito. And indeed we can have yasashihunakute　kashikonkatta hito, “the kerson who was not kind and smart”. It is also very easy to nominalise a verb or verbalise a noun by adding a suffix.

The word order in a sentence is also quite different from most Indo-European languages. The typical order is main topic, secondary topic(s), verb. The function of the topics is indicated with what is called a “particle”, a kind of suffix. For example, “I ate the pudding with a spoon” is purin wo supuun de tabeta. In this example, the main topic “I” is implied. Japanese is quite a parsimonious language: whenever possible, implied topics are left out, to be inferred from the context.

Finally, compared to Indo-European languages, verb conjugation serves a different purpose in Japanese. For example in English, French and German, tenses are mainly uses to give precise indications of the time and duration of the action: simple past, present continuous, future perfect continuous etc. Japanese has essentially two tenses: the past and the non-past; and a form to similar to the -ing form in English to indicated an ongoing action, although that is again a loose approximation. However, there are many tenses to indicate modifiers to the verb to say e.g. that something is possible, that the speaker wants something, that a third party wants something, that the speaker has begun to do something, that someone is doing someone a favour and of course to express the level of politeness. For example, shachou ha kiite kuremashita “the boss did me the favour of listening to me” (the ~mashita is a polite verb form), or wasurekaketeita “I had begun to forget” (~ta is a plain past, rather than polite).

Putting at least some of this grammar in the programming language seemed like an interesting challenge to me. In particular, I was interested in how programmers perceive functions calls. Some time ago I ran a poll about this, and 3/4 of respondents answered “imperative” (other options were infinitive, noun, -ing form).

In Japanese, the imperative (meireikei, “command form”) is rarely used. Therefore in Haku you can’t use this form. Instead, you can use the plain form, the polite -masu form or the -te form (like “-ing”), including -te kudasai (similar to “please”). Whether a function is perceived as a verb or a noun is up to you, and the difference is clear from the syntax. If it is a noun, you can turn it into a verb by adding suru, and if it is a verb, you can add the no or koto nominalisers. And you can conjugate the verb forms in many different ways, although in practice the verb ending has no semantic function in the Haku language.

Naming and giving meaning

In principle, a programming language does not need to be based on natural language at all. The notorious example is APL, which uses symbols for everything. Agda programmers also tends to use lots of mathematical symbols. It works because they are very familiar with those symbols. An interesting question is if an experienced programmer who does not know Japanese could understand a Haku program; or if not, what the minimal changes would be to make it understandable.

To allow to investigate that question, the Scheme and Raku emitters for Haku support (limited) transliteration to Romaji. I have the intention (but maybe not the time) to create a Romaji version of Haku as well as a version that does not use any Japanese but keeps the word order.

Syntax and parsing

I also wanted the language to be closer, at lease visually, to literary Japanese. Therefore Haku does not use Roman letters, Arabic digits or common arithmetic, logical and comparison operators. It also supports top-to-bottom, right-to-left writing.

Literary Japanese does not use spaces. So another question of interest to was how to tokenise a string of Japanese.

There are three writing systems: katakana (angular), hiragana (squigly) and kanji (complicated).
katakana is used in a similar way as italics, and also for loanwords and names of plants and animals.
Nouns, verb, adjectives and adverbs normally start with a kanji
hiragana is used for verb/adjective/adverb endings and “particles”, small words or suffixes that help identify the words in a sentence.
A verb/adjective/adverb can’t end with a hiragana character that represents a particle.

So we have some simple tokenisation rules:

a sequence of katakana
a kanji followed by more kanji or hiragana that do not represent particles
hiragana that represent particles

This is in fact a formalisation of the rules a human uses when reading Japanese.

Where that fails, we can introduce parentheses. A human reader uses context, and a considerable amount of look-ahead parsing and backtracking, but that would make the parser very complex and slow.

In practice, only specific adverbs and adjectives are used in Haku. For example:

ラムダ|は|或|エクス|で|エクス|掛ける|エクス|です

ラムダ: katakana word
は: particle
或: pre-noun adjective
エクス: katakana word
で: particle
エクス: katakana word
掛ける: verb
エクス: katakana word　
です: verb (copula)

Number system

For large numbers, Japanese uses a number system based on multiples of ten thousand (called myriads) rather than a thousand. A peculiar feature of this system is that there are kanji for all powers of 10,000 up to 10⁴⁸. For more background on this, please read my article on this topic.

The consequence is that a number such as

1,234,567,890

is composed as

  (10 + 2) * 100,000,000 
+ (3 * 1000 + 4 * 100 + 5 * 10 + 6) * 10,000
+  7 * 1000 + 8 * 100 + 9 * 10

which can is written in kanji as

十二億三千四百五十六万七千八百九十

There are also kanji for numbers smaller than one. They go down to 10^-12 in powers of 10 and rational numbers are indicated with the kanji 点 (ten, “dot”). So

3.14159

can be written as

 三点一分四厘一毛五糸九忽

Apart from this format, the decimal format is also used, and is indeed more common for rational numbers and also for years (and dates in general), e.g. 2021 is written 二〇二一 instead of 二千二十一. Haku supports all these formats.

Poetry

The expressiveness of Haku as a programming language is on purpose rather spartan. It is after all a “toy language”, an experimental rather than general-purpose language.

I am more interested in the natural-language expressiveness of Haku, and for that my criterion is: Can the programmer write poetry in it? Several of Haku’s features such as adjectives and verb conjugation (okurigana) are there entirely to make Haku programs sufficiently expressive on the natural-language level to support this idea. For that reason, my favourite Haku program is one that demonstrates this ability:

忘れるとは件で空のことです。

遠いとは物で物を見せるのことです。

本とは
記憶は「忘れられないあの冬の the new fallen snow」、
忘れかけてた遠い記憶
の事です

When run, this program prints out the string 「忘れられないあの冬の the new fallen snow」. The line that causes this string to be printed is

忘れかけてた遠い記憶

Wasurekaketeta tooi kioku

This is on the one hand an example of some of the Japanese grammar features that Haku supports:

adjectives as functions: tooi is a so-called “i-adjective”;
adjectival verbs: wasurekaketeta is a verb used as an adjective;
complex verb conjucations: the plain form, used to define the function, is wasureru. The form ~kakeru means “starting to” and the final ending ~ta is a plain past.

But on the other hand, it is also poetry.

Why haku?

I decided to call my language haku because I like the sound of it, and also because that word can be written in many ways and mean many things in Japanese (in my dictionary there are 89 kanji that have haku as one of their possible pronunciations). I was definitely thinking about the character Haku from the Studio Ghibli movie “Spirited Away”. Also, I like the resemblance with Raku, the implementation language.

If I had to pick a kanji, I would write it 珀 (amber) or 魄 (soul, spirit).

Haku: a Japanese programming language

2021-09-20T00:00:00+01:00

Haku is a natural language functional programming language based on literary Japanese. This article is about the implementation of Haku in Raku. You don’t need to know Japanese or have read the Haku documentation. I you are not familiar with Raku, you might want to read my quick introduction.

I do assume familiarity with the concepts of parsing, syntax tree and code generation. I you find you lack background for what follows, I recommend Andrew Shitov’s series of posts Creating a Compiler with Raku which takes a step-by-step approach.

Haku

Haku aims to be close to written Japanese, so it is written in a combination of the three Japanese writing systems kanji (Chinese characters), hiragana and katakana, and Japanese punctuation. There are no spaces, and Haku does not use Arabic (or even Roman) digits nor any operators. The design of the language is explained in more detail in the documentation.

Here is an example of a small Haku program (for more examples see the repo):

本とは
「魄から楽まで」を見せる
の事です。

This translates as

“main is: to show ‘From Haku to Raku’”

And the Raku version would be

say 'From Haku to Raku';

The strings “本とは” and “の事です。” indicate the start and end of the main program. “「魄から楽まで」” is a string constant. “見せる” is the print function. The ‘を’ indicates that anything before it is an argument of the function. The newlines in the example code are optional and purely there for readability. A Haku program is a single string without whitespace or newlines.

The actual generated Raku code for this example is

use v6;
use HakuPrelude;

sub main() {
    show('魄から楽まで')
}

main();

To be even closer to literary Japanese, Haku programs can be written vertically from right to left:

忘れるとは
物で空
のことです。

遠いとは
条で条を見せる
のことです。

本とは
記憶は無、
忘れかけてた遠い記憶
の事です。

The generated Raku code for this Haku program is again quite simple:

use v6;
use HakuPrelude;

sub wasureru( \mono) {[]}

sub tooi( \jou) {show(jou)}

sub hon() {
    my \kioku = Nil;
    wasureru(tooi(kioku))
}

hon();

Haku is implemented in Raku. The Haku compiler is a source-to-source compiler (sometimes called transpiler) which generates Raku source from the Haku source and executes it. Raku makes writing such a compiler easy in many ways:

Parsing using Grammars

I decided to implement Haku in Raku mostly because I wanted to use Raku’s Grammars feature, and it did not disappoint. A grammar is like a class, but instead of methods it has rules or tokens, which are the building blocks of the parser. Any token can be used in the definition of another token by enclosing it in <...>, for example:

token adjective {
    <i-adjective> | <na-adjective>
}

The tokens i-adjective and na-adjective have been defined separately and adjective matches one or the other.

I have always liked parser combinators (like Parsec in Haskell) and from a certain angle, Raku’s Grammar’s are quite similar. They are both scannerless, i.e. there is no separate tokenisation step, and highly composable. Many of the features offered by Parsec (e.g. many, oneOf, sepBy) are available courtesy of Raku’s regexes.

There are several features of Raku’s Grammars that helped to make the parser for Haku easy to implement.

Excellent Unicode support

I think Raku’s Unicode support is really excellent. For example, thanks to the support for Unicode blocks, I can simply write

token kanji {  
    <:Block('CJK Unified Ideographs')>
}

rather than having to enumerate them all (there are 92,865 kanji in that block!). In fact, the <:...> syntax works for any Unicode property, not just for Blocks.

Even better: I have some kanji that are reserved as keywords:

token reserved-kanji { '本' | '事' | ... }

To make sure these are excluded from the valid kanji for Haku, I can simply use a set difference:

token kanji {  
    <:Block('CJK Unified Ideographs') - reserved-kanji >
}

(One detail that bit me is that the equivalent syntax for a user-defined character class requires an explicit ‘+’ : token set-difference { < +set1 -set2> } )

Tokens and rules

Luckily, Raku does not assume by default that you want to parse something where whitespace can be ignored, or that you want to tokenise on whitespace. If you want to ignore whitespace, you can use a rule. But in Haku, extraneous whitespace is not allowed (except for newlines at certain locations). So I use token everywhere. (There is also regex, which backtracks. In Haku’s grammar I have not needed it.)

Very powerful regexes

As a lambdacamel, I’ve always been fond of Perl’s regexes, the now ubiquitous PCREs. Yet, Raku’s regexes go way beyond that in power, expressiveness and readability.

For one thing, they are composable: you can defined a named regex with the regex type and use it in subsequent regexes with the <...> syntax. Also, the care with which they have been designed makes them very easy to use. For example, a negative look-ahead assertion is simply <no> <!before <koto> >; and the availability of both a try-in-order alternation (||) and longest-token match alternation (|) is a huge boon. Another thing I like very much is the ability to make a character class non-capturing:

    token lambda-expression { 
        <.aru> <variable-list> <.de> <expression> 
    }

Only <variable-list> and <expression> will be captured, so a lot of the concrete syntax can be removed at parse time.

Grammar composition via roles

Roles (‘mixins’ in Ruby, ‘traits’ in Rust) define interfaces and/or implementation of those interfaces.
I found this a better fit for my purpose than the also-supported class inheritance. For example:

role Nouns does Characters {
    token sa { 'さ' }
    token ki { 'き' }
    # 一線 is OK,  一 is not OK, 線 is OK
    token noun { 
        <number-kanji>? <non-number-kanji> <kanji>* 
        [<sa>|<ki>]?
    }
}

role Identifiers 
does Verbs 
does Nouns 
does Adjectives 
does Variables 
{
    token nominaliser {
        | <no> <!before <koto> > 
        | <koto> <!before <desu> > 
    }
    # Identifiers are variables,
    # noun-style, verb-style
    # and adjective-style function names
    token identifier { 
        | <variable> 
        | <verb> <nominaliser>? 
        | <noun> <.sura>? 
        | <adjective> }
}

(Although I would like a list syntax for this, something like role Identifiers does Verbs, Nouns, Adjectives, Variables {...}.)

There is a lot more to grammars and regexes. The nice Raku folks on Twitter recommended me the book “Parsing with Perl 6 Regexes and Grammars” by Moritz Lenz and it was very useful in particular for debugging of the grammar and handling of error messages.

Abstract syntax tree using roles

I like to implement the abstract syntax tree (AST) as an algebraic data type, the way it is usually done in Haskell. In Raku, one way to do this is to use parametrised Roles as I explained in an earlier post. Most of the AST maps directly to the toplevel parser for each role in my grammar, for example the lambda expression:

role LambdaExpr[ @lambda-args, $expr] does HakuExpr {
    has Variable @.args = @lambda-args;
    has HakuExpr $.expr = $expr;
}

From parse tree to abstract syntax tree

Raku’s grammars provide a very convenient mechanism for turning the parse tree into an AST, called Actions. Essentially, you create a class with a method with the same name as the token or rule in the Grammar. Each method gets the Match object ($/) created by the token as a positional argument.

For example, to populate the AST node for a lambda expression from the parse tree:

method lambda-expression($/) {
        my @args = $<variable-list>.made;
        my $expr = $<expression>.made;
        make LambdaExpr[@args,$expr].new;
}

The capturing tokens used in the lambda-expression token are accessible via the notation $<...> which is shorthand for $/<...>, i.e. they are named attributes of the current match object.

In the Haku grammar, there are several tokens where the match is one from a list of alternatives, for example the expression token, which enumerates anything that is an expression in Haku. For such tokens I use the following code to “inherit” from the constituent tokens:

method expression($/) { 
        make $/.values[0].made;
}

Because every match is a map with as keys the names of the capturing tokens, and because we know that in this case there will be only one token selected, we know the first element in the corresponding values list will be the match for that particular token.

Code generation

The haku.raku main program essentially does this:

my $hon_parse = 
    Haku.parse($program_str, :actions(HakuActions));
my $hon_raku_code =  
    ppHakuProgram($hon_parse.made);

The Haku program string is parsed using the Haku grammar and the methods defined in the corresponding HakuActions class are used to populate the AST. The toplevel parse tree node must be $<haku-program>, and the made method of this node returns the AST node HakuProgram. The routine ppHakuProgram is the toplevel routine in the module Raku, which is the Raku emitter for Haku. (There is also a Scheme emitter, in the module Scheme.)

So ppHakuProgram($hon_parse.made) pretty-prints the HakuProgram AST node and thus the entire Haku program as Raku code.

What I like about the role-based AST is that you can pattern match against the variants of a type using given/when:

sub ppHakuExpr(\h) {            
    given h {
        when BindExpr { ... }
        when FunctionApplyExpr { ... }
        when ListExpr { ... }
        when MapExpr { ... }        
        when  IfExpr { ... }   
        when LetExpr { ... }
        when LambdaExpr { ... }        
        ...
        default {
            die "TODO:" ~ h.raku;
        }        
    }
}

The Raku code corresponding to the Haku AST is quite straightforward, but there are a few things worth noting:

Because Haku’s variables are immutable, I use the \ notation which means I don’t have to build a variable table with the sigils.
Because Haku is functional, let and if are expressions, so in Raku I wrap them in a do {} block.
For partial application I use .assuming().
In Haku, strings are lists. In Raku they aren’t. I created a small Prelude of functions, and the list manipulation functions in that Prelude use pattern matching on the type with given/when to see if the argument is a string or a list.

Running the generated Raku code

Running the generated Raku code is simple: I write the generated Raku code to a module and require it. The generated code ends with a call to hon(), the main function in a Haku program, so this automatically executes the program.

# Write the parsed program to a module 
'Hon.rakumod'.IO.spurt($hon_raku_code);

# Require the module. This will execute the program
require Hon;

Other things Haku makes really easy is to create command-line flags and document their usage:

sub USAGE() {
    print Q:to/EOH/;
    Usage: haku <Haku program, written horizontally or vertically, utf-8 text file>
        [--tategaki, -t] : do not run the program but print it vertically.
        [--miseru, -m] : just print the Raku source code, don't execute.
        ...
    EOH
}

unit sub MAIN(
          Str $src_file,
          Bool :t($tategaki) = False,   
          Bool :m($miseru) = False,
          ...
        );

USAGE is called when MAIN is called with the wrong (or no) arguments. Arguments of MAIN prefixed with : are flags. unit sub means that anything after this declaration is part of the MAIN program, so no need for {...}.

To conclude

This article shows the lazy programmer’s way to creating your own programming language: let Raku do all the hard work.

Or to express it with a Haku program:

本真とは
コンパイラを書いて、
プログラムを書いて、
プログラムを走らす
と言う事です。

the truth:
write the compiler,
write the program,
run the program.

Frugal computing

2021-06-29T00:00:00+01:00

On the need for low-carbon and sustainable computing and the path towards zero-carbon computing.

Key points

The problem:

The current emissions from computing are about 2% of the world total but are projected to rise steeply over the next two decades. By 2040 emissions from computing alone will be more than of half the emissions level acceptable to keep global warming below 1.5°C. This growth in computing emissions is unsustainable: it would make it virtually impossible to meet the emissions warming limit.
The emissions from production of computing devices far exceed the emissions from operating them, so even if devices are more energy efficient producing more of them will make the emissions problem worse. Therefore we must extend the useful life of our computing devices.

The solution:

As a society we need to start treating computational resources as finite and precious, to be utilised only when necessary, and as effectively as possible. We need frugal computing: achieving the same results for less energy.

The vision:

Imagine we can extend the useful life of our devices and even increase their capabilities without any increase in energy consumption, purely by improving the software.
Meanwhile, we will develop the technologies for the next generation of devices, designed for energy efficiency as well as long life.
Every subsequent cycle will last longer, until finally the world will have computing resources that last forever and hardly use any energy.

Defining computational resources

Computational resources are all resources of energy and material that are involved in any given task that requires computing. For example, when you perform a web search on your phone or participate in a video conference on your laptop, the computational resources involved are those for production and running of your phone or laptop, the mobile network or WiFi you are connected to, the fixed network it connects to, the data centres that perform the search or video delivery operations. If you are a scientist running a simulator in a supercomputer, then the computational resources involved are your desktop computer, the network and the supercomputer. For an industrial process control system, it is the production and operation of the Programmable Logic Controllers.

Computational resources are finite

Since the start of general purpose computing in the 1970s, our society has been using increasing amounts of computational resources.

For a long time the growth in computational capability as a function of device power consumption has literally been exponential, a trend expressed by Moore’s law.

With this growth in computational capability, increasing use of computational resources has become pervasive in today’s society. Until recently, the total energy budget and carbon footprint resulting from the use of computational resources has been small compared to the world total. As a result, computational resources have until recently effectively been treated as unlimited.

Because of this, the economics of hardware and software development have been built on the assumption that with every generation, performance would double for free. Now, this unlimited growth is no longer sustainable because of a combination of technological limitations and the climate emergency. Therefore, we need to do more with less.

Moore’s law has effectively come to an end as integrated circuits can’t be scaled down any more. As a result, the improvement in performance per Watt is slowing down continuously. On the other hand, the demand for computational resources is set to increase considerably.

The consequence is that at least for the next decades, growth in demand for computational resources will not be offset by increased power efficiency. Therefore with business as usual, the total energy budget and carbon footprint resulting from the use of computational resources will grow dramatically to become a major contributor to the world total.

Furthermore, the resources required to create the compute devices and infrastructure are also finite, and the total energy budget and carbon footprint of production of compute devices is huge. Moore’s Law has conditioned us to doubling of performance ever two years, which has led to very short effective lifetimes of compute hardware. This rate of obsolescence of compute devices and software is entirely unsustainable.

Therefore, as a society we need to start treating computational resources as finite and precious, to be utilised only when necessary, and as frugally as possible. And as computing scientists, we need to ensure that computing has the lowest possible energy consumption. And we should achieve this with the currently available technologies because the lifetimes of compute devices needs to be extended dramatically.

I would like to call this “frugal computing”: achieving the same results for less energy by being more frugal with our computing resources.

The scale of the problem

Meeting the climate targets

To limit global warming to 1.5°C, within the next decade a global reduction from 55 gigatonnes CO₂ equivalent (GtCO₂e) by 32 GtCO₂e to 23 GtCO₂e per year is needed [5]. So by 2030 that would mean a necessary reduction in overall CO₂ emissions of more than 50%. By 2040, a further reduction to 13 GtCO₂e per year is necessary. According to the International Energy Agency [10], emissions from electricity are currently estimated at about 10 GtCO₂e. The global proportion of electricity from renewables is projected to rise from the current figure of 22% to slightly more than 30% by 2040 [15]. A more optimistic scenario by the International Energy Agency [17] projects 70% of electricity from renewables, but even in that scenario, generation from fossil fuels reduces only slightly, so there is only a slight reduction in emissions as a result.

In other words, we cannot count on renewables to eliminate CO₂ emissions from electricity in time to meet the climate targets. Reducing the energy consumption is the only option.

Emissions from consumption of computational resources

The consequence of the end of Moore’s law was expressed most dramatically in a 2015 report by the Semiconductor Industry Association (SIA) “Rebooting the IT Revolution: a call to action” [1], which calculated that, based on projected growth rates and on the 2015 ITRS roadmap for CMOS chip engineering technologies [16],

computing will not be sustainable by 2040, when the energy required for computing will exceed the estimated world’s energy production.

It must be noted that this is purely the energy of the computing device, as explained in the report. The energy required by e.g. the data centre infrastructure and the network is not included.

The SIA has reiterated this in their 2020 “Decadal Plan for Semiconductors” [2], although they have revised the projection based on a “market dynamics argument”:

If the exponential growth in compute energy is left unchecked, market dynamics will limit the growth of the computational capacity which would cause a flattening out the energy curve.

This is merely an acknowledgement of the reality that the world’s energy production is not set to rise dramatically, and therefore increased demand will result in higher prices which will damp the demand. So computation is not actually going to exceed the world’s energy production.

Ever-rising energy demand for computing vs. global energy production is creating new risk, and new computing paradigms offer opportunities to dramatically improve energy efficiency.

In the countries where most of the computational resources are consumed (US and EU), electricity production accounts currently for 25% of the total emissions [4]. According to the SIA’s estimates, computation accounts currently for a little less than 10% of the total electricity production but is set to rise to about 30% by 2040. This would mean that, with business as usual, computational resources would be responsible for at least 10% of all global CO₂ emissions by 2040.

The independent study “Assessing ICT global emissions footprint: Trends to 2040 & recommendations” [3] corroborates the SIA figures: they estimate the computing greenhouse gas emissions for 2020 between 3.0% and 3.5% of the total, which is a bit higher than the SIA estimate of 2.5% because it does take into account networks and datacentres. Their projection for 2040 is 14% rather than 10%, which means a growth of 4x rather than 3x.

To put it in absolute values, based on the above estimate, by 2040 energy consumption of compute devices would be responsible for 5 GtCO₂e, whereas the target for world total emissions from all sources is 13 GtCO₂e.

Emissions from production of computational resources

To make matters worse, the carbon emissions resulting from the production of computing devices exceeds those incurred during operation. This is a crucial point, because it means that we can’t rely on next-generation hardware technologies to save energy: the production of this next generation of devices will create more emissions than any operational gains can offset. It does not mean research into more efficient technologies should stop. But their deployment cycles should be much slower. Extending the useful life of compute technologies by improving the way we design and use software must become our priority.

The report about the cost of planned obsolescence by the European Environmental Bureau [7] makes the scale of the problem very clear. For laptops and similar computers, manufacturing, distribution and disposal account for 52% of their Global Warming Potential (i.e. the amount of CO₂-equivalent emissions caused). For mobile phones, this is 72%. The report calculates that the lifetime of these devices should be at least 25 years to limit their Global Warming Potential. Currently, for laptops it is about 5 years and for mobile phones 3 years. According to [8], the typical lifetime for servers in data centres is also 3-5 years, which again falls short of these minimal requirements. According to this paper, the impact of manufacturing of the servers is 20% of the total, which would require an extension of the useful life to 11-18 years.

The total emissions cost from computing

Taking into account the carbon cost of both operation and production, computing would be responsible for 10 GtCO₂e by 2040, almost 80% of the acceptable CO₂ emissions budget [2,3,14].

Actual and projected emissions from computing (production+operation), and 2040 emission target to limit warming to <1.5°C

A breakdown per device type

To decide on the required actions to reduce emissions, it is important to look at the numbers of different types of devices and their energy usage. If we consider mobile phones as one category, laptops and desktops as another and servers as a third category, the questions are: how many devices are there in each category, and what is their energy consumption. The absolute numbers of devices in use are quite difficult to estimate, but the yearly sales figures [10] and estimates for the energy consumption for each category [11,12,13,14] are readily available from various sources. The tables below show the 2020 sales and yearly energy consumption estimates for each category of devices. A detailed analysis is presented in [14].

Number of devices sold worldwide in 2020
Device type	2020 sales
Phones	3000M
Servers	13M
Tablets	160M
Displays	40M
Laptops	280M
Desktops	80M
TVs	220M
IoT devices	2000M

The energy consumption of all communication and computation technology currently in use in the world is currently around 3,000 TWh/y, about 11% of the world’s electricity consumption, projected to rise by 3-4 times by 2040 with business as usual according to [2]. This is a conservative estimate: the study in [14] includes a worst-case projection of a rise to 30,000 TWh (exceeding the current world electricity consumption) by 2030.

Yearly energy consumption estimates in TWh
Device type	TWh
TVs	560
Other Consumer devices	240
Fixed access network (wired+WiFi)	900 + 500
Mobile network	100
Data centres	700
Total	3000

The above data make it clear which actions are necessary: the main carbon cost of phones, tablets and IoT devices is their production and the use of the mobile network, so we must extend their useful life very considerably and reduce network utilisation. Extending the life time is also the key action for datacentres and desktop computers, but their energy consumption also needs to be reduced considerably, as does the energy consumption of the wired, WiFi and mobile networks.

From the technical side, these are primarily software issues: the hardware exists because of the software

A vision for low carbon and sustainable computing

It is clear that urgent action is needed: in less than two decades, the global use of computational resources needs to be transformed radically. Otherwise, the world will fail to meet its climate targets, even with significant reductions in other emission areas. The carbon cost of both production and operation of the devices must be considerably reduced.

To use devices for longer, a change in business models as well as consumer attitudes is needed. This requires raising awareness and education but also providing incentives for behavioural change. And to support devices for a long time, an infrastructure for repair and maintenance is needed, with long-term availability of parts, open repair manuals and training. To make all this happen, economic incentives and policies will be needed (e.g. taxation, regulation). Therefore we need to convince key decision makers in society, politics and business.

Imagine that we can extend the useful life of our devices and even increase their capabilities, purely by improving the software. With every improvement, the computational capacity will in effect increase without any increase in energy consumption. Meanwhile, we will develop the technologies for the next generation of devices, designed for energy efficiency as well as long life. Every subsequent cycle will last longer, until finally the world will have computing resources that last forever and hardly use any energy.

Towards zero carbon computing: increasing performance and lifetime and reducing emissions. Illustration with following assumptions: every new generation lasts twice as long as the previous one and cost half as much energy to produce; energy efficiency improves linearly with 5% per year.

This is a very challenging vision, spanning all aspects of computing science. To name just a few challenges:

We must design software so that it supports devices with extended lifetimes.
We need software engineering strategies to handle the extended software life cycles, and in particular deal with technical debt.
Longer life means more opportunities to exploit vulnerabilities, so we need better cyber security.
We need to develop new approaches to reduce overall energy consumption across the entire system.

To address these challenges, action is needed on many fronts. What will you do to make frugal computing a reality?

Edits

2023-03-06: edits to make it more clear that frugal computing is primarily a software issue.

References

[1] “Rebooting the IT revolution: a call to action”, Semiconductor Industry Association/Semiconductor Research Corporation, Sept 2015
[2] “Full Report for the Decadal Plan for Semiconductors”, Semiconductor Industry Association/Semiconductor Research Corporation, Jan 2021
[3] “Assessing ICT global emissions footprint: Trends to 2040 & recommendations”, Lotﬁ Belkhir, Ahmed Elmeligi, Journal of Cleaner Production 177 (2018) 448–463
[4] “Sources of Greenhouse Gas Emissions”, United States Environmental Protection Agency, Last updated on April 14, 2021
[5] “Emissions Gap Report 2020”, UN Environment Programme, December 2020
[6] “The link between product service lifetime and GHG emissions: A comparative study for different consumer products”, Simon Glöser-Chahoud, Matthias Pfaff, Frank Schultmann, Journal of Industrial Ecology, 25 (2), pp 465-478, March 2021
[7] “Cool products don’t cost the Earth – Report”, European Environmental Bureau, September 2019
[8] “The life cycle assessment of a UK data centre”, Beth Whitehead, Deborah Andrews, Amip Shah, Graeme Maidment, Building and Environment 93 (2015) 395–405, January 2015
[9] Statista, retrieved June 2021
[10] “Global Energy & CO₂ Status Report”, International Energy Agency, March 2019
[11] “Redefining scope: the true environmental impact of smartphones?”, James Suckling, Jacquetta Lee, The International Journal of Life Cycle Assessment volume 20, pages 1181–1196 (2015)
[12] “Server Rack Power Consumption Calculator”, Rack Solutions, Inc., July 2019
[13] “Analysis of energy consumption and potential energy savings of an institutional building in Malaysia”, Siti Birkha Mohd Ali, M.Hasanuzzaman, N.A.Rahim, M.A.A.Mamun, U.H.Obaidellah, Alexandria Engineering Journal, Volume 60, Issue 1, February 2021, Pages 805-820
[14] “On Global Electricity Usage of Communication Technology: Trends to 2030”, Anders S. G. Andrae, Tomas Edler, Challenges 2015, 6(1), 117-157
[15] “BP Energy Outlook: 2020 Edition”,BP plc
[16] “2015 International Technology Roadmap for Semiconductors (ITRS)”, Semiconductor Industry Association, June 2015
[17] “Net Zero by 2050 — A Roadmap for the Global Energy Sector”, International Energy Agency, October 2021

Wim Vanderbauwhede

Resisting software driven hardware obsolescence

Hardware obsolescence

Extending phone lifetimes to reduce emissions

Ideological detour

Upgrading the phone software

Android, AOSP, LineageOS, /e/

Terminology

Tools

Finding the right version for your phone

The upgrade process

Case Study: LineageOS on a Samsung Galaxy A3

What is to be gained?

Why do this at all?

The actual experience

Required skills and knowledge

The Anti-Dystopians’ Guide to Generative AI

Demystifying AI

Webinar description

Webinar slides

References

Part I: Generative AI vs Machine Learning

Part II: GenAI-based tools

Part III: Ethical issues

Part IV: Societal impact

Books

Other Resources

AI-free search

Interview on generative AI

Cheaper AI does not mean greener AI

The urgent need to reduce emissions

Cheaper prompts, greener prompts?

Electricity pricing for data centres is very low

Energy consumption of a GPT-4 style large language model

What makes up the cost of a query?

The capex cost of the servers

The capex cost of the data centre

The running cost of the data centre

Overall costs

Pricing versus energy cost

Gemini 1.5 Pro pricing

GPT-4 pricing

The price is much higher than the energy cost

A note on the emissions

Conclusion

References

The real problem with the AI hype

Breakdown of data centre emissions

A rough estimate of the growth in emissions

Emissions from data centre use

Embodied carbon emissions

Growth in overall data centre emissions

Issues with this estimate

A better estimate of the growth in emissions

What about a hundred times growth?

The hype on its own is the real problem

References

Emissions from ChatGPT are much higher than from conventional search (updated)

Update 2025-01-06

Google search energy and emissions

ChatGPT energy consumption per query

Other factors contributing to emissions

Training

Data centre efficiency

Embodied carbon

Conclusion

References

Google Search energy consumption estimates

ChatGPT energy consumption estimates

FORTRAN, an “infantile disorder”?

Edsger Dijkstra

What Dijkstra said and didn’t

The “Infantile Disorder”

Decarbonising the Computing Science curriculum

The ambitious goal

The current state of affairs

The approach

The implementation

Lessons learned

What’s next

The mighty `∘` operator, part I: definition

The mighty `∘` operator, part II: implementation

The `Block` type

The `loop` built-in

The `done` built-in