Openwashing

Jul 26, 2025

Many of you who have been following my work since the beginning will know that I have a strong affiliation with open source and have been teaching creative coding for over fourteen years. I remain committed to this work and often share my teachings through small open-source projects, mostly archived on BitBucket. Documenting and maintaining this work is not always easy, but it is important to me. It is a commitment to sharing with others, and perhaps more importantly, it is an activity guided by values that are inherently unmotivated by selfish gain.

When we create tools in this field and give permission for anyone to access, study, modify, and even share them with others, we are essentially making an important decision. This decision empowers others with shared knowledge and enriches everyone culturally and intellectually.

Recently, I was slightly worried to learn of a new term that has appeared on the tech radar: openwashing. Openwashing is a practice whereby companies claim to be open source while keeping critical parts of their services proprietary. Unfortunately, this appears to be happening at scale and particularly within the current AI tech race.

I have a vivid memory of visiting OpenAI’s website, long before it began rolling out today’s range of AI tech devices. It was a simple website with a compelling mission statement. The research project appeared to tick all the boxes of a legitimate and inspiring Open Source project, clearly positioning itself as a non-profit organisation dedicated to envisioning safe and ethical development and use of AI technologies. This was an ambitious project committed to working openly on AI technology and advancements to clearly benefit humanity. So it came as no surprise that OpenAI truly caught my attention and no doubt due mostly to its open source ethos promoting transparency and collaboration.

“Discovering and enacting the path to safe artificial general intelligence.”

Many things have changed since then as you may well know. OpenAI became a business fuelled by huge capital investment, and while they did attempt to uphold their good intentions, there have been concerns about how the organisation has changed and how it goes about its business behind closed doors. My question to all this is simple: why is it still called OpenAI?

Many AI technologies today are pitched with a particular strain of ‘Openness’ and present themselves as open-source projects. Until recently, I hadn’t really thought much about this. In fact, I even believed more in these technologies because of their open-source ethos. However, there have been concerns about the varying degrees of access to these tools, raising questions about how transparent and open these organisations truly are. You only need to look to OpenAI again to realise that the only means of fully accessing their APIs is through paid contracts and online interfaces.

For a piece of software to be open source, anyone should be able to use it for any purpose without permission, study its workings and inspect its components, modify it for any purpose, including changing its outputs, and share it with others, with or without modifications and for any purpose. In 2024, the Open Source Initiative launched a new definition, OSAID 1.0, specifically for AI projects and I guess in part also to deter openwashing. While this proposal is a welcome addition to the regulatory issues surrounding current AI developments, it becomes evident that most AI companies do not comply with this new definition. To add insult to injury, reputable sources challenge this definition, claiming it is too loose and permissive.

Indeed, there’s a complicated issue with this new definition that seems to overlook a fundamental point: the omission of the dataset used to train a model. While the definition stipulates that the training dataset must be listed as a source, it doesn’t necessarily require distribution to fulfil the definition’s requirements. This is a significant omission for an open-source project, as a model without a dataset is essentially useless. Moreover, using one’s own dataset would result in unusual results due to the weights already baked into the model.

This particular issue of omitting the training dataset is problematic and only highlights how many companies are also obscuring access to these essential resources. I recently shared some links about the infamous Pile dataset a few months ago, only to find that it’s no longer hosted. While my tweeting wasn’t the cause, I do recall that some of the data’s contents were eye-opening. On another search for datasets recently, I discovered that none of the major AI services provide access to prompt data. This further brings to the forefront the contentious issue of copyright which is a hot topic among AI sceptics, and rightly so. If these companies claim transparency and openness, then surely we should also be able to access the data they’ve scraped from us? *(ref Note1).

So, how should I sign off with this one? You might not think this is significant news, or you might not care about third-party organisations like the Open Source Initiative. Personally, I do, and I firmly believe that the future of these technologies will increasingly rely on compliance with regulation. However, as things stand, regulation isn’t particularly strict when it comes to accepting or rejecting a company’s ‘openness’. In my search for a definition of open source amidst the EU AI Act, I couldn’t find anything, and certainly not a mention of the Open Source Initiative, despite their relatively permissive proposal. This lack of clear directive is problematic, as it undermines the entire open-source ethos and paves the way for future generations to prioritise profit over the sharing of wealth. In a world where anything goes, it becomes increasingly challenging to steer the ship.

On a more positive note, I do believe we can make a difference. I think regulation will find solutions, and I firmly believe that governance of information through open-source initiatives will remain an important means for steering the ship towards a fairer future for all.

* Note1

An interesting move by the web giant Cloudflare is worth noting on this topic. On July 1st, 2025, Cloudflare decided to block AI bots by default. This is a significant move for a company whose share of the web touches roughly 20%. By closing down automated data extraction, Cloudflare has single-handedly exacerbated the contentious issue of who owns the value of these systems that are clearly built on human expression.

Source: AI Ethics Brief 169

Please consider supporting my work by purchasing a copy of my first publication of artworks entitled Hyper & Cosmic released with Vetro Editions.

…fin

mark’s Newsletter

Discussion about this post

Ready for more?