MySQL’s OLD_PASSWORD() uses bytes, not characters

This is all ancient history, but sometimes you have to deal with ancient systems.

Way back in MySQL 4.x, MySQL had a PASSWORD() function that was used to set MySQL-managed user credentials. You gave it a string and it returned a hex string. It was never intended to be used by clients to hash passwords for their own use (indeed the 5.7 docs tell you not to) but nothing prevented it.

Later, MySQL changed the hashing algorithm and had a way to toggle if the PASSWORD() function used the old way or the new way via old_passwords. Somewhere along the way they also made an OLD_PASSWORD() function that only used the old algorithm.

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.5.62-0ubuntu0.14.04.1 (Ubuntu)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select PASSWORD('123456');
+-------------------------------------------+
| PASSWORD('123456')                        |
+-------------------------------------------+
| *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+-------------------------------------------+
1 row in set (0.00 sec)

mysql> select OLD_PASSWORD('123456');
+------------------------+
| OLD_PASSWORD('123456') |
+------------------------+
| 565491d704013245       |
+------------------------+
1 row in set (0.00 sec)

Then MySQL 8.0 came along and all of those functions were removed.

But users had done Bad Things and used those functions to generate hashes and stored those in their own databases and needed ways to replicate the function. And the internet provided solutions in various languages, including python, PHP, and a replacement SQL function.

The devil is in the details though, because based on the language, it matters if the password is 7-bit ASCII or Unicode.

The PASSWORD() and OLD_PASSWORD() functions both treat their input as a string of bytes, not a string of Unicode characters. If the input is 7-bit ASCII those are the same and it doesn’t matter. If it’s Unicode, however, multibyte characters are hashed as individual bytes rather than as characters.

With a little python3 code and a handy mysql-5.7 Dockerfile we can demonstrate this:

# old_password.py
import sys


def mysql_old_password_chars(password):
    """Treat the password as a string of Unicode characters -- WRONG"""
    password = password.replace(" ", "").replace("\t", "")

    # build the old password in nr and nr2
    nr = 1345345333
    add = 7
    nr2 = 0x12345671

    for c in (ord(x) for x in password):
        nr ^= (((nr & 63)+add)*c) + (nr << 8) & 0xFFFFFFFF
        nr2 = (nr2 + ((nr2 << 8) ^ nr)) & 0xFFFFFFFF
        add = (add + c) & 0xFFFFFFFF

    return "%08x%08x" % (nr & 0x7FFFFFFF, nr2 & 0x7FFFFFFF)


def mysql_old_password_bytes(password):
    """Treat the password as a string of bytes -- CORRECT"""
    password = password.replace(" ", "").replace("\t", "")

    # build the old password in nr and nr2
    nr = 1345345333
    add = 7
    nr2 = 0x12345671

    for c in password.encode('utf8'):
        nr ^= (((nr & 63)+add)*c) + (nr << 8) & 0xFFFFFFFF
        nr2 = (nr2 + ((nr2 << 8) ^ nr)) & 0xFFFFFFFF
        add = (add + c) & 0xFFFFFFFF

    return "%08x%08x" % (nr & 0x7FFFFFFF, nr2 & 0x7FFFFFFF)


if __name__ == '__main__':
    password = sys.argv[1]
    print("chars: " + mysql_old_password_chars(password))
    print("bytes: " + mysql_old_password_bytes(password))
# Dockerfile
FROM ubuntu:14.04

RUN apt-get update && apt-get install -y mysql-server

# start mysqld in the foreground
CMD mysqld

Let’s get our MySQL 5.7 server going:

docker build --tag mysql-5.7 .
docker run --rm -d --name=mysql57 mysql-5.7

Now run some tests. Start with some simple 7-bit ASCII strings:

$ docker exec mysql57 mysql --default-character-set=utf8 --skip-column-names -e 'select old_password("123456");'
565491d704013245
$ python3 old_password.py '123456'
chars: 565491d704013245
bytes: 565491d704013245

$ docker exec mysql57 mysql --default-character-set=utf8 --skip-column-names -e 'select old_password("Pa$$ W0rD");'
69d9eae853c7ddf5
$ python3 old_password.py 'Pa$$ W0rD'
chars: 69d9eae853c7ddf5
bytes: 69d9eae853c7ddf5

So far so good. Now throw in something above 7-bit ASCII, like a simple ‘ô’

$ docker exec mysql57 mysql --default-character-set=utf8 --skip-column-names -e 'select old_password("Allô");'
4ae9b3f6595c3f70
$ python3 old_password.py 'Allô'
chars: 3ff55a3d63bf3485
bytes: 4ae9b3f6595c3f70

The well-known python solution that uses characters fails here. To be fair, it probably worked in python2 which used bytestrings not Unicode strings.

The “nice” thing about PHP here though is that because it doesn’t understand Unicode natively at all, it just works with a simple port of the python version.

<?php

function mysql_old_password($password)
{
    # build the old password in nr and nr2
    $nr = 1345345333;
    $add = 7;
    $nr2 = 0x12345671;

    $password = str_replace([' ', '\n'], '', $password);

    for ($index = 0; $index < strlen($password); $index++) {
        $c = ord($password[$index]);

        $nr ^= ((($nr & 63) + $add) * $c) + ($nr << 8) & 0xFFFFFFFF;
        $nr2 = ($nr2 + (($nr2 << 8) ^ $nr)) & 0xFFFFFFFF;
        $add = ($add + $c) & 0xFFFFFFFF;
    }
    return sprintf("%08x%08x", $nr & 0x7FFFFFFF, $nr2 & 0x7FFFFFFF);
}

echo mysql_old_password($argv[1]) . "\n";
$ php --version
PHP 8.1.32 (cli) (built: May 21 2025 23:22:09) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.32, Copyright (c) Zend Technologies
$ php old_password.php '123456'
565491d704013245
$ php old_password.php 'Pa$$ W0rD'
69d9eae853c7ddf5
$ php old_password.php 'Allô'
4ae9b3f6595c3f70

The well-known SQL replacement for a user-defined OLD_PASSWORD() function also breaks with non-7bit-ASCII because it uses LENGTH() to calculate the string length with MID() to get the character which is encoding aware. Any attempt to use the function with a multibyte string will fail.

All of this is dealing with a decade-old technology and hopefully you never have to encounter it. But if you come across this blog post you are probably working with an ancient system and I hope this helps you.

Faster clamav scans for archives

clamav got much slower in 0.105 which we discovered at DProofreaders after upgrading from Ubuntu 20.04 to 24.04. It became so slow that the scans for our new content uploads and post-processing artifacts — all zip files — timed out resulting in failed uploads as the AV check is a gate. No amount of futzing with configuration options, RAM disks, --fdpass, and other things would get this faster.

clamav has a “multiscan” mode that will make the clamd service scan multiple files concurrently, which is great for modern systems with multiple processors. Except that mode does not work with files inside archives.

We solved this for our needs by creating a wrapper script that detects if the scanned file is a zip file and if so, extracts a copy and scans it with --multiscan instead. We went one step further and if the zip file contains other top-level zip files, or epubs which are effectively zip files, we extract those as well. With this we’re able to successfully scan large archives within our system timeout.

However, if an upload does still time out, it’s likely to succeed without a timeout the second time the user attempts to upload the file. This is because clamav caches the results of the last 65536 files (see CacheSize) based on the file’s hash and if it has passed before, clamav doesn’t need to scan it again. In this way content scans of extracted archives can make incremental progress on retries.

The following is an example for how this might be done:

#!/usr/bin/bash
# To use multiple threads to scan archives, we need to extract them
# first. We also extract included epubs to get them parallelized too.

if [ "$1" = "--" ]; then
    shift
fi

if [ $# -ne 1 ]; then
    echo "Script only takes a single argument: filename";
    exit 255
fi

FILENAME=$1

if [ ! -f "$FILENAME" ]; then
    echo "Filename '$FILENAME' is not a valid filename";
    exit 255
fi

TEMPDIR=$(mktemp --dry-run)
if ! unzip -q -d "$TEMPDIR" "$FILENAME" >/dev/null 2>&1; then
    # if it didn't extract, just try to scan it
    rm -rf "$TEMPDIR"
    echo "Error extracting file '$FILENAME', will try to just scan it"
    clamdscan --fdpass "$FILENAME"
    exit $?
fi

# now try to extract any top-level zip-compressed files into subfolders
# if extraction of one of them fails, keep the original for scanning
for ext in epub zip; do
    NUM_COMP=$(ls "$TEMPDIR" | grep -c -e "$ext\$")
    if [ "$NUM_COMP" -gt 0 ]; then
        for compfile in "$TEMPDIR"/*"$ext"; do
            COMPDIR="$TEMPDIR/$(basename "$compfile")-extract"
            mkdir "$COMPDIR"
            if ! unzip -q -d "$COMPDIR" "$compfile"; then
                rm -rf "$COMPDIR"
            else
                rm "$compfile"
            fi
        done
    fi
done

clamdscan --fdpass --multiscan "$TEMPDIR"
RESULT=$?

rm -rf "$TEMPDIR"

exit $RESULT

Note that decompressing archives can be fraught with problems like zip bombs so you must ensure that you account for these like extracting to a temporary partition with a fixed size, limiting the extraction runtime, etc.

Android build failures over NFS

I was recently doing Android builds (aosp) over NFS which kept failing with errors like:

FAILED: out/target/product/generic_x86_64/obj/FAKE/treble_sepolicy_tests_31.0_intermediates/treble_sepolicy_tests_31.0
Traceback (most recent call last):
  File "internal/stdlib/runpy.py", line 196, in _run_module_as_main
  File "internal/stdlib/runpy.py", line 86, in _run_code
  File "/mnt/cpeel/aosp-a14r1/out/host/linux-x86/bin/treble_sepolicy_tests/__main__.py", line 12, in 
  File "internal/stdlib/runpy.py", line 196, in _run_module_as_main
  File "internal/stdlib/runpy.py", line 86, in _run_code
  File "treble_sepolicy_tests.py", line 526, in 
  File "internal/stdlib/shutil.py", line 728, in rmtree
  File "internal/stdlib/shutil.py", line 726, in rmtree
OSError: [Errno 39] Directory not empty: '/mnt/cpeel/aosp-a14r1/out/soong/.temp/tmpwjulet5z'

The build worked perfectly locally but failed over NFSv3 and NFSv4.1. I suspect this is related to the NFS sillyrename issue where a program is still using a file that is deleted and NFS renames it to .nfsXXXX so the file is still around when rmtree tries to delete the directory.

With some help of a coworker I got around this with the changes in this patch:

diff -u orig/system/sepolicy/tests/apex_sepolicy_tests.py system/sepolicy/tests/apex_sepolicy_tests.py
--- orig/system/sepolicy/tests/apex_sepolicy_tests.py  2024-03-09 19:41:09.000000000 +0000
+++ system/sepolicy/tests/apex_sepolicy_tests.py        2024-03-11 20:21:21.212288445 +0000
@@ -161,5 +161,5 @@


 if __name__ == '__main__':
-    with tempfile.TemporaryDirectory() as temp_dir:
+    with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as temp_dir:
         do_main(temp_dir)
diff -u orig/system/sepolicy/tests/apex_sepolicy_tests_test.py system/sepolicy/tests/apex_sepolicy_tests_test.py
--- orig/system/sepolicy/tests/apex_sepolicy_tests_test.py     2024-03-09 19:41:09.000000000 +0000
+++ system/sepolicy/tests/apex_sepolicy_tests_test.py   2024-03-11 21:23:42.598506522 +0000
@@ -34,7 +34,7 @@

     @classmethod
     def tearDownClass(cls) -> None:
-        shutil.rmtree(cls.temp_dir)
+        shutil.rmtree(cls.temp_dir, ignore_errors=True)

     # helpers

diff -u orig/system/sepolicy/tests/sepolicy_tests.py system/sepolicy/tests/sepolicy_tests.py
--- orig/system/sepolicy/tests/sepolicy_tests.py       2024-03-09 19:41:09.000000000 +0000
+++ system/sepolicy/tests/sepolicy_tests.py     2024-03-11 21:23:46.177726884 +0000
@@ -222,4 +222,4 @@
             f.write(blob)
         do_main(libpath)
     finally:
-        shutil.rmtree(temp_dir)
+        shutil.rmtree(temp_dir, ignore_errors=True)
diff -u orig/system/sepolicy/tests/treble_sepolicy_tests.py system/sepolicy/tests/treble_sepolicy_tests.py
--- orig/system/sepolicy/tests/treble_sepolicy_tests.py        2024-03-09 19:41:09.000000000 +0000
+++ system/sepolicy/tests/treble_sepolicy_tests.py      2024-03-11 21:14:52.681351832 +0000
@@ -523,4 +523,4 @@
             f.write(blob)
         do_main(libpath)
     finally:
-        shutil.rmtree(temp_dir)
+        shutil.rmtree(temp_dir, ignore_errors=True)

I couldn’t find anything about this online when I was searching, so hopefully this helps the next person.

Tips on migrating to mailman3

Earlier this month I upgraded the mailman 2.1 installation for Distributed Proofreaders to 3.x (ie mailman3). I was asked by Project Gutenberg for any tips for their upcoming migration and realized this might be more useful to others as well. As mailman2 is no longer available after Ubuntu 20.04 I expect there will be more migrations happening in the next 6 months when people upgrade to a newer LTS.

The following is very Ubuntu-centric and focuses on migrating 2.x installations to 3.x but some of this might be more broadly useful as well.

This post assumes you’ve at least read about the mailman3 architecture and looked over the very useful migration documentation already.

Installing mailman3 alongside mailman2

mailman2 and mailman3 can coexist on a system just fine. They have different package names & dependencies, wire into postfix in different ways, and are accessed at different URLs. This is great because it means you can get mailman3 installed, configured, and tested before migrating any lists.

The core of mailman3 is available in Ubuntu 20.04 as the mailman3 package. The mailman3-web package includes the web administration. python3-mailman-hyperkitty is the package needed for archiving. mailman3-full is a meta package that pulls in all of this and other dependencies. Interestingly, those packages don’t exist for Ubuntu 22.04 but do for the planned 24.04 release. Ubuntu 22.04 is based on Debian bookworm which does have the packages and you might be able to use the Debian packages directly but I haven’t tested that.

Versions used in this doc for reference:

  • Ubuntu 20.04
  • Apache2 2.4.41-4ubuntu3.14
  • mailman3 3.2.2-1
  • mailman3-full 3.2.2-1
  • mailman3-web 0+20180916-10
  • python3-django-mailman3 1.3.2-1
  • python3-django-hyperkitty 1.3.2-1
  • python3-mailman-hyperkitty 1.1.0-9

After installing the packages I used the excellent mailman3 docs on how to configure the core and the web side of things. However, after wiring mailman3 into Apache I hit a frustrating bug where the Postorious URLs weren’t working. Turns out there’s a bug in the apache.conf file from the base Debian package and I wrote up a separate blog post on how to fix that.

At this point I was able to fully and administer mailman3. I set up a test list in mailman3, played with the configuration, and sent email to it all successfully without impacting the live mailman2 lists.

Migrating lists

The migration itself went fairly well, again thanks to the good mailman3 migration documentation. I did encounter a couple of issues that took some research to resolve and I want to include those here in case it helps others.

Importing one list resulted in a schema violation exception in python. Unfortunately I don’t have the actual exception but it includes a message about the “info” column not being large enough for the data it’s trying to insert. This is because the length of the “info” setting in mailman3 is smaller than mailman2. To resolve this I just shortened the info field in the mailman2 web UI (retaining the original for later) and the import worked. After the migration completed I let the list owner know alongside a copy of the original text and allowed them to deal with it.

The second error was a python exception with this in the stacktrace:

ModuleNotFoundError: No module named 'Mailman'

Thankfully there was a solution in this mailman3 thread that involves fixing an invalid bounce_info setting. This appears to have been fixed in a later version of mailman3 but Ubuntu 20.04 users may still encounter it.

Importing archives also worked pretty well. We have low-volume lists and only a few of them have archives so we’re probably not a great representation. The importer encountered 6 or so messages that it was unable to import due to what looked like encoding issues. Only the failed messages were skipped and the rest went in.

Avoiding failure deliveries during migration

The best practice as recommended by the mailman3 docs is to stop the MTA (eg postfix) before doing a migration. This is because the process of migrating results in a window where both mailman2 and mailman3 are configured to receive mail for the same address. I don’t know what would happen if mail were delivered during this window, but you probably don’t want to find out.

That said, at pgdp.net web code connects directly to postfix to send email and taking the MTA down would have resulted in errors shown to users on top of non-list emails not getting sent. Our lists are low enough volume that I just minimized the time between step 1 (create list with same email in mailman3) and the last step (rmlist in mailman2) and that seemed to be successful.

Unless you really can’t avoid it, stopping your MTA during the migration and allowing external SMTP servers to retry when it comes back online would be my strong recommendation.

DMARC Mitigations

Our driving focus for migrating to mailman3 was improving mail delivery for our lists. Specifically with Google and others increasing their spam enforcement and encouraging DMARC having our mail server send email to lists that weren’t From our server was problematic. mailman3 supports different DMARC Mitigation strategies and using merge_from unconditionally has seemed to improve our delivery rate.

The version of mailman3 included with the Ubuntu 20.04 package doesn’t support ARC Signing but I hope to add that when we eventually upgrade.

Postorius URLs broken in Ubuntu

Postorious is the web interface to mailman3, the Django-based rewrite for the old, reliable mailman. On Ubuntu this can be installed with the mailman3-web package for Debian and wired up to Apache via mod_proxy_uwsgi using the provided /etc/mailman3/apache.conf file.

Except it doesn’t work.

Apache happily routes traffic from http://<host>/mailman3 to Postorious via proxy_uwsgi, but Postorius returns a 301 directing you to /mailman/postorious/lists/ instead:

$ curl --head http://example.com/mailman3
HTTP/1.1 301 Moved Permanently
Date: Tue, 07 Nov 2023 22:53:12 GMT
Server: Apache
Upgrade: h2
Connection: Upgrade
Content-Type: text/html; charset=utf-8
Location: /mailman/postorius/lists/
X-Frame-Options: SAMEORIGIN
Vary: Accept-Language,Cookie,Accept-Encoding,User-Agent
Content-Language: en

In the /var/log/mailman3/web/mailman-web.log log file there are errors like:

WARNING 2023-11-07 21:57:40,423 820310 django.request Not Found: /mailman//postorius/lists/

Head, meet desk. After hours of searching and trying different things I finally discovered the fix.

This apparently worked up until Sept 2021 when an Apache security update broke it. It was reported as a Ubuntu bug, but then closed as invalid. That bug points to this mailman3 mailing list thread which includes … let’s call it a workaround:

In the apache.conf file there is the line:

ProxyPass /mailman3 unix:/run/mailman3-web/uwsgi.sock|uwsgi://localhost/

and the problem was caused by the final trailing slash. Changing it to

ProxyPass /mailman3 unix:/run/mailman3-web/uwsgi.sock|uwsgi://localhost

fixed the problem.

After the change you’ll need to restart Apache. Be sure to clear your browser cache so you’re getting the actual results and not a cached 301 from the browser too. A hearty thank you to Simon Brown for the fix above and posting it to the list.

Is it a Postorious problem? A problem with the Debian package? I don’t know, but it’s an incredibly frustrating problem in an already complex installation.

Versions in use:

  • Ubuntu 20.04
  • Apache2 2.4.41-4ubuntu3.14
  • mailman3-web 0+20180916-10
  • python3-django-mailman3 1.3.2-1

python, bytecode, and read-only containers

Upon first access, python compiles .py code into bytecode and stores it in .pyc files. Subsequent uses of those python sources are read from the .pyc files without needing to re-compile. This make startup time, but not runtime, faster.

But what about read-only filesystems? If python is running on a read-only filesystem no .pyc files are written and every use of a .py file involves compiling the code afresh. Everything works, the startup time of a script is just a little slower.

Read-only filesystems within a container are a security best practice in production environments. Often in kubernetes deployment manifests you might see something like:

securityContext:
readOnlyRootFilesystem: true

And if you’re running python only once within the container, the container has a little bit of overhead at startup as it compiles into bytecode and then everything is in memory and off it goes. But if you’re running python multiple times, or want to make even the single run start faster, we can pre-compile the code when we build the container with the built-in python compileall library.

# Compile code on sys.path (includes system packages and CWD):
RUN python -c "import compileall; compileall.compile_path(maxlevels=10)"

# Compile code in a directory:
RUN python -m compileall /path/to/code

This moves the compilation overhead to the container build where it happens once, and out of the startup.

Thanks to Itamar Turner-Trauring at https://pythonspeed.com/ for their excellent Production-ready Docker packaging for Python slide deck with this gem.

poetry auth via .netrc

poetry, the python package manager, provides several ways of authenticating against a repository. What isn’t explicitly documented, because it’s an implicit dependency, is that poetry can also use the ~/.netrc file for authentication when fetching packages.

poetry uses requests under the covers, and requests falls back to the ~/.netrc file. This is the same fallback method for pip for the same reason.

There are several (probably bad) reasons why someone would want to do this vs one of the explicit methods given by poetry. One that comes to mind is needing to install python packages from a private repository from inside a docker container by simply volume mapping the host’s ~/.netrc file to have poetry use the right creds.

This approach probably won’t work when publishing packages — caveat emptor.

While I’m not suggesting that this is a best practice, it’s good to know that it’s an available method in some extreme edge cases.

Accessing Ubuntu desktop UI over SSH+VNC

During this pandemic I’m working from home on my Mac laptop and accessing things on my Ubuntu 18.04-based Linux desktop in the office. For most things this is fine via SSH or sshfs, but there are times you just need access to the desktop UI to get things done.

Specifically I had a 500 MB OVA that I needed to upload to an ESXi system — both of which are in the office. I could have downloaded the OVA to my laptop over the VPN, then uploaded it back over the VPN to ESXi but that is both slow, tedious, and wasteful. Instead after a bit of googling I figured out how to get a VNC client on my Mac securely accessing my work Xwindows display and do it all local to the office:

On your desktop, install x11vnc:

sudo apt install x11vnc

On your home computer, open an SSH tunnel and start the viewer on your remote system (below as $HOSTNAME):

ssh -t -L 5900:localhost:5900 $HOSTNAME 'x11vnc -localhost -display :0'

Then start a VNC viewer on your home computer (on MacOS I recommend RealVNC) and connect to localhost:5900

Security advisory: when accessing your desktop like this your computer is unlocked and accessible by keyboard and mouse to users who wander by your desk. Granted, in a pandemic when everyone is working from home is this really a problem? Lock your computer when you’re done as if you were walking away from your desk and you’ll be fine.

Creating aspell dictionary packages for Ubuntu

There are many aspell dictionary packages available for Ubuntu, but not all of them. If you’re a somewhat esoteric project like Distributed Proofreaders, you may discover that you need things like the Latin aspell dictionary (aspell-la) which I can’t seem to find packaged anywhere.

Installing from source

It’s super easy and perfectly possible to install any of the aspell dictionaries directly. Just fetch the file, configure, make, and make install and you’re golden:

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2
cd aspell6-la-20020503-0
./configure
make
make install

The quick and dirty works but for systems maintained by multiple people it’s a recipe for disaster without a lot of documentation. How will someone remember that this needs to be done again for the next server upgrade or server migration? In these cases it’s usually best to create a system package and install the package.

Building & installing a package

Building a package for Ubuntu / Debian can be mind-boggling complicated when all you want to do is package up a few files to lay down on the filesystem. Luckily for aspell dictionaries we can easily borrow the template used by the aspell-en package.

Start by finding and downloading the aspell dictionary that you want to install from the list available and extracting it.

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2

Configure and build it to create the .rws file:

cd aspell6-la-20020503-0
./configure
make

Now head over to the aspell-en package on LaunchPad, to find and download the aspell-en_*.debian.tar.xz file from the Ubuntu version that most closely matches your own, then extract it into the the dictionary directory. This is the source file for the debian/ control directory used to build the aspell-en package, which we’ll use as a template for our own.

# from within aspell6-la-20020503-0/
wget https://launchpad.net/ubuntu/+archive/primary/+files/aspell-en_2017.08.24-0-0.1.debian.tar.xz
tar xvfJ aspell-en_2017.08.24-0-0.1.debian.tar.xz

This contains several files that we don’t need for our simple dictionary, so we can clean things up a bit. Keep in mind that we’re not creating a dictionary for distribution, just for ourselves, so this doesn’t have to be perfect.

cd debian
rm aspell-en.info-aspell changelog copyright extrawords.txt
cp ../COPYING copyright

You’ll need to update some of the files to reference your language, most of these are fairly straightforward:

  • control – Update references to aspell-en to your aspell dictionary; also update Maintainer and Description. You might need to change the debhelper version to whatever is installed on your system (Ubuntu 16.04 uses v9 not v10). If you change this, you should change it in compat too.
  • watch – Update the last line to point to where you got your aspell dictionary from — you probably just need to change the two instances of ‘en’ to your language’s code.

Three files require a little more finessing: installrules, and source/format.

The install file specifies which files should be copied into the package for installation. For reasons that I, frankly, just don’t understand, we need to specify that the .rws file needs to be installed. Your install file should look like this:

*.multi         usr/lib/aspell
*.alias         usr/lib/aspell
*.dat           usr/lib/aspell
*.rws           var/lib/aspell

The rules files is a makefile that does all of the heavy lifting for building the package. The version for aspell-en includes bits that we don’t care about, namely everything related to docs and extrawords, we can remove those and update the DICT_LANG which leaves us with:

#!/usr/bin/make -f

include /usr/share/cdbs/1/rules/debhelper.mk

DICT_LANG := la

DEB_DH_MD5SUMS_ARGS += -Xvar/lib/aspell

install/aspell-$(DICT_LANG)::
        for f in `LC_ALL=C ls *.cwl`; do \
            gzip -9 -n -c "$$f" > "$(DEB_DESTDIR)/usr/share/aspell/"$$f".gz"; \
            WL=`echo $$f | sed 's/\.cwl$$//'`; \
            touch "$(DEB_DESTDIR)/var/lib/aspell/$$WL.rws"; \
            dh_link "var/lib/aspell/$$WL.rws" "usr/lib/aspell/$$WL.rws"; \
            echo "$$WL" >> "$(DEB_DESTDIR)/usr/share/aspell/$(DICT_LANG).contents"; \
        done

        touch $(DEB_DESTDIR)/var/lib/aspell/$(DICT_LANG).compat

        installdeb-aspell

Note that the 8-space indents above should be tabs in your version — this is a makefile!

The final thing to do is change source/format to say we want to use the 1.0 version:

1.0

The last thing to do is to create the changelog file using dch. This file is used by the packager to determine the name and version of the package file. To keep things simple, I recommend sticking with the version from the source file itself, even if that differs from the normal Debian version format.

# from within aspell6-la-20020503-0/
dch --create -v 20020503-0 --package aspell-la

Now all that’s left is building the package:

# from within aspell6-la-20020503-0/
debuild -us -uc

If successful, this will put a aspell-la_20020503-0_all.deb file in the parent directory.

$ ls -1
aspell-la_20020503-0.dsc
aspell-la_20020503-0.tar.gz
aspell-la_20020503-0_all.deb
aspell-la_20020503-0_amd64.build
aspell-la_20020503-0_amd64.changes
aspell6-la-20020503-0
aspell6-la-20020503-0.tar.bz2

You can now install this via:

sudo apt install ./aspell-la_20020503-0_all.deb

Note, the ./ is required, otherwise it will look in the package catalog instead of on disk for the package.

You can test that your new dictionary works via:

$ echo hello | aspell list --lang=la

If that returns with “hello” as misspelled word, it worked. If you have problems, you can remove the package (sudo apt remove aspell-la), futz with some of the files, and try rebuilding it again. Things to watch out for are ensuring you’ve configured and make’d the package and that your changes to the install and rules files are correct.

Installing yaz for PHP on Ubuntu

tl;dr

Here’s how to install yaz on Ubuntu 20.04 with PHP 7.4 (or Ubuntu 22.04 with PHP 8.1):

sudo apt install yaz libyaz-dev php-dev php-pear pkg-config
sudo pecl install yaz

The libyaz-dev package is the important, and oft-overlooked part.

Then add the following line to /etc/php/<version>/apache2/php.ini:

extension=yaz.so

And restart apache:

sudo systemctl restart apache2

If you’ve upgraded Ubuntu and gotten a new version of PHP you will probably need to uninstall and reinstall from pecl:

sudo pecl uninstall yaz
sudo pecl install yaz

Original post from 4 years ago follows.

Numerous sites on the internet have answered the basic question of “how do I install yaz for PHP on Ubuntu”. Which basically boils down to some flavor of:

PHP 5.x

sudo apt-get install yaz
sudo apt-get install pecl      # Ubuntu pre-16.04
sudo apt-get install php-pear  # Ubuntu 16.04 and later
sudo pecl install yaz

Then add the following line to /etc/php5/apache2/php.ini:

extension=yaz.so

PHP 7.0

sudo apt-get install yaz
sudo apt-get install php7.0-dev php7.0-pear
# might just be php-dev and php-pear on your OS (eg: Ubuntu 16.04)
sudo pecl install yaz

Then add the following line to /etc/php/7.0/apache2/php.ini:

extension=yaz.so

But wait, that fails

Sadly, the pecl install will fail with the error:

checking for yaz-config... NONE
configure: error: YAZ not found (missing NONE)
ERROR: `/tmp/pear/temp/yaz/configure --with-yaz' failed

All the search results for this error solve it by downloading the yaz source code and compiling and installing it outside the package manager, which is non-ideal.

The missing piece is that yaz-config is included with the libyaz4-dev package:

sudo apt-get install libyaz4-dev

Interestingly, this yaz install blog post does explicitly calls out the need for the -dev packages, but doesn’t include the error when you don’t have it. Hopefully this blog post will tie the two bits together for future people perplexed by this.

Updates:

  • 2018-06-03: include PHP 7.0 instructions for Ubuntu 16.04.
  • 2020-12-05: include PHP 7.4 instructions for Ubuntu 20.04.