A new server – Setting it up

Moving country means that my previous VM-on-the-Internet (AKA a VPC) has a terrible latency. Like 280 ms terrible. A closer server would be 60 ms which is way better.

Thus it’s time for a new server. But I learn from bad decisions, so here’s my new requirements:

  1. Everything which can be in a container, should be in a container. Exceptions are low level network things (VPN, firewall, routing, load balancing/reverse proxy). Containers are easy to install, update, delete. No dependencies.
  2. Applications I need:
    • A git server where to put my source code. Don’t wanna use GitHub for my own private code. Also keeps accidental leaking of passwords/keys from happening.
    • A note taking application. I use Trilium. Maybe there is something better now?
    • An S3 server with a simple web UI. I could use B2 too. So this is optional.
    • Tailscale as VPN
  3. Preferably no complex orchestration. I used k3s on my last machine. It was a good learning experience, but it’s overkill: my containers don’t need to scale and if the server goes down, everything is down anyway. Before using k3s, I used docker-compose and that was perfect back then: simple, works, easy to administrate.

So here is what I settled on:

  • Hetzner 2 CPU, 8 GB RAM, 150 GB SSD in Germany (about 55 ms latency). Strangely I could not find a good provider in Dublin. That might have reduced the latency a bit more.
  • Forgejo for git. Way more than a simple gitserver, but it does git just fine and does much more. Foregejo is a fork from Gitea and it looks/feels a bit like GitHub/GitLab.
  • Trilium is being further updated and it’s now called TriliumNext. That’s a solid choice since I used Trilium before and I like it.
  • I used Minio for my S3 “needs” at home: it’s convenient to copy files via web browser and via command line tools. However Minio removed their Docker images for their community (non-commercial) product. Thus I changed to Garage. And as a simple web UI, garage WebUI.
  • To make HTTPS work on everything, either everything works with https by itself, or I need a reverse proxy. I used Nginx and HAProxy before. This time it’s Caddy as it can natively handle ACME certificates.
  • Podman, not Docker. And running rootless where possible. Podman does not need a daemon running as root: all is handled as a normal user (except some file permissions). Caddy is the only exception as it needs to bind to port 80 and 443.
  • To make podman work with user owned container services and systemd, Quadlets come into play.

Note: I use qw.org as my domain example. Replace it with your DNS domain.

Step 0: Podman🔍

I used Docker for many years. docker-compose was even better since it allowed multi-container setups and automatic restart of containers after a reboot. But that docker daemon was always a bit of an eye-sore when it comes to security.

Podman is daemonless and with a bit help of systemd, a user can run services which automatically start on user-login or after a reboot. Exactly what I need.

So: Docker/docker-compose out, Podman/Quadlet in.

I intend to run my services as myself as much as possible. Exceptions are when the service need permissions I do not have as a normal user. E.g. binding to ports below 1024.

Podman’s user config files for Quadlets are in ~/.config/containers/systemd

Git – Forgejo🔍

Create ~/.config/containers/systemd/forgejo.container

[Container]
ContainerName=forgejo13
HostName=forgejo
Image=codeberg.org/forgejo/forgejo:13-rootless
AutoUpdate=registry
Volume=%h/forgejo/data:/var/lib/gitea
Volume=%h/forgejo/config:/etc/gitea
Volume=/etc/timezone:/etc/timezone:ro
Volume=/etc/localtime:/etc/localtime:ro
PublishPort=127.0.0.1:3000:3000
PublishPort=3022:2222

[Unit]
Description=Forgejo
After=local-fs.target

[Install]
WantedBy=default.target

[Service]
Restart=on-failure

Note that the port 3000 is localhost only as I do not want anyone to use http on port 3000. Caddy will receive https and then connect to Forgejo on http://localhost:3000

SSH on the other hand directly connects to Forgejo.

Set up data directories. I don’t know why userid and groupid is 100999. It seems to be not adjustable either.

$ mkdir -p ~/forgejo/{config,data}
$ sudo chown 100999:100999 ~/forgejo/{config,data}

To start the container:

$ systemctl --user daemon-reload
$ systemctl --user start forgejo

To start Forgejo upon reboot without me having to log in:

$ systemctl --user enable linger

Test:

  • podman ps to show the container running
  • on the host: curl http://localhost:3000
  • from external: ssh -p 3022 YOURHOST should start an ssh session. There is no account yet of course.

Reverse Proxy – Caddy🔍

Forgejo is not yet connectable from outside. Need some reverse proxy for this. Caddy can do that plus it can handle ACME natively, so no fiddling with acme.sh or similar.

Caddy is not running as container as non-root containers cannot bind to port 80 or 443. Thus Caddy is a normal daemon on the base OS. Yes, I could run it as a container with elaborate nftables. Or run as root. Which is about the same as running Caddy as a normal service.

Simple is good. Thus Caddy runs as a normal service.

$ sudo apt install caddy

Config file is in /etc/caddy/Caddyfile

# The Caddyfile is an easy way to configure your Caddy web server.
#
# Unless the file starts with a global options block, the first
# uncommented line is always the address of your site.
#
# To use your own domain name (with automatic HTTPS), first make
# sure your domain's A/AAAA DNS records are properly pointed to
# this machine's public IP, then replace ":80" below with your
# domain name.

{
http_port       80
https_port      443
email           "harald.kubota@gmail.com"
default_sni     de.qw.org

log default {
        output stdout
        level INFO
        }
}

:80 {
        # Set this path to your site's directory.
        root * /usr/share/caddy

        # Enable the static file server.
        file_server
}

git.qw.org {
        reverse_proxy localhost:3000
        tls {
          ca https://acme-v02.api.letsencrypt.org/directory
        }
}

# Refer to the Caddy docs for more information:
# https://caddyserver.com/docs/caddyfile

Start the usual systemd way:

# systemctl enable caddy
# systemctl start caddy

When connecting explicitly to the host port 80 via http, you get the Caddy slanted welcome page. When connecting to https://git.qw.org you connect to the Forgejo instance.

DNS needs to be set up of course: git.qw.org should point to the IP of the server (A or CNAME does not matter)

Test:

  • connect to http://IPADDRESS and you should get the slanted Caddy page. This needs to work as ACME is using port 80.
  • connect to https://git.qw.org should give you the Forgejo initial page.
  • connect to http://git.qw.org and it should redirect you to the https page (via a 307 redirect)
  • systemctl status caddy should show it’s enabled and running

Note Taking App – TriliumNext🔍

I used Trilium and it was good. I should use it more though. TriliumNext is the “successor”. Compatible with old backups, so the migration from Trilium to TriliumNext should be super-simple. Spoiler: it was.

After adding a DNS entry for trilium.qw.org to point to the Hetzner server, Caddyfile needs one extra entry (add to the end):

trilium.qw.org {
        reverse_proxy localhost:8080
        tls {
          ca https://acme-v02.api.letsencrypt.org/directory
        }
}

The ~/.config/container/systemd/trilium.container file:

[Container]
ContainerName=trilium
HostName=trilium
Image=docker.io/triliumnext/trilium:main
AutoUpdate=registry
Volume=%h/trilium/data:/home/node/trilium-data
Volume=/etc/timezone:/etc/timezone:ro
Volume=/etc/localtime:/etc/localtime:ro
PublishPort=127.0.0.1:8080:8080

[Unit]
Description=Trilium
After=local-fs.target

[Install]
WantedBy=default.target

[Service]
Restart=on-failure

To start Trilium:

$ mkdir ~/trilium/data
$ chown 100999:100999 ~/trilium/data
$ systemctl --user daemon-reload
$ systemctl --user start trilium

After starting it, it’ll create some directories and files. To restore a backup from a previous Trilium instance, copy the backup file (e.g. backup-2025-10-20.db) to ~/trilium/data/document.db after stopping the container.

Test by

S3 – Garage

Here the container definition ~/.config/containers/systemd/garage.container:

[Container]
ContainerName=garage
HostName=garage
Image=docker.io/dxflrs/garage:v2.1.0
AutoUpdate=registry
Volume=%h/garage/garage.toml:/etc/garage.toml
Volume=%h/garage/meta:/var/lib/garage/meta
Volume=%h/garage/data:/var/lib/garage/data
Volume=/etc/timezone:/etc/timezone:ro
Volume=/etc/localtime:/etc/localtime:ro
PublishPort=127.0.0.1:3900:3900
PublishPort=127.0.0.1:3901:3901
PublishPort=127.0.0.1:3902:3902
PublishPort=127.0.0.1:3903:3903

[Unit]
Description=Garage
After=local-fs.target

[Install]
WantedBy=default.target

[Service]
Restart=on-failure

Create directories and config file ~/garage/garage.toml:

$ mkdir -p ~/garage/{meta,data}
$ cat >~/garage/garage.toml <<_EOF_
metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
db_engine = "sqlite"

replication_factor = 1

rpc_bind_addr = "[::]:3901"
rpc_public_addr = "127.0.0.1:3901"
rpc_secret = "$(openssl rand -hex 32)"


[s3_api]
s3_region = "garage"
api_bind_addr = "[::]:3900"
root_domain = ".s3.qw.org"

[s3_web]
bind_addr = "[::]:3902"
root_domain = ".s3.qw.org"
index = "index.html"

[k2v_api]
api_bind_addr = "[::]:3904"

[admin]
api_bind_addr = "[::]:3903"
admin_token = "$(openssl rand -base64 32)"
metrics_token = "$(openssl rand -base64 32)"
_EOF_

Start garage with

$ systemctl --user daemon-reload
$ systemctl --user start garage

podman exec -ti garage /garage status should show the pod and “NO ROLE ASSIGNED”

Follow https://garagehq.deuxfleurs.fr/documentation/quick-start/ to set up a layout and a bucket. That includes creating a bucket and a key.

/etc/caddy/Caddyfile needs some sections added too:

s3.qw.org:3800 {
        reverse_proxy localhost:3900
}
s3.qw.org:3801 {
        reverse_proxy localhost:3901
}
s3.qw.org:3803 {
        reverse_proxy localhost:3903
}

Test:

I have a minio client working, so I just added this section in the .mc/config.json config:

        "garage": {
            "url": "https://s3.qw.org:3800",
            "accessKey": "GK36....................01",
            "secretKey": "5f...........................................................f8",
            "api": "S3v4"
         },

And you can do things like:

$ mc cp .mc/config.json garage/dump/
.../.mc/config.json: 1.85 KiB / 1.85 KiB ━━━━━━ 2.58 KiB/s 0s

Files are stored in ~/garage/data/, however they are broken down in small pieces.

To make S3 buckets available as web files, enable a bucket to have web access enabled, and then you can access those files as https://web.s3.qw.org:3802/codemonster/index.html

web is the bucket. You also need a DNS entry for web.s3 and Caddy needs to know too:

web.s3.qw.org:3802 {
        reverse_proxy localhost:3902
}

Garage WebUI🔍

Minio has a nice built-in admin web page to check the status of Garage, create buckets, upload/download files via a web browser. It’s nice to have. Garage does not have this. Garage-WebUI fixes that.

Here the ~/.config/containers/systemd/garage-ui.container file:

[Container]
ContainerName=garageui
HostName=garageui
Image=docker.io/khairul169/garage-webui:1.1.0
AutoUpdate=registry
Volume=%h/garage/garage.toml:/etc/garage.toml:ro
Volume=%h/garage/meta:/var/lib/garage/meta
Volume=%h/garage/data:/var/lib/garage/data
Volume=/etc/timezone:/etc/timezone:ro
Volume=/etc/localtime:/etc/localtime:ro
Environment=API_BASE_URL="https://s3.qw.org:3803"
Environment=S3_ENDPOINT_URL="https://s3.qw.org:3800"
PublishPort=127.0.0.1:3909:3909

[Unit]
Description=Garage WebUI
After=local-fs.target garage.service

[Install]
WantedBy=default.target

[Service]
Restart=on-failure

There is no authentication by default, so set up basic authentication via Caddy. Add this into /etc/caddy/Caddyfile:

s3.qw.org:3809 {
        basicauth / {
                s3admin $2a$14$WdT9d7/.mBI.....................cz
        }
        reverse_proxy localhost:3909
}

To get an encrypted password, see https://caddyserver.com/docs/command-line#caddy-hash-password

After restarting caddy, connect to https://s3.qw.org:3809 and you should get a request for an account and password.

Tailscale🔍🔍🔎

To install, follow the rather simple instructions from https://tailscale.com/download/linux. Start with systemctl start tailscaled

To enable this machine as exit node, with tailscale running, execute tailscale set --advertise-exit-node

Adding that same flag to /etc/defaults/tailscaled does not work. The daemon does not even start anymore then.

Appendix🔍

The complete /etc/caddy/Caddyfile:

# The Caddyfile is an easy way to configure your Caddy web server.
#
# Unless the file starts with a global options block, the first
# uncommented line is always the address of your site.
#
# To use your own domain name (with automatic HTTPS), first make
# sure your domain's A/AAAA DNS records are properly pointed to
# this machine's public IP, then replace ":80" below with your
# domain name.

{
http_port       80
https_port      443
email           "harald.k@gmail.com"
default_sni     de.qw.org

log default {
        output stdout
        level INFO
        }
}

:80 {
        # Set this path to your site's directory.
        root * /usr/share/caddy

        # Enable the static file server.
        file_server
}

s3.qw.org:3800 {
        reverse_proxy localhost:3900
}
s3.qw.org:3801 {
        reverse_proxy localhost:3901
}
web.s3.qw.org:3802 {
        reverse_proxy localhost:3902
}
s3.qw.org:3803 {
        reverse_proxy localhost:3903
}
s3.qw.org:3809 {
        basicauth / {
                s3admin $2a$14$WdT9d7/.mBI.....................cz
        }
        reverse_proxy localhost:3909
}

git.qw.org {
        reverse_proxy localhost:3000
        tls {
          ca https://acme-v02.api.letsencrypt.org/directory
        }
}

trilium.qw.org {
        reverse_proxy localhost:8080
        tls {
          ca https://acme-v02.api.letsencrypt.org/directory
        }
}

# Refer to the Caddy docs for more information:
# https://caddyserver.com/docs/caddyfile

Next Steps🔍

Currently all my needs are fulfilled with this installation. A bit monitoring would be nice though: telegraf+InfluxDB+Grafana.

Zugverspätungen (Deutsche Bahn)

Diesen Urlaub bin ich mit meinem Sohnemann mit dem Zug unterwegs. Nach’m Flug Auto fahren ist nicht mehr mein Ding, und Zug ohne Umsteigen…das ist schon bequem und oft schneller als ein Auto. Ich krache nicht mit 200 über die Autobahn…

Und am Anfang klappt alles wie am Schnürchen: ICE kommt pünktlich am Flughabenfernbahnhof an, und mit nur 5 Minuten Verspätung am Ende. Kein Problem. Ich erwarte nicht 1.6 Minuten wie bei anderen Zügen.

Dann gestern mit dem Bus nach Dortmund gefahren. Bus kommt eine Minute später an als geplant und ist pünktlich am Bahnhof in Dortmund. Klasse.

Aber dann ging’s nur noch bergab…

Auf der Rückfahrt von Dortmund gibt’s 2 Möglichkeiten: ein Regionalzug oder die S-Bahn. Oder Bus. Aber den hatten wir schon. Der RE3 ist schneller, fährt aber nur einmal pro Stunde. Die S-Bahn braucht länger, aber fährt 3 mal pro Stunde. Wir kommen 10 Minuten am Bahnhof an bevor der RE3 abfährt. Perfect!

Dann hat der RE3 10 Minuten Verspätung.

Dann wechselt er die Platform und ungefähr 50 Leute dackeln von einem Gleis zum neuen.

Dann gibt’s nochmal 5 Minuten mehr Verspätung.

Das wurde uns zu bunt und wir fuhren mit der S-Bahn die mittlerweile fast pünktlich ankam. Wieder Gleis wechseln, aber immerhin zum letzten mal.

Nächster Tag, nächster Versuch:

RE3 hat 10min Verspätung.

Dann wird der Zug komplett gestrichen. Wie geht das? Der Zug löst sich doch nicht einfach in Luft auf?! Was ist mit den Leuten die in so einem Zug sind? Das Rätsels Lösung war dass der RE3 schlicht an uns vorbei fuhr. Ohne uns.

Also S-Bahn wieder. Diesmal fast pünktlich.

Dann das nächste Problem: ICE’s können auch spät sein…und heute sind sie’s auch:

  1. Alle ICE an diesem Morgen sind verspätet
  2. ICE616 (der erste in obigem Bild) war am Anfang 20 Minuten zu spät, aber das steigerte sich zu den 69 Minuten die man im Bild sieht, und das mehrte sich noch zu 75 Minuten am Ende.
  3. Wir hatten 1h Puffer für unser Anschlusszug in Hamburg. 1h war aber nicht genug. Den Anschusszug werden wir verpassen.
  4. ICE616 sollte noch 2 Stationen nach Hamburg Hbf haben. Die wurden aber kurzfristig gestrichen: Endstation war Hamburg Hbf anstelle von Hamburg-Altona.
  5. Unser Anschlusszug nach Hannover (ICE579) hatte 24 Minuten Verspätung.
  6. Die Rückfahrt im ICE70 von Hannover nach Hamburg hatte 29 Minuten Verspätung.

Ich frage mich ernsthaft wie das so schlecht funktionieren kann. Oder wie die Bahn das korrigieren wird. Oder will.

Update: Fahrt nach Paris🔍

Ein paar Tage später ging’s mit dem Zug weiter von Hamburg nach Paris mit Umstieg in Mannheim. 54 Minuten Zeit in Mannheim sollte reichen oder?

Alles lief wie geschmiert bis nach Frankfurt. Kurz vor Frankfurt kamen dann Verspätungen für die nachfolgenden Stationen nach Frankfurt an. Erst 15 Minuten, dann über 60. Uh oh…

Das Problem war ein ausgefallenes Signal zwischen Frankfurt und Mannheim. Das musste umfahren werden und das ist natürlich langsamer. Der ICE Fahrer hat uns oft und detailliert Updates gegeben und das hat sehr geholfen: weil der TGV die gleiche Strecke von Frankfurt nach Mannheim fahren muss, wird der dementsprechend auch verspätet werden. Und so war’s dann auch: der ICE war 1h3hm spät, aber der TGV “nur” 33m, was immernoch 24 Minuten für das Umsteigen übrig lies. Weil’s auf der gleichen Platform war, gar kein Problem.

Am Ende hat alles funktioniert, aber nervenaufreibend war’s schon. Ohne den Zugfahrer und seine Updates hätte ich nicht gewusst ob ich den Anschlusszug erreichen kann oder nicht. Wenn der TGV von woanders nach Mannheim gefahren wäre, hatte ich den TGV nicht mehr bekommen.

Ich erwarte nicht dass ein ICE so pünktlich ist wie ein Shinkansen, aber etwas mehr pünktlichkeit hätte ich schon erwartet. Ich werde in jedem Fall keine Zugreisen mit Anschlusszügen nehmen: die Chance dass ich den verpasse ist einfach zu gross.

Google minus the Crap

I don’t like the AI generated answer/summaries from Google. I can read myself, and I actually prefer to read the original information to get an idea whether it’s coherent and correct (as much as I can say since usually I look for information for which I do not know the answer).

An AI summary gives me a coherent text with unknown degrees of truth. But it always reads like it was written by someone who can make a coherent sentence.

There’s a “hack” to add some swear words which removed the AI summary, but it looks like a clunky workaround. Maybe 10 year olds get a kick out of adding “fuck” into every Google search, but I am too old for that shit.

This is a good solution though: https://udm14.com/

No more AI summary. Feels like the old and useful Google. Love it!

To make the default search engine use this switch, follow the instructions at https://tenbluelinks.org/ and your normal Google search will be using that udm=14 parameter.

Of course you could also use https://duckduckgo.com/ as your default search engine…

A Good LLM Use

I find a lot of LLM uses which are anywhere from pointless to futile, but once in a while I find an example of LLM use which adds value and is something which would have been much harder without LLM.

Accelerating Large-Scale Test Migration with LLMs from AirBnB is one of those:

  • Tests are short snippets of code which don’t contain complex logic: they do set up a test, run it, and check the result.
  • Tests are supposed to pass, thus it’s easy and quick to verify the results created by the LLM.
  • LLMs have seen enough JavaScript to know how to create valid code. Same for the two involved testing frameworks: Enzyme and the React Testing Library.
  • If the generated tests don’t run at all (i.e. syntax errors or severe logical errors), then just try again. Not much harm done.
  • Adding context works, thus use an LLM which can handle a large enough context window. And make sure your context is relevant and helpful. In above examples they went up to 100k token input.

Reading above list, it’s easy to imagine where LLM will be much less successful:

  • Unusual languages or frameworks
  • Hard to verify that the LLM did good

Costs

I wish the authors would have mentioned the costs for this migration. I am confident it was way cheaper than doing it manually. They estimated 1.5 years vs. 6 weeks it took with LLM.

From the article, I am assuming 3600 files in total and 75% were solved with a prompt with 50k token on average, assuming 5 tries and using Claude 3.7 Sonnet at $3/1M input token, and ignoring the output token count since it’s relatively small code, that’s $2000 for the 2700 files which worked, and $1350 for the 900 which failed even after 10 tries.

Add in the attempt to fix those 900 remaining files with different prompts, adding different examples or context files etc. and the 50-100 times rerunning the LLM prompt, that’s another $6000. And then manually fixing the few (3%, about 100 files) remaining files “in another week of work”.

So roughly $10,000 for the LLM and 1.5 months for a small team. Compared to 18 months for a relatively simple programming task. Since tests are independent from one another, that could be done in 1.5 months by hiring 12 people instead of 1. Same price but 12 times faster.

Cheesecake

From the knowledge-that-should-not-be-lost department

Found at https://www.gourmet.com.s3-website-us-east-1.amazonaws.com/recipes/1990s/1999/11/three_cities_spain_cheesecake.html before it gets deleted.

Generally I like all kind of cheesecake, but this one caught my eye for excessively tasty looking.

Photograph by Anna Williams

Three Cities of Spain Cheesecake

Serves 8 to 10

  • Active time:30 min
  • Start to finish:9 1/2 hr

ADAPTED FROM THREE CITIES OF SPAIN COFFEEHOUSE, SANTA FE, NM

November 1999

No cheesecake roundup would be complete without this one, created by Santa Fe’s Three Cities of Spain coffeehouse (which closed in the mid-1970s) and our absolute favorite in the creamy category.

  • crumb-crust recipe (see below), made with finely ground graham crackers.
  • 3 (8-oz, 250g) packages of cream cheese, softened
  • 4 large eggs
  • 1 teaspoon vanilla
  • 1 cup (200g) sugar

For Topping

  • 16 oz (500ml) sour cream
  • 1 tablespoon sugar
  • 1 teaspoon vanilla
  • Make crumb crust as directed in separate recipe. Preheat oven to 350ºF (180ºC)

Make filling and bake cake:

  • Beat cream cheese with an electric mixer until fluffy and add eggs, 1 at a time, then vanilla and sugar, beating on low speed until each ingredient is incorporated and scraping down bowl between additions.
  • Put springform pan with crust in a shallow baking pan. Pour filling into crust and bake in a baking pan (to catch drips) in middle of oven 45 minutes, or until cake is set 3 inches from edge but center is still slightly wobbly when pan is gently shaken. Let stand in baking pan on a rack 5 minutes. Leave oven on.

Make topping:

  • Stir together sour cream, sugar, and vanilla. Drop spoonfuls of topping around edge of cake and spread gently over center, smoothing evenly. Bake cake with topping 10 minutes.
  • Run a knife around top edge of cake to loosen and cool completely in springform pan on rack. (Cake will continue to set as it cools.) Chill cake, loosely covered, at least 6 hours. Remove side from pan and transfer cake to a plate. Bring to room temperature before serving.

Cooks’ note: Cheesecake keeps, covered and chilled, 3 days.


Crumb Crust

Makes enough for a 24-centimeter cheesecake

  • Active time:10 min
  • Start to finish:10 min

November 1999

  • 1 1/2 cups (5 oz, 150g) finely ground graham crackers or cookies such as chocolate or vanilla wafers or gingersnaps
  • 5 tablespoons unsalted butter, melted
  • 1/3 cup (70g) sugar
  • 1/8 teaspoon salt
  • Stir together crust ingredients and press onto bottom and 1 inch side of a buttered 24-centimeter springform pan. Fill right away or chill up to 2 hours.

My First Broken SSD

I work in a data center and we replace a ton of SSD. And those are the good ones, not the cheaper consumer models. Thus I know SSDs can break.

Long time ago I had a HDD die on me (Seagate ST 1162N, 142MB, SCSI). The model and the capacity gives away that this is long ago. I learned my lesson and since then backups are semi-regularly done. My NAS has RAID1 too. No severe data loss since then.

Ignoring microSD cards, which I do not count, I had yet to see one of my SSD failing. Today I got one:

[  145.916941] sd 1:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 145.916948] sd 1:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current]
[ 145.916954] sd 1:0:0:0: [sda] tag#0 Add. Sense: Invalid command operation code
[ 145.916960] sd 1:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 06 ce f0 38 00 00 18 00
[ 145.916963] critical target error, dev sda, sector 114225208 op 0x1:(WRITE) flags 0x800 phys_seg 3 prio class 2
[ 147.061324] JBD2: recovery failed
[ 147.061333] EXT4-fs (dm-7): error loading journal

The culprit was a Klevv Cras C700 NVMe SSD with 480GB and Silicon Motion SM2263EN controller. Worked well. Until it stopped working.

smartctl shows nothing odd:

root@m75q:~# smartctl --all /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-32-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: KLEVV CRAS C700 M.2 NVMe SSD 480GB
Serial Number: E201908210020546
Firmware Version: R0801L2
PCI Vendor/Subsystem ID: 0x126f
IEEE OUI Identifier: 0x005cd2
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 480,103,981,056 [480 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000000 0000000028
Local Time is: Sun Mar 23 22:04:58 2025 JST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
1 + 4.60W - - 1 1 1 1 0 0
2 + 3.80W - - 2 2 2 2 0 0
3 - 0.0450W - - 3 3 3 3 2000 2000
4 - 0.01W - - 4 4 4 4 8000 1015936

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
1 - 4096 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 37 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 3,035,689 [1.55 TB]
Data Units Written: 10,340,813 [5.29 TB]
Host Read Commands: 62,049,267
Host Write Commands: 193,235,592
Controller Busy Time: 4,569
Power Cycles: 2,806
Power On Hours: 6,365
Unsafe Shutdowns: 29
Media and Data Integrity Errors: 123
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Warning: NVMe Get Log truncated to 0x200 bytes, 0x200 bytes zero filled
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Interesting info from above:

  • There are no errors logged.
  • There are 123 media and data integrity errors.
  • While I used that SSD for years and software development, I only wrote 5.29TB. That seems to be low, but it might be right.
  • Reading only 1.55TB…that seems very low. It makes me wonder how correct this and all the other reported numbers are.
  • 100% spare blocks exist. Why were they not used before it was too late?
  • The computer was not turned on for almost a year. Maybe that’s the problem. Which would be a problem when using SSDs as offline backup storage. Makes a good case for HDDs for long term offline storage. Optical media makes even more sense.

Conclusion

  • Have backups. Everything including SSDs can fatally fail suddenly.
  • Don’t trust offline SSDs. Like every backup, test it regularly that it’s still good.
  • I don’t have HDDs except in my NAS. I was planning to replace it with SSDs, but I’ll have to re-think that.

Unusual Treats at Gyomu Super

From the team “Keeping the Random in Harald’s Random Stuff”

I am lucky to have a Gyomu Super close to home: they are a normal super market with a lot of unusual things which are usually catering to professional food related businesses, e.g. restaurants. E.g. want 1kg of roasted onions? Or 2.5kg of canned whole Italian tomatoes?

They also have normal sizes, like 400g chocolate spread or marmalade. Not everything is super-sized.

If you order those, like actual businesses would do, you’ll get multiple of those delivered to you. Like a box with 12 packets of onions. Or 6 of those tomato cans. Or 12 glasses of chocolate spread.

However in their public supermarkets they sell those in single units. I love to go there once in a while if only to see what interesting things they currently sell.

Recently I found something I didn’t know existed: Panna Cotta or Chocolate Babaloa in an unusual shape of the typical 1l milk packs:

This is not a drink, but it’s what it says on the outside: 1kg pudding. In the shape of 1l milk packs. Shape aside, this stuff is so good and not even very expensive (500 Yen for the Babaloa), especially considering the quality. The is also Japanese Pudding (プリン), Coffee Jelly and some more.

If you have a sweet tooth and you are able to go to a Gyomu Super, get one! And get a frozen cheese cake or lemon cake too:

Those are a nice treat for weekends when you’d like a cake for “Kaffee und Kuchen“.