Reimplement NN ensemble using PyTorch by osma · Pull Request #926 · NatLibFi/Annif

osma · 2026-01-13T11:34:52Z

This PR reimplements the NN ensemble using PyTorch instead of Keras/TensorFlow.

To test this, you will have to use uv sync --group all --extra torch-cpu or similar (see comments below).

Some notes about the implementation:

the neural network architecture has been radically simplified; it turned out that a much simpler model (separate linear models for each concept) gives better results than the old MLP-based model
the old code displayed top_k_categorical_accuracy, but this was not easily available in PyTorch, so I switched to using the nDCG metric computed for a random subset (n=512 documents) of the given train set and this metric is used for early stopping
the progress bar shown during training now uses tdqm, so it looks a bit different than the Keras one; it is also displayed on stderr and not stdout as the old one used to be
the code implements early stopping; it could train up to 20 epochs (can be set with max-epochs parameter), but tracks nDCG on a small sample (n=512) of the train set and stops when scores start to decline (with patience=2)
the old code showed a detailed error message when model loading failed; I couldn't figure out (yet) how to do that with PyTorch models, but the model is stored with metadata (python version, torch version etc.) that may be helpful in implementing such an error message later on if it turns out to be necessary. In general, the models should be pretty much PyTorch-version-agnostic so there may not be a need for this.
This PR sets up a dependency group all for installing all extras (a substitute for --all-extras which won't work anymore) as well as special extras for selecting the PyTorch variant. I only defined torch-cpu and torch-cu128 extras for now, but I think the setup could quite easily be extended to other PyTorch variants such as CUDA 12.6 or 13.0, ROCm or Intel XPU, though obviously these would require more configuration in pyproject.toml.
This NN ensemble will not make use of a GPU anyway; the model is trained and inference is performed using CPU only. The model is so small that using GPU computation would not bring any practical benefit. But the infrastructure for GPU use is now in place for other PyTorch based backends such as EBM or XTransformer that would benefit from GPU computing.

Fixes #895

codecov · 2026-01-13T11:43:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.63%. Comparing base (4ab42dc) to head (211fde9).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #926      +/-   ##
==========================================
- Coverage   99.63%   99.63%   -0.01%     
==========================================
  Files         103      103              
  Lines        8238     8225      -13     
==========================================
- Hits         8208     8195      -13     
  Misses         30       30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…now)

…ctionality (for now)

osma · 2026-01-15T14:57:51Z

Selecting of the PyTorch variant (CPU or CUDA x.y or ROCm or...) when setting up the development environment using uv sync has been a headache, but I think I've found a workable solution. It's not super elegant, but at least it seems to work.

The problem is that uv sync wants to perform "universal resolution", that is, resolve all the transitive dependencies once and for all, then write the result into the uv.lock file. This can be parameterized by OS, Python version and some other factors, but not by anything that the user could set when running uv sync. Since different PyTorch variants have different dependencies (e.g. CUDA libraries), dependencies for each of them would have to be resolved separately.

But fortunately, it is possible to have some degree of control over the resolution by setting up "extras" and then declaring a "conflict" between them. This causes uv to "fork" the resolution into different "branches", each having their own dependency tree.

So in commit e629963, I added two new extras: torch-cpu (CPU only) and torch-cu128 (CUDA 12.8 GPU), and declared a conflict between them, i.e., you can't install both extras at the same time. (This will unfortunately cause --all-extras to stop working, which is a shame, since it means that lots of specific --extra parameters are needed in typical situations.) These extras are then tied to specific PyTorch package indexes and thus different variants of the torch package.

The end result is that these two extras can be used to select the PyTorch variant at uv sync time. The torch dependency is still also defined for the nn extra, without a specific index. This means that installing only the nn extra will install whatever is the default PyTorch variant (on Linux it is a CUDA variant).

Here are examples of how this works now:

1. `uv sync` without extras

This installs 439MB of dependencies, no PyTorch.

$ uv sync
Resolved 212 packages in 1.71s
      Built annif @ file:///home/oisuomin/git/Annif
Prepared 1 package in 261ms
Uninstalled 1 package in 0.21ms
Installed 1 package in 0.50ms
 ~ annif==1.5.0.dev0 (from file:///home/oisuomin/git/Annif)

$ du -sh .venv
439M	.venv

2. `uv sync` with just the `nn` extra

This installs the default PyTorch CUDA variant, for a total 2.2GB of dependencies.

$ uv sync --extra nn
Resolved 212 packages in 0.77ms
Installed 6 packages in 96ms
 + lmdb==1.7.5
 + mpmath==1.3.0
 + networkx==3.6.1
 + setuptools==80.9.0
 + sympy==1.14.0
 + torch==2.9.1

$ du -sh .venv
2.2G	.venv

3. `uv sync` with both `nn` and `torch-cpu` extras

This switches to the CPU-only variant of PyTorch. Dependencies are now only 1.2GB.

$ uv sync --extra nn --extra torch-cpu
Resolved 212 packages in 0.78ms
Uninstalled 1 package in 69ms
Installed 1 package in 93ms
 - torch==2.9.1
 + torch==2.9.1+cpu

$ du -sh .venv
1.2G	.venv

4. `uv sync` with both `nn` and `torch-cu128` extras

This installs the PyTorch CUDA 12.8 variant and lots of nvidia-* library packages, for a whopping 7.0GB of dependencies. (I wonder why this isn't the same as the default PyTorch CUDA build that got installed in step 2 above?)

$ uv sync --extra nn --extra torch-cu128
Resolved 212 packages in 0.77ms
Uninstalled 1 package in 72ms
Installed 17 packages in 97ms
 + nvidia-cublas-cu12==12.8.4.1
 + nvidia-cuda-cupti-cu12==12.8.90
 + nvidia-cuda-nvrtc-cu12==12.8.93
 + nvidia-cuda-runtime-cu12==12.8.90
 + nvidia-cudnn-cu12==9.10.2.21
 + nvidia-cufft-cu12==11.3.3.83
 + nvidia-cufile-cu12==1.13.1.3
 + nvidia-curand-cu12==10.3.9.90
 + nvidia-cusolver-cu12==11.7.3.90
 + nvidia-cusparse-cu12==12.5.8.93
 + nvidia-cusparselt-cu12==0.7.1
 + nvidia-nccl-cu12==2.27.5
 + nvidia-nvjitlink-cu12==12.8.93
 + nvidia-nvshmem-cu12==3.3.20
 + nvidia-nvtx-cu12==12.8.90
 - torch==2.9.1+cpu
 + torch==2.9.1+cu128
 + triton==3.5.1

$ du -sh .venv
7.0G	.venv

…h variant

…document it in README

osma · 2026-01-16T10:44:17Z

I refined the above solution by adding an all dependency group (because --all-extras cannot be used anymore). Now a basic developer install with all CPU-only extra features can be installed with:

uv sync --group all --extra torch-cpu

Maybe not ideal, but it works.

juhoinkinen · 2026-01-22T13:55:56Z

I ran benchmarking runs using Annif-tutorial YSO-NLF dataset on annif-data-kk server (it has 6 CPUs).

The used script and output data are in the benchmarking branch

train

	Before (main) -j1	After (this PR) -j1	Before (main) -j6	After (this PR) -j6
user time (seconds)	2810.63	3023.01	2948.25	3208.04
percent CPU	106%	112%	571%	538%
wall time	44:26.96	45:36.19	8:45.21	10:10.04
max RSS	3_368_876	7_076_980	2_599_604	6_764_364
model disk size (bytes)	1_304_759_580	1_131_495_858	(same as -j1)	(same as -j1)

eval

	Before (main) -j1	After (main) -j6	Before (this PR) -j1	After (this PR) -j6
user time	475.29	471.15	485.92	473.70
percent CPU	99%	99%	498%	507%
wall time	7:58.65	7:53.83	1:38.66	1:34.24
max RSS	2_666_460	2_176_184	2_105_688	1_840_860
nDCG	0.4805	0.4750	0.4775	0.4691

Compared to TensorFlow implementation PyTorch requires twice as much memory in training and is slightly slower (107% in usertime); but in inference the situation is the opposite: PyTorch is faster (~98%) and takes less memory.

osma · 2026-01-22T14:07:43Z

Thanks @juhoinkinen ! The RAM usage doubling is interesting. First hypothesis: Maybe PT uses higher precision floats than TF? I'll investigate.

annif/backend/nn_ensemble.py

…clarity

…@1000

…@1000

osma · 2026-02-11T08:09:48Z

I've now implemented all the changes I had in mind. The model has been changed to a much smaller variant that seems to perform better than the original TensorFlow based model (which turns out was heavily overengineered, my bad!).

@juhoinkinen @mjsuhonos @mfakaehler Please try this out if you have a chance! If there are no big problems then I think this could soon be merged.

juhoinkinen · 2026-02-12T11:19:50Z

Using the code before the today's commits I ran the same benchmarks as above, full output here.

(The early stopping is a very nice feature! Decrease of nDCG stopped training after epoch 23 (with -j1) and 22 (with -j6).)

train

	Before (main) -j1	After (this PR) -j1	Before (main) -j6	After (this PR) -j6
user time (seconds)	2810.63	2976.53	2948.25	2992.71
percent CPU	106%	108%	571%	573%
wall time	44:26.96	46:39.13	8:45.21	8:56.57
max RSS	3_368_876	3_354_828	2_599_604	3_064_212
model disk size (bytes)	1_304_759_580	1_247_592_202	(same as with -j1)	(same as with -j1)

eval

	Before (main) -j1	After (main) -j1	Before (this PR) -j6	After (this PR) -j6
user time	475.29	508.43	485.92	506.08
percent CPU	99%	99%	498%	505%
wall time	7:58.65	8:32.86	1:38.66	1:41.22
max RSS	2_666_460	2_198_968	2_105_688	1_787_660
nDCG	0.4805	0.4659	0.4775	0.4688

juhoinkinen

I read through the code and have no complains. :)

There could be a short explanation of the model architecture somewhere, maybe in the NN ensemble Wiki page.

We could try out if online learning works better with the new implementation.

osma · 2026-02-12T11:48:33Z

@juhoinkinen Thanks for the new benchmark and the approval!

It's a pity that NDCG seems to have decreased. With the Finto AI (Finnish) training data set that I used for most tests, there was a nice increase in F1@5 scores of more than 0.02.

You reported that the model size is almost unchanged, but this number seems to include the LMDB which contains all the preprocessed training data. The actual model file (nn-model.pt) should be just a small fraction of the old model file (nn-model.keras). In my tests it went from 181MB to 618kB.

osma · 2026-02-13T12:45:24Z

I did some further testing on YSO and KOKO based projects. Sadly, the evaluation results have in most cases decreased compared to the old NN ensemble (around 0.02 in F1@5 scores). I think the model architecture still needs some work - the current one may be too simple after all, even if it performs better than the old one in certain cases (e.g. JYU-theses/fi).

Also, the LMDB size now seems to grow faster than earlier. All the KOKO models required more than 2GB disk space, when previously they didn't hit the default 1GB limit. This needs to be investigated - it was not intended.

mfakaehler · 2026-02-20T14:34:37Z

Sandro and I have tested the new nn-ensemble backend. As a small disclaimer ahead: even with the old implementation, we have not yet found a configuration that has produced better results then plain ensemble. Maybe the GND with its >200K entities is too hard a problem for the architecture.

We tested the following config with the new and the old branches and trained the ensemble on ~250K tables of content.

projects.cfg

[gnd-nn-PyTorch]
name=gnd-nn-PyTorch
language=de
backend=nn_ensemble
sources=gnd-1299-ob-1.4.0-TOC-IHT-ger:0.5,gnd-1299-ob-1.4.0-TITLE-ger-2:0.32,gnd-1299-mllm-1.4.0-TOC-ger:0.06,gnd-1299-mllm-1.4.0-TITLE-ger:0.12
limit=100
vocab=gnd-202510-06b-reduced
nodes=100
dropout_rate=0.2
epochs=10
lmdb_map_size=7516192768

While the tensorflow implementation ran through and produced reasonable results, the pytorch version had serious trouble with the LMDB and aborted during document processing:

Backend nn_ensemble: Processing training documents...

resulting in the following error:

`lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached`

Traceback (most recent call last):
  File "/home/sandro/Annif_nn_dev/.venv/bin/annif", line 10, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/flask/cli.py", line 400, in decorator
    return ctx.invoke(f, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/annif/cli.py", line 220, in run_train
    proj.train(documents, backend_params, jobs)
  File "/home/sandro/Annif_nn_dev/annif/project.py", line 286, in train
    self.backend.train(corpus, beparams, jobs)
  File "/home/sandro/Annif_nn_dev/annif/backend/backend.py", line 107, in train
    return self._train(corpus, params=beparams, jobs=jobs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sandro/Annif_nn_dev/annif/backend/nn_ensemble.py", line 275, in _train
    self._fit_model(
  File "/home/sandro/Annif_nn_dev/annif/backend/nn_ensemble.py", line 344, in _fit_model
    self._corpus_to_vectors(corpus, seq, n_jobs)
  File "/home/sandro/Annif_nn_dev/annif/backend/nn_ensemble.py", line 319, in _corpus_to_vectors
    seq.add_sample(score_vector, true_vector)
  File "/home/sandro/Annif_nn_dev/annif/backend/nn_ensemble.py", line 72, in add_sample
    self._txn.put(key, buf.read())
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

We restarted the process increasing LMDB size to 70GB and 400GB, both eventually aborting. Only after restricting the training documents to 25K (1/10 of the original), the actual pytorch training started and the process could finish. The final size of the file nn-train.mdb was 43GB, while the old branch produced a file of 1GB (while having processed 10x more documents).

We will report any other observations later.

san-uh · 2026-02-23T09:35:28Z

Here are the further results from a test that @mfakaehler and I carried out. Since the training data set of 260K could not be used (see above), we limited the training data to 25K.
We used nn classic based on Keras/Tensorflow with Annif.1.4.1 and, for the new nn based on PyTorch, the available dev branch.

Settings

Test case settings
testcase: German-language documents ; text kinds: table of contents (tocs)
vocabulary: 456,599 GND descriptors
traindata: 25,000 tocs
testdata: 40,885 tocs

Singlemodells
omikuji trained with 604,775 tocs and 251,818 blurbs
omikuji trained with 964,507 titles
mllm 1 trained with 5,000 tocs
mllm 2 trained with 5,000 titles

nn-ensemble parameters

nn classic (Keras/TensorFlow)

[25k-gnd-nn-Keras-TF]
name=25k-gnd-nn-Keras-TF
language=de
backend=nn_ensemble
sources=gnd-1299-ob-1.4.0-TOC-IHT-ger:0.5,gnd-1299-ob-1.4.0-TITLE-ger-2:0.32,gnd-1299-mllm-1.4.0-TOC-ger:0.06,gnd-1299-mllm-1.4.0-TITLE-ger:0.12
limit=100
vocab=gnd-202510-06b-reduced
nodes=100
dropout_rate=0.2
epochs=10
lmdb_map_size=7516192768

nn new PyTorch

[gnd-nn-PyTorch]
name=gnd-nn-PyTorch
language=de
backend=nn_ensemble
sources=gnd-1299-ob-1.4.0-TOC-IHT-ger:0.5,gnd-1299-ob-1.4.0-TITLE-ger-2:0.32,gnd-1299-mllm-1.4.0-TOC-ger:0.06,gnd-1299-mllm-1.4.0-TITLE-ger:0.12
limit=100
vocab=gnd-202510-06b-reduced
nodes=100
dropout_rate=0.2
epochs=10
lmdb_map_size=429496729600

Technical settings
1008 GB Memory
96 CPUs
2 GPUs (NVIDIA A100 80GB PCIe) available but were not used

Results

Below are some observations and analysis of the results. Information on memory and CPU usage can be found in the graphs and figures in Appendix 926-test-nn-cpu-memory.pdf

train (25.000 tocs)

	nn classic 25k Keras/TensorFlow(main) -j80	nn new 25k PyTorch (this PR) -j80
real time	66m12,713s	32m4,964s
model disk size	1024 MB (nn-train.mdb) 2,6 GB (nn-model.keras)	43 GB (nn-train.mdb) 8,8 MB (nn-model.pt)

The mdb-file sizes differ enormously: despite the same training volume, the new nn PyTorch grows many times faster than nn classic. The model-files show quite the opposite behaviour.

Note: The new nn PyTorch ensemble has stopped after the 6th epoch due to the early stopping functionality. Message:
Backend nn_ensemble: Epoch 6/50: NDCG=0.9874
Backend nn_ensemble: Model no longer improving, using best epoch 3.
If all epochs had been performed, the estimated actual training time of nn PyTorch would have been many times higher.

A suggestion for the early stopping feature: Depending on the size of the vocabulary and, in particular, the training data set, it might be useful to have an option to individually specify the size of the random subset used (currently n=512 documents). In the DNB test case with a training data set of 25,000 documents, a higher number (e.g., 20% of the total) could be useful and lead to more robust statements. So you could either set EVAL_BATCH_SIZE proportional to the training data size, or make it configurable with the annif train call.
We ran our script with the same parameters as for the nn classic ensemble. Inspecting the code, we are not sure, however, if the parameters nodes, dropout_rate and epochs even exist for the new implementation?
Perhaps the option to disable early stopping in order to force training to continue until the end (i.e., across all epochs) could also be a good addition.

eval (40.885 tocs)

	nn classic 25k Keras/TensorFlow(main) -j80	nn new 25k PyTorch (this PR) -j80
real time	12m53,328s	10m52,225s
F1@5 (doc avg)	0.3941	0.3954
F1@10 (doc avg)	0.2880	0.2843
NDCG@10	0.6383	0.6476

The new nn PyTorch is 2 minutes faster in evaluation than nn classic.

There is hardly any difference in performance: nn PyTorch and nn classic achieve (almost) identical values for F1@5. For F1@10, nn classic is slightly ahead, while the new nn PyTorch offers a slightly more optimized ranking for the top 10 suggestions (see NDCG@10).

index (40.885 tocs)

	nn classic 25k Keras/TensorFlow(main) -j80	nn new 25k PyTorch (this PR) -j80
real time	66m12,713s	106m32,691s

The nn classic requires 0.097 seconds per document. The new nn PyTorch requires 0.156 seconds per document.

osma · 2026-02-23T11:30:39Z

Thanks a lot for the detailed report @san-uh ! It is clear that there is something wrong with the LMDB database (nn-train.mdb) growing much faster than before. That obviously needs fixing.

After that, I will try to rework the model architecture to try to achieve better results with at least some of our data sets (e.g. KOKO based) where the outcome was much worse than for the old NN ensemble.

Regarding the early stopping heuristic and the set of 512 documents that is used to calculate the metric: you are right that this is a bit rigid. However, in my tests it didn't seem to matter that much - it might happen that choosing a non-ideal subset would cause the early stopping to happen one epoch earlier or later than ideal, but it should be pretty close to optimal regardless. However, I can try to make this more flexible.

…ve serialization overhead

sonarqubecloud · 2026-02-26T13:34:47Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

osma · 2026-02-26T13:36:41Z

It took a while to hunt this down, but I found the cause of the massive increase in LMDB size. The code was using a suboptimal sparse matrix type (CSC instead of CSR). CSC used to be the right choice in the old code, but the new code flips some matrix dimensions around and now CSR is needed. The most recent commit 211fde9 fixes this.

The next step is further iteration on the model itself; I think some alternative approaches must be evaluated and the current one may bee too simple.

osma self-assigned this Jan 13, 2026

osma added the enhancement label Jan 13, 2026

osma force-pushed the issue895-nn-ensemble-pytorch branch from 5bdbf64 to d82a54a Compare January 13, 2026 11:40

osma added 3 commits January 15, 2026 14:25

switch dependency from tensorflow-cpu to torch (only cpu variant for …

e04644e

…now)

NN ensemble basic functionality implemented using PyTorch

2d3e434

add pytorch and python version to NN model; remove model metadata fun…

da479eb

…ctionality (for now)

osma force-pushed the issue895-nn-ensemble-pytorch branch from d82a54a to da479eb Compare January 15, 2026 12:25

enable selecting PyTorch CPU or CUDA (12.8) variant through extras

e629963

osma mentioned this pull request Jan 15, 2026

Add ebm backend #914

Open

osma added 3 commits January 15, 2026 17:15

use torch-cpu extra in CI/CD and Dockerfile to select CPU-only PyTorc…

3784155

…h variant

define dependency group 'all' as a substitute for '--all-extras' and …

541f2af

…document it in README

drop torch-cpu from 'all' group as it is not needed

e3fc7f9

osma added 4 commits January 16, 2026 15:03

add progress bar (using tqdm) for NN ensemble training loop

ff8c692

calculate nDCG after every training epoch (using torchmetrics package)

1660273

specify num_workers and weight_decay parameters

bf0cba0

cleanup

85057cd

osma requested a review from juhoinkinen January 16, 2026 13:54

osma added this to the 1.5 milestone Jan 16, 2026

osma added 2 commits January 16, 2026 16:07

remove unnecessary TensorFlow log level adjustment

fb38ef8

remove test for TF log level setting

9681ab3

osma marked this pull request as ready for review January 16, 2026 15:54

osma changed the title ~~[WIP] Reimplement NN ensemble using PyTorch~~ Reimplement NN ensemble using PyTorch Jan 16, 2026

osma added 3 commits January 21, 2026 14:05

adjust PyTorch model to better match old Keras model

faf3de7

fix test that broke

311a29c

switch to BCELoss (requires clamping output values)

1437d43

osma added 3 commits February 9, 2026 15:59

replace NNEnsembleModel with a simpler linear model that performs well

84ebfda

change to BCEWithLogitsLoss and apply sigmoid transform for output

7c3cd14

adjust hyperparameters & print eval ndcg after each epoch

c3d44d5

github-advanced-security bot found potential problems Feb 9, 2026

View reviewed changes

annif/backend/nn_ensemble.py Fixed Show fixed Hide fixed

osma added 12 commits February 9, 2026 17:19

switch to nDCG@1000 metric; cleanup unused variable

4471080

cleanups

72698d2

implement early stopping

c047591

define constants for EVAL_BATCH_SIZE and EARLY_STOPPING_PATIENCE for …

34dde70

…clarity

use source weights to initialize model arameters

4156b17

fix type annotation and reformat

46d9494

apply log1p scaling for targets as well (improves results a lot)

f91c50b

switch back to unlimited nDCG for early stopping, instead of arbitrary …

4ef1bfe

…@1000

scale predictions wider within 0..1 range

f18011b

ensure early stopping is triggered in nn_ensemble unit tests

b127b39

mention the --group all option in README

94f1df2

increase max-epochs default to 50, since we have early stopping in place

c489082

osma added 2 commits February 12, 2026 11:49

remove print statements, log chosen best epoch

63b5b97

upgrade to torch 2.10

d0970ac

juhoinkinen approved these changes Feb 12, 2026

View reviewed changes

osma added 2 commits February 23, 2026 14:56

Merge branch 'main' into issue895-nn-ensemble-pytorch

8c02e8b

use csr_matrix when storing source predictions in LMDB to avoid massi…

211fde9

…ve serialization overhead

Conversation

osma commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

osma commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. uv sync without extras

2. uv sync with just the nn extra

3. uv sync with both nn and torch-cpu extras

4. uv sync with both nn and torch-cu128 extras

Uh oh!

osma commented Jan 16, 2026

Uh oh!

juhoinkinen commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

train

eval

Uh oh!

osma commented Jan 22, 2026

Uh oh!

Uh oh!

osma commented Feb 11, 2026

Uh oh!

juhoinkinen commented Feb 12, 2026

train

eval

Uh oh!

juhoinkinen left a comment

Choose a reason for hiding this comment

Uh oh!

osma commented Feb 12, 2026

Uh oh!

osma commented Feb 13, 2026

Uh oh!

mfakaehler commented Feb 20, 2026

Uh oh!

san-uh commented Feb 23, 2026

Settings

Results

Uh oh!

osma commented Feb 23, 2026

Uh oh!

sonarqubecloud bot commented Feb 26, 2026

Quality Gate passed

Uh oh!

osma commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

osma commented Jan 13, 2026 •

edited

Loading

codecov bot commented Jan 13, 2026 •

edited

Loading

osma commented Jan 15, 2026 •

edited

Loading

1. `uv sync` without extras

2. `uv sync` with just the `nn` extra

3. `uv sync` with both `nn` and `torch-cpu` extras

4. `uv sync` with both `nn` and `torch-cu128` extras

juhoinkinen commented Jan 22, 2026 •

edited

Loading