Sebastian Wild's Site

Powersort's Pursuit

Sun, 16 Jul 2023 00:00:00 +0200

My colleague Tony McCabe put a game implementation of merging policies together, where you can try to find the minimal-mergecost order of merges.

Below are a few example inputs to try. You have to drag&drop one run over one of its neighbors to merge them; this costs you the sum of their lengths. The goal is to merge up all runs with minimal total cost.

Connection to sorting

The game captures exactly the optimization problem that Timsort and Powersort face: The boxes are existing sorted runs in the data that we have to merge in pairs until we eventually have a single sorted run. The cost of merging is (slightly simplistically) set to the size of the output. For a stable sort, we can only merge adjacent runs.

My PyCon US 2023 Powersort Talk

Sat, 22 Apr 2023 00:00:00 +0200

I presented Powersort and its story at PyCon US 2023, the largest Python community conference. Here are some resources and impressions from the last couple of days here in Salt Lake City.

View on downtown Salt Lake City (with the conference venue!) and Utah State Capitol, from Ensign Peak.

Resources

Official Talk recording
My reupload with the antiphase audio fixed
Talk slides
(Speakerdeck seems to have an issue with my fancy transparency patterns, but you can download the pdf there.)
Colab notebook with the (educational) implementations of Timsort and Powersort
How-to for donating data to the Adaptive Sorting Benchmark

Here’s the blurb from the talk submission:

Quicksort, Timsort, Powersort - Algorithmic ideas, engineering tricks, and trivia behind CPython’s new sorting algorithm
Writing a sorting function is easy - coding a fast and reliable reference implementation less so. In this talk, I tell the story behind CPython’s latest updates of the list sort function.

Aims: entertain people with twists of history and algorithmic puzzles, which tell a lovely story of how a seemingly useless piece of theory lead to the fastest and most elegant solution of a practical challenge.

Target audience: geeks believing in the power of solid algorithmic thinking; programmers interested in engineering performance-critical code; all Python enthusiast curious about what makes (sorting lists in) Python fast.

Content: After using Quicksort for a long while, Tim Peters invented Timsort, a clever Mergesort variant, for the CPython reference implementation of Python. Timsort is both effective in Python and a popular export product: it is used in many languages and frameworks, notably OpenJDK, the Android runtime, and the V8 JavaScript engine.

Despite this success, algorithms researchers eventually pinpointed two flaws in Timsort’s underlying algorithm: The first could lead to a stack overflow in CPython (and Java); although it has meanwhile been fixed, it is curious that 10 years of widespread use didn’t bring it to surface. The second flaw is related to performance: the order in which detected sorted segments, the “runs” in the input, are merged, can be 50% more costly than necessary. Based on ideas from the little known puzzle of optimal alphabetic trees, the Powersort merge policy finds nearly optimal merging orders with negligible overhead, and is now (Python 3.11.0) part of the CPython implementation.

Impressions

PyCon US is huge; there were 2200 attendees on site, plus over 400 online participants; the industrial sponsors alone donated over $1 million (!) towards the event. (Cheers to JetBrains for bringing proper coffee to the US and giving it away for free (cappuccinos throughout the conference, yay ☕), and AWS & Superblocks for inviting the whole conference to their party 🍺).

Talk topics were very mixed and broad; some talks presented technical details on changes to the CPython implenentation (like the two talks by Mark and Brandt from the Faster CPython initiative, or the flurry of talks on PyScript, the attempts to make Python run in the browser(!)), whereas others provided an overview of an area or reported on particular projects (such as games on the micro:bit or algorithmic embroidery).

I was also deeply impressed by how much PyCon has going on in terms of community building at the conference (and outside of it), and how much personal appreciation people showed for each other. It is a remarkable achievement to not only get the technical aspects of the language right, but also the community around it. While one first needs a thing to gather around before a community can evolve, it makes me wonder whether a healthy community indeed doesn’t follow, but cause the success of Python.

I was very happy that Guido van Rossum made it to my talk. Of course, I couldn’t let the opportunity pass to interview him on sorting in CPython afterwards:

“I don’t remember the exact reasoning, but the mere fact that we used qsort for sorting lists shows you that I didn’t care much about sorting.”

Fair enough 😉
Had that been any different, who knows whether I would have had an excuse to enjoy PyCon US today?

How to contribute your inputs to the Adaptive Sorting Benchmark

Fri, 21 Apr 2023 00:00:00 +0200

If you consider contributing sorted lists from your own Python application to our benchmark for adaptive sorting, the steps below show you how to do collect this data. Note: Our instrumentation stores a list of integers with equivalent comparison-behavior to all lists sorted when running Python code through our custom CPython.

Background

The goal of the benchmark is to collect real-world data from Python applications to better understand the effectiveness of adaptive features in the list sort functions. In my PyCon US 2023 talk, I reached out to Pythonistas to contribute their sorting inputs. If sorted lists were completely random data, we would never see (significant) improvements from these, but data hardly is very random.
How much pre-sortedness is there in your use case? Let’s find out!

Step 1: Build instrumented CPython

Clone the instrumented branch of CPython; currently we have support for 3.11 or 3.10. (If we dearly need another version, drop me a line and we can add it.)

git clone https://github.com/sebawild/cpython --branch 3.11-instrumented --single-branch cpython-sorting
cd cpython-sorting

The steps below assume linux and a set up development environment; check the official instructions). For a core installation only standard C build tools are needed, plus OpenSSL headers. (On Ubuntu, you get the latter via sudo apt-get install libssl-dev).

./configure --enable-optimizations && make -j
make test 

Step 2: Set up your project

First, we create a venv (a virtual environment to keep installed package local). Inside cpython-sorting, call

./python -m venv sorting-python
source sorting-python/bin/activate

to create and activate the sorting-python venv. Now you can use pip in the usual way to install any needed packages.

Step 3: Run your application and submit `arrays.txt`

You run your application as normal: python your-awesome-script.py.

To collect the benchmark data, first delete arrays.txt (results are otherwise appended) and run your application. Then store arrays.txt and send it over, with a quick description of your application.

Afterwards arrays.txt will contain all sorted lists (and some stats). Note that even during the process of starting python, a few dozen calls to list sort are made (mostly on tiny lists); for the benchmark, we are mostly interested in big lists.

A rudimentary script to read an arrays.txt file and compute some presortedness metrics is implemented in run-information.py. Simply running python run-information.py (in the same folder) will print stats on the longest sorted list (by default). This is sufficient to check whether your application sorted substantially long list at all. If so, please send your arrays.txt to me.

Limitations

The instrumentation is a quick hack at this point, not production-ready code. It is hence best to run code via our python in a sandbox environment.

Known limitations:

The output arrays.txt is appended each time you run python and it could grow large.
Our instrumentation is not ready for multi-threading. The instrumentation may crash python in obscure scenarios such as comparison functions that modify the sorted list.

Build CPython from source and install packages

Tue, 07 Feb 2023 00:00:00 +0100

For experimenting with novel CPython features, you can quickly set up an isolated environment. This post shows you how to do that.

I did this on Ubuntu 20.04 LTS with standard build tools installed, but the same instructions probably work more generally.

Compile python

Download latest CPython sources

git clone git@github.com:python/cpython.git

Change to a stable branch instead of main (so that we don’t have to build all libraries from source); here we’re using 3.11, the latest stable branch:

git checkout 3.11

To run the build, use the following (standard) commands.

configure --enable-optimizations
make
make test

--enable-optimizations does some instrumentation first, runs a demo workload, and then compiles again using deemed best compiler options.

make test may not be necessary, but probably not a bad idea. For me, test_ssl fails, but I’ll ignore that for now.

Note: If you want several builds to compare, you need to have a full copy of the source (cpython root) folder; you can build in a subfolder, but that doesn’t change that all pythons share the Lib folder and hence only the latest compile works correctly. This seems to remain the case even with a venv that isolates the installed packages. You cannot run a Python version if you change the git checkout to a different version; the build still uses the Lib subfolder from the cpython repo.

pip bootstrap

The better option: Create a virtual environment, see below.

So far, the compilation generated a naked python executable that is just the Python interpreter. For almost anything interesting, we will have to install packages, and the most convenient way for that is pip.

Python already comes with a bootstrap module to do that (https://pip.pypa.io/en/stable/installation/):

./python -m ensurepip --upgrade

That’s it! Now you can run

./python -m pip install numpy pandas

etc. to install packages. These all get installed into the system wide folder as

./python -m pip show pandas

reveals.

Create a venv

A virtual environment is a folder with all Python needs, isolated from other installations.

./python -m venv my-python

generates a virtual environment in the subfolder

source my-python/bin/activate

make this the active venv for the current running shell. Check python --version to see if it worked.

From now on, you can use python instead of ./python and pip directly instead of ./python -m pip etc.

Moreover, a call to

python -m pip show pandas

reveals that these are now local to your project (the venv my-python really), and that is much better isolation.

Why a stable branch?

CPython is reasonably easy and quick to compile, so why not simply work with the current main branch? The main reason (no pun intended) is to easily be able to install any Python packages with pip without much hassle. For major releases (like 3.11), PyPi has precompiled “wheels” of many popular packages and so installing them does not need all their build dependencies installed and is very quick.

Since Python version jumps often affect the C API, many libraries also lag a bit behind CPython main and will not easily be usable with the development branch.

Powersort in official Python 3.11 release

Mon, 24 Oct 2022 00:00:00 +0200

Our sorting method Powersort is used as default list.sort() algorithm in CPython, the reference implementation of the Python programming language.

Join the Powersort Competition
Help us study Timsort and Powersort and win substantial prizes!

See my PyCon US talk for the full story.
Here’s the entry from the official Python changelog:

bpo-34561: List sorting now uses the merge-ordering strategy from Munro and Wild’s powersort(). Unlike the former strategy, this is provably near-optimal in the entropy of the distribution of run lengths. Most uses of list.sort() probably won’t see a significant time difference, but may see significant improvements in cases where the former strategy was exceptionally poor. However, as these are all fast linear-time approximations to a problem that’s inherently at best quadratic-time to solve truly optimally, it’s also possible to contrive cases where the former strategy did better.

The change had been included in the development version of CPython, but with the official release of Python 3.11, Powersort is now on route to be deployed to hundreds of millions of devices, on top of already being in active use in PyPy.

Update (June 2025)

Powersort has also been adopted for numpy, replacing the former Timsort implementation.

The University of Liverpool Powersort Competition is also still underway, with lots of prizes up for grabs!

Powersort is explained in my PyCon US 2023 talk (in my biased opinion in a much clearer way than in our original publication 😅); More context is given in my Efficient Algorithms module in the unit on sorting, which has an intro to adaptive sorting (34min) and then covers Powersort itself (15min).

We showed how to extend Powersort to multiway merges, looking very promising in first experiments.

Coverage

ACM TechNews (2022-12-14)
University of Liverpool News story (2022-12-12)
on LinkedIn post
TechXPlore
London Daily News

Amortized analysis of resizing-array stacks

Thu, 20 Oct 2022 00:00:00 +0200

A rigorous proof that a stack implemented with doubling arrays has constant amortized time operations; written up here since it does not seem to appear in any of the standard algorithms books.

A well-known, fundamental data structure is the implementation of a stack using resizing arrays (a.k.a. doubling arrays), where we maintain an array of $C$ items for the $n$ elements of a stack, and whenever the array becomes full, we double its size, and whenever the array becomes less that one quarter full, we halve its size. This maintains the invariant that $\frac14 C \le n \le C$.

A folklore analysis shows that this achieves constant amortized cost for all stack operations, despite the occasional expensive resizing operations.

This analysis is not a particularly hard or surprising proof by any means, but it makes a great first nontrivial example of amortized analysis, and hence I wanted to show it in my Efficient Algorithms (COMP526) lectures; see Unit 2 – Fundamental Data Structures for the full context.

The goal is to show that while any individual push/pop in a resizing-array based stack might be expensive ($\Omega(n)$ cost), any sequence of operations is necessarily much cheaper, namely $O(1)$ time per operation on average. As the dominant operation, we count array accesses, i.e., any read or write access to an array.

Part 1: Amortized costs for all operations

Basically, each operation has two types of costs for the amortized analysis: actual costs (# array accesses) and a change in potential/credits. We define the potential $\Phi = \min\lbrace n-\frac14C,\;C-n\rbrace$, and the amortized cost $a_i$ of an operation is the actual cost plus $-4$ times the change in potential. The intuition behind $\Phi$ is to measure the distance of the current filling mark $n$ from the “expensive boundaries” $\frac14C$ resp. $C$.

We have to analyze both costs separately.

Actual costs:

cheap push/pop: exactly 1 array access to write/read the topmost element.
copying push: currently there are $n$ elements on the stack, these have to be read from the old array ($n$ accesses) and written to the new array ($n$ accesses); also one more element has to be added (like in cheap push). In total that is $2n+1$ actual cost.
copying pop: actually exactly the same: there are $n$ elements on the stack, these have to be read from the old array ($n$ accesses) and written to the new array ($n$ accesses); also one element has to be read to be returned. In total that is $2n+1$ actual cost.

(One could avoid this very last extra read by not copying the element that we pop right after anyways; but typical implementations do not do this for convenience. It would clearly not save much either way.)

Credits / Potential change

The credits is the change in potential $\Phi = \min\lbrace n-\frac14C,\;C-n\rbrace$.

cheap push: $n$ gets one bigger, but $C$ is unchanged. If $C-n < n-\frac14 C$, then $\Phi$ drops by one (“we lose one credit”).
cheap pop: $n$ gets one smaller, but $C$ is unchanged. If $n-\frac14 C<C-n$, then $\Phi$ drops by one (“we lose one credit”).
copying push: We must have had $n=C$ (i.e. $\Phi_{i-1}=0$) before this push, and we will now set $C=2n$ before the push. Then, the push increments $n$. That means the new potential $\Phi_i=(n+1)-\frac14\cdot2n=\frac12n+1$. We have earned $\frac12n+1$ credits.
copying pop: We must have had $n=\frac14C$ (i.e. $\Phi_{i-1}=0$) before this push, and we will now make $C=2n$ before the pop; the pop itself then decrements $n$. So $\Phi_i=(n-1)-\frac14\cdot 2n = \frac12n-1$, and we have earned $\frac12n-1$ credits.

Adding up

Adding up actual cost and $-4(\Phi_i-\Phi_{i-1})$ shows that in each case the amortized costs are at most 5.

Part 2: From amortized to total actual costs

The second part is indeed the same for all amortized analyses: The total actual cost over a sequence of $m$ operations is essentially bounded by the sum of their amortized costs, plus initial/final potential; this is shown using a telescoping-sum argument:

\[5m \ge \sum_{i=1}^m a_i = \sum_{i=1}^m c_i - 4 \underbrace{\sum_{i=1}^m(\Phi_i - \Phi_{i-1})}_{=\Phi_m - \Phi_0}\]

Rearranging gives

\[\sum_{i=1}^m c_i \le 5m + 4\Phi_m-4\Phi_0\]

Now, we can also show using the invariant $\frac14 C \le n \le C$, i.e., $n\le C \le 4n$, that $0\le \Phi\le \frac35n$: Since $\Phi$ is piecewise linear, it suffices to consider the endpoints of the linear segments, i.e., $C = 4n$, $C = n$ and $n-\frac14 C = C-n$, i.e., $C = \frac85 n$; at these points $\Phi$ has values $0$, $0$, and $\frac35 n$, respectively.

Hence $\displaystyle\sum_{i=1}^m c_i \le 4\Phi_m -4\Phi_0 \le 5m + 2.4n \in \Theta(m+n)$.

Increase the number of recent folders in Thunderbird

Thu, 27 Jan 2022 00:00:00 +0100

Showing more than 15 recent folders in move-to and copy-to context menus is easy in Thunderbird 91.

I’m a heavy user of many IMAP folders for organizing email (and Günter Gersdorf’s brilliant Thunderbird extension Copy Sent to Current), moving emails to folders quickly is important.

Thunderbird long has remembered which folders were used most recently, offering to move or copy mails there in a separate menu, but the default number of folders shown there was a miserly 15 folders. Previously, increasing that number required a rather hidden hack, but in the latest version of Thunderbird (91 at the time of writing), it is easy:

Simply open the “Config Editor” in the preferences, and change the key mail.folder_widget.max_recent to your preferred value; 40 for me.

How to move your lecture online – in little time

Tue, 17 Mar 2020 00:00:00 +0100

I describe my solution for online lecturing amid the COVID-19 crisis using youtube livestreams and PINGO.

Although I kind of saw it coming after reading this excellent data analysis (on March 12, before things got really crazy), things did get hectic: The official decision of University of Liverpool to move all face-to-face classes online with immediate effect came on Saturday evening (March 14), with my class due on Monday, March 16. So what follows is not the well-thought out, technically sophisticated and educationally up-to-date mode of online teaching I (and you) might dream of, but it is what allowed me to deliver an (according to isolated feedback) effective online lecture with less than one day of prep time.

Where I started: Good old screencasts …

I have been recording screencasts of my lectures for COMP526 and posting them on youtube all term. The methodology for that (on Ubuntu) is basically still as described here, only with an update of my laptop (now an HP Elitebook x360) and Xournal++.

So a reasonable mic, screencasting software (SimpleScreenRecorder) and a website to post videos and lecture notes were already set up.

… and in-class formative assessments (aka clicker questions)

What I also came to like as an effective tool, is an in-class response system to quickly ask for opinions, prior knowledge, to recap definitions, and to test understanding. (I have been using PINGO for that.)

So my initial contingency plan was to record the lectures at home and upload them. But what was missing, was a way to keep the clicker questions; and – I know how these things go first hand – I was afraid that had there been no incentive for students to keep on track with watching the videos, it all too easily happens that some fall behind.

I was determined to not let that happen (quite so easily).

Going live!

My solution was to use youtube livestreams for the lectures; how to do that is explained below. That way, we (the students and myself that is) would be seeing the same screen (almost – more on that later) in real time, and I could simply continue with the clicker questions.

Youtube also has a “live chat” that offers a (limited) backchannel for students to ask questions (which quite a few did!), signal technical problems (none yet, luckily!), or give a quick “hands” on who is still following.

Quick how-to for youtube livestreams

(Here is youtube’s detailed manual on that; you want the “encoder streaming”.)

After signing into your youtube account, click on CREATE → Go Live (top right). There you pick “Stream” (the middle tab at the top). I did change the defaults, except for setting the stream latency to “ultra low”. In the top right, you can get a link that you can share with students even ahead of time.

Now, to stream your screen content (or part of it), you need an encoder. I had good experiences with SimpleScreenRecorder, and indeed you can use it for this, too. The screenshot below shows the settings I used; what goes into the “Save as” box is shown in the youtube stream settings as “Stream URL” and “Stream key”; the entry simply is <Stream URL>/<Stream key>.

Selecting AAC as audio codec is vital! The mp3 encoder (selected by default) does not work with youtube, but the error messages don’t tell you that.

Settings for streaming to youtube.

Then you click Continue and simple start your recording. The youtube stream settings site should now show your screen content (with a few seconds delay).

Clicking on “GO LIVE” (top right) starts the actual livestream.

On my (fairly new) laptop, downscaling the 4K display to 1080p and encoding as x264 did put considerable load on the machine, but I did not experience severe problems, so I did not try to play with the encoder settings at all. Your mileage may vary.

As for all youtube videos, you can configure the visibility of the stream as “unlisted”, then only students with the link can view the video, but no-one can find it through search; “public” videos appear in searches. If you choose “private”, people have to be signed in, and I did not want to force students to do that.

Phone for backchannel

During the lecture, I kept the youtube app on my Android open to see messages coming on the live chat. (This is also a great way to test if your stream works.)

Aftermath

For consistency, I split the recorded livestream into individual videos for each subsection, using the youtube studio editor, but also keep the livestream itself (as an unlisted video). (Pretend to trim the live stream, but then instead of “SAVE” click the three dots and “SAVE AS NEW”). The nicely cut videos are then linked to from my course website, e.g., here.

Cutting videos takes a few extra minutes that I did not need when locally recording in class, where I could easily start a new recording with one click. But it makes the recordings much easier to navigate and use later on.

First impressions

After using this setup for 3 hours of lectures, I am overall fairly happy. It does not cost much more preparation for me (although a bit extra time for cutting videos) and is clearly superior to only uploading videos. The livestream always had well below 10s of delay, which is totally fine for the interactive questions, and sound and video quality are excellent.

Compared to (my experiences with) video conferencing solutions Cisco WebEx, Skype, and Microsoft Teams, the stream is clearly superior in quality and stability, and the resulting recordings are essentially indistinguishable from the ones I recorded in face-to-face lectures.

A clear downside of my approach is the missing instant feedback from looking around the audience’s faces. (I usually have 30-50 students in class, so this was very doable.) I used to look around for this instant feedback very frequently – I’m lecturing facing the audience in the lecture rooms – so there is no way to replace this with the same number of PINGO questions.

Ideally, I’d like to have an additional (informal, anonymous) “quick-emotions” backchannel with buzzers for “I’m lost”, “I got it”, and “I need a break” (or so) that students could continuously push (as opposed to questions I have to trigger). So far, I have not found a service for that.

Install pdf2htmlEX on recent Ubuntu

Tue, 01 Jan 2019 00:00:00 +0100

Because of unresolved dependencies, installing pdf2htmlEX became challenging in recent Ubuntu.

Update [2024-11]

For Ubuntu 24.04, the situation seems to again have changed. While the version from pdf2htmlex.github.io still works, it does fail to convert some PDFs for me. I have not yet found a solution for this, but I will update this post when I do.

The old docker built by bwits is still available and works fine, including all the other steps described below, so for now (and again, until the team at pdf2htmlex.github.io has an updated built), the docker container is the way to go.

Update [2022-09]

Much of the complication below can now be avoided! A few developers – worthy of our collective Thanks! – revived pdf2htmlEX and ported it to new versions of poppler and fontforge. Their effort lives on pdf2htmlex.github.io and they offer various prepackaged releases, including AppImages.

pdf2htmlEX in docker

I use pdf2htmlEX to make pdfs nicely readable in the browser. pdf2htmlEX relies on a custom version of the poppler library, and support for more recent versions of poppler has not been built into it yet. Since no new maintainer has been found, people started to look for alternatives to keep using pdf2htmlEX productively, without being forced to stay on old libraries systemwide. Docker containers are a solution for precisely such use cases.

I here describe the steps that it took me to get pdf2htmlEX running on Ubuntu 18.04.1 LTS; I was fine with a certain overhead (in time and space) for running it, but I wanted direct command-line interaction on individual files. Since docker containers are isolated from the host system, this requires some extra steps.

First install docker; I used the snap version, so I ran:

snap install docker

Next, I pulled the prepackaged docker container by bwits:

sudo docker pull bwits/pdf2htmlex

For running pdf2htmlEX conveniently and (somewhat) securely, you should be able to run docker as user; this is not possible directly since docker uses Unix sockets owned by root for communicating with containers. But if you create a group docker and add yourself to it, the socket will be owned by that group instead. So:

sudo groupadd docker
sudo usermod -aG docker $USER

You probably have to reboot (log out and restart the docker daemon) before this takes effect, you can test it with docker run hello-world.

If everything worked out, we can now run pdf2htmlEX as

docker run -ti --rm -v `pwd`:/pdf bwits/pdf2htmlex pdf2htmlEX [args] file.pdf

to convert file.pdf in the current working directory. Note that the application inside the container only gets access to the the folder you map to /pdf using the -v option, i.e., in the above command the current directory.

Why DOIs Rule

Fri, 13 Jul 2018 00:00:00 +0200

DOIs (digital object identifiers) are much more than a unique id for scientific papers.

Making the style for bibliographies consistent probably ranks among the least favorite tasks of researchers who would like to disseminate their findings. Thanks to LaTeX and BibTeX, the task of citing other research is mostly reduced to curating a high-quality bib-file of references.

But that shifts the problem to getting high-quality bib entries! My experiences with publisher-provided bib-entries and services like Google Scholar were very mixed – most required manual tweaking (and double checking!) of the entries.

This should be much easier. All it needs is a well-curated data base of metadata for scientific research (maintained by those who care for consistency: the publishers!), but with a machine-readable well-defined interface to be used by some tool created by someone who understands BibTeX well (it seems, this is rather not the publishers strength …).

Luckily, both exists! DOIs (digital object identifiers) are not just an id for papers, they also serve as keys in exactly such a data base. And with doi2bib.org, there is a service that produces high-quality bib-entries from a doi.

This brings us one step closer to a system, in which the TeX source would only give the DOI and everything else is taken care of automatically (retaining the option for manual tweaking as with the bbl files of BibTeX).

Update: Nothing is infallible

I found a case where I was not happy with the result of doi2bib: For papers in Springer journals that first appear online and later in the printed journal, doi2bib mixes the two entries. Month and year of publication are set to the first online version, but volume and issue number are also filled in, so that the resulting bib entry looks as if the printed issue appeared earlier. This is confusing, I would rather use only the final printed information, and ended up manually adapting the bib files.