Skip to content

Make musicbrainz plugin talk to musicbrainz directly#6052

Merged
snejus merged 15 commits intomasterfrom
use-musicbrainz-api-directly
Dec 21, 2025
Merged

Make musicbrainz plugin talk to musicbrainz directly#6052
snejus merged 15 commits intomasterfrom
use-musicbrainz-api-directly

Conversation

@snejus
Copy link
Member

@snejus snejus commented Sep 29, 2025

This PR refactors the MusicBrainz plugin implementation by replacing the musicbrainzngs library with direct HTTP API calls using requests and requests-ratelimiter.

Key Changes:

  • New utilities module: Added beetsplug/_utils/requests.py with TimeoutSession class and HTTP error handling (HTTPNotFoundError, CaptchaError)
  • MusicBrainz API rewrite: Replaced musicbrainzngs dependency with custom MusicBrainzAPI class using direct HTTP requests
  • Rate limiting: Integrated requests-ratelimiter for API rate limiting instead of musicbrainzngs.set_rate_limit()
  • Data structure updates: Updated field names to match MusicBrainz JSON API v2 format (e.g., medium-listmedia, track-listtracks)
  • Dependency management:
    • Made musicbrainzngs optional and added it to plugin-specific extras (listenbrainz, mbcollection, missing, parentwork). Updated plugin docs accordingly.
    • Made requests a required dependency to ensure backwards compatibility (ideally, we would make it an optional dependency under musicbrainz extra).
  • Error handling: Simplified error handling by removing MusicBrainzAPIError wrapper class

Benefits:

  • Direct control over HTTP requests
  • Consistent rate limiting across all network requests
  • Better alignment with modern MusicBrainz API responses

The changes maintain backward compatibility while modernizing the underlying implementation.

Fixes #5553
Fixes #5095

@snejus snejus requested a review from a team as a code owner September 29, 2025 00:01
@snejus snejus requested review from JOJ0, Copilot and semohr September 29, 2025 00:01
@github-actions
Copy link

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the musicbrainzngs library with direct HTTP API calls to the MusicBrainz API using requests and requests-ratelimiter. The goal is to provide more direct control over HTTP requests while maintaining backward compatibility.

Key changes:

  • Custom MusicBrainzAPI class replaces musicbrainzngs functionality
  • Updated field names to match MusicBrainz JSON API v2 format (e.g., medium-listmedia)
  • Made musicbrainzngs optional and added it to specific plugin extras

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/plugins/test_musicbrainz.py Updates test data to match new JSON API field names and fixes mock paths
pyproject.toml Moves musicbrainzngs to optional dependencies and adds requests as required
docs/plugins/*.rst Adds installation instructions for plugins requiring musicbrainzngs
beetsplug/musicbrainz.py Main refactor replacing musicbrainzngs with custom HTTP implementation
beetsplug/lyrics.py Updates imports to use shared HTTP utilities
beetsplug/_utils/requests.py New shared HTTP utility module
.git-blame-ignore-revs Adds commits to git blame ignore list

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • requests is now a required dependency but could be made optional under the musicbrainz extra to avoid imposing it on users who don't need the plugin.
  • With the removal of MusicBrainzAPIError, HTTP errors will surface as raw exceptions without context; consider wrapping these to include the entity and query for clearer error messages.
  • TimeoutSession registers an atexit callback each time it’s instantiated; to prevent multiple registrations you might move the close handler to module scope or guard against duplicates.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- requests is now a required dependency but could be made optional under the musicbrainz extra to avoid imposing it on users who don't need the plugin.
- With the removal of MusicBrainzAPIError, HTTP errors will surface as raw exceptions without context; consider wrapping these to include the entity and query for clearer error messages.
- TimeoutSession registers an atexit callback each time it’s instantiated; to prevent multiple registrations you might move the close handler to module scope or guard against duplicates.

## Individual Comments

### Comment 1
<location> `beetsplug/musicbrainz.py:125` </location>
<code_context>
+    def browse_recordings(self, **kwargs) -> list[JSONDict]:
+        kwargs.setdefault("limit", BROWSE_CHUNKSIZE)
+        kwargs.setdefault("inc", BROWSE_INCLUDES)
+        return self._get("recording", **kwargs)["recordings"]
+
+
</code_context>

<issue_to_address>
**suggestion:** Assumes 'recordings' key is always present in API response.

Using .get("recordings", []) will prevent KeyError if the response structure changes or an error occurs.

```suggestion
        return self._get("recording", **kwargs).get("recordings", [])
```
</issue_to_address>

### Comment 2
<location> `beetsplug/musicbrainz.py:270` </location>
<code_context>

     for country in preferred_countries:
-        for event in release.get("release-event-list", {}):
+        for event in release.get("release-events", {}):
             try:
-                if country in event["area"]["iso-3166-1-code-list"]:
</code_context>

<issue_to_address>
**issue (bug_risk):** Iterating over release-events with a default of {} may cause issues.

Use [] as the default value to ensure safe iteration and prevent TypeError if 'release-events' is missing.
</issue_to_address>

### Comment 3
<location> `beetsplug/musicbrainz.py:523-526` </location>
<code_context>
-            for medium in release["medium-list"]:
-                for recording in medium["track-list"]:
+            for medium in release["media"]:
+                for recording in medium["tracks"]:
                     recording_info = track_map[recording["recording"]["id"]]
                     recording["recording"] = recording_info
</code_context>

<issue_to_address>
**suggestion:** Assumes all 'tracks' entries have a 'recording' key.

Using recording['recording'] without checking may cause a KeyError if the key is missing. Use recording.get('recording') and handle cases where it is absent.

```suggestion
            for medium in release["media"]:
                for recording in medium["tracks"]:
                    rec = recording.get("recording")
                    if rec is not None and "id" in rec:
                        recording_info = track_map[rec["id"]]
                        recording["recording"] = recording_info
```
</issue_to_address>

### Comment 4
<location> `beetsplug/musicbrainz.py:716-719` </location>
<code_context>

             for source, url in product(wanted_sources, url_rels):
-                if f"{source}.com" in (target := url["target"]):
+                if f"{source}.com" in (target := url["url"]["resource"]):
                     urls[source] = target
                     self._log.debug(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Assumes 'url' and 'resource' keys are always present in streaming relations.

Using .get() for 'url' and 'resource' can prevent KeyErrors if these keys are missing.

```suggestion
            for source, url in product(wanted_sources, url_rels):
                target = url.get("url", {}).get("resource")
                if target and f"{source}.com" in target:
                    urls[source] = target
                    self._log.debug(
```
</issue_to_address>

### Comment 5
<location> `beetsplug/_utils/requests.py:16-18` </location>
<code_context>
+class TimeoutSession(requests.Session):
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        beets_version = importlib.metadata.version("beets")
+        self.headers["User-Agent"] = f"beets/{beets_version} https://beets.io/"
+
</code_context>

<issue_to_address>
**suggestion:** Using importlib.metadata.version may raise PackageNotFoundError if 'beets' is not installed as a package.

If 'beets' might not be installed as a package, handle PackageNotFoundError and set a default version string.

```suggestion
import importlib.metadata

class TimeoutSession(requests.Session):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        try:
            beets_version = importlib.metadata.version("beets")
        except importlib.metadata.PackageNotFoundError:
            beets_version = "unknown"
        self.headers["User-Agent"] = f"beets/{beets_version} https://beets.io/"
```
</issue_to_address>

### Comment 6
<location> `beetsplug/musicbrainz.py:652-655` </location>
<code_context>
        if "secondary-types" in release["release-group"]:
            if release["release-group"]["secondary-types"]:
                for sec_type in release["release-group"]["secondary-types"]:
                    albumtypes.append(sec_type.lower())

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/merge-nested-ifs))

```suggestion
        if "secondary-types" in release["release-group"] and release["release-group"]["secondary-types"]:
            for sec_type in release["release-group"]["secondary-types"]:
                albumtypes.append(sec_type.lower())

```

<br/><details><summary>Explanation</summary>Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>

### Comment 7
<location> `beetsplug/musicbrainz.py:177-180` </location>
<code_context>
def _multi_artist_credit(
    credit: list[JSONDict], include_join_phrase: bool
) -> tuple[list[str], list[str], list[str]]:
    """Given a list representing an ``artist-credit`` block, accumulate
    data into a triple of joined artist name lists: canonical, sort, and
    credit.
    """
    artist_parts = []
    artist_sort_parts = []
    artist_credit_parts = []
    for el in credit:
        alias = _preferred_alias(el["artist"].get("aliases", ()))

        # An artist.
        if alias:
            cur_artist_name = alias["name"]
        else:
            cur_artist_name = el["artist"]["name"]
        artist_parts.append(cur_artist_name)

        # Artist sort name.
        if alias:
            artist_sort_parts.append(alias["sort-name"])
        elif "sort-name" in el["artist"]:
            artist_sort_parts.append(el["artist"]["sort-name"])
        else:
            artist_sort_parts.append(cur_artist_name)

        # Artist credit.
        if "name" in el:
            artist_credit_parts.append(el["name"])
        else:
            artist_credit_parts.append(cur_artist_name)

        if include_join_phrase and (joinphrase := el.get("joinphrase")):
            artist_parts.append(joinphrase)
            artist_sort_parts.append(joinphrase)
            artist_credit_parts.append(joinphrase)

    return (
        artist_parts,
        artist_sort_parts,
        artist_credit_parts,
    )

</code_context>

<issue_to_address>
**suggestion (code-quality):** Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        cur_artist_name = alias["name"] if alias else el["artist"]["name"]
```
</issue_to_address>

### Comment 8
<location> `beetsplug/musicbrainz.py:271-275` </location>
<code_context>
def _preferred_release_event(
    release: dict[str, Any],
) -> tuple[str | None, str | None]:
    """Given a release, select and return the user's preferred release
    event as a tuple of (country, release_date). Fall back to the
    default release event if a preferred event is not found.
    """
    preferred_countries: Sequence[str] = config["match"]["preferred"][
        "countries"
    ].as_str_seq()

    for country in preferred_countries:
        for event in release.get("release-events", {}):
            try:
                if country in event["area"]["iso-3166-1-codes"]:
                    return country, event["date"]
            except KeyError:
                pass

    return release.get("country"), release.get("date")

</code_context>

<issue_to_address>
**issue (code-quality):** Use `contextlib`'s `suppress` method to silence an error ([`use-contextlib-suppress`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-contextlib-suppress/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Sep 29, 2025

Codecov Report

❌ Patch coverage is 84.34783% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.36%. Comparing base (ac0b6ec) to head (5785ce3).
⚠️ Report is 168 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beetsplug/musicbrainz.py 76.61% 21 Missing and 8 partials ⚠️
beetsplug/lyrics.py 83.33% 4 Missing and 1 partial ⚠️
beetsplug/_utils/requests.py 98.50% 0 Missing and 1 partial ⚠️
beetsplug/mbpseudo.py 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6052      +/-   ##
==========================================
+ Coverage   68.03%   68.36%   +0.32%     
==========================================
  Files         137      138       +1     
  Lines       18708    18773      +65     
  Branches     3163     3172       +9     
==========================================
+ Hits        12728    12834     +106     
+ Misses       5309     5263      -46     
- Partials      671      676       +5     
Files with missing lines Coverage Δ
beetsplug/_utils/requests.py 98.50% <98.50%> (ø)
beetsplug/mbpseudo.py 80.00% <88.88%> (+0.37%) ⬆️
beetsplug/lyrics.py 85.12% <83.33%> (+6.82%) ⬆️
beetsplug/musicbrainz.py 81.15% <76.61%> (+1.98%) ⬆️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@snejus snejus mentioned this pull request Sep 29, 2025
@snejus snejus force-pushed the use-musicbrainz-api-directly branch 2 times, most recently from b691012 to e11c7f3 Compare September 29, 2025 10:38
@henry-oberholtzer
Copy link
Member

I think there's also potential to be able to use the new request code here and refactor the Discogs plugin to make direct API calls as well, since we end up fetching the whole release object and parsing it like a dictionary anyway.

@Morikko
Copy link
Contributor

Morikko commented Sep 29, 2025

It may replace #5976 implementation over mbzero.

Copy link
Contributor

@semohr semohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you already wanted a review here. Added some comments but limited myself to the requests logic 🙃

Having basic requests logic unified across plugins could reduce quite some boilerplate. Not sure if we want to go this route tho as we might loose some flexibility.

@Louson
Copy link

Louson commented Oct 8, 2025

It may replace #5976 implementation over mbzero.

Peharps this can be discussed in the original thread #4660

EDIT: I have updated the thread to resume the situation

@snejus snejus force-pushed the use-musicbrainz-api-directly branch 2 times, most recently from f30d96e to e52b7eb Compare October 20, 2025 07:14
@snejus
Copy link
Member Author

snejus commented Oct 20, 2025

It may replace #5976 implementation over mbzero.

I would think so, given that this requires no external dependencies and is simpler.

Copy link
Contributor

@semohr semohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the requests layer indepth and think it is quite the nice addition!

For the musicbrainz changes I can't say too much about them, as I haven't spend much time in the plugin beforehand. Nonetheless, the changes seem good to me 👍

We recently had a PR (#5888) adding a pseudo release musicbrainz plugin. Could you have a look and evaluate if there are any changes that might break this proposed new plugin?

@snejus snejus force-pushed the use-musicbrainz-api-directly branch 2 times, most recently from 3f5318e to 9aee0b1 Compare October 20, 2025 20:55
@snejus snejus force-pushed the use-musicbrainz-api-directly branch 2 times, most recently from 7e36ef5 to cd3ad3d Compare December 19, 2025 13:27
@snejus snejus force-pushed the use-musicbrainz-api-directly branch from 437d363 to 001dddc Compare December 20, 2025 01:37
@snejus
Copy link
Member Author

snejus commented Dec 20, 2025

I tested these adjustments by reimporting all musicbrainz tracks/albums in my library

$ beet -vv import -LI data_source:musicbrainz --plugins=musicbrainz,importreplace (and -s)

I found a couple of issues and fixed them, see the diff:

  • Added retry on connection errors (3 attempts, 1s, 2s and 4s later)
  • Fixed inc parameter format
  • Fixed URLs parsing

@henry-oberholtzer
Copy link
Member

That retry on connection might really help cut down on currently the persistent Musicbrainz unreachable error! Nice!

Introduce a new RequestHandler base class to introduce a shared session,
centralize HTTP request management and error handling across plugins.

Key changes:
- Add RequestHandler base class with a shared/cached session
- Convert TimeoutSession to use SingletonMeta for proper resource
  management
- Create LyricsRequestHandler subclass with lyrics-specific error
  handling
- Update MusicBrainzAPI to inherit from RequestHandler
@snejus snejus force-pushed the use-musicbrainz-api-directly branch 3 times, most recently from 8054f2d to 951181d Compare December 21, 2025 00:55
@snejus
Copy link
Member Author

snejus commented Dec 21, 2025

Have also added a test for the retry behaviour. Pretty sure it's ready now, merging!

See this line in https://musicbrainz.org/doc/MusicBrainz_API#Lookups

> To include more than one subquery in a single request, separate the arguments to inc= with a + (plus sign), like inc=recordings+labels.
@snejus snejus force-pushed the use-musicbrainz-api-directly branch from 951181d to 5785ce3 Compare December 21, 2025 01:03
@snejus snejus enabled auto-merge December 21, 2025 01:07
@snejus snejus merged commit c1904b1 into master Dec 21, 2025
19 checks passed
@snejus snejus deleted the use-musicbrainz-api-directly branch December 21, 2025 01:08
@aereaux
Copy link
Contributor

aereaux commented Dec 21, 2025

Hi! I'm getting this error with the parentwork plugin:

$beet parentwork -f
parentwork: No work for 10 String Symphony - 10 String Symphony - Prettiest Girl, add one at https://musicbrainz.org/recording/9cebf2bd-8fef-432a-b360-23d70ab03008
Traceback (most recent call last):
  File "/home/user/.local/bin/beet", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/user/.local/lib/beets/beets/ui/__init__.py", line 1631, in main
    _raw_main(args)
    ~~~~~~~~~^^^^^^
  File "/home/user/.local/lib/beets/beets/ui/__init__.py", line 1610, in _raw_main
    subcommand.func(lib, suboptions, subargs)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/beets/beetsplug/parentwork.py", line 92, in func
    changed = self.find_work(item, force_parent, verbose=True)
  File "/home/user/.local/lib/beets/beetsplug/parentwork.py", line 193, in find_work
    work_info, work_date = find_parentwork_info(item.mb_workid)
                           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/beets/beetsplug/parentwork.py", line 64, in find_parentwork_info
    parent_id, work_date = work_parent_id(mb_workid)
                           ~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/user/.local/lib/beets/beetsplug/parentwork.py", line 53, in work_parent_id
    new_mb_workid, work_date = direct_parent_id(mb_workid, work_date)
                               ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/beets/beetsplug/parentwork.py", line 29, in direct_parent_id
    work_info = musicbrainzngs.get_work_by_id(
        mb_workid, includes=["work-rels", "artist-rels"]
    )
  File "/usr/lib/python3.13/site-packages/musicbrainzngs/musicbrainz.py", line 904, in get_work_by_id
    return _do_mb_query("work", id, includes)
  File "/usr/lib/python3.13/site-packages/musicbrainzngs/musicbrainz.py", line 728, in _do_mb_query
    return _mb_request(path, 'GET', auth_required, args=args)
  File "/usr/lib/python3.13/site-packages/musicbrainzngs/musicbrainz.py", line 417, in __call__
    return self.fun(*args, **kwargs)
           ~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/musicbrainzngs/musicbrainz.py", line 624, in _mb_request
    raise UsageError("set a proper user-agent with "
                     "set_useragent(\"application name\", \"application version\", \"contact info (preferably URL or email for your application)\")")
musicbrainzngs.musicbrainz.UsageError: set a proper user-agent with set_useragent("application name", "application version", "contact info (preferably URL or email for your application)")

It looks like user-agent information was removed here: https://github.com/beetbox/beets/pull/6052/files#diff-3420b9b69bb3555786b2100c434079c83f948e7f1dc796869f4945c33577d3baL87 . Should this now be set elsewhere, or should the parentwork (and presumably other) plugins be refactored to use a similar approach as to here?

@henry-oberholtzer
Copy link
Member

henry-oberholtzer commented Dec 21, 2025

Looks like we didn't catch that side effect!

I think it'd be nice to get that plugin & others either using our new request system or fold it into the Musicbrainz plugin. In the interim we can probably patch in a user agent?

@aereaux
Copy link
Contributor

aereaux commented Dec 21, 2025

I might be able to do some work on parentwork in the next couple weeks. Or would the better plan be to include it directly in the musicbrainz plugin?

@snejus
Copy link
Member Author

snejus commented Dec 21, 2025

Oops, apologies!

I'm currently working on migrating the rest of plugins away from musicbrainzngs so I will take care of this :)

@aereaux
Copy link
Contributor

aereaux commented Dec 21, 2025

Great! I can test anything out when you have it!

@aereaux
Copy link
Contributor

aereaux commented Dec 28, 2025

It looks like the mbcollection plugin fails as well with this change:

$ beet mbupdate
error: MusicBrainz credentials missing

I assume this is something you're working on fixing with migrating the rest of the plugins, right?

snejus added a commit that referenced this pull request Mar 10, 2026
> This PR add support for aliases to releases, release-groups and
recordings.

> This PR is a must have (IMO at least) for people that listen to
Japanese, Chinese and other songs that has other symbols for letters.
With this, not only the artist name will use the alias if available, but
now the album and track name will also use aliases.

Follow up from #5277 based on the
recent new mechanism to query musicbrainz
(#6052).

## API examples

- Aliases for releases, recordings and release groups:
http://musicbrainz.org/ws/2/release/59211ea4-ffd2-4ad9-9a4e-941d3148024a?inc=recordings+aliases+release-groups&fmt=json
- Aliases for recordings:
https://musicbrainz.org/ws/2/recording/b9ad642e-b012-41c7-b72a-42cf4911f9ff?inc=aliases&fmt=json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MusicBrainz connection network error handling differences between beets version 1.6 and 2+ urlopen error [Errno 12] Cannot allocate memory

9 participants