Make musicbrainz plugin talk to musicbrainz directly#6052
Conversation
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
There was a problem hiding this comment.
Pull Request Overview
This PR replaces the musicbrainzngs library with direct HTTP API calls to the MusicBrainz API using requests and requests-ratelimiter. The goal is to provide more direct control over HTTP requests while maintaining backward compatibility.
Key changes:
- Custom
MusicBrainzAPIclass replacesmusicbrainzngsfunctionality - Updated field names to match MusicBrainz JSON API v2 format (e.g.,
medium-list→media) - Made
musicbrainzngsoptional and added it to specific plugin extras
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/plugins/test_musicbrainz.py | Updates test data to match new JSON API field names and fixes mock paths |
| pyproject.toml | Moves musicbrainzngs to optional dependencies and adds requests as required |
| docs/plugins/*.rst | Adds installation instructions for plugins requiring musicbrainzngs |
| beetsplug/musicbrainz.py | Main refactor replacing musicbrainzngs with custom HTTP implementation |
| beetsplug/lyrics.py | Updates imports to use shared HTTP utilities |
| beetsplug/_utils/requests.py | New shared HTTP utility module |
| .git-blame-ignore-revs | Adds commits to git blame ignore list |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- requests is now a required dependency but could be made optional under the musicbrainz extra to avoid imposing it on users who don't need the plugin.
- With the removal of MusicBrainzAPIError, HTTP errors will surface as raw exceptions without context; consider wrapping these to include the entity and query for clearer error messages.
- TimeoutSession registers an atexit callback each time it’s instantiated; to prevent multiple registrations you might move the close handler to module scope or guard against duplicates.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- requests is now a required dependency but could be made optional under the musicbrainz extra to avoid imposing it on users who don't need the plugin.
- With the removal of MusicBrainzAPIError, HTTP errors will surface as raw exceptions without context; consider wrapping these to include the entity and query for clearer error messages.
- TimeoutSession registers an atexit callback each time it’s instantiated; to prevent multiple registrations you might move the close handler to module scope or guard against duplicates.
## Individual Comments
### Comment 1
<location> `beetsplug/musicbrainz.py:125` </location>
<code_context>
+ def browse_recordings(self, **kwargs) -> list[JSONDict]:
+ kwargs.setdefault("limit", BROWSE_CHUNKSIZE)
+ kwargs.setdefault("inc", BROWSE_INCLUDES)
+ return self._get("recording", **kwargs)["recordings"]
+
+
</code_context>
<issue_to_address>
**suggestion:** Assumes 'recordings' key is always present in API response.
Using .get("recordings", []) will prevent KeyError if the response structure changes or an error occurs.
```suggestion
return self._get("recording", **kwargs).get("recordings", [])
```
</issue_to_address>
### Comment 2
<location> `beetsplug/musicbrainz.py:270` </location>
<code_context>
for country in preferred_countries:
- for event in release.get("release-event-list", {}):
+ for event in release.get("release-events", {}):
try:
- if country in event["area"]["iso-3166-1-code-list"]:
</code_context>
<issue_to_address>
**issue (bug_risk):** Iterating over release-events with a default of {} may cause issues.
Use [] as the default value to ensure safe iteration and prevent TypeError if 'release-events' is missing.
</issue_to_address>
### Comment 3
<location> `beetsplug/musicbrainz.py:523-526` </location>
<code_context>
- for medium in release["medium-list"]:
- for recording in medium["track-list"]:
+ for medium in release["media"]:
+ for recording in medium["tracks"]:
recording_info = track_map[recording["recording"]["id"]]
recording["recording"] = recording_info
</code_context>
<issue_to_address>
**suggestion:** Assumes all 'tracks' entries have a 'recording' key.
Using recording['recording'] without checking may cause a KeyError if the key is missing. Use recording.get('recording') and handle cases where it is absent.
```suggestion
for medium in release["media"]:
for recording in medium["tracks"]:
rec = recording.get("recording")
if rec is not None and "id" in rec:
recording_info = track_map[rec["id"]]
recording["recording"] = recording_info
```
</issue_to_address>
### Comment 4
<location> `beetsplug/musicbrainz.py:716-719` </location>
<code_context>
for source, url in product(wanted_sources, url_rels):
- if f"{source}.com" in (target := url["target"]):
+ if f"{source}.com" in (target := url["url"]["resource"]):
urls[source] = target
self._log.debug(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Assumes 'url' and 'resource' keys are always present in streaming relations.
Using .get() for 'url' and 'resource' can prevent KeyErrors if these keys are missing.
```suggestion
for source, url in product(wanted_sources, url_rels):
target = url.get("url", {}).get("resource")
if target and f"{source}.com" in target:
urls[source] = target
self._log.debug(
```
</issue_to_address>
### Comment 5
<location> `beetsplug/_utils/requests.py:16-18` </location>
<code_context>
+class TimeoutSession(requests.Session):
+ def __init__(self, *args, **kwargs) -> None:
+ super().__init__(*args, **kwargs)
+ beets_version = importlib.metadata.version("beets")
+ self.headers["User-Agent"] = f"beets/{beets_version} https://beets.io/"
+
</code_context>
<issue_to_address>
**suggestion:** Using importlib.metadata.version may raise PackageNotFoundError if 'beets' is not installed as a package.
If 'beets' might not be installed as a package, handle PackageNotFoundError and set a default version string.
```suggestion
import importlib.metadata
class TimeoutSession(requests.Session):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
try:
beets_version = importlib.metadata.version("beets")
except importlib.metadata.PackageNotFoundError:
beets_version = "unknown"
self.headers["User-Agent"] = f"beets/{beets_version} https://beets.io/"
```
</issue_to_address>
### Comment 6
<location> `beetsplug/musicbrainz.py:652-655` </location>
<code_context>
if "secondary-types" in release["release-group"]:
if release["release-group"]["secondary-types"]:
for sec_type in release["release-group"]["secondary-types"]:
albumtypes.append(sec_type.lower())
</code_context>
<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/merge-nested-ifs))
```suggestion
if "secondary-types" in release["release-group"] and release["release-group"]["secondary-types"]:
for sec_type in release["release-group"]["secondary-types"]:
albumtypes.append(sec_type.lower())
```
<br/><details><summary>Explanation</summary>Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.
Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>
### Comment 7
<location> `beetsplug/musicbrainz.py:177-180` </location>
<code_context>
def _multi_artist_credit(
credit: list[JSONDict], include_join_phrase: bool
) -> tuple[list[str], list[str], list[str]]:
"""Given a list representing an ``artist-credit`` block, accumulate
data into a triple of joined artist name lists: canonical, sort, and
credit.
"""
artist_parts = []
artist_sort_parts = []
artist_credit_parts = []
for el in credit:
alias = _preferred_alias(el["artist"].get("aliases", ()))
# An artist.
if alias:
cur_artist_name = alias["name"]
else:
cur_artist_name = el["artist"]["name"]
artist_parts.append(cur_artist_name)
# Artist sort name.
if alias:
artist_sort_parts.append(alias["sort-name"])
elif "sort-name" in el["artist"]:
artist_sort_parts.append(el["artist"]["sort-name"])
else:
artist_sort_parts.append(cur_artist_name)
# Artist credit.
if "name" in el:
artist_credit_parts.append(el["name"])
else:
artist_credit_parts.append(cur_artist_name)
if include_join_phrase and (joinphrase := el.get("joinphrase")):
artist_parts.append(joinphrase)
artist_sort_parts.append(joinphrase)
artist_credit_parts.append(joinphrase)
return (
artist_parts,
artist_sort_parts,
artist_credit_parts,
)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
cur_artist_name = alias["name"] if alias else el["artist"]["name"]
```
</issue_to_address>
### Comment 8
<location> `beetsplug/musicbrainz.py:271-275` </location>
<code_context>
def _preferred_release_event(
release: dict[str, Any],
) -> tuple[str | None, str | None]:
"""Given a release, select and return the user's preferred release
event as a tuple of (country, release_date). Fall back to the
default release event if a preferred event is not found.
"""
preferred_countries: Sequence[str] = config["match"]["preferred"][
"countries"
].as_str_seq()
for country in preferred_countries:
for event in release.get("release-events", {}):
try:
if country in event["area"]["iso-3166-1-codes"]:
return country, event["date"]
except KeyError:
pass
return release.get("country"), release.get("date")
</code_context>
<issue_to_address>
**issue (code-quality):** Use `contextlib`'s `suppress` method to silence an error ([`use-contextlib-suppress`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-contextlib-suppress/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6052 +/- ##
==========================================
+ Coverage 68.03% 68.36% +0.32%
==========================================
Files 137 138 +1
Lines 18708 18773 +65
Branches 3163 3172 +9
==========================================
+ Hits 12728 12834 +106
+ Misses 5309 5263 -46
- Partials 671 676 +5
🚀 New features to boost your workflow:
|
b691012 to
e11c7f3
Compare
|
I think there's also potential to be able to use the new request code here and refactor the Discogs plugin to make direct API calls as well, since we end up fetching the whole release object and parsing it like a dictionary anyway. |
|
It may replace #5976 implementation over |
semohr
left a comment
There was a problem hiding this comment.
Not sure if you already wanted a review here. Added some comments but limited myself to the requests logic 🙃
Having basic requests logic unified across plugins could reduce quite some boilerplate. Not sure if we want to go this route tho as we might loose some flexibility.
f30d96e to
e52b7eb
Compare
I would think so, given that this requires no external dependencies and is simpler. |
semohr
left a comment
There was a problem hiding this comment.
I had a look at the requests layer indepth and think it is quite the nice addition!
For the musicbrainz changes I can't say too much about them, as I haven't spend much time in the plugin beforehand. Nonetheless, the changes seem good to me 👍
We recently had a PR (#5888) adding a pseudo release musicbrainz plugin. Could you have a look and evaluate if there are any changes that might break this proposed new plugin?
3f5318e to
9aee0b1
Compare
7e36ef5 to
cd3ad3d
Compare
437d363 to
001dddc
Compare
|
I tested these adjustments by reimporting all $ beet -vv import -LI data_source:musicbrainz --plugins=musicbrainz,importreplace (and -s)I found a couple of issues and fixed them, see the diff:
|
|
That retry on connection might really help cut down on currently the persistent Musicbrainz unreachable error! Nice! |
Introduce a new RequestHandler base class to introduce a shared session, centralize HTTP request management and error handling across plugins. Key changes: - Add RequestHandler base class with a shared/cached session - Convert TimeoutSession to use SingletonMeta for proper resource management - Create LyricsRequestHandler subclass with lyrics-specific error handling - Update MusicBrainzAPI to inherit from RequestHandler
8054f2d to
951181d
Compare
|
Have also added a test for the retry behaviour. Pretty sure it's ready now, merging! |
See this line in https://musicbrainz.org/doc/MusicBrainz_API#Lookups > To include more than one subquery in a single request, separate the arguments to inc= with a + (plus sign), like inc=recordings+labels.
951181d to
5785ce3
Compare
|
Hi! I'm getting this error with the parentwork plugin: It looks like user-agent information was removed here: https://github.com/beetbox/beets/pull/6052/files#diff-3420b9b69bb3555786b2100c434079c83f948e7f1dc796869f4945c33577d3baL87 . Should this now be set elsewhere, or should the parentwork (and presumably other) plugins be refactored to use a similar approach as to here? |
|
Looks like we didn't catch that side effect! I think it'd be nice to get that plugin & others either using our new request system or fold it into the Musicbrainz plugin. In the interim we can probably patch in a user agent? |
|
I might be able to do some work on |
|
Oops, apologies! I'm currently working on migrating the rest of plugins away from |
|
Great! I can test anything out when you have it! |
|
It looks like the I assume this is something you're working on fixing with migrating the rest of the plugins, right? |
> This PR add support for aliases to releases, release-groups and recordings. > This PR is a must have (IMO at least) for people that listen to Japanese, Chinese and other songs that has other symbols for letters. With this, not only the artist name will use the alias if available, but now the album and track name will also use aliases. Follow up from #5277 based on the recent new mechanism to query musicbrainz (#6052). ## API examples - Aliases for releases, recordings and release groups: http://musicbrainz.org/ws/2/release/59211ea4-ffd2-4ad9-9a4e-941d3148024a?inc=recordings+aliases+release-groups&fmt=json - Aliases for recordings: https://musicbrainz.org/ws/2/recording/b9ad642e-b012-41c7-b72a-42cf4911f9ff?inc=aliases&fmt=json
This PR refactors the MusicBrainz plugin implementation by replacing the
musicbrainzngslibrary with direct HTTP API calls usingrequestsandrequests-ratelimiter.Key Changes:
beetsplug/_utils/requests.pywithTimeoutSessionclass and HTTP error handling (HTTPNotFoundError,CaptchaError)musicbrainzngsdependency with customMusicBrainzAPIclass using direct HTTP requestsrequests-ratelimiterfor API rate limiting instead ofmusicbrainzngs.set_rate_limit()medium-list→media,track-list→tracks)musicbrainzngsoptional and added it to plugin-specific extras (listenbrainz,mbcollection,missing,parentwork). Updated plugin docs accordingly.requestsa required dependency to ensure backwards compatibility (ideally, we would make it an optional dependency undermusicbrainzextra).MusicBrainzAPIErrorwrapper classBenefits:
The changes maintain backward compatibility while modernizing the underlying implementation.
Fixes #5553
Fixes #5095