Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

Blob.from_string() should be able to parse all valid gcs uri #1107

@pPanda-beta

Description

@pPanda-beta

Blob.from_string is terminating parsing of object names at hash(#) character.

Test Code (minimalistic)

from google.cloud import storage
blob = storage.Blob.from_string('gs://my-bucket/my/problamatic/object#name')
print(blob.name)
# my/problamatic/object

Expected

Object name should be my/problamatic/object#name

Actual

Object name found is my/problamatic/object

Temporary Workaround

import re
from google.cloud import storage
GS_PATTERN = re.compile(r"gs://(?P<bucket_name>[^/]+)/(?P<object_name>.+)")

m = GS_PATTERN.match('gs://my-bucket/my/problamatic/object#name')
blob = storage.Bucket(name=m.group('bucket_name'), client=None).blob(m.group('object_name'))
print(blob.name)
# my/problamatic/object#name

Version

google_cloud_storage-2.10.0-py2.py3-none-any.whl

Suggestion

urlsplit is not a good utility for parsing gcs uris. I have seen problems with % char, trailing ?, ... etc with urlsplit. Most of those characters are valid in context of gcs object names.

scheme, netloc, path, query, frag = urlsplit(uri)

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the googleapis/python-storage API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions