In Vector Search 2.0, Collections store data as individual JSON objects called Data Objects. This page describes the validation rules a Data Object must satisfy, and how to create, read, update, import, export, and delete Data Objects either individually or in batch.
Data validation
Every Data Object that Vector Search 2.0 ingests is checked against a fixed
set of rules. If any rule fails for a record, that record is rejected and
emitted to the error sink with code = INVALID_ARGUMENT; later checks do
not run for that record. To avoid the "fix one error, re-ingest, hit the
next error" loop, validate your dataset against all of the data
validation rules before kicking off an ingestion or index build.
The pipeline applies the validations in this order:
| Stage | What it checks |
|---|---|
| 1. Parsing | JSON is well-formed; required top-level fields exist; field types are correct |
| 2. ID validations | The id field is present and matches the documented ID format |
| 3. Data field schema | The data payload conforms to the JSON Schema declared in the Collection's CollectionConfig |
| 4. Embedding validations | Each dense/sparse vector matches the Collection's vector schema (name, type, dimension, finite values, sparse parity and uniqueness) |
| 5. Searchable-fields population | Fields marked searchable in the schema can be successfully extracted from data |
1. Parsing validations
Parsing validations run first, while turning each input line into an internal Data Object.
They apply to both supported JSON shapes, the default format (with a
top-level vectors/data object) and the v1 format (with embedding,
sparse_embedding, restricts, or numeric_restricts). The format is
auto-detected per record.
- JSON must be parseable. Each line must parse as a JSON object. A line
whose top-level keys match neither the default nor v1 format is rejected
with
Unknown JSON format for string: <line>. idis required. Every record must contain a non-nullid. Otherwise:'id' field is missing or null.- Embeddings must be present (v1 format only). A v1-format record must
contain at least one of
embeddingorsparse_embedding. Otherwise:'embedding' or 'sparse_embedding' fields are missing. - Dense embedding type checks. The dense embedding field (
embeddingin v1, or any array value undervectorsin the default format) must be a JSON array of numbers. A value that cannot be coerced to afloatis rejected with'<field>' field contains non-float values. - Sparse embedding structure checks. For every sparse vector:
- Must be a JSON object.
- Must contain both arrays:
values(floats) andindices(longs); in v1 these arevaluesanddimensions. valuesmust not be empty.- Indexes must be non-negative.
values.lengthmust equalindices.length(or v1dimensions.length).
datafield type. If present,datamust be a JSON object, not an array, string, or scalar. Otherwise:'data' field is not a JSON object.numeric_restrictsshape (v1 format).numeric_restrictsmust be a JSON array of objects. Each entry must have a stringnamespaceand must set exactly one ofvalue_int,value_float, orvalue_double.
Most "first failure" issues come from this stage. Common errors include a stringified
number in an embedding array, a missing id, or values/indices of
different lengths.
2. ID validations
Data Object IDs must comply with RFC 1035. In practice this means:
- 1-63 characters long.
- Lowercase letters, digits, and hyphens (
-) only. No uppercase, no underscores, no spaces, no symbols, no Unicode. - Must start with a lowercase letter (
a-z). - Must end with a lowercase letter or a digit (no trailing
-).
Regular expression: [a-z]([-a-z0-9]{0,61}[a-z0-9])?
The following table shows common examples:
| ID | Valid? | Why |
|---|---|---|
doc-123 |
Yes | Starts with a letter, only lowercase + digits + hyphen, ends with a digit |
a |
Yes | A single lowercase letter is the minimum |
product-sku-42 |
Yes | All rules satisfied |
Doc-123 |
No | Uppercase D not allowed |
123-doc |
No | Must start with a letter, not a digit |
doc_123 |
No | Underscore not allowed |
doc-123- |
No | Cannot end with a hyphen |
my doc |
No | Spaces not allowed |
| 64+ characters | No | Maximum length is 63 |
This shape is the DNS label rule (the part between dots in a hostname like
my-service.example.com). It is reused here so that IDs round-trip safely
through URLs, filenames, logs, and CLIs without escaping.
3. Data field validations (JSON Schema)
The data field validation stage runs only if the Collection has a dataSchema declared in its
CollectionConfig. If no schema is configured, this stage is skipped.
- Schema compliance. The Data Object's
datapayload is serialized to JSON and validated against the configured JSON Schema (Draft 7). The validator reports one error per schema violation, so a record with three bad fields produces three error messages.- Message:
DataObject with id <id> failed schema validation: <error>.
- Message:
- Schema processing errors. If the schema validator itself throws (for
example, unsupported draft features), the record is rejected with
DataObject with id <id> failed schema validation processing: <exception>.
4. Embedding field validations
The embedding field validation stage iterates first over dense vectors and then sparse vectors. A single shared set of seen vector names spans both lists, so a name cannot be used twice -- even across the dense and sparse boundary.
Shared rules (apply to both dense and sparse)
| Rule | Why it matters | Error message |
|---|---|---|
| No duplicate vector names across dense and sparse for the same Data Object | Two entries with the same vector name would target the same storage key, producing undefined last-write-wins behavior | ... has duplicate embedding field '<name>' across its dense/sparse vectors; each vector name must appear at most once |
Vector name must be declared in the CollectionConfig vector schema |
An unknown vector name cannot be routed to a column | ... has dense/sparse embedding field '<name>' but this field is not defined in CollectionConfig vector schema |
Dense-vector-only rules
| Rule | Error message |
|---|---|
| The field must be configured as dense in the Collection schema. | ... has dense embedding field '<name>' but CollectionConfig defines it as non-dense |
| Dimension must match the configured dimension. | ... field '<name>': expected dense embedding dimension <expected>, but got <actual> |
All values must be finite -- no NaN, +Infinity, or -Infinity. Non-finite values would corrupt distance computations. |
... field '<name>': dense embedding contains non-finite value <v> at index <i> (NaN/Infinity values are not allowed) |
Sparse-vector-only rules
| Rule | Error message |
|---|---|
| The field must be configured as sparse in the Collection schema. | ... has sparse embedding field '<name>' but CollectionConfig defines it as non-sparse |
Indices/values length parity: indicesCount == valuesCount. |
... field '<name>': sparse embedding has <n> indices but <m> values; indices and values must have the same length |
| Non-negative indices: every index >= 0. | ... field '<name>': sparse embedding contains negative index <i> at position <p> (indices must be non-negative) |
| Unique indices within the same sparse vector. | ... field '<name>': sparse embedding contains duplicate index <i> (each index must appear at most once) |
| All values must be finite. | ... field '<name>': sparse embedding contains non-finite value <v> at position <p> (NaN/Infinity values are not allowed) |
5. Searchable-fields population
After embedding validation succeeds, the pipeline walks the data payload
using the Collection's dataSchema and copies fields that the schema
declares (string, integer/number, boolean, array of string, and nested
objects) into a searchable-fields index. Two failure modes here will also
reject a record:
- A path that should be a struct contains a scalar (for example, the schema
says
author.nameis a string butauthoris itself a string in the document). - An array-of-strings field contains a non-string element.
These typically indicate that the document's shape has drifted from the declared schema and are not always caught at the JSON Schema stage.
Pre-flight checklist
Before kicking off an ingestion or index build, validate the entire dataset against the following rules. This is the same set of checks the pipeline applies, ordered so a single client-side pass surfaces every issue:
- Format -- every line parses as JSON and matches either the default or v1 shape.
- IDs -- match the RFC 1035 regular expression
[a-z]([-a-z0-9]{0,61}[a-z0-9])?and are unique across the dataset. - Embeddings present -- at least one vector per record; vector names are
listed in your
CollectionConfigvector schema with the correct dense or sparse type. - Dense vectors -- correct dimension; no
NaN,+Inf, or-Infvalues. - Sparse vectors --
values.length == indices.length; all indexes greater or equal to 0 and unique. Non-finite values are not allowed. - No duplicate vector names within a record across dense and sparse.
- Data schema -- if a
dataSchemais configured, yourdatapayload validates against it (Draft 7), and every field's actual JSON type matches the declared type (especially for nested objects and arrays of strings). - v1
numeric_restricts-- every entry has a stringnamespaceand exactly one ofvalue_int/value_float/value_double.
Create a Data Object
The following example demonstrates adding a single Data Object to a Collection
with the ID COLLECTION_ID.
REST
Before using any of the request data, make the following replacements:
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects?dataObjectId=DATA_OBJECT_ID
Request JSON body:
{
"data": {
"director": "Frank Darabont",
"genre": "Drama",
"title": "The Shawshank Redemption",
"year": 1994
},
"vectors":{
"genre_embedding": {
"dense": {
"values": [ 0.38638010860523064, 0.739343471733759, 0.16189056837017107, 0.5271366865924485 ]
}
},
"plot_embedding": {
"dense": {
"values": [ 0.4752082440607731, 0.09026746166854707, 0.8752307753619009 ]
}
},
"soundtrack_embedding": {
"dense": {
"values": [ 0.5920451749052875, 0.08301644173787519, 0.1264733498775969, 0.6196429624200321, 0.4925828581737443 ]
}
},
"sparse_embedding": {
"sparse": {
"indices": [ 4065, 13326, 17377, 25918, 28105, 32683, 42998 ],
"values": [ 1, 6, 3, 2, 8, 5, 2 ]
}
}
}
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
"data": {
"director": "Frank Darabont",
"title": "The Shawshank Redemption",
"year": 1994,
"genre": "Drama"
},
"vectors": {
"genre_embedding": {
"dense": {
"values": [
0.3863801,
0.73934346,
0.16189057,
0.5271367
]
}
},
"plot_embedding": {
"dense": {
"values": [
0.47520825,
0.090267465,
0.8752308
]
}
},
"soundtrack_embedding": {
"dense": {
"values": [
0.5920452,
0.08301644,
0.12647335,
0.619643,
0.49258286
]
}
},
"sparse_embedding": {
"sparse": {
"values": [
1,
6,
3,
2,
8,
5,
2
],
"indices": [
4065,
13326,
17377,
25918,
28105,
32683,
42998
]
}
}
}
}
gcloud
Before using any of the command data below, make the following replacements:
-
DATA_FILE: The local path to a JSON file containing the data part of the data object.
Example file content:
{ "director": "Frank Darabont", "genre": "Drama", "title": "The Shawshank Redemption", "year": 1994 }
-
VECTORS_FILE: The local path to a JSON file containing the vectors part of the data object.
Example file content:
{ "genre_embedding": { "dense": { "values": [ 0.38638010860523064, 0.739343471733759, 0.16189056837017107, 0.5271366865924485 ] } }, "plot_embedding": { "dense": { "values": [ 0.4752082440607731, 0.09026746166854707, 0.8752307753619009 ] } }, "soundtrack_embedding": { "dense": { "values": [ 0.5920451749052875, 0.08301644173787519, 0.1264733498775969, 0.6196429624200321, 0.4925828581737443 ] } }, "sparse_embedding": { "sparse": { "indices": [ 4065, 13326, 17377, 25918, 28105, 32683, 42998 ], "values": [ 1, 6, 3, 2, 8, 5, 2 ] } } }
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects create DATA_OBJECT_ID \ --data=DATA_FILE \ --vectors=VECTORS_FILE \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID
Windows (PowerShell)
gcloud vector-search collections data-objects create DATA_OBJECT_ID ` --data=DATA_FILE ` --vectors=VECTORS_FILE ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID
Windows (cmd.exe)
gcloud vector-search collections data-objects create DATA_OBJECT_ID ^ --data=DATA_FILE ^ --vectors=VECTORS_FILE ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID
You should receive a response similar to the following:
Created dataObject [DATA_OBJECT_ID].
Python
from google.cloud import vectorsearch_v1
# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()
# Initialize request
data_object = vectorsearch_v1.DataObject(
data={
"title": "The Shawshank Redemption",
"genre": "Drama",
"year": 1994,
"director": "Frank Darabont",
},
vectors={
"plot_embedding": {
"dense": {"values": [0.1, 0.2, 0.3]}
},
"genre_embedding": {
"dense": {"values": [0.4, 0.5, 0.6, 0.7]}
},
"soundtrack_embedding": {
"dense": {"values": [0.8, 0.9, 1.0, 1.1, 1.2]}
},
"sparse_embedding": {
"sparse": {"values": [1.0, 2.0], "indices": [10, 20]}
},
},
)
request = vectorsearch_v1.CreateDataObjectRequest(
parent="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
data_object_id="DATA_OBJECT_ID",
data_object=data_object,
)
# Make the request
response = data_object_service_client.create_data_object(request=request)
# Handle the response
print(response)
Embedding fields that have an auto-embedding specified in the Collection Schema are automatically populated. You can also bring your own embeddings (BYOE) to set vector field values that are not automatically populated.
Batch create Data Objects
For efficient bulk ingestion of small numbers of records (up to 1000 Data
Objects per request), use batchCreate. The whole batch is atomic: either
all Data Objects are created, or the entire request fails. For larger
datasets, prefer importing Data Objects from
Cloud Storage.
REST
Before using any of the request data, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchCreate
Request JSON body:
{
"requests": [
{
"parent": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
"dataObjectId": "movie-1",
"dataObject": {
"data": {
"title": "The Shawshank Redemption",
"year": 1994
},
"vectors": {
"plot_embedding": {
"dense": { "values": [0.47, 0.09, 0.87] }
}
}
}
},
{
"parent": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
"dataObjectId": "movie-2",
"dataObject": {
"data": {
"title": "The Godfather",
"year": 1972
},
"vectors": {
"plot_embedding": {
"dense": { "values": [0.12, 0.55, 0.31] }
}
}
}
}
]
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"dataObjects": [
{
"name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1",
"data": {
"title": "The Shawshank Redemption",
"year": 1994
},
"vectors": {
"plot_embedding": {
"dense": { "values": [0.47, 0.09, 0.87] }
}
}
},
{
"name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2",
"data": {
"title": "The Godfather",
"year": 1972
},
"vectors": {
"plot_embedding": {
"dense": { "values": [0.12, 0.55, 0.31] }
}
}
}
]
}
gcloud
Before using any of the command data below, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects batch-create \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID \ --requests='[ { "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID", "dataObjectId":"movie-1", "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}} }, { "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID", "dataObjectId":"movie-2", "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}} } ]'
Windows (PowerShell)
gcloud vector-search collections data-objects batch-create ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID ` --requests='[ { "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID", "dataObjectId":"movie-1", "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}} }, { "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID", "dataObjectId":"movie-2", "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}} } ]'
Windows (cmd.exe)
gcloud vector-search collections data-objects batch-create ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID ^ --requests='[ { "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID", "dataObjectId":"movie-1", "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}} }, { "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID", "dataObjectId":"movie-2", "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}} } ]'
Python
from google.cloud import vectorsearch_v1
# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()
parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"
# Build per-DataObject create requests.
requests = [
vectorsearch_v1.CreateDataObjectRequest(
parent=parent,
data_object_id="movie-1",
data_object=vectorsearch_v1.DataObject(
data={"title": "The Shawshank Redemption", "year": 1994},
vectors={
"plot_embedding": {"dense": {"values": [0.47, 0.09, 0.87]}},
},
),
),
vectorsearch_v1.CreateDataObjectRequest(
parent=parent,
data_object_id="movie-2",
data_object=vectorsearch_v1.DataObject(
data={"title": "The Godfather", "year": 1972},
vectors={
"plot_embedding": {"dense": {"values": [0.12, 0.55, 0.31]}},
},
),
),
]
request = vectorsearch_v1.BatchCreateDataObjectsRequest(
parent=parent,
requests=requests,
)
# Make the request
response = data_object_service_client.batch_create_data_objects(request=request)
# Handle the response
for data_object in response.data_objects:
print(data_object.name)
Get a Data Object
The following example demonstrates how to get a Data Object with the ID
DATA_OBJECT_ID from a Collection with the ID COLLECTION_ID.
REST
Before using any of the request data, make the following replacements:
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
GET https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
"createTime": "2026-01-31T20:05:06Z",
"updateTime": "2026-01-31T20:05:06Z",
"data": {
"title": "The Shawshank Redemption",
"director": "Frank Darabont",
"year": 1994,
"genre": "Drama"
},
"vectors": {
"sparse_embedding": {
"sparse": {
"values": [
1,
6,
3,
2,
8,
5,
2
],
"indices": [
4065,
13326,
17377,
25918,
28105,
32683,
42998
]
}
},
"genre_embedding": {
"dense": {
"values": [
0.3863801,
0.73934346,
0.16189057,
0.5271367
]
}
},
"plot_embedding": {
"dense": {
"values": [
0.47520825,
0.090267465,
0.8752308
]
}
},
"soundtrack_embedding": {
"dense": {
"values": [
0.5920452,
0.08301644,
0.12647335,
0.619643,
0.49258286
]
}
}
}
}
gcloud
Before using any of the command data below, make the following replacements:
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects describe DATA_OBJECT_ID \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID
Windows (PowerShell)
gcloud vector-search collections data-objects describe DATA_OBJECT_ID ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID
Windows (cmd.exe)
gcloud vector-search collections data-objects describe DATA_OBJECT_ID ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID
You should receive a response similar to the following:
name: projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID
data:
director: Frank Darabont
genre: Drama
title: The Shawshank Redemption
year: 1994
vectors:
genre_embedding:
dense:
values:
- 0.3863801
- 0.73934346
- 0.16189057
- 0.5271367
plot_embedding:
dense:
values:
- 0.47520825
- 0.090267465
- 0.8752308
soundtrack_embedding:
dense:
values:
- 0.5920452
- 0.08301644
- 0.12647335
- 0.619643
- 0.49258286
sparse_embedding:
sparse:
indices:
- 4065
- 13326
- 17377
- 25918
- 28105
- 32683
- 42998
values:
- 1.0
- 6.0
- 3.0
- 2.0
- 8.0
- 5.0
- 2.0
Python
from google.cloud import vectorsearch_v1
# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()
# Initialize request
request = vectorsearch_v1.GetDataObjectRequest(
name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
)
# Make the request
response = data_object_service_client.get_data_object(request=request)
# Handle the response
print(response)
Update a Data Object
The following example demonstrates how to update the title data field
and plot_embedding vector values in the Data Object with the ID DATA_OBJECT_ID
in a Collection with the ID COLLECTION_ID.
REST
Before using any of the request data, make the following replacements:
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
PATCH https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID
Request JSON body:
{
"data": {
"title": "The Shawshank Redemption (updated)"
},
"vectors": {
"plot_embedding": {
"dense": {
"values": [
1.0,
1.0,
1.0
]
}
}
}
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
"data": {
"title": "The Shawshank Redemption (updated)"
},
"vectors": {
"plot_embedding": {
"dense": {
"values": [
1,
1,
1
]
}
}
}
}
gcloud
Before using any of the command data below, make the following replacements:
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects update DATA_OBJECT_ID \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID \ --data='{"title": "The Shawshank Redemption (updated)"}' \ --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'
Windows (PowerShell)
gcloud vector-search collections data-objects update DATA_OBJECT_ID ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID ` --data='{"title": "The Shawshank Redemption (updated)"}' ` --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'
Windows (cmd.exe)
gcloud vector-search collections data-objects update DATA_OBJECT_ID ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID ^ --data='{"title": "The Shawshank Redemption (updated)"}' ^ --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'
You should receive a response similar to the following:
Updated dataObject [DATA_OBJECT_ID].
Python
from google.cloud import vectorsearch_v1
# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()
# Initialize request
data_object = vectorsearch_v1.DataObject(
name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
data={"title": "The Shawshank Redemption (updated)"},
vectors={
"plot_embedding": {
"dense": {"values": [1., 1., 1.]}
},
},
)
request = vectorsearch_v1.UpdateDataObjectRequest(
data_object=data_object,
)
# Make the request
response = data_object_service_client.update_data_object(request=request)
# Handle the response
print(response)
Batch update Data Objects
To update many Data Objects at once, use batchUpdate. A maximum of 1000
Data Objects can be updated in a single batch. Each per-record request
specifies the dataObject (which must include its full resource name plus
the fields you want to change) and an updateMask listing which fields to
overwrite. Fields not listed in the mask are left unchanged.
REST
Before using any of the request data, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchUpdate
Request JSON body:
{
"requests": [
{
"dataObject": {
"name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1",
"data": { "genre": "Thriller" }
},
"updateMask": "data.genre"
},
{
"dataObject": {
"name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2",
"vectors": {
"plot_embedding": {
"dense": { "values": [0.21, 0.34, 0.55] }
}
}
},
"updateMask": "vectors.plot_embedding"
}
]
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{}
gcloud
Before using any of the command data below, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects batch-update \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID \ --requests='[ { "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}}, "updateMask":"data.genre" }, { "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}}, "updateMask":"vectors.plot_embedding" } ]'
Windows (PowerShell)
gcloud vector-search collections data-objects batch-update ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID ` --requests='[ { "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}}, "updateMask":"data.genre" }, { "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}}, "updateMask":"vectors.plot_embedding" } ]'
Windows (cmd.exe)
gcloud vector-search collections data-objects batch-update ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID ^ --requests='[ { "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}}, "updateMask":"data.genre" }, { "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}}, "updateMask":"vectors.plot_embedding" } ]'
Python
from google.cloud import vectorsearch_v1
from google.protobuf import field_mask_pb2
# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()
parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"
# Each entry specifies the DataObject to update (with its full resource
# name) and an update_mask listing the fields to overwrite. Fields not
# listed in the mask are left unchanged.
requests = [
vectorsearch_v1.UpdateDataObjectRequest(
data_object=vectorsearch_v1.DataObject(
name=f"{parent}/dataObjects/movie-1",
data={"genre": "Thriller"},
),
update_mask=field_mask_pb2.FieldMask(paths=["data.genre"]),
),
vectorsearch_v1.UpdateDataObjectRequest(
data_object=vectorsearch_v1.DataObject(
name=f"{parent}/dataObjects/movie-2",
vectors={
"plot_embedding": {"dense": {"values": [0.21, 0.34, 0.55]}},
},
),
update_mask=field_mask_pb2.FieldMask(paths=["vectors.plot_embedding"]),
),
]
request = vectorsearch_v1.BatchUpdateDataObjectsRequest(
parent=parent,
requests=requests,
)
# Make the request
data_object_service_client.batch_update_data_objects(request=request)
Import Data Objects
The following example demonstrates how to import Data Objects from Cloud Storage
into a Collection with the ID COLLECTION_ID. Use import for
large datasets; for smaller bulk ingests (up to 1000 records) consider
creating Data Objects in batch.
REST
Before using any of the request data, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects
Request JSON body:
{
"gcsImport": {
"contentsUri": "gs://your-bucket/path/to/your-data.json",
"errorUri": "gs://your-bucket/path/to/import-errors/",
"outputUri": "gs://your-bucket/path/to/import-output/"
}
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
"metadata": {
"@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ImportDataObjectsMetadata",
"createTime": "2026-02-02T13:30:43.874527852Z"
},
"done": false
}
gcloud
Before using any of the command data below, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections import-data-objects COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID \ --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" \ --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" \ --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" \ --async
Windows (PowerShell)
gcloud vector-search collections import-data-objects COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID ` --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" ` --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" ` --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" ` --async
Windows (cmd.exe)
gcloud vector-search collections import-data-objects COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID ^ --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" ^ --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" ^ --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" ^ --async
Python
from google.cloud import vectorsearch_v1
# Create the client
vector_search_service_client = vectorsearch_v1.VectorSearchServiceClient()
# Initialize request
request = vectorsearch_v1.ImportDataObjectsRequest(
name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
gcs_import={
"contents_uri": "gs://your-bucket/path/to/your-data/",
"error_uri": "gs://your-bucket/path/to/import-errors/",
},
)
# Make the request
operation = vector_search_service_client.import_data_objects(request=request)
# Wait for the result (note this may take up to several minutes)
operation.result()
Folder gs://your-bucket/path/to/your-data/ can contain one or more files
each containing multiple Data Objects.
Use this structure for large datasets that are spread across multiple files.
The following file formats are supported in Vector Search 2.0:
- JSONL, where each line is a JSON object having three top level properties:
id,data, andvectors. Use this format for new Vector Search 2.0 datasets when you want human-readable input for inspecting and editing by hand. - AVRO: Use this format for new Vector Search 2.0 datasets when you need a compact, schema-validated binary format -- typically for large datasets produced by data-pipeline tools such as Dataflow, Beam, or Spark.
- Vector Search JSON: Use this format only when you are migrating an existing Vector Search (Vector Search 1.0) JSON dataset and want to reuse it as-is.
- Vector Search AVRO: Use this format only when you are migrating an existing Vector Search (Vector Search 1.0) AVRO dataset and want to reuse it as-is.
The following provides an example of the JSONL with the required properties.
{
"id": "movie-789",
"data": {
"title":"The Shawshank Redemption",
"plot": "...",
"year":1994,
"avg_rating": 8.5,
"movie_runtime_info": {
"hours": 2,
"minutes": 5
},
},
"vectors": {
"title_embedding": [-0.23, 0.88, 0.11, ...],
"sparse_embedding": {
"values": [0.01, -0.93, 0.27, ...],
"indices": [23, 83, 131, ...]
}
}
}
AVRO
For AVRO files, each record must conform to the DataObject Avro schema shown.
The fields mirror the JSONL format:
id(requiredstring).vectors(map, default{}). Each entry is keyed by the vector name and its value is either anarrayoffloat(dense vector) or aSparseVectorrecord withvalues(array offloat) andindices(array oflong).data(nullablemap, defaultnull). Keys are data field names. Each value is aDataValuerecord whosevaluefield is a union over the supported primitive types (boolean,int,long,float,double,string) plusarrayofDataValueandmapofstringtoDataValuefor nested structures.etag(nullablestring, defaultnull).
{
"namespace": "com.google.cloud.ai.vectorsearch",
"type": "record",
"name": "DataObject",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "vectors",
"type": {
"type": "map",
"values": [
{
"type": "array",
"items": "float"
},
{
"type": "record",
"name": "SparseVector",
"fields": [
{
"name": "values",
"type": { "type": "array", "items": "float" }
},
{
"name": "indices",
"type": { "type": "array", "items": "long" }
}
]
}
]
},
"default": {}
},
{
"name": "data",
"type": [
"null",
{
"type": "map",
"values": {
"type": "record",
"name": "DataValue",
"fields": [
{
"name": "value",
"type": [
"boolean",
"int",
"long",
"float",
"double",
"string",
{
"type": "array",
"items": "DataValue"
},
{
"type": "map",
"values": "DataValue"
}
]
}
]
}
}
],
"default": null
},
{
"name": "etag",
"type": [
"null",
"string"
],
"default": null
}
]
}
The following snippet shows the conceptual contents of a single AVRO record
that matches the preceding JSONL example. Note that under this schema each
entry in data is wrapped in a DataValue record (with a single value
field), which is how AVRO represents the heterogeneous types in
data:
{
"id": "movie-789",
"vectors": {
"title_embedding": [-0.23, 0.88, 0.11],
"sparse_embedding": {
"values": [0.01, -0.93, 0.27],
"indices": [23, 83, 131]
}
},
"data": {
"title": { "value": "The Shawshank Redemption" },
"plot": { "value": "..." },
"year": { "value": 1994 },
"avg_rating": { "value": 8.5 },
"movie_runtime_info": {
"value": {
"hours": { "value": 2 },
"minutes": { "value": 5 }
}
}
}
}
Export Data Objects
The following example demonstrates how to export every Data Object in a Collection to Cloud Storage in JSONL format. The destination bucket must be in the same region as the Collection. Export is a long-running operation.
REST
Before using any of the request data, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:exportDataObjects
Request JSON body:
{
"gcsDestination": {
"exportUri": "gs://your-bucket/path/to/export-dir/",
"format": "JSONL"
}
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
"metadata": {
"@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ExportDataObjectsMetadata",
"createTime": "2026-02-02T13:30:43.874527852Z"
},
"done": false
}
gcloud
Before using any of the command data below, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections export-data-objects COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID \ --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" \ --gcs-destination-format="jsonl" \ --async
Windows (PowerShell)
gcloud vector-search collections export-data-objects COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID ` --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" ` --gcs-destination-format="jsonl" ` --async
Windows (cmd.exe)
gcloud vector-search collections export-data-objects COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID ^ --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" ^ --gcs-destination-format="jsonl" ^ --async
Python
from google.cloud import vectorsearch_v1
# Create the client
vector_search_service_client = vectorsearch_v1.VectorSearchServiceClient()
# Initialize request
request = vectorsearch_v1.ExportDataObjectsRequest(
name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
gcs_destination={
"export_uri": "gs://your-bucket/path/to/export-dir/",
"format": vectorsearch_v1.ExportDataObjectsRequest.GcsExportDestination.Format.JSONL,
},
)
# Make the request
operation = vector_search_service_client.export_data_objects(request=request)
# Wait for the result (note this may take up to several minutes)
operation.result()
Delete a Data Object
The following example shows how to delete a single Data Object
DATA_OBJECT_ID from a Collection with the ID
COLLECTION_ID.
REST
Before using any of the request data, make the following replacements:
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
DELETE https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
"metadata": {
"@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ExportDataObjectsMetadata",
"createTime": "2026-02-02T13:30:43.874527852Z"
},
"done": false
}
gcloud
Before using any of the command data below, make the following replacements:
- DATA_OBJECT_ID: The ID of the data object.
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects delete DATA_OBJECT_ID \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID
Windows (PowerShell)
gcloud vector-search collections data-objects delete DATA_OBJECT_ID ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID
Windows (cmd.exe)
gcloud vector-search collections data-objects delete DATA_OBJECT_ID ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID
You should receive a response similar to the following:
Deleted dataObject [DATA_OBJECT_ID].
Python
from google.cloud import vectorsearch_v1
# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()
# Initialize request
request = vectorsearch_v1.DeleteDataObjectRequest(
name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
)
# Make the request
data_object_service_client.delete_data_object(request=request)
Batch delete Data Objects
To delete many Data Objects at once, use batchDelete with a list of
fully-qualified Data Object resource names. A maximum of 1000 Data Objects
can be deleted in a single batch.
REST
Before using any of the request data, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchDelete
Request JSON body:
{
"requests": [
{ "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1" },
{ "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2" }
]
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{}
gcloud
Before using any of the command data below, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects batch-delete \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID \ --requests='[ {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"}, {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"} ]'
Windows (PowerShell)
gcloud vector-search collections data-objects batch-delete ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID ` --requests='[ {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"}, {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"} ]'
Windows (cmd.exe)
gcloud vector-search collections data-objects batch-delete ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID ^ --requests='[ {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"}, {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"} ]'
Python
from google.cloud import vectorsearch_v1
# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()
parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"
requests = [
vectorsearch_v1.DeleteDataObjectRequest(
name=f"{parent}/dataObjects/movie-1",
),
vectorsearch_v1.DeleteDataObjectRequest(
name=f"{parent}/dataObjects/movie-2",
),
]
request = vectorsearch_v1.BatchDeleteDataObjectsRequest(
parent=parent,
requests=requests,
)
# Make the request
data_object_service_client.batch_delete_data_objects(request=request)
Count Data Objects
To count how many Data Objects a Collection contains, use the
aggregate operation with the COUNT aggregation method. The same call
accepts an optional JSON filter expression so you can count only the
Data Objects that match a predicate (for example, genre == "sci-fi").
To count every Data Object in the Collection, omit the filter.
REST
Before using any of the request data, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
HTTP method and URL:
POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:aggregate
Request JSON body:
{
"aggregate": "COUNT",
"filter": { "genre": { "$eq": "sci-fi" } }
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"aggregateResults": [
{ "count": "42" }
]
}
gcloud
Before using any of the command data below, make the following replacements:
- COLLECTION_ID: The ID of the collection.
- LOCATION: The region where you are using Agent Platform.
- PROJECT_ID: Your Google Cloud project ID.
Execute the following command:
Linux, macOS, or Cloud Shell
gcloud vector-search collections data-objects aggregate \ --collection=COLLECTION_ID \ --location=LOCATION \ --project=PROJECT_ID \ --aggregation-method=count \ --json-filter='{"genre": {"$eq": "sci-fi"}}'
Windows (PowerShell)
gcloud vector-search collections data-objects aggregate ` --collection=COLLECTION_ID ` --location=LOCATION ` --project=PROJECT_ID ` --aggregation-method=count ` --json-filter='{"genre": {"$eq": "sci-fi"}}'
Windows (cmd.exe)
gcloud vector-search collections data-objects aggregate ^ --collection=COLLECTION_ID ^ --location=LOCATION ^ --project=PROJECT_ID ^ --aggregation-method=count ^ --json-filter='{"genre": {"$eq": "sci-fi"}}'
Python
from google.cloud import vectorsearch_v1
from google.protobuf import struct_pb2
from google.protobuf import json_format
# Create the client
search_client = vectorsearch_v1.DataObjectSearchServiceClient()
parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"
# Optional: build a JSON filter. Omit `filter=` to count everything.
filter_struct = json_format.ParseDict(
{"genre": {"$eq": "sci-fi"}}, struct_pb2.Struct()
)
request = vectorsearch_v1.AggregateDataObjectsRequest(
parent=parent,
aggregate=vectorsearch_v1.AggregationMethod.COUNT,
filter=filter_struct,
)
# Make the request
response = search_client.aggregate_data_objects(request=request)
# The count value is returned in aggregate_results[0].
for result in response.aggregate_results:
print(result)
What's next?
- Discover how to use ETags for Data Object concurrency control.
- Learn about Collection Indexes.
- See how to query and search for Data Objects.