Skip to content

feat: data connector sync and source metadata (API v0.7.2)#96

Merged
kdr merged 1 commit into
mainfrom
kdr-v072
Jun 3, 2026
Merged

feat: data connector sync and source metadata (API v0.7.2)#96
kdr merged 1 commit into
mainfrom
kdr-v072

Conversation

@kdr

@kdr kdr commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Bump OpenAPI spec version to 0.7.2.
  • Add POST /data-connectors/{id}/sync to materialize a connector URI (e.g. grain://recording/<id>) into a Cloudglue file without starting a downstream job; idempotent for the same URI.
  • Add GET /data-connectors/{id}/source-metadata to fetch upstream source metadata for a connector URI without creating a file (Grain supported; other types return 501).
  • Add source_metadata on the File schema plus GrainSourceMetadata, SourceMetadata, and SourceMetadataResponse schemas for Grain recording provenance (participants, AI summary, action items, HubSpot links, etc., where Grain returns them).

Test plan

  • Validate spec/openapi.json parses as valid OpenAPI 3.0
  • Confirm new paths and schemas match implemented API behavior in the backend
  • Regenerate or verify SDK/docs consumers pick up v0.7.2

Made with Cursor

Summary by CodeRabbit

  • New Features
    • Data connectors can now be synced independently without triggering downstream jobs
    • Added ability to retrieve and view source metadata and provenance information for data connectors

Document Grain source provenance on files, connector URI sync, and upstream metadata lookup endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR updates the OpenAPI specification to version 0.7.2, adding two new Data Connector endpoints for syncing connector URIs and retrieving source provenance metadata. The File schema is extended with a source_metadata field, and new Grain-specific provenance types are introduced. Multiple endpoint descriptions are refined for clarity around URI handling and TikTok charges.

Changes

Data Connector API and Provenance

Layer / File(s) Summary
Version bump and File provenance field
spec/openapi.json
OpenAPI spec incremented to 0.7.2; File schema extended with nullable source_metadata field typed as SourceMetadata reference.
Grain provenance type definitions
spec/openapi.json
New GrainSourceMetadata schema introduced; SourceMetadata and SourceMetadataResponse expanded with discriminator-backed Grain support (source_type: "grain").
Data Connector sync and metadata endpoints
spec/openapi.json
Added POST /data-connectors/{id}/sync (idempotently materializes connector URI to File) and GET /data-connectors/{id}/source-metadata (fetches upstream provenance; returns 501 for unsupported types).
Endpoint and schema description refinements
spec/openapi.json
Updated description strings across NewSegments, NewExtract, AddCollectionFile, NewTranscribe, NewDescribe, and KnowledgeBaseCollections for URI type clarity, TikTok handling, and backward compatibility notes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • cloudglue/cloudglue-api-spec#92: Adds Grain to DataConnector.type and File.source enums; this PR expands Grain support in provenance metadata schemas and endpoint handling.
  • cloudglue/cloudglue-api-spec#86: Adds initial data-connectors API listing and DataConnector schema; this PR extends that surface with sync and metadata retrieval endpoints.

Suggested reviewers

  • amyxst

Poem

🐰 A hop through Cloudglue's provenance,
Where Grain now tracks its source with eloquence!
Sync and metadata dance in tandem true,
Files now remember whence they once flew. ✨
Version bumped to 0.7.2's delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description covers all key changes (version bump, new endpoints, new schemas) but is missing explicit test results. Clarify which test plan items were completed and which remain pending; provide evidence or update checkboxes to reflect actual status.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding data connector sync and source metadata endpoints, and bumping to API v0.7.2.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kdr-v072

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kdr kdr requested a review from amyxst June 3, 2026 13:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
spec/openapi.json (1)

4443-4494: ⚡ Quick win

Add the standard 500 response to this endpoint too.

The sibling Data Connector endpoints on Lines 4271-4322 and Lines 4361-4412 both document unexpected server failures, but this new lookup endpoint omits that contract. That leaves generated SDKs/docs with inconsistent error handling for the same resource family.

📄 Suggested addition
         "responses": {
           "200": {
             "description": "Source metadata for the URI",
             "content": {
               "application/json": {
                 "schema": {
                   "$ref": "`#/components/schemas/SourceMetadataResponse`"
                 }
               }
             }
           },
           "400": {
             "description": "Bad request (e.g. URL source does not match the connector type)",
             "content": {
               "application/json": {
                 "schema": {
                   "$ref": "`#/components/schemas/Error`"
                 }
               }
             }
           },
           "404": {
             "description": "Data connector not found",
             "content": {
               "application/json": {
                 "schema": {
                   "$ref": "`#/components/schemas/Error`"
                 }
               }
             }
           },
+          "500": {
+            "description": "An unexpected error occurred on the server",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "`#/components/schemas/Error`"
+                }
+              }
+            }
+          },
           "501": {
             "description": "Source metadata lookup is not implemented for this connector type",
             "content": {
               "application/json": {
                 "schema": {
                   "$ref": "`#/components/schemas/Error`"
                 }
               }
             }
           },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@spec/openapi.json` around lines 4443 - 4494, Add a standard 500 response
object to the responses block for the "Source metadata for the URI" endpoint so
it matches the sibling Data Connector endpoints: include "500" with description
"Unexpected server error" (or similar) and the same application/json content
referencing the existing "`#/components/schemas/Error`" schema; update the
responses object that currently contains 200, 400, 404, 501, 502 (the Source
metadata endpoint using SourceMetadataResponse) to also include this 500 entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@spec/openapi.json`:
- Line 4330: Update the operation description string that currently begins
"Materialize a connector URI (e.g. `grain://recording/<id>`) into a Cloudglue
file..." to avoid promising that `source_metadata` is always populated; either
limit the guarantee to "newly materialized files" or explicitly note that
idempotent returns for pre-existing/legacy Grain imports may have
`source_metadata: null`. Edit the JSON "description" value to include that
clarification so callers know that `source_metadata` may be null for existing
files.

---

Nitpick comments:
In `@spec/openapi.json`:
- Around line 4443-4494: Add a standard 500 response object to the responses
block for the "Source metadata for the URI" endpoint so it matches the sibling
Data Connector endpoints: include "500" with description "Unexpected server
error" (or similar) and the same application/json content referencing the
existing "`#/components/schemas/Error`" schema; update the responses object that
currently contains 200, 400, 404, 501, 502 (the Source metadata endpoint using
SourceMetadataResponse) to also include this 500 entry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b53bb99b-2015-4563-ad10-66e8cd7fb9a4

📥 Commits

Reviewing files that changed from the base of the PR and between a694f4f and 667d987.

📒 Files selected for processing (1)
  • spec/openapi.json

Comment thread spec/openapi.json
"tags": ["Data Connectors"],
"summary": "Sync a data connector URI into a file",
"operationId": "syncDataConnectorFile",
"description": "Materialize a connector URI (e.g. `grain://recording/<id>`) into a Cloudglue file without starting a downstream job. Idempotent: syncing the same URI returns the existing file. For Grain, the file's `source_metadata` is populated from the recording.",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't guarantee source_metadata on every idempotent sync result.

Because this operation can return an already-existing file, older Grain imports can still come back with source_metadata: null per Line 8492. The description should scope that guarantee to newly materialized files or call out the legacy-null case explicitly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@spec/openapi.json` at line 4330, Update the operation description string that
currently begins "Materialize a connector URI (e.g. `grain://recording/<id>`)
into a Cloudglue file..." to avoid promising that `source_metadata` is always
populated; either limit the guarantee to "newly materialized files" or
explicitly note that idempotent returns for pre-existing/legacy Grain imports
may have `source_metadata: null`. Edit the JSON "description" value to include
that clarification so callers know that `source_metadata` may be null for
existing files.

@kdr kdr merged commit 8ba5b28 into main Jun 3, 2026
1 check passed
@kdr kdr deleted the kdr-v072 branch June 3, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants