feat(data-warehouse): add WorkOS source#60890
Conversation
Add a WorkOS data warehouse source that syncs organizations, users, SSO connections, directories, directory users, and directory groups. The source is resumable via WorkOS's `after` cursor pagination and is full-refresh only — WorkOS list endpoints expose no server-side timestamp filter. Outbound HTTP goes through the tracked rest_source transport. Marked alpha / unreleased pending testing against a live account. Generated-By: PostHog Code Task-Id: 2b53eabc-25d7-4ff9-a2c2-e47bcfa94b80
|
Hey @Gilbert09! 👋 It looks like your git author email on this PR isn't your
You can fix it for this repo with: git config user.email "you@posthog.com"Or set it globally with |
👥 Auto-assigned reviewersSkipped a review request for |
There was a problem hiding this comment.
Pull request overview
Adds a new WorkOS external data warehouse source to the Temporal data import framework, allowing PostHog to ingest WorkOS directory/identity data (full-refresh) via the shared REST source transport with resumable cursor pagination.
Changes:
- Introduces a WorkOS
ResumableSourceimplementation (cursor-basedafterpagination persisted viaResumableSourceManager) plus endpoint settings and schema surface. - Registers WorkOS across backend enums/config plumbing and frontend schema lists so it appears as a selectable warehouse source.
- Adds a Django choices-only migration updating
ExternalDataSource.source_typechoices to include WorkOS.
Reviewed changes
Copilot reviewed 14 out of 18 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| products/warehouse_sources/backend/migrations/max_migration.txt | Bumps recorded latest migration to 0003. |
| products/warehouse_sources/backend/migrations/0003_alter_externaldatasource_source_type.py | Adds WorkOS to source_type choices via AlterField. |
| products/data_warehouse/backend/types.py | Adds WORKOS to ExternalDataSourceType (backend enum). |
| posthog/temporal/data_imports/sources/workos/workos.py | Implements WorkOS REST resource config, cursor paginator, credential validation, and resumable checkpointing. |
| posthog/temporal/data_imports/sources/workos/source.py | Registers WorkOSSource and exposes config UI fields, schemas, retryability rules, and pipeline binding. |
| posthog/temporal/data_imports/sources/workos/settings.py | Declares WorkOS endpoints, paths, and partitioning defaults. |
| posthog/temporal/data_imports/sources/workos/tests/test_workos.py | Tests paginator behavior, resume checkpointing, and credential validation mapping. |
| posthog/temporal/data_imports/sources/workos/tests/test_workos_source.py | Tests source config surface, schema enumeration, non-retryable patterns, and resumable manager wiring. |
| posthog/temporal/data_imports/sources/SOURCES.md | Adds WorkOS to the tracked sources list. |
| posthog/temporal/data_imports/sources/generated_configs.py | Adds WorkOSSourceConfig and maps ExternalDataSourceType.WORKOS to it. |
| posthog/temporal/data_imports/sources/_load_all.py | Ensures WorkOS source module is imported/registered. |
| posthog/schema.py | Adds WORK_OS = "WorkOS" to the schema enum. |
| frontend/src/queries/schema/schema-general.ts | Adds WorkOS to the frontend externalDataSources list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
posthog/schema.py:2221-2222
`posthog/schema.py` is autogenerated by `pnpm run schema:build` and should not be modified by hand. The PR description acknowledges this was hand-applied to "match deterministic generator output," but any drift between the hand-applied value and what the generator would actually produce won't surface until CI runs the build. Please run `pnpm run schema:build` (and `pnpm run generate:source-configs` for `generated_configs.py`) before merging and commit the generated output, rather than keeping the hand-written edit.
### Issue 2 of 2
posthog/temporal/data_imports/sources/workos/workos.py:64-68
After the terminal page (`list_metadata.after` is null), `_after` retains the cursor from the previous page while `_has_next_page` flips to `False`. The current code relies on the rest-client never calling `get_resume_state()` once `has_next_page` is `False`, but that contract is not enforced by the type system. If the rest-client's behaviour ever changes, `get_resume_state()` would return the stale cursor and `save_checkpoint` would persist it, causing the next run to re-fetch the last page unnecessarily. Clearing `_after` in the `else` branch removes the dependency on that implicit assumption.
```suggestion
if next_after:
self._after = next_after
self._has_next_page = True
else:
self._after = None
self._has_next_page = False
```
Reviews (1): Last reviewed commit: "chore: update OpenAPI generated types" | Re-trigger Greptile |
|
Size Change: 0 B Total Size: 80.9 MB ℹ️ View Unchanged
|
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
- Add WorkOS to the ExternalDataSourceType enum in schema.json so the generated schema.py / schema.json up-to-date CI checks pass. - Clear the cursor on the terminal page so the paginator never hands back a stale resume checkpoint, regardless of the rest-client calling contract. Generated-By: PostHog Code Task-Id: 2b53eabc-25d7-4ff9-a2c2-e47bcfa94b80
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (0 modified, 1 added, 0 deleted) What this means:
Next steps:
|
Migration SQL ChangesHey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:
|
🔍 Migration Risk AnalysisWe've analyzed your migrations for potential risks. Summary: 0 Safe | 1 Needs Review | 0 Blocked
|
The source config generator snapshot test enumerates every registered source type. Adding the WorkOS source means the generated mapping now includes WORKOS, so refresh the snapshot to match. Generated-By: PostHog Code Task-Id: 39c67bce-d6bd-405d-9f7d-439e06e2ea83
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
Problem
WorkOS is a common identity / directory-sync provider (SSO, SCIM, user management). Customers using WorkOS have no way to pull that data — organizations, users, SSO connections, and directory sync state — into the PostHog data warehouse for analysis alongside their product data.
Changes
Adds a new WorkOS data warehouse source under
posthog/temporal/data_imports/sources/workos/.ResumableSource. WorkOS uses cursor pagination (list_metadata.after→afterparam), so the paginator persists the cursor to Redis and resumes mid-sync after heartbeat timeouts.organizations,users(/user_management/users),connections,directories,directory_users,directory_groups.created_at.sk_-prefixed API key (password field).validate_credentialsprobesGET /organizations?limit=1; a 403 is accepted at source-create (valid key, missing scope), while sync-time 401/403 fail permanently viaget_non_retryable_errors.rest_sourcetransport.products/data_warehouse/backend/types.py),schema-general.ts,_load_all.py, generated config,SOURCES.md, plus a choices-onlyAlterFieldmigration (warehouse_sources/0003). Icon committed atfrontend/public/services/workos.png.Marked
unreleasedSource=True/releaseStatus="alpha"since it hasn't yet been exercised against a live WorkOS account.Deliberately left out of v1: the
/eventsAPI (the only true incremental path, but its 30-day-window / 90-day-retention semantics need dedicated handling) andorganization_memberships(requires auser_id/organization_idfilter, so it can't be listed top-level).How did you test this code?
I'm an agent. I added two automated test modules (
tests/test_workos.py,tests/test_workos_source.py) covering paginator state transitions, resume round-trips, end-to-end resume throughrest_api_resource, credential-status mapping, and the source-class config/schema surface.I could not run the test suite, codegen, migrations, or linters in my environment (no Django/node/ruff available). The generated-file edits (
generated_configs.py,posthog/schema.py) and the migration were hand-applied to match deterministic generator output. Before merge, please run locally to verify:Automatic notifications
🤖 Agent context
Authored by Claude Code (Opus 4.8) via the
implementing-warehouse-sourcesskill. Workflow: researched the WorkOS REST API (pagination envelope, per-endpoint filters, error/rate-limit semantics) with a web-research subagent, then modeled the implementation on the existing Clerk source as the closest reference (auth provider, cursor pagination, resumable, full-refresh).Key decisions: chose
ResumableSourceoverSimpleSourcebecause theaftercursor is a deterministic resume point; kept all endpoints full-refresh after confirming no server-side timestamp filter exists; excluded/eventsandorganization_membershipsfor the reasons noted above; accepted 403 at credential-validation time so users can connect with scoped-down keys. The Logo.dev SVG endpoint rejected the request on the available token, so the icon is a PNG (consistent with most other sources).Agent-assisted, requires human review — do not self-merge.