Skip to content

Add OAuth 2.0 Device Flow backend for CLI authentication#11984

Merged
malhotra5 merged 45 commits intomainfrom
oauth-device-flow-backend
Dec 16, 2025
Merged

Add OAuth 2.0 Device Flow backend for CLI authentication#11984
malhotra5 merged 45 commits intomainfrom
oauth-device-flow-backend

Conversation

@malhotra5
Copy link
Copy Markdown
Collaborator

@malhotra5 malhotra5 commented Dec 10, 2025

Summary of PR

Implements OAuth 2.0 Device Flow backend for CLI authentication with OpenHands Cloud. Includes comprehensive unit tests (32 tests) and code cleanup.

OAuth 2.0 Device Flow

How it works:

  1. CLI requests codes: POST /oauth/device/authorize → returns device_code + user_code
  2. User authorizes: Visits verification URL, enters user_code, logs in via Keycloak
  3. CLI polls for token: POST /oauth/device/token with device_code → gets API key
  4. CLI authenticated: Uses API key for OpenHands Cloud access

Why both codes are needed:

  • user_code (ABC12345): Short, human-readable for browser entry
  • device_code (128 chars): Cryptographically secure for CLI polling

Key Features

🔐 Endpoints

  • POST /oauth/device/authorize - Start flow
  • POST /oauth/device/token - Poll for API key
  • GET /oauth/device/verify - User authorization page
  • GET /oauth/device/keycloak-callback - OAuth callback

Change Type

  • New feature

Checklist

  • I have read and reviewed the code and I understand what the code is doing.
  • I have tested the code to the best of my ability and ensured it works as expected.

Release Notes

  • Include this change in the Release Notes.

Added OAuth 2.0 Device Flow for CLI authentication


Related: OpenHands/OpenHands-CLI#192

Testing:

cd enterprise
PYTHONPATH=".:$PYTHONPATH" poetry run pytest tests/unit/storage/test_device_code*.py tests/unit/server/routes/test_oauth_device.py -v
# 32 passed in 1.48s

make lint
# All checks passed!

To run this PR locally, use the following command:

GUI with Docker:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.openhands.dev/openhands/runtime:83dede6-nikolaik   --name openhands-app-83dede6   docker.openhands.dev/openhands/openhands:83dede6

- Add OAuth device flow endpoints for CLI authentication
- Implement device code storage and management
- Create database migration for device codes table
- Integrate with existing API key system
- Return API keys directly instead of OAuth tokens for simplified flow

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 10, 2025

⚠️ This PR contains migrations. Please synchronize before merging to prevent conflicts.

- Remove client_id and scope requirements from request models since endpoints are already authenticated
- Add keycloak_user_id field to track which user initiated the device authorization
- Update database schema to store keycloak_user_id instead of client_id/scope
- Add user authentication dependency injection to endpoints using get_user_id
- Add security check to ensure only the requesting user can authorize their device
- Remove refresh_token handling as it's not needed for this flow
- Simplify DeviceCode model and store methods by removing unnecessary OAuth fields

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 10, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  enterprise
  saas_server.py 34-39, 61-66
  enterprise/server
  middleware.py 152, 166
  enterprise/server/routes
  oauth_device.py 116-118, 189-196, 208-220, 319-321
  enterprise/storage
  api_key_store.py 132, 136-142, 146-159
  device_code.py 45, 58, 105-109
  device_code_store.py 134, 137, 156-167
Project Total  

This report was generated by python-coverage-comment-action

openhands-agent and others added 16 commits December 10, 2025 16:07
The device authorization endpoint should not require authentication since
the whole point of device flow is to authenticate unauthenticated devices.

Changes:
- Remove authentication requirement from /oauth/device/authorize endpoint
- Make keycloak_user_id nullable in device_codes table (set during authorization)
- Update DeviceCodeStore.create_device_code() to not require keycloak_user_id
- Set keycloak_user_id during device authorization, not creation

This follows the OAuth 2.0 Device Flow specification where:
1. Device authorization is unauthenticated
2. User verification requires authentication
3. Token polling is unauthenticated

Co-authored-by: openhands <openhands@all-hands.dev>
The device authorization and token polling endpoints need to be public
since they are used by unauthenticated devices during the OAuth flow.

- /oauth/device/authorize - Device authorization (unauthenticated)
- /oauth/device/token - Token polling (unauthenticated)

The verification endpoint (/oauth/device/verify) remains authenticated
as it requires user login to authorize the device.

Co-authored-by: openhands <openhands@all-hands.dev>
Remove unnecessary grant_type validation to keep the authorization flow
minimal and simple. The device_code is sufficient for token requests.

Changes:
- Remove grant_type field from DeviceTokenRequest model
- Remove grant_type validation in token endpoint
- Simplify client request to only send device_code

Co-authored-by: openhands <openhands@all-hands.dev>
Fix two issues in OAuth Device Flow:

1. Timezone-aware datetime comparison error:
   - Handle timezone-naive datetime from database in is_expired() method
   - Ensure both sides of comparison are timezone-aware

2. Pydantic deprecation warning:
   - Replace deprecated .dict() calls with .model_dump()
   - Maintains compatibility with Pydantic V2

Fixes error: 'can't compare offset-naive and offset-aware datetimes'

Co-authored-by: openhands <openhands@all-hands.dev>
1. Change verification route from /oauth/device/verify to /oauth/verify
   - Update router prefix and route paths
   - Update verification URI in device authorization response
   - Update HTML form action

2. Fix authorization logic to properly link user ID during verification
   - Remove incorrect user ID matching check that always failed
   - Device codes are created without user ID (keycloak_user_id=None)
   - User ID is linked to device code during authorization via authorize() method
   - Add proper user authentication check

This fixes the 'You are not authorized to approve this device request' error
that occurred because we were checking for user ID match before linking it.

Co-authored-by: openhands <openhands@all-hands.dev>
1. Remove /oauth/verify from middleware whitelist
   - Route should require user authentication, not be public
   - Only /oauth/device/authorize and /oauth/device/token are public

2. Add authentication requirement to GET /oauth/verify
   - Both GET and POST routes now require user_id via Depends(get_user_id)
   - Users must be logged in to access device verification page

3. Remove redundant user authentication check in POST handler
   - FastAPI dependency injection guarantees user_id is valid
   - No need for manual None check

This ensures users must be authenticated to authorize devices, which is
the correct security model for OAuth 2.0 Device Flow verification.

Co-authored-by: openhands <openhands@all-hands.dev>
1. Handle authentication manually in OAuth device routes
   - Change user_id parameter to Optional[str] in both GET and POST /oauth/verify
   - Add manual authentication check and redirect to login if not authenticated
   - Build proper login URL with redirect_uri parameter

2. Add /oauth/verify back to middleware whitelist
   - Route now handles authentication manually instead of relying on middleware
   - Allows unauthenticated access so we can control the redirect behavior

This fixes the issue where unauthenticated users saw raw redirect HTML
instead of being properly redirected to the login page.

Co-authored-by: openhands <openhands@all-hands.dev>
1. Replace manual authentication with Keycloak OAuth flow
   - GET /oauth/verify now redirects to Keycloak for authentication
   - Encode user_code in JWT state parameter for callback processing
   - Remove POST /oauth/verify endpoint (no longer needed)

2. Add Keycloak callback endpoint
   - GET /oauth/keycloak-callback handles authentication completion
   - Decode JWT state to retrieve user_code
   - Get user info from Keycloak tokens
   - Automatically authorize device and create CLI API key

3. Update middleware whitelist
   - Add /oauth/keycloak-callback to public endpoints
   - Both verification endpoints now handle auth flow properly

This follows the same pattern as Slack integration and provides
a cleaner, more secure authentication flow for device authorization.

Co-authored-by: openhands <openhands@all-hands.dev>
- Import HOST_URL from integrations.utils
- Update authentication failure messages to match Slack integration pattern
- Use styled link to OpenHands Cloud with consistent messaging
- Apply to both Keycloak token and user info failure cases

This provides users with clear guidance on how to resolve authentication
issues during the device authorization flow.

Co-authored-by: openhands <openhands@all-hands.dev>
- Change redirect_uri from request.url.netloc to HOST_URL pattern
- Match Slack integration approach exactly
- Use HOST_URL/oauth/keycloak-callback for both redirect and token exchange
- This fixes 'Invalid parameter: redirect_uri' error from Keycloak

The redirect URI must match what Keycloak expects, which is the configured
HOST_URL rather than the dynamic request URL.

Co-authored-by: openhands <openhands@all-hands.dev>
Comment thread enterprise/migrations/versions/084_create_device_codes_table.py
openhands-agent and others added 4 commits December 11, 2025 15:27
- Remove user_id column from migration (redundant with keycloak_user_id)
- Remove access_token column from migration (should not be stored in DB)
- Update DeviceCode model to remove user_id and access_token fields
- Update DeviceCode.authorize() method to not store access_token
- Update device_code_store.py to not pass access_token to authorize method
- Update oauth_device.py to retrieve API key dynamically instead of storing it
- Add helper methods to ApiKeyStore for retrieving/deleting by name and user_id
- Fix bug where delete_api_key was called with name instead of key value

Co-authored-by: openhands <openhands@all-hands.dev>
- Add DeviceCode model tests with parameterized expiration and status checks
- Add DeviceCodeStore tests with parameterized lookup and authorization scenarios
- Add OAuth2 device flow API endpoint tests with comprehensive error handling
- Remove unused created_at field from DeviceCode model and migration
- Remove cleanup_expired_codes method from DeviceCodeStore
- Update conftest.py to include DeviceCode model for test database setup

Tests cover:
- Device code generation and validation
- Authorization workflow (success/failure cases)
- API endpoints (/authorize, /token, /verify, /keycloak-callback)
- Error handling and edge cases
- 32 tests total, all passing

Co-authored-by: openhands <openhands@all-hands.dev>
- Remove unused cli_api_key variable in keycloak_callback
- Fix import issues and formatting per ruff/mypy requirements
- Add noqa comment for DeviceCode import in conftest.py (needed for DB table creation)
- Fix trailing whitespace and end-of-file formatting
- All 32 tests still passing, linting now clean

Co-authored-by: openhands <openhands@all-hands.dev>
Comment thread enterprise/server/routes/oauth_device.py Outdated
- Remove user_code validation from /verify endpoint before authentication
- Move all database validation to happen after Keycloak authentication in callback
- Update test to reflect new behavior where invalid codes redirect to Keycloak
- This prevents potential probing of valid user codes without authentication

Co-authored-by: openhands <openhands@all-hands.dev>
@malhotra5 malhotra5 requested a review from xingyaoww December 13, 2025 04:01
openhands-agent and others added 6 commits December 13, 2025 19:37
- Add session.expunge() to get_by_device_code and get_by_user_code methods
- Properly detach objects from session before returning them
- Update type annotations to use modern | None syntax instead of Optional
- Prevents DetachedInstanceError and ensures clean session management

Co-authored-by: openhands <openhands@all-hands.dev>
- Replace check-then-act pattern with database constraint enforcement
- Use IntegrityError handling to retry on constraint violations
- Remove vulnerable _generate_unique_codes method that had race condition
- Add comprehensive tests for retry logic and constraint violations
- Database already has unique indexes on device_code and user_code columns
- Prevents race conditions where multiple processes could insert duplicate codes

This fixes the classic Time-of-Check-Time-of-Use (TOCTOU) bug where:
1. Process A checks if codes exist (they don't)
2. Process B checks if codes exist (they don't)
3. Process A inserts codes
4. Process B inserts same codes -> constraint violation

Now the database enforces uniqueness and we handle violations gracefully.

Co-authored-by: openhands <openhands@all-hands.dev>
- Remove delete_api_key_by_name call to allow multiple concurrent devices
- Create unique API keys per device using user_code in name pattern
- Update API key name to 'Device Link Access Key' for clarity
- Modify token retrieval to use device-specific API key lookup
- Add comprehensive test for multiple device authentication
- Ensure backward compatibility with existing single-device flows

This allows users to authenticate multiple devices simultaneously without
interference, fixing the previous limitation where only one device could
be authenticated at a time.

Co-authored-by: openhands <openhands@all-hands.dev>
- Add rate limiting fields to device_codes table (last_poll_time, current_interval)
- Implement check_rate_limit() and update_poll_time() methods in DeviceCode model
- Add slow_down error response with interval field when clients poll too frequently
- Increase polling interval by 5 seconds on each slow_down, capped at 60 seconds
- Update DeviceCodeStore with update_poll_time() method for database persistence
- Add comprehensive test suite covering all rate limiting scenarios:
  * First poll always allowed
  * Normal polling (respecting interval) allowed
  * Fast polling returns slow_down error with increased interval
  * Interval increases with repeated violations
  * Interval caps at maximum value (60 seconds)
  * Rate limiting applies even to authorized devices
- Fix existing tests to mock new rate limiting methods
- All tests passing with clean linting

This ensures RFC 8628 compliance by preventing clients from overwhelming
the server with excessive polling requests while providing clear feedback
about required polling intervals.

Co-authored-by: openhands <openhands@all-hands.dev>
- Changed from hardcoded DEVICE_TOKEN_POLL_INTERVAL to device_code_entry.current_interval
- Ensures clients get correct polling interval if device has increased interval from rate limiting
- Added test for device authorization with increased interval (15s vs default 5s)
- Maintains backward compatibility for new device codes with default interval
- Improves RFC 8628 compliance by providing accurate polling guidance

This ensures that if a device code already has an increased interval from
previous rate limiting violations, the authorization response correctly
communicates the current required polling interval to the client.

Co-authored-by: openhands <openhands@all-hands.dev>
@malhotra5
Copy link
Copy Markdown
Collaborator Author

malhotra5 commented Dec 13, 2025

Fixes made

  1. SQLAlchemy Session Detachment Bug
  2. Race Condition in Code Generation
  3. Broken Multi-Device Support
  4. HTML Templates in Python - UI is handled via FE
  5. Inconsistent Error Handling - backend doesn't handle HTML responses
  6. Incomplete Spec Compliance - slow down protocol is implemented. we don't support fine grain grants for now so I haven't implemented that (its a native api key with access to cloud endpoints)
  7. Inefficient Timezone Handling

Things not addressed in this PR

  1. DOS Attack Vector - we've tried system wide rate limiting in the past but there were some complications in prod. I'm not sure of the current status but I think that's out of scope for this PR

  2. Premature State Encoding - not a big deal imho

  3. Database Bloat - I'd like to do this in a follow up PR as it involves writing a CRON job. Considering this is a security heavy PR I'd prefer including chore related code in a separate PR to mitigate any oversights during the review process.

- Updated migration to use DateTime(timezone=True) for all datetime columns
- Modified DeviceCode model to use DateTime(timezone=True) columns
- Simplified is_expired() method by removing runtime timezone conversions
- Simplified check_rate_limit() method by removing timezone conversion logic
- Database now stores timezone-aware timestamps eliminating conversion overhead

Performance improvements:
- Eliminates timezone check on every is_expired() call (called every 5+ seconds per device)
- Removes datetime.replace() object creation overhead
- Ensures consistent timezone handling across all database backends
- PostgreSQL will use TIMESTAMP WITH TIME ZONE for proper timezone storage

This change ensures that datetime fields are always timezone-aware when retrieved
from the database, eliminating the need for runtime timezone conversions and
improving performance for frequently called methods.

Co-authored-by: openhands <openhands@all-hands.dev>
- Cleaned up overly specific comments about database timezone storage
- Comments were too implementation-specific and verbose for the context
- Code is now cleaner and more maintainable

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww
Copy link
Copy Markdown
Collaborator

@OpenHands I'm not an OAuth expert, please do a /codereview-roasted and tell me if the implementation is correct

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Dec 15, 2025

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Dec 15, 2025

Summary

I've completed a comprehensive Linus-style code review of the OAuth 2.0 Device Flow implementation in PR #11984. As requested, I focused on security correctness and implementation quality rather than making changes.

🔥 Review Results

Overall Rating: 🟡 Acceptable with caveats - Core logic is sound but has critical security issues that must be fixed before merging.

Critical Issues Found:

  1. 🚨 CSRF Vulnerability (HIGH PRIORITY)

    • Location: frontend/src/routes/device-verify.tsx lines 57-62
    • Problem: Auto-submits device authorization without user consent when authenticated user visits the URL
    • Risk: Attacker can trick users into authorizing attacker's devices by sending them a malicious link
    • Fix Required: Add explicit user confirmation UI before authorization
  2. 🚨 Transaction Atomicity Violation (HIGH PRIORITY)

    • Location: enterprise/server/routes/oauth_device.py lines 261-305
    • Problem: API key is created before device authorization; if authorization fails, orphaned API keys remain in database
    • Risk: Data integrity issues and API key leakage
    • Fix Required: Use single transaction or create API key AFTER successful authorization
  3. ⚠️ RFC 8628 Spec Violation (MEDIUM PRIORITY)

    • Location: enterprise/server/routes/oauth_device.py line 128
    • Problem: Token endpoint accepts JSON instead of application/x-www-form-urlencoded as required by RFC 8628
    • Risk: Breaks interoperability with standard OAuth clients
    • Fix Required: Change to Form parameters

Additional Issues:

  1. No rate limiting on device creation endpoint (DoS risk)
  2. API key expiration too short (24h vs industry standard 30-90 days)
  3. User code entropy could be higher for safety margin
  4. No cleanup of expired device codes (database bloat)

Positive Aspects:

✅ Previous reviewer's feedback was addressed (SQLAlchemy sessions, race conditions, timezone handling)
✅ RFC 8628 rate limiting properly implemented with slow_down
✅ Comprehensive test coverage (32 tests)
✅ Device-specific API keys for per-device revocation

Recommendation:

Do not merge until HIGH priority security issues are fixed. The CSRF vulnerability is particularly serious and could make headlines. The transaction atomicity issue will cause subtle production bugs. These should be addressed before deployment.


Note: I did not make any code changes as requested - this is a review-only assessment per the /codereview-roasted command.

View full conversation

openhands-agent and others added 6 commits December 15, 2025 15:23
- Reorder operations to authorize device before creating API key
- Add cleanup logic to revert authorization if API key creation fails
- Implement transaction integrity to prevent data inconsistency
- Add comprehensive error handling with detailed logging
- Include test suite for all failure scenarios

This fixes a critical security issue where API keys could be orphaned
in the database if device authorization failed after key creation.

Co-authored-by: openhands <openhands@all-hands.dev>
- Change /oauth/device/token endpoint to accept form data instead of JSON
- Replace DeviceTokenRequest model with Form parameter
- Update device_code parameter handling throughout the endpoint
- Ensures RFC 8628 compliance for OAuth 2.0 Device Authorization Grant

Co-authored-by: openhands <openhands@all-hands.dev>
- Remove automatic device authorization on page load
- Add explicit user consent UI with security warning
- Require user to click 'Authorize Device' button before API call
- Display device code clearly with security notice
- Maintain existing manual code entry functionality

This prevents attackers from tricking users into authorizing devices
by sending malicious links with user_code parameters.
- Remove unused variable api_key_created in oauth_device.py
- Fix code formatting and style issues
- Ensure all enterprise linting checks pass
- Remove non-existent DeviceTokenRequest import
- Update test calls to use device_code parameter instead of request object
- All 22 tests now pass successfully
- Tests align with current Form-based parameter implementation

Co-authored-by: openhands <openhands@all-hands.dev>
Comment thread enterprise/server/middleware.py Outdated
Comment on lines +74 to +109
def check_rate_limit(self) -> tuple[bool, int]:
"""Check if the client is polling too fast.

Returns:
tuple: (is_too_fast, current_interval)
- is_too_fast: True if client should receive slow_down error
- current_interval: Current polling interval to use
"""
now = datetime.now(timezone.utc)

# If this is the first poll, allow it
if self.last_poll_time is None:
return False, self.current_interval

# Calculate time since last poll
time_since_last_poll = (now - self.last_poll_time).total_seconds()

# Check if polling too fast
if time_since_last_poll < self.current_interval:
# Increase interval for slow_down (RFC 8628 section 3.5)
new_interval = min(self.current_interval + 5, 60) # Cap at 60 seconds
return True, new_interval

return False, self.current_interval

def update_poll_time(self, increase_interval: bool = False) -> None:
"""Update the last poll time and optionally increase the interval.

Args:
increase_interval: If True, increase the current interval for slow_down
"""
self.last_poll_time = datetime.now(timezone.utc)

if increase_interval:
# Increase interval by 5 seconds, cap at 60 seconds (RFC 8628)
self.current_interval = min(self.current_interval + 5, 60)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, @OpenHands is there an library we can use to do all these, instead of needing to implement these ourselves?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I answered the question about whether there's a library available for OAuth 2.0 Device Flow implementation instead of building it from scratch.

Answer

Yes, there is a library: Authlib - Python's most comprehensive OAuth library with full RFC 8628 (Device Flow) support.

Key Points:

What Authlib Provides:

  • ✅ Built-in DeviceAuthorizationEndpoint and DeviceCodeGrant classes for RFC 8628
  • ✅ Battle-tested, standards-compliant implementation
  • ✅ Handles device code generation, rate limiting, and token polling

The Important Caveat:

  • ✅ Authlib has full OAuth server support for Flask and Django
  • ❌ For FastAPI/Starlette, Authlib only provides OAuth client functionality (not server)
  • ⚠️ Using Authlib with FastAPI as an OAuth server would still require:
    • Implementing abstract methods yourself
    • Building custom FastAPI integration layer
    • Connecting to your SQLAlchemy models

My Recommendation:
Your custom implementation is appropriate for this use case because:

  1. You're building an OAuth server with FastAPI (Authlib has no pre-built integration for this)
  2. Your implementation is already complete, tested (32 passing tests), and working
  3. Your use case is customized (returning API keys instead of standard OAuth tokens)
  4. The code is clean, well-structured, and maintainable

Bottom Line: While Authlib exists and is excellent for standard OAuth scenarios, your focused custom implementation (~300 lines) is a pragmatic choice given the lack of FastAPI OAuth server support in Authlib and your already-working solution. You could consider refactoring to Authlib in the future if OAuth requirements expand significantly beyond Device Flow.

View full conversation

openhands-agent and others added 4 commits December 15, 2025 19:59
- Fix TypeError when comparing timezone-naive expires_at from database with timezone-aware datetime.now(UTC)
- Add timezone handling in validate_api_key method to convert naive datetimes to UTC before comparison
- Add comprehensive tests for both timezone-aware and timezone-naive scenarios
- Ensures API keys created through OAuth device flow work immediately after creation

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@malhotra5 malhotra5 merged commit 7853b41 into main Dec 16, 2025
25 checks passed
@malhotra5 malhotra5 deleted the oauth-device-flow-backend branch December 16, 2025 16:54
afanty2021 pushed a commit to afanty2021/OpenHands that referenced this pull request Dec 17, 2025
…1984)

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants