Skip to content

[TLS 1.3] Consolidate the Session Manager#3150

Merged
reneme merged 8 commits intomasterfrom
tls13/session_consolidation
Feb 24, 2023
Merged

[TLS 1.3] Consolidate the Session Manager#3150
reneme merged 8 commits intomasterfrom
tls13/session_consolidation

Conversation

@reneme
Copy link
Copy Markdown
Collaborator

@reneme reneme commented Dec 30, 2022

Pull Request Dependencies

Strong Types

Those are the first code changes that make use of the strong-type helper introduced in #3165. Particularly for Session_ID and Session_Ticket, which are otherwise simply std::vector<uint8_t> containers.

class Session

This is now a dumb data-sink that can (un)marshal itself (and knows how to encrypt/decrypt itself). In particular it doesn't have a notion about its own ID or ticket anymore. These are the realm of the Session_Manager.

struct Session_Handle

A descriptor for a session. As in the example above: It contains either a (TLS 1.2-ish) ID or a ticket (TLS 1.2 or 1.3) but never both. These invariants are ensured by the class. The Session_Manager emits a Session_Handle for new sessions and spits them out again given a Session_Handle. It can also store a Session along with an existing session handle.

class Session_Manager

I put some documentation in the header, so won't repeat that here. Main point: There is a sharp distinction between APIs useful for the server and the client:

Servers create Session objects and pass them to Session_Manager::establish(Session) -> Session_Handle. Hence, the Session_Manager is in full control of both the Session_ID and the Session_Ticket. Servers will retrieve Session objects directly via Session_Manager::retrieve(Session_Handle) -> Session (TLS 1.2) or via Session_Manager::choose_from_offered_tickets. Both are high-level methods implemented directly in the base class. They take care of session policy validation (e.g. expiry). Low-level storage methods must be implemented by all sub-classes.

Clients on the other hand always receive a Session_Handle from the server (that's just how TLS works). They use Session_Manager::store(Session, Session_Handle) to associate their sessions with the handles from the server. Usually, clients will then use Session_Manager::find(Server_Information) -> std::vector<std::pair<Session, Session_Handle>> to retrieve potential sessions for resumption. Again, find is a high-level method implemented in the base class in terms of a low-level abstract storage method.

Note that clients may retrieve multiple sessions for resumption. TLS 1.3 allows clients to provide several tickets (as PSK identities) at once. The Session_Manager can opt to always return a single Session, Session_Handle pair, of course. On the server-side Session_Manager::choose_from_offered_tickets() receives the tickets offered by the client and selects the "most favorable": The default implementation simply selects the first that matches the server's needs.

I'm hoping that typical users won't need to override the high-level (policy-enforcing) methods. They are marked as virtual though and can be overridden if needed.

Session deletion (e.g. after usage by the client/server) is done with Session_Manager::remove(Session_Handle).

class Session_Manager_Stateless

A strawman manager for a stateless server. It generates random Session IDs and creates tickets via Session::encrypt(). The sessions are not persisted and Session_Manager::retrieve(Session_ID) will always fail. To stay compatible with the current architecture Session_Manager_Stateless uses Credentials_Manager::psk() to obtain the necessary ticket key.

class Session_Manager_In_Memory

Like before, this keeps sessions in-memory in a std::map<>. Note that the SessionID strong-type acts as a key! This implementation will never emit session tickets.

class Session_Manager_SQLite

I adapted the implementation to the new Session_Manager API. Most notably, the implementation now detects old (and/or corrupted) databases and resets them transparently. Some more clean-ups could be done regarding the SQL database and statements wrapper. As the size of this patch is already getting out of hand, that will come later.

class Session_Manager_Hybrid

This internally combines a Session_Manager_Stateless with an arbitrary (stateful) Session_Manager implementation. It automatically creates either IDs or tickets depending on whether the TLS 1.2 peer can handle tickets. Applications can choose whether to prefer tickets or IDs at construction time. This is likely the most useful flavor for TLS servers.

What's next

First: Thanks for reading through those thoughts. 😄

A few open questions:

  • Strong types: yay? or nay?
  • Should we go as far and separate the Session_Manager into a server and a client part? Now that the API seems almost fully disjoint (apart from ::remove()).
  • Required technical understanding of the application developer: is this taking it too far?
    • Generation of valid SessionIDs and Tickets
    • Lifetime handling
    • Choosing from a number of offered tickets
    • Balancing how many tickets a TLS 1.3 client should offer at once
    • ...
  • ...

@reneme reneme force-pushed the tls13/session_consolidation branch from c07679d to e2dba40 Compare December 30, 2022 18:31
@reneme reneme force-pushed the tls13/session_consolidation branch from b3e1971 to 3312899 Compare January 2, 2023 09:43
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 2, 2023

Codecov Report

Base: 88.05% // Head: 88.16% // Increases project coverage by +0.11% 🎉

Coverage data is based on head (f5b500e) compared to base (a3c26f5).
Patch coverage: 88.78% of modified lines in pull request are covered.

❗ Current head f5b500e differs from pull request most recent head d9b2a79. Consider uploading reports for the commit d9b2a79 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3150      +/-   ##
==========================================
+ Coverage   88.05%   88.16%   +0.11%     
==========================================
  Files         608      615       +7     
  Lines       68674    69479     +805     
  Branches     6840     6906      +66     
==========================================
+ Hits        60470    61256     +786     
+ Misses       5343     5340       -3     
- Partials     2861     2883      +22     
Impacted Files Coverage Δ
src/cli/tls_utils.cpp 64.63% <0.00%> (ø)
src/fuzzer/tls_client.cpp 55.55% <0.00%> (ø)
src/fuzzer/tls_server.cpp 38.59% <0.00%> (ø)
src/lib/tls/tls13/tls_ticket_13.cpp 0.00% <0.00%> (ø)
src/lib/tls/tls_client.cpp 85.45% <ø> (ø)
src/tests/test_tls_messages.cpp 89.24% <0.00%> (ø)
src/tests/test_tls_stream_integration.cpp 91.81% <ø> (ø)
src/tests/unit_asio_stream.cpp 98.63% <ø> (ø)
src/cli/tls_client.cpp 59.87% <40.00%> (ø)
src/cli/tls_server.cpp 51.11% <40.00%> (ø)
... and 44 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@reneme reneme force-pushed the tls13/session_consolidation branch 2 times, most recently from 2c78ddc to ac56132 Compare January 3, 2023 09:54
@randombit randombit self-requested a review January 3, 2023 16:01
@randombit
Copy link
Copy Markdown
Owner

Strong types: yay? or nay?

I haven't seen this before in C++ tbh, but I'm generally a fan of newtypes in Rust which seem to be basically the same idea. Fine to proceed along these lines.

Should we go as far and separate the Session_Manager into a server and a client part. Now that the API seems almost fully disjoint (apart from ::remove()).

The underlying storage mechanism is going to be the same though, and we'd be duplicating that if we split the managers.
(Possibly the real fix here is to make the storage mechanism distinct from the policy logic)

Required technical understanding of the application developer: is this taking it too far?
...

I think we should provide toggles where someone might reasonably have differing needs, but in general people just want something to work and they shouldn't have to understand what any of this is doing unless they are doing something tricky or unusual. If in doubt we should just use whatever default parameters that say BoringSSL or OpenSSL uses.

@reneme
Copy link
Copy Markdown
Collaborator Author

reneme commented Jan 6, 2023

This seems to finally converge. Frankly, I underestimated the intricacies of consolidating 1.2 and 1.3 behind the same API, but I'm fairly happy with this now. Commit history will need cleanup and potentially, I'll also split this into two or three patch sets.

Main improvements from an (advanced) user's perspective:

  • Full control over the usage of stateful or stateless resumption by subclassing Session_Manager.
    • Previously, ticket creation was hard-coded
    • Stateful and stateless operation code is easily reusable for TLS 1.2 and 1.3
  • A number of default Session_Manager classes to choose from:
    • ..._In_Memory
      as before, keeps state in RAM. When used in servers, it effectively disables stateless ticket support. When used in clients, tickets as well as IDs can be stored and resumed. Clients will probably be fine with this (or the SQL flavor).
    • ..._Stateless
      effectively disables stateful resumption (does not store anything) in servers. Is not useful for clients.
    • ..._Hybrid
      internally combines a Session_Manager_Stateless with an arbitrary (stateful) Session_Manager implementation. Automatically creates either IDs or tickets depending on whether the TLS 1.2 peer can handle tickets. Applications can choose whether to prefer tickets or IDs at construction time. This is likely the most useful flavor for TLS servers.

Still to do

  • Cleanup
  • Move SQL-based managers to the new architecture
  • Remove Session_ID and Session_Ticket from the Session object
    (Sessions don't need to be aware of their handle)
  • Proliferate the strong types a bit more (likely in a follow-up)
  • Rebase [TLS 1.3] Support Session Resumption for TLS Server  #3140 onto this (re-posted below)
    (I smoke-tested this locally already to validate the API)

Questions

  • The SQL database schema will likely change, do we need a migration for this? Implemented as "drop table and re-initialize"
  • The Session DER-schema will likely change. Migration needed? No migration
  • Anything we could improve on the above schematics once we're on it?
  • Do you think the documentation for Session_Manager is sufficient to enable users to custom-make their own?

@randombit
Copy link
Copy Markdown
Owner

The SQL database schema will likely change, do we need a migration for this?
The Session DER-schema will likely change. Migration needed?

IMO no. TLS sessions are transitory. We should just make sure it's clearly called out in the release notes and migration guide.

@reneme reneme force-pushed the tls13/session_consolidation branch 8 times, most recently from 7a12ab8 to a257f5a Compare January 17, 2023 08:14
@reneme reneme force-pushed the tls13/session_consolidation branch from c0a4e81 to 3adece2 Compare January 20, 2023 12:44
@randombit randombit mentioned this pull request Jan 22, 2023
29 tasks
@reneme reneme force-pushed the tls13/session_consolidation branch from 3adece2 to c1f1198 Compare January 25, 2023 12:11
@reneme reneme force-pushed the tls13/session_consolidation branch from 343ee18 to 29f551e Compare January 26, 2023 14:38
@reneme reneme marked this pull request as ready for review January 26, 2023 16:16
@reneme
Copy link
Copy Markdown
Collaborator Author

reneme commented Jan 26, 2023

Sorry for leaving this dormant for a while. I waited for a continuous time slot to finish this up as it required some more focused refactoring and RFC pondering. Anyway, this should be the last oversized PR for TLS 1.3 😄.

Turns out that @randombit was right, that storage and policy should be somewhat separated:

Should we go as far and separate the Session_Manager into a server and a client part. Now that the API seems almost fully disjoint (apart from ::remove()).

The underlying storage mechanism is going to be the same though, and we'd be duplicating that if we split the managers.
(Possibly the real fix here is to make the storage mechanism distinct from the policy logic)

I didn't split the managers into a client and a server part. Instead I added a layer of indirection in the Session_Manager API that allows implementing policy filters (mainly expiry) in the base class where all derived managers can
benefit from it. Details are in Doxygen of session_manager.h.

Final touches

... that might be done in follow-ups as this PR is becoming ridiculously hard to reason about. Mainly because the API changes of Session_Manager require adaptions in examples, tests, CLI, fuzzer and so forth.

  • Update the PR's top-level description to provide an overview of the relevant changes
  • Commit history cleanup
  • Distinction between Policy::session_ticket_lifetime() and Policy::tls12_session_lifetime() (would be a new policy)
  • Session_Manager::used_in_handshake(bool success) to inform the manager that a ticket was used in a handshake so that it may be removed
    (Instead, implementations guarantee to call ::find() only once per handshake. I.e. if a manager handed out a session, it can assume that it will be used and remove it internally. Advantage: find() is an atomic operation that should facilitate concurrent accesses without additional complication)
  • TLS::Callbacks::tls_session_established() should be changed to reflect: [TLS 1.3] Overhaul TLS::Callbacks #2988 (comment)
  • eventually: Rebase [TLS 1.3] Support Session Resumption for TLS Server  #3140 onto this

Comment on lines +67 to +76
const auto ticket_age = callbacks.tls_current_timestamp() - session->start_time();
const bool expired = ticket_age > policy_lifetime;
Copy link
Copy Markdown
Collaborator Author

@reneme reneme Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies the same expiry filter to both tickets and TLS 1.2 session IDs. Previously, there were two lifetimes:

  • Policy::session_ticket_lifetime() and
  • Session_Manager_Memory/SQL(..., session_lifetime).

TLS 1.3 doesn't make this distinction anyway, but we could introduce a Policy::tls12_session_id_lifetime() to allow the same flexibility as before or at least rename session_ticket_lifetime() to session_lifetime().

Note that derived Session_Manager classes each used to implement the expiry logic themselveds. That is still possible (by overriding ::retrieve() and ::find()) but not necessary anymore. For applications to use the default implementation, they would just need to override the "internal storage methods" ::retrieve_one() and ::find_all().

This introduces a recursive mutex into the base class and implements
'Lockable' for the base class. Derived classes can therefore just
lock *this using a scoped lock and don't have to maintain their own
mutex.
@reneme reneme force-pushed the tls13/session_consolidation branch from b1018b8 to 558f220 Compare February 21, 2023 15:31
@reneme
Copy link
Copy Markdown
Collaborator Author

reneme commented Feb 21, 2023

rebased to master to solve conflicts with enum adaptions.
Edit: had to work around the GCC ICE first seen in: #3246.

@reneme reneme force-pushed the tls13/session_consolidation branch from e9e2b49 to 4c1c299 Compare February 22, 2023 07:50
Copy link
Copy Markdown
Collaborator

@lieser lieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First incomplete round of review.

Comment on lines +26 to +27
BOTAN_ASSERT(session.side() == Connection_Side::Server,
"Client tried to establish a session");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely happy with the combined API of client and server session manager. Having to do this kind of misuse checks indicates to me that this is not optimal.

But since the API of the Session_Manager is more intended for internal use (i.e. a user should not really need to call any of the problematic methods directly himself) I am OK with keeping as is if a split would complicate things to much.

Copy link
Copy Markdown
Collaborator Author

@reneme reneme Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree. Let's spike an alternative API together.

As discussed: let's revisit after merging.

@reneme
Copy link
Copy Markdown
Collaborator Author

reneme commented Feb 23, 2023

Copy link
Copy Markdown
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not do a terribly careful review here, but leaving an approval to indicate I didn't notice any immediate problems / no objections to merging.

@reneme
Copy link
Copy Markdown
Collaborator Author

reneme commented Feb 24, 2023

Done from my side. Waiting for CI and @lieser's approval.

@reneme reneme requested a review from lieser February 24, 2023 10:20
@reneme reneme merged commit a8d0853 into master Feb 24, 2023
@reneme reneme deleted the tls13/session_consolidation branch February 24, 2023 11:36
@reneme
Copy link
Copy Markdown
Collaborator Author

reneme commented Feb 24, 2023

🥳

@oviano
Copy link
Copy Markdown
Contributor

oviano commented Mar 27, 2023

A side note on thread safety: The Session_Manager_SQL did not use a mutex so far. I believe it should though. I guess the implementation relied on SQLite being thread-safe by default. Though, this can be disabled by the application developer (e.g. when compiling their SQLite in single-threaded mode). Also, custom SQL adapters might not provide the same guarantees.

I wanted to comment on this as one of my use/test cases is a client-side test application which opens tens of thousands of TLS connections to a server. To force it to store separate session records for each client, despite connecting to the same host and port, I plan to append an id to the host name prior to the look-up.

It seems unfortunate to me not to be able to leverage the multi-threading capbilities of SQLite, and that we essentially have to serialise these look-ups.

Is it really necessary to lock a mutex around every single SQL look-up?

Under Botan 2.0 I had written my own session manager, but I figured with all the changes to the session manager API for 3.0, rather than retrofit what I have I'd give the SQLite one a go and see how it performed.

Perhaps it can be optional as a template argument to the session manager allowing you to supply a no-op mutex or something...

@reneme
Copy link
Copy Markdown
Collaborator Author

reneme commented Mar 27, 2023

Is it really necessary to lock a mutex around every single SQL look-up?

I see this mainly as a safety precaution. But perhaps you are right and that's a bit too broad. Maybe we could use sqlite3_threadsafe() to figure out whether we need the mutex or not?

@oviano
Copy link
Copy Markdown
Contributor

oviano commented Mar 27, 2023

Reading up a little bit on SQLite, my understanding is that there are three settings.

  1. Single threaded, so no mutex, not thread safe at all.
  2. Multi-threaded, allowing different threads to access the DB at the same time, provided they use their own connection
  3. Serialized, so SQLlite allows full access to the same database connection from multiple threads and applies the necessary locking to achieve this.

Actually, in my use case, where I have a fixed number of worker threads servicing connections then it would seem that something could be possible with option 2 above in conjunction with a thread_local helper to manage the database connection lifetime.

Maybe it would be a custom/derived SQLite Session_Manager, but there is still some locking going on in the base Session_Manager and Session_Manager_SQL classes which would need addressing somehow...so maybe something along the lines of being able to provide the mutex type (or a no-op) for the Session_Manager base class as a template argument and then the current Session_Manager_SQLite would pass in recursive_mutex_type, and some new Session_Manager_SQLite_TL (thread local) would pass in the noop_mutex but then would change how the DB is created, to do some sort of thread local magic to ensure one database connection exists per thread, plus also specifying to sqlite3_open that it can operate in multithreaded mode.

Just some thoughts anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants