-
Notifications
You must be signed in to change notification settings - Fork 18
Orion V2 initial implementation draft #48
Description
Orion V2 initial implementation draft
The draft implementation I'm describing in this document can be found HERE.
This repository is based on the current squid-substrate-template and contains all the input schemas, refactored Atlas queries, drafts of the custom graphql resolvers etc., as well as some basic setup and reference code to ilustrate some of the ideas of how certain issues can be addressed.
The official Subsquid documentation, which is often referenced throught this document can be found here: https://docs.subsquid.io/
How to run the local setup
I'll be explaining each part of the setup more deeply later in this document, but here is just a quick-start reference:
- Clone the Joystream repository (if not already done)
git clone https://github.com/Joystream/joystream.git- Run the
joystream-nodeservice in the Joystream repository
# You can also specify the usual environment variables like `RUNTIME_PROFILE` etc.
export JOYSTREAM_NODE_TAG=$(./scripts/runtime-code-shasum.sh)
docker-compose up -d joystream-node- Clone the Subsquid-Orion repository
cd ..
git clone https://github.com/Lezek123/subsquid-orion.git- Run the archive (indexer)
cd subsquid-orion/archive
docker-compose up -d- Build the processor
cd ..
npm install
make codegen
make build- Run and migrate the processor datbase
make up
make migrate- Run the processor
make process- Run the GraphQL server
make serveAfter performing those steps you should be able to go to http://localhost:4350/graphql and see something like this:

Currently the processor will produce some mock data on each block so you can also test some of the existing queries:

On-chain data indexing and processing
Squid Archive
Squid Arachive is analogous concept to the Hydra Indexer, it uses the Joystream node websocket rpc endpoint to fetch data about on-chain events and extrinsics and store it in a relational database (PostgreSQL).
We can configure the archive via a docker-compose file located in archive/docker-compose.yml.
The current squid archive configuration is using a local Joystream docker node (ws://joystream-node:9944) that runs on joystream_default network as a source.
SubstrateBatchProcessor
SubstrateBatchProcessor is a class we use to instantialize the events processor. As opposed to Hydra, where we would only implement the "mapping" functions (or "mappings"), Subsquid let's us instantialize and programatically configure the processor ourselves (manifest.yml file no longer required), which gives us more controll over its behavior.
SubstrateBatchProcessor is just one of the many processor implementations available in Subsquid, but it's the one currently recommended for processing substrate events and extrinsics. This specific processor implementation queries all blocks along with the events of interest from the Squid Arachive (using the @subsquid/substrate-gateway service). The maximum number of blocks in a single batch currently depends on the @subsquid/substrate-gateway implementation, it's still a little unclear how this will work in the future, but currently there are two main components that affect the batch size:
- the time it takes to read & prepare a batch (by the gateway) is limited to 5 seconds
- the size is limited to "1 MB" (however note that this currently depends on some assumptions about "average" event/call/extrinsic size, which are not very reliable, so the final size may be much greater)
Current processor implementation:
In the current draft implementation:
- the processor is given
new TypeormDatabase({ isolationLevel: 'READ COMMITTED' })instance (which is then used to insert data into the database). As you can see we can easily provide config forTypeormDatabasehere, which we take advantage of by specifingisolationLevel: 'READ COMMITTED'. This isolation level reduces the possibility of conflicts, since the database state will be modified by both the processor and through the external api (ie. counting video views, featuring, reporting video/channel etc., as explained further down below); - I added some code which populates the database with a bunch of "mock" entities on each
System.ExstrinsicSuccessevent; - I specified
Content.VideoCreatedandContent.ChannelCreatedamong events of interest for ilustration purposes.
This impelementation provides a decent general overview of how the "mappings" are written in Subsquid and how one can extract the events&data of interest from a batch and then perform bulk inserts/updates at the end of processing a batch, which considerably increases the performance.
The API
Input schema
The current input schema files can be found here: https://github.com/Lezek123/subsquid-orion/tree/main/schema
I tried to preserve a similar schema to the one we currently use in Hydra, however there are some notable differences:
- I only used entities that are actually of interest for Atlas (so no proposals, working group, forum etc.);
- In some entities I reduced the set of fields to only those that are currently of use for Atlas. For example, since Atlas currently doesn't support channel collaborators, I removed
collaboratorsfield from theChannelentity. This is mainly for simplification purposes and to reduce the initial scope of work, they can of course be added later if needed; - Interfaces are no longer supported in Hydra, however, since unions can be used as an alternative, I refactored the events to use
EventDataunion instead. The result can be seen here. I will explain the differences that come from this change further down below; - Since deeply nested filters are now supported, as well as
nested field queries, I removed some redundant entity relationships etc., which I assumed were mostly serving as a workaround for the lack of those features before; - I replaced fields like
nftOwnerMember,isNftOwnerChannel,nftOwnerCuratorGroupin the NFT entity with aNftOwnerunion for better clarity - Unified state: The input schema now also includes entities that were previously only existing in Orion, like
ChannelReportetc. Some of those entities will probably be moved away from the input schema, unless we want to take advantage of the autogenerated queries that Subsquid will provide for them. If not - we can just use custom models instead (as described further below). - Entities like
VideoandChannelnow include newfollowsNumandvideoViewsNumcounters, as Atlas relies on being able to execute queries that include sorting based on those values, as well as recieving those values as part of the result set. Having that in mind, I decided that the cost of introducing those additional fields is relatively low compared to the efficiency&simplicity benefits, however some custom aggregation queries (like sorting based on number of follows within given time period) will also be needed, as will be explained further below. activeVideoCounterfields have been removed, as this data is now accessible through extended queries. See Customtype-graphqlResolvers section for more details;createdAtandupdatedAtfields are no longer automatically added, so in some cases I included them in the input schema (forEvententity however I decided to name the fieldtimestampinstead);Many-to-Manyrelationships are not supported in Subsquid so they were refactored to 2-sideMany-to-Onerelationships with a specific "join entity".
Custom models
Subsquid comes with a nice directory structure alowing us to define our own TypeORM models separately from the autogenerated ones, however they will all become part of the same database.
Use cases:
The primary use-case for definig those custom models is when we don't want Subquid to autogenerate the public api endpoints for quering certain (private) data, but we still want to keep this data as part of the same database to take advantage of the relational model. Take User entity for example. We want to be able to connect users with channels through User>-ChannelFollow-<Channel relationship, but we don't necessarily want to expose any User data through the api, that's why we define custom models for User and ChannelFollow, but we don't include those entities in the input schema.
Custom GraphQL api extensions
Subsquid allows use to add some custom extensions to the autogenerated GraphQL api. Those are stored in the src/server-extension/ library and constitute a significant part of the project.
Custom type-graphql Resolvers
Custom type-graphql resolvers are classes where we can define our custom GraphQL queries, mutations and subsriptions that will then be included in the final API.
Normally we run a Subsquid graphql server using the @subsquid/graphql-server library/service, which generates and runs a GraphQL server based on the input schema. For the purpose of generating the final ("output") schema and resolves it uses another library called @subsquid/openreader. The schema generated by @subsquid/openreader is then merged with the schema generated from our custom resolvers that we are providing in src/server-extension/resolvers. For this merge, the mergeSchemas method from graphql-tools library is used.
The interesting property of mergeSchemas is that this method also merges all individual GraphQL types defined in both schemas, which makes us able to reuse the autogenerated types like Video, VideoWhereInput, VideoOrderByInput etc. All we have to do is define a graphql object with the same name in our resolvers space and at least one property which matches with the autogenerated object (for entities it can be, for example, id: string). Then when the types are merged, we will get a consistent Video object with all the expected properties in the final schema.
This can be probably better understood by looking at the implementation inside https://github.com/Lezek123/subsquid-orion/tree/main/src/server-extension/resolvers, especially https://github.com/Lezek123/subsquid-orion/blob/main/src/server-extension/resolvers/baseTypes.ts where the "placeholders" for the to-be-autogenerated types are defined.
There are also many other useful references in this directory:
- In https://github.com/Lezek123/subsquid-orion/blob/main/src/server-extension/resolvers/ChannelsResolver/index.ts#L31 I implemented
extendedChannelsquery to ilustrate how we can take advantage of the subsquid libraries to define queries that build "on top of" the autogenerated queries likechannels. In this example implementation we make use of the autogeneratedChannelWhereInput,ChannelOrderByInputtypes and translate the client's request to a corresponding SQL query, same way it would've been done by thechannelsquery. However, on top of this, we're able to include some custom, additional fields (like, in this case, thecount()of videos that meet certain criteria in each channel) and allow the client to filter and sort the results by those fields (they can also be included in the results set if needed). In this specific implementation I showed how we can get rid ofChannel.activeVideosCountfield, which would otherwise need to be constantly updated on many different occasions, by replacing it with a subquery that can be ran for each queried channel upon client's request instead; - In https://github.com/Lezek123/subsquid-orion/blob/main/src/server-extension/resolvers/StateResolver/index.ts#L30 I've shown how we can implement a custom subscription to keep Atlas informed about the current processing state;
- In https://github.com/Lezek123/subsquid-orion/blob/main/src/server-extension/resolvers/ChannelsResolver/index.ts#L164 and a few other places you can see how we can easily add some custom mutations.
Use-cases summary for custom resolvers:
- Introducing custom queries or extending the autogenerated ones:
extendedChannelsquery (allows querying channel along withactiveVideosCountaggregation)extendedVideoCategoriesquery (allows querying video categories along withactiveVideosCountaggregation)mostRecentChannelsquery (a query that allows filtering and ordering results among X most recent channels)channelNftCollectorsquery (allows querying the list of members who collected the highest number of nfts issued by a given channel)searchChannelsquery (allows implementing custom channel search logic)searchVideosquery (allows implementing custom video search logic)mostViewedVideosConnectionquery (allows querying videos with the highest number of views in a given time period)getKillSwitchquery (allows retrieving the current Atlas "killSwitch" status)videoHeroquery (allows retrieving information about content currently featured in the Atals Hero section)
- Introduing mutations:
followChannelunfollowChannelreportChannelreportVideoaddVideoViewsetKillSwitch(operator only)setVideoHero(operator only)
- Introducing subscriptions:
processorState(allows Atlas to stay updated about the current processing state, similar to the Hydra'sstateSubscription)
checkRequest plugin
The checkRequest plugin is a Subsquid feature that allows us act on the Apollo server's requestDidStart event. The handler function can be implemented inside src/server-extension/checkRequest.ts and recieves information like request headers, ip of the origin, all the data specific to the graphql request etc.
The current example implementation shows how this plugin can be used to introduce some authentication for all mutation requests.
Use cases:
- Authentication & access restriction: The main use-case I see here is restricting access to operator-only mutations like
setKillSwitch,setVideoHeroetc., however, as described in the Known issues section, I believe this is a suboptimal way of doing this.
Atlas queries: Refactored!
I refactored all existring Atlas queries (https://github.com/Joystream/atlas/tree/master/packages/atlas/src/api/queries) to match the new schema.
The results can be seen here. The directory structure matches the one in the Atlas repository, which makes it easy to do side-by-side comparison. I also added CHANGE: comment in all places where the changes where introduced.
The most notbale changes can be observed in the notifications / events queries, due to the refactorization of Event entities. It is now easier to query all the events of interest together and apply filtering, sorting and limit on the results of one query instead of making separate queries for each event type and then post-processing the results client-side.
Some other notbale changes include:
- Some unused (by Atlas) queries were removed;
- Wherever there was a reference to
entityId, it had to be replaced withentity.id, as Subsquid doesn't support the former syntax anymore; - Wherever there was a reference to
IDgraphql type id had to be replaced withString, as Subsquid doesn't support the former anymore; - Wherever there was
entityByUniqueInputquery used it had to be replaced w/entityById, asentityByUniqueInputis not supported in Subsquid. For members, which used to be queryable by handle, we can now either add a custom query or use the existingmembersquery (providing handle in thewhereclause); - Some redundant relations (like
event.data.bidder, if it can be also accessed through, for exampleevent.data.bid.bidder) were removed, so filtering now goes more deep in some cases; - For NFTs, fields like
nftOwnerMember,isNftOwnerChannel,nftOwnerCuratorGroupwere replaced withNftOwnerunion; - Some queries were renamed (like
admin=>getKillSwitch); - Channel/videoCategory queries that included
activeVideosCounterwere changed toextendedChannels/extendedVideoCategoriesqueries; - Videos featured in a category are now simply queried via
category.featuredVideosrelation; - Some very specific, REST-api-like Orion queries like
top10Videoswere replaced with a little bit more generic/customizable queries likemostVievedVideosConnection; - New search queries (separate one for channels and videos);
- Changes related to
Many-to-Manyrelationships no longer being supported.
Custom migrations: Setting up the database
Subsquid allows us to generate database migration files that we can then use to setup the processor database.
Besides that, we can also specify some custom migrations that will be ran before or after the generated ones.
In the draft implementation I introduced 2 custom migrations: Views and Indexes (since the filenames and class names need to include timestamp I've just choosen some arbitrarly high values to make sure those migrations are always ran after the autogenerated one)
Use cases for custom migrations:
- We can specify indexes on
jsonbfields or expressions, which is not possible through the input schema. This is useful when dealing with unions where some of the variants include a reference to another entity, like the newEventDataunion. - We can introduce views, which has a few benefits:
- We can simplify complex queries
- We can replace tables with views, for example, a
channelcan be replaced withchannelview, which allows us to filter out certain channels from the results of any autogenerated query. In the draft implementaton I usechannelview to exclude moderated channels, this way Atals doesn't need to worry about including this filter in each query, the censored channels are also hidden from anyone trying to query the server directly. If thechannelgets "unmoderated" however, it will instantly re-appear in the results (which wouldn't be possible if we just deleted moderated channels permanently).
Performance
Using the mocked data I did some performance tests against the current implementation, here are some results:
GetExtendedBasicChannelsquery
Arguments:
where: { activeVideosCount_gt: 2 },
orderBy: createdAt_DESC,
limit: 50Number of channel entries: 12,921
Number of video entries: 257,400
Time to execute the query: 86ms
GetNotificationsquery
Arguments:
channelId: "1",
memberId: "1",
limit: 50Number of event entries: 2,574,000
Time to execute the query: 880ms
GetNftHistoryquery
Arguments:
nftId: "1"Number of event entries: 2,974,000
Time to execute the query: ~9 seconds (!)
Potential candidate for optimalization
Benchmarks to be continued...
Known issues and unresolved questions
- The
Contextprovided by@subsquid/graphql-serveris very limited, for example, we cannot access client's IP address or any request headers inside the graphql resolver. This is problematic, as it makes authentication more complex (we have to use the separate "checkRequest" plugin) and also makes it difficult to handle mutations likeaddVideoView, where such data would be necessary to prevent abuse. - There are two ways I can think of in terms of introducing global category filtering, but both have their pros and cons:
- We can avoid storing unrelated videos/events completely, which seems like a natural approach to avoid unnecessary bloat of the database with unrelated content, however this has two obvious drawbacks:
- if the video category changes there is no easy way to get all the requeired information in order to (re)store this video,
- if the operator changes the supported category set we run into the same problem - we have no access to data from categories that were not supported before (unless we re-process the chain from scratch)
- We store all videos/events, which kind of defeats the idea of vertical scaling, however we can then use views to filter-out unrelated content as described in Custom migrations: Setting up the database section.
Alternatives to consider
"Manually" setting up graphql server instead of using @subsquid/graphql-server
To have more control of the setup we can run the graphql server from within our own codebase instead of using the @subsquid/graphql-server, we can still take advantage of @subsquid/openreader however to generate the initial schema and resolvers.
Pros of this approach:
- We solve the
Contextissue, so we can more easily implement our own authentication, authorization, rate limits and other restrictions. - Generally more freedom when writing our own resolvers, we can also decide how we want to merge them with the autogenerated schema.
- We avoid potential issues if the
@subsquid/graphql-serverchanges in the future in a way that no longer supports the current assumptions.
Cons:
- Requires more work
- We will be probably repeating a lot of work that Subsquid team already did/does, while we could try to push for more customizability instead (which I think Subsquid as a project would benefit from too)