fix: video language detection fix#309
Merged
zeeshanakram3 merged 3 commits intoJoystream:masterfrom Mar 8, 2024
Merged
Conversation
WRadoslaw
reviewed
Mar 4, 2024
src/utils/language.ts
Outdated
|
|
||
| // console.log(`Cleaned text: ${cleanedText}`) | ||
| // Get the most accurate language prediction | ||
| return detectAll(cleanedText).length ? detectAll(cleanedText)[0] : undefined |
Contributor
There was a problem hiding this comment.
I'm not sure how costly it is to run detection over string, but I would be behind calling it only once. We don't even what to assign it.
Suggested change
| return detectAll(cleanedText).length ? detectAll(cleanedText)[0] : undefined | |
| return detectAll(cleanedText)?.[0] |
This should have the same outcome with a single call.
Contributor
Author
There was a problem hiding this comment.
Ah, right that's a mistake on my end
src/utils/language.ts
Outdated
| let detectedLang: string | undefined | ||
|
|
||
| const titleLang = predictLanguage(title ?? '') | ||
| if (titleLang && titleLang?.accuracy < 0.5) { |
Contributor
There was a problem hiding this comment.
Did you make some benchmarking at what threshold prediction is mostly wrong?
Contributor
Author
There was a problem hiding this comment.
No, It's just a numerical guess, i.e. if the prediction confidence is less than 50%, then do the above logic.
WRadoslaw
approved these changes
Mar 4, 2024
zeeshanakram3
added a commit
that referenced
this pull request
Mar 12, 2024
…ication events (#314) * Add is short field to video entity (#301) * add isShort field to video entity * regenerate db migrations * remove @joystream/metadata-protobuf patch from assets/patches * fix lint issue * Disable both in Appp and eail notifications for video posted events (#299) * bump package version and update CHANGELOG (#302) * bump package version and update CHANGELOG * change release version * Simple public homefeed query and mutation (#304) * update graphql schema * add partial index on 'video.include_in_home_feed' field * update video view definition to only include public videos * regenerate migrations * add dumbPublicFeedVideos custom query * add setPublicFeedVideos mutation * fix lint issue * add arg to skip video IDs * revert: update video view definition to only include public videos * add feat. to unset public feed videos * address requested change * bump package version and update CHANGELOG * Update `nara` from `master` (#300) * Adds mappings for `ChannelAssetsDeletedByModerator` & `VideoAssetsDeletedByModerator` events (#199) * mark 'VideoDeletedByModerator' & 'ChannelDeletedByModerator' events deprecated * Implements mappings for 'Content.VideoAssetsDeletedByModerator and 'Content.ChannelAssetsDeletedByModerator' runtime events * remove unused import * Nara/crt update (#244) * feat: build orion * feat: start generating schema * fix: extra entities * fixup! * fix: continue implementing design specs * fix: review and fix foreign key relationships * fix: formatting * fix: generation errors * fix: add comment * fix: relations * fix: final review * fixup! * fix: add ending blocks * fix: generate type & set typegen to ipv4 * fix: add support for event backward compatibility * feat: start adding mappings * fix: continue with mappnigs * feat: init sale * feat: patronage decreased to & fixed build * feat: claim patronage event * feat: tokens bought on amm * feat: tokens sold on amm * fix: add relation between sales and vesting schedules * feat: add Tokens sold on sale vente * feat: update upcoming sale * feat: revenue share issued * feat: member joined whitelist * feat: amm deactivated * feat: burned token * feat: transfer policy changed to permissionless * feat: sale finalized * feat: finish mappings * fix: review * fix: remove cascade deletions * fix: renaming & formatting * fixup! * fixup! * fix: patched protobuf packages with token proto * feat: update metadata and add event handler scheleton * feat: token metadata * feat: sale metadata * fix: review comments * fix: formatting * fix: revenue * Revert "fix: revenue" This reverts commit 0821abe. * fix: token status after sale * fix: fixmes * fix: formatting * fix: funds accounting during sale * fix: amount accounting * fix: linter * fix: review * fix: review 2 * fix: review * fix: linter * feat: migration for new db scheam * fix: update event versions * fix: patch types with crt_release types * fix: patch types * fix: generate all events versions since mainnet * fix: temp fix after event version generation * fix: event versioning * fix: add migration * fix: mignations * fix: solve channel not being added * fix: add id to TokenChannel * fix: non-nullable deleted field set * fix: format * feat: creator token init sale re enabling * feat: re enable sale init code * fix: update types * fix: amm id * fix: id computation for revenue share * fix: amm id computation for token * fix: issuer transfer accounting * fix: amm tx id * fix: destination accounting * feat: minor fix on holder transfer processing * fix: re-enable metadata * fix: metadata parsing * fix: post reword cleanup * fix: format * fix: silence ci checks * fix: event version * fix: address PR changes I edited all the entity that have a composite index like TokenAccount so that they have a synthetic ID and an optionally unique @index * fix: add hidden entities conditions * fix: add extra fields to token in order to keep track of ongoing status * fix: build errors * fix: adapt mapping to new token fields * fix: format * feat: add trailer video entity this is required so we can simply make trailer video hidden if video is hidden * fix: linter * chore: prettier * fix: from PR review * fix: vesting schedule schema & mappings I have replaced the vesting schedule back to the original schema with: - VestingSchedule: holding vesting schedule information such being amount agnostic - VestedAccount: contains information regarded to a vested account, the goal is to mimic the runtime logic * fix: burning from vesting * patch: metadata-protobuf package * patch: metadata-protobuf package * fix: generate migrations * fix: purchase token on sale * Update schema/token.graphql Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> * Update schema/token.graphql Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> * fix: address PR * fix: hidden entities * fix: migration ok * feat: add extra check for migrations * fix: docker network * fix: format * fix: remove unrequired constraint * fix: 🐛 post rebase fixes * feat: 🎨 add metadata processing for issue token * feat(crt-v1): ✨ chain metadata for v 2003 * fix(crt-v1): 🚑 comment out view element for orion playgroud * fix(crt-v1): 🎨 add playground config variable to .env * feat: ✅ add tests * fix(crt-v1): 📦 packages and patches * fix(crt-v1): ✅ update entity id used and other minor fixes * fix(crt-v1): ✅ update entity id used and other minor fixes * test(crt-v1): 🐛 misc fixes to have tests working * test(crt-v1): 🐛 misc fixes to have tests working * fix(crt-v1): 🐛 metadata and trailer video * feat(crt-v1): 🎨 update types * fix(crt-v1): ✨ Add correct Ratio denomination (Permill) * update with master * fix: 🐛 metadata not being set * fix: 🐛 parameters order * test: 🧪 fixing integration tests * test(crt-v1): 🧪 fix integration tests * feat(crt-v1): ✨ last price for token and recovered field for rev share part * feat: ✨ add resolver for dividend amount * feat(crt-v1): ✨ start adding channel fields for trackingtotal revenue * feat(crt-v1): ✨ add utils for royalty computation * feat(crt-v1): ✨ cumulative revenue on channel * feat(crt-v1): ✨ add resolver for transferrable amount * fix(crt-v1): ✨ add `acquiredAt` to pinpoint latest vesting schedule for account * Token metadata processing update * Prettier * chore(crt-v1): ⚡ dbgen * fix(crt-v1): 🧪 fix integration tests * fix(crt-v1): 🐛 missing fields in token sale vesting source * test(crt-v1): 🧪 test for transferrable balance amount * fix(crt-v1): 🐛 transferrable amount * test: 🧪 update tests after resolver fix * fix: 🐛 error on vesting schedules array * fix: 🎨 CI fixes * docs: update gitignore * fix: 🚨 prettier * build: 📌 chai depnedencies --------- Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> Co-authored-by: WRadoslaw <r.wyszynski00@gmail.com> * Clear benefits even if not passed (#282) * 🤑 Fix revenue share dividend estimation (#297) * Fix on revenue share dividend estimation * Fix type on result * 🛕 Historical revenue share participants (#286) * New field for revenue share * Set potential revenue share particitants at the time of start * fix: .gitignore not working * fix lint issues * re-generate db migrations * commit register.html.mst file * fix: notifications integration test --------- Co-authored-by: Ignazio Bovo <ignazio@jsgenesis.com> Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> Co-authored-by: WRadoslaw <r.wyszynski00@gmail.com> Co-authored-by: WRadoslaw <92513933+WRadoslaw@users.noreply.github.com> * Revert "Update `nara` from `master` (#300)" (#306) This reverts commit 887427c. * generate auth api docs and types * add is short derived field to video entity (#310) * add is shirt derived field to video entity * add indices on is short fields * fix: video language detection fix (#309) * fix: video language detection fix * address requested changes * fix: predictVideoLanguage function * fix: include max 1 video per channel in homepage videos (#313) * fix: include max 1 video per channel in homepage videos * update setOrionLanguage Migration script * format updateVideoRelevanceValue SQL query * fix: use UTC midnight epoch instead of current epoch to calculate video relevance score * bump package version and update CHANGELOG * fix: lint bug * add CRT token 'channelId' to amm burn/mint and sale mint notification events --------- Co-authored-by: Ignazio Bovo <ignazio@jsgenesis.com> Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> Co-authored-by: WRadoslaw <r.wyszynski00@gmail.com> Co-authored-by: WRadoslaw <92513933+WRadoslaw@users.noreply.github.com>
zeeshanakram3
added a commit
that referenced
this pull request
Mar 13, 2024
* Add is short field to video entity (#301) * add isShort field to video entity * regenerate db migrations * remove @joystream/metadata-protobuf patch from assets/patches * fix lint issue * Disable both in Appp and eail notifications for video posted events (#299) * bump package version and update CHANGELOG (#302) * bump package version and update CHANGELOG * change release version * Simple public homefeed query and mutation (#304) * update graphql schema * add partial index on 'video.include_in_home_feed' field * update video view definition to only include public videos * regenerate migrations * add dumbPublicFeedVideos custom query * add setPublicFeedVideos mutation * fix lint issue * add arg to skip video IDs * revert: update video view definition to only include public videos * add feat. to unset public feed videos * address requested change * bump package version and update CHANGELOG * Update `nara` from `master` (#300) * Adds mappings for `ChannelAssetsDeletedByModerator` & `VideoAssetsDeletedByModerator` events (#199) * mark 'VideoDeletedByModerator' & 'ChannelDeletedByModerator' events deprecated * Implements mappings for 'Content.VideoAssetsDeletedByModerator and 'Content.ChannelAssetsDeletedByModerator' runtime events * remove unused import * Nara/crt update (#244) * feat: build orion * feat: start generating schema * fix: extra entities * fixup! * fix: continue implementing design specs * fix: review and fix foreign key relationships * fix: formatting * fix: generation errors * fix: add comment * fix: relations * fix: final review * fixup! * fix: add ending blocks * fix: generate type & set typegen to ipv4 * fix: add support for event backward compatibility * feat: start adding mappings * fix: continue with mappnigs * feat: init sale * feat: patronage decreased to & fixed build * feat: claim patronage event * feat: tokens bought on amm * feat: tokens sold on amm * fix: add relation between sales and vesting schedules * feat: add Tokens sold on sale vente * feat: update upcoming sale * feat: revenue share issued * feat: member joined whitelist * feat: amm deactivated * feat: burned token * feat: transfer policy changed to permissionless * feat: sale finalized * feat: finish mappings * fix: review * fix: remove cascade deletions * fix: renaming & formatting * fixup! * fixup! * fix: patched protobuf packages with token proto * feat: update metadata and add event handler scheleton * feat: token metadata * feat: sale metadata * fix: review comments * fix: formatting * fix: revenue * Revert "fix: revenue" This reverts commit 0821abe. * fix: token status after sale * fix: fixmes * fix: formatting * fix: funds accounting during sale * fix: amount accounting * fix: linter * fix: review * fix: review 2 * fix: review * fix: linter * feat: migration for new db scheam * fix: update event versions * fix: patch types with crt_release types * fix: patch types * fix: generate all events versions since mainnet * fix: temp fix after event version generation * fix: event versioning * fix: add migration * fix: mignations * fix: solve channel not being added * fix: add id to TokenChannel * fix: non-nullable deleted field set * fix: format * feat: creator token init sale re enabling * feat: re enable sale init code * fix: update types * fix: amm id * fix: id computation for revenue share * fix: amm id computation for token * fix: issuer transfer accounting * fix: amm tx id * fix: destination accounting * feat: minor fix on holder transfer processing * fix: re-enable metadata * fix: metadata parsing * fix: post reword cleanup * fix: format * fix: silence ci checks * fix: event version * fix: address PR changes I edited all the entity that have a composite index like TokenAccount so that they have a synthetic ID and an optionally unique @index * fix: add hidden entities conditions * fix: add extra fields to token in order to keep track of ongoing status * fix: build errors * fix: adapt mapping to new token fields * fix: format * feat: add trailer video entity this is required so we can simply make trailer video hidden if video is hidden * fix: linter * chore: prettier * fix: from PR review * fix: vesting schedule schema & mappings I have replaced the vesting schedule back to the original schema with: - VestingSchedule: holding vesting schedule information such being amount agnostic - VestedAccount: contains information regarded to a vested account, the goal is to mimic the runtime logic * fix: burning from vesting * patch: metadata-protobuf package * patch: metadata-protobuf package * fix: generate migrations * fix: purchase token on sale * Update schema/token.graphql Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> * Update schema/token.graphql Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> * fix: address PR * fix: hidden entities * fix: migration ok * feat: add extra check for migrations * fix: docker network * fix: format * fix: remove unrequired constraint * fix: 🐛 post rebase fixes * feat: 🎨 add metadata processing for issue token * feat(crt-v1): ✨ chain metadata for v 2003 * fix(crt-v1): 🚑 comment out view element for orion playgroud * fix(crt-v1): 🎨 add playground config variable to .env * feat: ✅ add tests * fix(crt-v1): 📦 packages and patches * fix(crt-v1): ✅ update entity id used and other minor fixes * fix(crt-v1): ✅ update entity id used and other minor fixes * test(crt-v1): 🐛 misc fixes to have tests working * test(crt-v1): 🐛 misc fixes to have tests working * fix(crt-v1): 🐛 metadata and trailer video * feat(crt-v1): 🎨 update types * fix(crt-v1): ✨ Add correct Ratio denomination (Permill) * update with master * fix: 🐛 metadata not being set * fix: 🐛 parameters order * test: 🧪 fixing integration tests * test(crt-v1): 🧪 fix integration tests * feat(crt-v1): ✨ last price for token and recovered field for rev share part * feat: ✨ add resolver for dividend amount * feat(crt-v1): ✨ start adding channel fields for trackingtotal revenue * feat(crt-v1): ✨ add utils for royalty computation * feat(crt-v1): ✨ cumulative revenue on channel * feat(crt-v1): ✨ add resolver for transferrable amount * fix(crt-v1): ✨ add `acquiredAt` to pinpoint latest vesting schedule for account * Token metadata processing update * Prettier * chore(crt-v1): ⚡ dbgen * fix(crt-v1): 🧪 fix integration tests * fix(crt-v1): 🐛 missing fields in token sale vesting source * test(crt-v1): 🧪 test for transferrable balance amount * fix(crt-v1): 🐛 transferrable amount * test: 🧪 update tests after resolver fix * fix: 🐛 error on vesting schedules array * fix: 🎨 CI fixes * docs: update gitignore * fix: 🚨 prettier * build: 📌 chai depnedencies --------- Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> Co-authored-by: WRadoslaw <r.wyszynski00@gmail.com> * Clear benefits even if not passed (#282) * 🤑 Fix revenue share dividend estimation (#297) * Fix on revenue share dividend estimation * Fix type on result * 🛕 Historical revenue share participants (#286) * New field for revenue share * Set potential revenue share particitants at the time of start * fix: .gitignore not working * fix lint issues * re-generate db migrations * commit register.html.mst file * fix: notifications integration test --------- Co-authored-by: Ignazio Bovo <ignazio@jsgenesis.com> Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> Co-authored-by: WRadoslaw <r.wyszynski00@gmail.com> Co-authored-by: WRadoslaw <92513933+WRadoslaw@users.noreply.github.com> * Revert "Update `nara` from `master` (#300)" (#306) This reverts commit 887427c. * generate auth api docs and types * add is short derived field to video entity (#310) * add is shirt derived field to video entity * add indices on is short fields * fix: video language detection fix (#309) * fix: video language detection fix * address requested changes * fix: predictVideoLanguage function * fix: include max 1 video per channel in homepage videos (#313) * fix: include max 1 video per channel in homepage videos * update setOrionLanguage Migration script * format updateVideoRelevanceValue SQL query * fix: use UTC midnight epoch instead of current epoch to calculate video relevance score * bump package version and update CHANGELOG * fix: lint bug * remove NextEntityIdManager migration script * [offchainState] add v4.0.0 (CRT release) migrations * [offchainState] remove ORDER BY clause from UPDATE statements * add migration for NextEntityId * bump package version and update CHANGELOG --------- Co-authored-by: Ignazio Bovo <ignazio@jsgenesis.com> Co-authored-by: Leszek Wiesner <leszek@jsgenesis.com> Co-authored-by: WRadoslaw <r.wyszynski00@gmail.com> Co-authored-by: WRadoslaw <92513933+WRadoslaw@users.noreply.github.com>
malchililj
added a commit
to malchililj/orion
that referenced
this pull request
Sep 3, 2024
* fix: video language detection fix * address requested changes * fix: predictVideoLanguage function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
To accurately detect the language of video based on it's
titleanddescription, this fix makes the following changesconst cleanedString = input.replace(/[\p{P}\p{S}\p{N}\p{M}]/gu, '')regular expression as it unnecessarily removes a lot of characters from the input string and changes it's compositiontitlefor language detection, and if the detected language accuracy is not acceptable then usetitle+descriptionas input for language detection