Skip to content

✨Add new amp-skimlinks extension #17907

Merged
lannka merged 155 commits intoampproject:masterfrom
skimhub:integration/amp-skimlinks-0.1
Nov 5, 2018
Merged

✨Add new amp-skimlinks extension #17907
lannka merged 155 commits intoampproject:masterfrom
skimhub:integration/amp-skimlinks-0.1

Conversation

@slocka
Copy link
Copy Markdown

@slocka slocka commented Sep 6, 2018

Implementation for the new amp-skimlinks extension as discussed here: I2I: amp-skimlinks · Issue #16435

Initial design doc is available here.

What's in this PR:

  • New amp-skimlinks extension allowing publishers to use Skimlinks affiliate services.
  • New 'Link Rewriter Service': A proposed generic solution to handle link rewriting (mostly for affiliate services but could be extended).

Critical changes:

  • New "Link rewriter service"
    The service is added in the main amp.js bundle since it is global to all extensions.

EDIT: Based on feedback, the service has been moved back to amp-skimlinks for now.

  • Adding DOCUMENT_REFERRER to SANDBOX_AVAILABLE_VARS:
    amp-skimlinks is using the amp-analytics internal API to send tracking requests which seems to use amp-analytics with the sandbox flag turned to true. amp-skimlinks needs to send the document referrer and from looking at this PR, it looks like there may not be any real reason to blacklist DOCUMENT_REFERRER.

amp-skimlinks extension

As a reminder amp-skimlinks does two main things:

  • Automatically transform a standard link to a merchant into an affiliate link. Skimlinks tries to be as less intrusive as possible by only replacing the links that can be affiliated.
  • Track information about the page then used in Skimlinks reporting platform.

In order to do so amp-skimlinks is divided into four main modules:

  • AmpSkimlinks: The custom component.
  • Tracking: Sends all the tracking requests through the amp-analytics API.
  • AffiliateLinkResolver: Communicates with the new "link rewriter service" to resolve through the Skimlinks domain resolver API, which links on the page should be swapped.
  • Waypoint: Builds the final Skimlinks affiliate URLs namely, the replacement URLs.

New "Link Rewriter Service"

In order to make the code from amp-skimlinks as generic as possible, we have created a new "link rewriter service" on top of which we have built amp-skimlinks.
The link rewriter service allows extensions to rewrite the href of anchors on the page at click time. The service also fixes the issue of conflicts between extensions trying to rewrite a URL at the same time by a system of priorities defined by the publisher.

EDIT: The service now lives inside amp-skimlinks but was build with the idea of being common to all extensions.

Initial solution:

The service is composed of two main parts interacting with each others:

LinkRewriter

A LinkRewriter instance allows you to specify in an asynchronous or synchronous way, which anchors should be replaced and by which replacement URLs.

LinkRewriterManager

As an AMP page can have multiple LinkRewriter instances running at the same time, we introduced LinkRewriterManager which is managing all the registered LinkRewriter.
Example:
A publisher wants to use Skimlinks to replace some specific links on the page but wants to use another affiliate service to replace all the other links.

A globalLinkRewriterManager instance is installed when the main amp.js script starts

Typical flow is:

1 - One or multiple LinkRewriter are registered through the LinkRewriterManager, each with:

  • A unique Id
  • A 'resolveUnknownLinks' function.
  • An optional CSS selector to restrict the links the LinkRewriter should handle.

2 - Each LinkRewriter scans the page to find all the anchors (based on the linkSelector option if provided).

3 - LinkRewriter ask through the provided 'resolveUnknownLinks' function what is the replacement for each anchor found on the page and store the results in a cache (LinkReplacementCache).

(4) - LinkRewriterManager detects a DOM_CHANGED event and ask all the registered LinkRewriter to re-scan the page (go back to 2)

5 - LinkRewriterManager detects a click on an anchor, iterates through the list of registered LinkRewriters by order of priority, asking one by one if they want to replace the link. If one LinkRewriter wants to replace the link, the iteration stops and LinkRewriter with a lower priority are ignored.

6 - A "click" event is propagated through an event system to all the registered LinkRewriter so all the LinkRewriters can at least track the click regardless of if they have mutated the anchor.

Potential future improvement:

Chainable linkRewriters.

In the current solution, there is no way to chain href mutation on the URL, only one LinkRewriter is allowed to modify the link at a time. If this solution fits most of the affiliate services we could make it more generic for other types of URL rewriting. We could imagine a chainable: true/false option to allow LinkRewriter to modify the URL without blocking the other LinkRewriter.

Config based service.

amp-skimlinks is doing more than just replacing links, it has a lot of logic related to tracking and how we resolve links. However, we could imagine for simpler use-cases a config based "Link Rewriter Service".

"config": {
  "id": "my-affiliate-network",
  // Resolve links based on api response
  "resolveLinksEndpoint": "https://affiliate-api/test",
  // Resolve links in a static way
  "replacementUrl": "htts://www.go.skimresources.com?url${href}",
  "linkSelector": "a.skimlinks"
}

Vincent added 30 commits July 30, 2018 12:40
@lepunk
Copy link
Copy Markdown

lepunk commented Nov 2, 2018

@zhouyx hi, I don't want to be pushy and understand that you are probably busy but my commercial team is pressuring me (those people, right?). Is there any way you could review @slocka's changes / comments? thank you.

We have some large publishers waiting on this feature to monetize their content for the holiday season.

@mrjoro we are pretty new to the amp project. could you please give us an estimate on how long you recon it would take to get this feature live / whats the process around it? thanks

@lannka
Copy link
Copy Markdown
Contributor

lannka commented Nov 2, 2018

hi @lepunk @slocka thanks for your patience. @zhouyx is traveling. I can take another round of review later today to see if all her comments are addressed.

@mrjoro
Copy link
Copy Markdown
Member

mrjoro commented Nov 2, 2018

@lepunk, regarding how long it will take to go live, there are a couple of aspects.

In general once code is merged in it will take 1-2 weeks to be fully in production. Full details are in our release-schedule.md.

We also generally prefer that new features are launched behind an experiment to allow it to be more fully tested. I'll defer to @lannka and @choumx on whether that is needed in this case.

TODO: add optional config param to .build() so we don't need to mutate
a private property from outside.
*/
analytics.config_.transport = {beacon: true};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the other PR is merged, could you update here?

@lannka
Copy link
Copy Markdown
Contributor

lannka commented Nov 2, 2018

Just had one last comment. Otherwise LGTM

Vincent added 2 commits November 5, 2018 09:48
Turn xhrpost off to avoid wildcard in Access-Control-Origin error when navigator.sendBeacon() is not available.
@slocka
Copy link
Copy Markdown
Author

slocka commented Nov 5, 2018

@lannka @choumx @zhouyx I made all the changes, is it ready to be merged?

@mrjoro Since it's a new extension and it's not deployed on any pages I believe we don't need to hide it under an experiment flag?

@lannka
Copy link
Copy Markdown
Contributor

lannka commented Nov 5, 2018

LGTMed. since @zhouyx is back today, I'm checking with her if she want to take a final look.

@mrjoro
Copy link
Copy Markdown
Member

mrjoro commented Nov 5, 2018

One advantage of putting new stuff behind an experiment is that it makes site developers aware that a feature is new and may have rough edges, and developers can use the feature in a nearly-real environment to see how it behaves.

The downside is that currently experiments are enabled on a per-browser basis, and there isn't a good way to enable an experimental feature in production for a limited set of sites until we re-enable origin experiments (dependent on #16183).

I'll defer to @choumx and @lannka on what to do in this case.

@lannka lannka merged commit c61415d into ampproject:master Nov 5, 2018
@lannka
Copy link
Copy Markdown
Contributor

lannka commented Nov 5, 2018

@slocka @lepunk PR merged. According to our regular release process, it will be in canary tmr. You can then start to do some end to end test.

@slocka
Copy link
Copy Markdown
Author

slocka commented Nov 5, 2018

@lannka Thanks a lot!

I have seen on Slack a message from @mrjoro mentioning that there will be no releases this week.

As Alan mentioned, we will not push a new version of AMP to production (except for P0 fixes) next week, the week of November 6. Regular releases will resume the following week (week of November 13).

Doesn't this impact amp-skimlinks canary release as well?

@lannka
Copy link
Copy Markdown
Contributor

lannka commented Nov 5, 2018

There will be no prod release this week. But we will still do canary release.

Copy link
Copy Markdown
Contributor

@jridgewell jridgewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was far too much code to effectively review.


export const AMP_SKIMLINKS_VERSION = '1.0.0';
export const XCUST_ATTRIBUTE_NAME = 'data-skimlinks-custom-tracking-id';
export const AFFILIATION_API = 'https://go.skimresources.com';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should needs a trailing /, so it's impossible to change the domain name.

'domains': domains,
});

const beaconUrl = `${DOMAIN_RESOLVER_API_URL}?data=${JSON.stringify(data)}`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: JSON.stringify is not 100% sufficient, since it won't encode & and = inside strings. The pubcode above comes from publisher HTML element.getAttribute('publisher-code')

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not expecting any & or = since domains contain a list of hostnames and publisher-code is validated using this regex: ^[0-9]+X[0-9]+$

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I didn't check the validator files.

* @private
*/
getNewDomains_(anchorList) {
return anchorList.reduce(((acc, anchor) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unnecessary quotes around the arrow function.

unknownAnchors.map(anchor => ({anchor, replacementUrl: null}))
);
const twoStepsResponse = this.resolveUnknownLinks_(unknownAnchors);
user().assert(twoStepsResponse instanceof TwoStepsResponse,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check doesn't make sense to me. You control the API, of course it will be a new TwoStepsResponse.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your confusion. It's true that for the moment only Skimlinks is using the link rewriter API but it was designed to be able to be used by other extensions, hence the check here. There will be a separate discussion about if we want to move this code outside of amp-skimlinks to make this API available to other extensions. If not I will clean up the code and will remove this test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍. instanceof is notoriously brittle across document boundaries, and AMP works in several modes that work that way. A duck-type check would be better.


/** @override */
isLayoutSupported() {
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checking against Layout.NO_DISPLAY explicitly.

return false;
}
// Save so we can restore it.
anchor.setAttribute(ORIGINAL_URL_ATTRIBUTE, anchor.href);
Copy link
Copy Markdown
Contributor

@jridgewell jridgewell Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: This is not re-entrant, and it's too hard to reason about the caller to rewriteAnchorUrl.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand what you mean and I don't see the bug 🤔. Could you give more explanations?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be called multiple times on a link in the timespan, so we will have buildup of affiliate links pointing to affiliate links pointing to the original link. Kinda like, https://example.com/?redirect=https://example.com/?redirect=https://example.com/?redirect=original.com

@slocka
Copy link
Copy Markdown
Author

slocka commented Nov 8, 2018

@lannka @choumx Has this been shipped in the canary release? I have the feeling it hasn't, could you help with that?

alin04 pushed a commit to alin04/amphtml that referenced this pull request Nov 13, 2018
@alin04 alin04 mentioned this pull request Nov 13, 2018
alin04 added a commit that referenced this pull request Nov 13, 2018
* cl/220306609 Revision bump for #17907

* cl/220307253 Revision bump for #19128

* cl/220310523 Revision bump for #19129

* cl/220399983 Revision bump for #19167

* cl/221145203 n/a

* cl/221159765 Revision bump for #19214

* cl/221164382 Invalidate `<amp-script>` tag as well

* cl/221176616 Revision bump for #17939

* cl/221181356 Revision bump for #19171
Enriqe pushed a commit to Enriqe/amphtml that referenced this pull request Nov 28, 2018
* Basic skim-js implementation

* Trying to use amp-analytics to send the request.

* Force tracking request to be a GET request

* add plant uml graph for design review

* Updating design doc graph

* design doc v2

* Start implementing AMP solution

* Hook AffiliateLinks to navigation click handler

* Better version of affiliate links manager.

* Adding tracking module & refactor booting of the extension

* Adding options and handle no links to resolve on the page scenario. (We always want to call beacon)

* Fix issue with wrong 'this' context

* Use events instead of optional callback to trigger na click tracking

* New link rewriter service

* Remove typo

* session id is not a thing anymore

* FIx JS Doc lint issues && move link rewrite service to 'src'

* Add domain resolver tests around beacon API calls

* Add domain resolver tests && replace AnchorRewriteData by createAnchorReplacementTuple

* adding test around beacon callback

* Add placeholder tests around waypoint query params

* Add impression id to waypoint params

* Implement .skip() tests

* Moving domain resolver tests to separate file.

* Add tracking tests

* No need to mock internal functions.

* Adding some NA click tracking tests

* Adding xcust test for NA click tracking

* Add test for na tracking => rnd query params

* Adding generate page impression id test

* Add tracking "setup" tests

* Cleaning up

* Adding tests  amp-skimlinks bootstrap

* Test beacon callback handler

* Add test to check if options are parsed when loading the extension

* Refactor link-rewrite-service priority logic.

* Add more link-rewriter-service tests

* Remove unused function

* Add tests for LinkRewriter

* Replace AnchorRewriteDataResponse by twoStepsResponse object.

* Add new variable to whitelist

* Fix use of .calledOnce instead of .called when testing not called.

* Use fakeWin instead of realWin for quicker run of the tests

* Add tests for rewriteAnchorUrl method

* Adding tests for link rewriter

* Refactor call beacon fallback when no links on page.

* Fix issue, missing anchor param for NA click

* Fix issue with impression tracking getting wrong ae status.

We need to make sure the anchorLinkReplacment map is updated before calling this.trackingService.sendImpressionTracking.

* amp-skimlinks should be loaded with layout='nodisplay'.

* Amp analytics requests should be sent using sendBeacon

* Consider 'canonical url hostname and sourceUrl hostname as 'internal' domain. (Domain will be excluded by default)

* Add real links for manual testing

* Renaming variables

* Refactor again the beacon callback logic so we can access the guid before sending  impression tracking.

* Remove .only

* Cleanup

* Call onDomUpdate from registerLinkRewriter instead of LinkRewriter constructor.

* Send new 'platform' flag instead of xcreo .

* Move LinkRewriter to be global to the app.

* Handle link-rewrite-service click handler from navigation instead of propagating a click event.

* Move constant to constants.js

* Renaming constants

* Renaming file

* Remove first param since the guid is passed using  setTrackingInfo

* Extract waypoint logic from AffiliateLinkResolver

* We can not use replacement macro for waypoint URL.

Macro in links only support CLIENT_ID and QUERY_PARAM.
https://github.com/ampproject/amphtml/blob/master/spec/amp-var-substitutions.md#variable-substitution-in-links

* Code review

* Clean up

* Add tests for link rewriter called from navigation.js

* Add basic validator

* Fix typos

* Fix search & replace

* Fixing check types

* Adding documentation for affiliate-link-resolver class

* Adding documentation for Tracking module

* Use skimresources instead of go.redirectingat

* Add documentation for link-rewriter

* Adding 1st version of the amp-skimlinks readme

* Fix broken tests

* Documentation fixes

* Rename link-rewrite-service to link-rewriter-manager for clarity

* Code review

* Do not send NA click tracking on right click.

* Renaming test

* Remove anchorClickActions since we don't need it after all

* Fix issue when unregistered LinkRewriter are in priority list.

* Fix new tests after rebase

* .trim() is already done in code above.

* Update pubcode for testing

* Code review

* Rename link-rewrite-service to link-rewriter-manager for clarity

* Replace Map by LinkReplacementCache class  since Map are not allowed in this project.

* Fix broken tests

* Simplify updateLinkList

* FIx typo

* Add comment

* Fixing eslint issues part1

* Fix lint errors in tests

* Fix issue with go.redirectingat links showing in link  & page impressions + tracking as NA click (on top of waypoint)

* Fix issue with excluded-domains option overwriting internal and globally excluded domains.

* fix linter issue

* Update doc types around link rewriter API

* Improve comments

* Fix check types

* Remove plantUML file from repo.

* Cleaning up amp-skimlinks.html

* Add owners to amp-skimlinks

* Fix outdated comment

* Update comments

* Fix issues related to page pre-rendering scenario

* Improve comment

* Add copyright on top of file

* Don't initialise anything until buildCallback

* Fix typo

* Moving linkRewriter logic inside amp-skimlinks

* Fix eslint issues while running unit tests on travis.

* Access referrer through util.

*  Fix "Do NOT stub on a cross domain iframe!" error on travis

* Remove dependency on navigation.js

* Code review part1

- Use existing Observable class instead of EventMessenger.
- Move affiliate link status to enum instead of constants.

* Make closure compiler types more explicit.

* @Private & @public should be the last annotation.

* Code review part2

* Use JsonDict when reading/writing serialisable  data

* Use a class to represent TwoStepsResponse.

* Add comment

* Minor code review changes

* Missing annotations

* Shadow AMP support

* Don't use parseUrlDeprecated to get the hostname if possible since the function is extremly slow.

* Add Skimlinks privacy policy

* Use chunk function to scan the document.

Chunk will delay the execution of the document scaning to when the browser is idle. The purpose of this change is to avoid any potential performance issue that could affect the user experience when loading the page.

At this point it hasn't been proven that amp-skimlinks could create such  slowdown but it feels safer to wrap it inside the chunk function for now. Since this is not the ideal solution (we could potentially lose clicks and page impressions), it would be great make more measurement and optimization and remove it once we feel confident that amp-skimlinks can run without risking to interfer with the user experience.

* Fix ampproject#18282. CustomEventReporter should auto-load amp-analytics.

* Add choumx as amp-skimlinks owner.

* amp-skimlinks should be lowest priority.

* Remove getBoundFunction util

* Improve amp-skimlinks validator

* Code review

* Get referrer from javascript instead of using macro

* Remove access to private variable. Use default transport until ampproject#19014 is ready.

* Rename tracking param.

* Use custom transport config for tracking API.

Turn xhrpost off to avoid wildcard in Access-Control-Origin error when navigator.sendBeacon() is not available.
Enriqe pushed a commit to Enriqe/amphtml that referenced this pull request Nov 28, 2018
* cl/220306609 Revision bump for ampproject#17907

* cl/220307253 Revision bump for ampproject#19128

* cl/220310523 Revision bump for ampproject#19129

* cl/220399983 Revision bump for ampproject#19167

* cl/221145203 n/a

* cl/221159765 Revision bump for ampproject#19214

* cl/221164382 Invalidate `<amp-script>` tag as well

* cl/221176616 Revision bump for ampproject#17939

* cl/221181356 Revision bump for ampproject#19171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.