Skip to content

Preview PDFs and videos#30667

Merged
georgeclaghorn merged 18 commits intorails:masterfrom
georgeclaghorn:activestorage-previews
Sep 28, 2017
Merged

Preview PDFs and videos#30667
georgeclaghorn merged 18 commits intorails:masterfrom
georgeclaghorn:activestorage-previews

Conversation

@georgeclaghorn
Copy link
Copy Markdown
Contributor

Add a mechanism for generating and storing previews (images extracted from non-image blobs). Provide two built-in previewers: one that extracts the first frame from a video, and one that extracts the first page from a PDF document. The video previewer relies on FFmpeg and the PDF previewer relies on mupdf.

/cc @dhh

@georgeclaghorn georgeclaghorn changed the title Preview PDFs and videos Preview PDFs and videos (WIP) Sep 20, 2017
Copy link
Copy Markdown
Member

@dhh dhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really lovely. It's awesome that we can use the framework itself to provide additional features, like the preview.

class ActiveStorage::PreviewsController < ActionController::Base
def show
if blob = find_signed_blob
expires_in 5.minutes # service_url defaults to 5 minutes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should find a way to encode this information once. Maybe the service provider just sets this as a constant that we can reference here.

end

def disposition_param
params[:disposition].presence_in(%w( inline attachment )) || "inline"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the third place we'd perform this piece of logic. Let's extract it into the Service with something like normalize_disposition or whatever. Then we can just rely on passing params[:disposition] straight in.

end

private
def find_signed_blob
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this method is worth it. Let's just inline in all three controllers.

end

def decoded_variation
ActiveStorage::Variation.decode(params[:variation_key])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story here. We can just have variant accept a variation_or_variation_key and then do the normalization there.


class_attribute :service

has_one_attached :image
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need something a bit more explicit here, maybe even just preview_image would do?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# extracting its first frame, and a PDF blob can be previewed by extracting its first page.
#
# Active Storage provides previewers for videos and PDFs. You can add your own previewers or remove the built-in ones
# by modifying +ActiveStorage.previewers+.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need a quick note that order of previewers matter, as it's first match that'll be used.

end

def preview
open do |input|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will still require us to download the entire video file before we can start trying to do a preview. We should check what the biggest videos we have look like. I wouldn't be surprised if some of them are 20, 50, or even 100GB. It would kinda suck to have to download the whole file just to get the first frame for a preview.

I wonder if there's a way to explore whether some files can be used in truncated form. Like, could you just download the first 20mb and then do the preview off that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion, we’re going to roll with this as-is. We’ll pursue performance optimizations if we find them necessary.

ActiveStorage::Variant.new(image, variation).processed
end


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional CR for separation.

# * {ffmpeg}(https://www.ffmpeg.org)
# * {mupdf}(https://mupdf.com)
#
# These libraries are not provided by Rails; to use the built-in previewers, you must install them yourself.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we detecting their presence early enough and raising helpful messages? It looks like this is currently setting people up for an unexpected error in production

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine, because they'll fail as soon as you actually call #preview. Which will happen in test/dev. The upside of this is that it's a no config.

file.rewind
end


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional CR for separation.

Copy link
Copy Markdown
Member

@dhh dhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful 👌

Comment thread activestorage/lib/active_storage.rb Outdated
mattr_accessor :verifier

mattr_accessor :previewers do
[ ActiveStorage::Previewer::PdfPreviewer, ActiveStorage::Previewer::VideoPreviewer ]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we, and users who write their own previewer, have to add previewers here manually? Wouldn't an inherited hook work better?

@dhh
Copy link
Copy Markdown
Member

dhh commented Sep 22, 2017 via email

@simi
Copy link
Copy Markdown
Contributor

simi commented Sep 22, 2017

To install ffmpeg and mupdf on CI should be enough to add those lines to travis.yml:

addons: 
  apt: 
    sources: 
      - sourceline: 'ppa:mc3man/trusty-media' 
    packages: 
      - ffmpeg 
      - mupdf 

@kaspth
Copy link
Copy Markdown
Contributor

kaspth commented Sep 23, 2017

@dhh ah, so users are really meant to be able to do things like ActiveStore.previewers.prepend SpecificPreviewer? And not just append previewers?

@dhh
Copy link
Copy Markdown
Member

dhh commented Sep 23, 2017 via email

@matthewd
Copy link
Copy Markdown
Member

I think this is a more widely-applicable observation for Active Storage, so may not be worth dealing with in this PR, but things like previewers need to go through config


class_attribute :service

class_attribute :default_url_expiry, default: 5.minutes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we often have configs that include default in the name. How about service_url_expiry? Or maybe service_url_expires_in since that connotes the relativeness of it and that users would want relative timestamps.

Alternatively, should this be a reader that wraps an ActiveStorage.service_url_expiry? (Which should then go through config.active_storage.service_url_expiry like @matthewd mentioned).


class_attribute :logger

class_attribute :url_expires_in, default: 5.minutes
Copy link
Copy Markdown
Contributor

@kaspth kaspth Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat 👍 — good call, fits better here.

@bdewater
Copy link
Copy Markdown
Contributor

I'm slightly surprised to see no warning about MuPDF's licensing. When I read this PR I hoped it was a more liberally licensed alternative to GhostScript for PDF processing, only to discover it's also Affero GPL. Unless the app developer buys a commercial license from Artifex, by using MuPDF they need to make their source code available per the license requirements. AFAIK most (if not all) Rails dependencies use MIT or similarly liberal licenses?

@redtachyons
Copy link
Copy Markdown

Not an expert on this topic,
Section 13 of AGPL

  1. Remote Network Interaction; Use with the GNU General Public License.

Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software. This Corresponding Source shall include the Corresponding Source for any work covered by version 3 of the GNU General Public License that is incorporated pursuant to the following paragraph.

Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the work with which it is combined will remain governed by version 3 of the GNU General Public License.

My understanding, please correct me if I am wrong
Without a commercial license of mupdf

  1. You can't distribute your rails application with mupdf bundled with less restrictive or closed source license

  2. You can't change the source of code of mupdf and use it in your own rails app server(a typical saas app setup)

If you are using mupdf without altering the source and re distribution is not a concern, then you can you mupdf without a commercial license.

But still a warning about license will be great

@simi
Copy link
Copy Markdown
Contributor

simi commented Sep 26, 2017

I went thru AGPL meaning few years ago.

TL;DR (sourced from wiki - https://en.wikipedia.org/wiki/Affero_General_Public_License)

This provision requires that the full source code be made available to any network user of the AGPL-licensed work, typically a web application.

If you use mupdf in Rails app, any application user gains right to obtain Rails app source code.

Also https://choosealicense.com/licenses/agpl-3.0/ helps here:

Network use is distribution (Users who interact with the software via network are given the right to receive a copy of the source code.)

Disclose source (Source code must be made available when the software is distributed.)

IMHO @bdewater is right. It will be nice to find some alternative to mupdf to be optional dependency by default.

@redtachyons
Copy link
Copy Markdown

redtachyons commented Sep 26, 2017

Seems like we are using mutool from shell here and it will be MereAggregation as per GPL terms, So I believe this usage won't make an issue in terms of licensing

@georgeclaghorn
Copy link
Copy Markdown
Contributor Author

georgeclaghorn commented Sep 26, 2017

Artifex is clear that use of MuPDF in a closed-source Rails application requires a commercial license. See their licensing page:

Generally speaking, “distribution” is the key word. Some examples of distribution/network use requiring a commercial license include:

[...]

Running Ghostscript or MuPDF on your network servers and permitting non-employees access to interact with these products as part of your non-AGPL application. Example: utilizing Ghostscript to interpret and display a PDF on a cloud application.

I have a version of the PDF previewer that uses Apache PDFBox working locally. The main drawback is that it requires Java. If there are no objections, I’ll push this change.

@georgeclaghorn
Copy link
Copy Markdown
Contributor Author

After some discussion, we’ve added a statement to the docs indicating that app developers should look into the licensing requirements that apply to them, since neither ffmpeg nor mupdf is released under the MIT License. This review of Artifex’s interpretation of the AGPL seems relevant.

@georgeclaghorn georgeclaghorn changed the title Preview PDFs and videos (WIP) Preview PDFs and videos Sep 28, 2017
@georgeclaghorn georgeclaghorn merged commit d305862 into rails:master Sep 28, 2017
@georgeclaghorn georgeclaghorn deleted the activestorage-previews branch September 28, 2017 20:43
hone added a commit to hone/rails that referenced this pull request Mar 1, 2018
mutool uses AGPL for licensing which has strict distribution rules.
rails#30667 (comment)

Poppler is licensed under GPL, so this should provide a free
alternative for most Rails apps.
Copy link
Copy Markdown

@marjoriewu marjoriewu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants