Skip to content

Valid filename characters made more permissive#7096

Closed
Convincible wants to merge 1 commit intojekyll:masterfrom
Convincible:patch-1
Closed

Valid filename characters made more permissive#7096
Convincible wants to merge 1 commit intojekyll:masterfrom
Convincible:patch-1

Conversation

@Convincible
Copy link
Copy Markdown
Contributor

Fixes #7087

@pathawks
Copy link
Copy Markdown
Member

pathawks commented Jul 8, 2018

Is there anything either built into Ruby or provided by a widely used Gem that could handle this for us? I don't like being the arbiter of what is a valid filename.

@Convincible
Copy link
Copy Markdown
Contributor Author

Convincible commented Jul 9, 2018

@pathawks Good thought. The short answer as far as I can tell is no. From my understanding:

  • Codifying what constitutes a valid filename across OS flavours is pretty tough, as each OS will have quirks and a variety of reserved names.
  • However, some characters can safely be called invalid, these are: / \ ? * : | " < > plus ASCII control characters (hexadecimal 00 to 1f)
  • Simply excluding these would be the most 'permissive' approach, but would fail in edge cases due to aforementioned quirks.
  • Instead, permitting only a proper subset of all safe filenames (e.g. via a Regex pattern) will guarantee filenames that work, but necessarily limit possible names.
  • So, you have to find a good middle ground.

Currently we are simply permitting only names that match the Regex [\w/\.-]+ i.e. a-z, 0-9, dot and hyphen.

I am suggesting nudging our 'middle ground' over just a bit to include other filename characters people will commonly want to use, such as _, #, ~, +, @ and brackets. In my view, those are in decreasing order of priority, i.e. it's common to see people naming files including underscores, then a bit less common with hashes, a bit less with tildes, etc. Others might prefer to cut this list short, i.e. permit a few more characters but not quite all of the ones I have listed.

I think it's fair enough for Jekyll to determine the characters it accepts i.e. impose a sort of file naming convention for its own sanity. This just needs to match how people may want to use it. Personally I find characters like _ and # useful in filenames when organising e.g. lots of layouts, some of which may be related to others, etc.

@ashmaroli
Copy link
Copy Markdown
Member

@Convincible The character class \w already includes 'underscore (_)' so {% include foo_bar.html %} is already valid.
I'm not so keen on whitelisting [#~+@] and the various brackets to keep include filenames simple especially when one can already use filenames such as sidebar--left.html and sidebar__right.html for categorizing.

@Convincible
Copy link
Copy Markdown
Contributor Author

@ashmaroli Fair enough, I suggest closing: I forgot that underscore is already included in \w. - and _ should be plenty for filenames. If we did want to add more flexibility in future, perhaps the inclusion of just 1 more such 'separator' character (e.g. #) might be worth considering.

@ashmaroli
Copy link
Copy Markdown
Member

perhaps the inclusion of just 1 more such 'separator' character (e.g. #) might be worth considering.

Possible if you can provide a strong use-case.

@Convincible
Copy link
Copy Markdown
Contributor Author

Actually I think I remember where I was coming from originally. It's a pretty minor thing but: - and _ are usually sorted at the opposite end of an alphabetical sort to ~ (tilde). So, for instance, if you have a directory containing both a main file and a relative file to include, e.g.:

  • my-first-blog.md
  • my-first-blog[separator]header.html

...then your choice of _ vs ~ as a separator will determine whether the file-to-include is listed after or before the primary file. This can actually be pretty helpful just in keeping your file list browsable/readable for development/editing.

This is obviously dependent on the OS and program displaying the file list, so the developer would choose the separator based on how it behaves in their environment.

@skullface
Copy link
Copy Markdown

👋 Bump from the dead — I would love to see this implemented. I want to use a versioned .html file from an npm package as an include, and I can’t do so as-is because of the automatic @ character in a scoped package’s name.

Using the escape filter doesn’t work, because such a character is not allowed. The @ symbol is the point of failure here.

@ashmaroli
Copy link
Copy Markdown
Member

Superseded by #8618

@ashmaroli ashmaroli closed this Aug 25, 2021
@jekyll jekyll locked and limited conversation to collaborators Aug 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

{% include %} is unnecessarily restrictive of file path characters

6 participants