Skip to content

Iashim [internetarchive shim for http://box/archive with NGINX]#2120

Closed
georgejhunt wants to merge 4 commits intoiiab:masterfrom
georgejhunt:iashim
Closed

Iashim [internetarchive shim for http://box/archive with NGINX]#2120
georgejhunt wants to merge 4 commits intoiiab:masterfrom
georgejhunt:iashim

Conversation

@georgejhunt
Copy link
Copy Markdown
Contributor

Missed translating apache to nginx for internetarchive.
This still fails because internetarchive uses node, and is currently configured to listen on ipv6, but we've disabled ipv6 on IIAB.
The configuration that needs changing may be in dweb, but I cannot find it.

@holta holta changed the title Iashim Iashim [internetarchive shim] Jan 8, 2020
@holta holta added this to the 7.1 milestone Jan 8, 2020
@holta holta added bug and removed enhancement labels Jan 8, 2020
src: /etc/apache2/sites-available/internetarchive.conf
path: /etc/apache2/sites-enabled/internetarchive.conf
state: link
- name: Install nginx config for nternetarchive for short URL http://box/archive (if debuntu and internetarchive_enabled)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Install nginx config for nternetarchive for short URL http://box/archive (if debuntu and internetarchive_enabled)
- name: Install nginx config for internetarchive for short URL http://box/archive (if debuntu and internetarchive_enabled)

- name: Restart to enable/disable http://box/archive (not just http://box:{{ internetarchive_port }})
systemd:
name: "{{ apache_service }}" # httpd or apache2
name: nginx # httpd or apache2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: nginx # httpd or apache2
name: nginx

@holta
Copy link
Copy Markdown
Member

holta commented Jan 9, 2020

@mitra42 can we ask you to please have your Node web server listen on IPv4 ?

Let us know if you have any questions! Then we can merge this or similar.

@holta holta changed the title Iashim [internetarchive shim] Iashim [internetarchive shim for NGINX] Jan 9, 2020
@holta holta changed the title Iashim [internetarchive shim for NGINX] Iashim [internetarchive shim for http://box/archive with NGINX] Jan 9, 2020
@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 9, 2020

Hi @holta - no idea what this is about. I've certainly not intentionally constrained anything to listen on IPv6, the servers in all other configurations are running standalone - i.e. not thru Apache or Nginx and are certainly listening on IPv4 (no way i would remember their IPv6 address to key it in !).

I'm actually surprised that a proxypass works since dweb-mirror can redirect URLs itself,. But it must be as AFAIK IA is working on IIAB (I only have one RPI4 and its setup with a different configuration at the moment), but anyway it should at least get you the first page - problems , if there were any, would be after that point.

But ... as I said, definitely I'm not doing anything about using IPv6.

@holta
Copy link
Copy Markdown
Member

holta commented Jan 9, 2020

@georgejhunt can you clarify a bit more what you need from @mitra42?

Thanks!

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 9, 2020

@georgejhunt you should be able to confirm what I'm saying by just pointing browser at http://[IPv4-address-of-yourbox]:4244 from any other machine.

@georgejhunt
Copy link
Copy Markdown
Contributor Author

georgejhunt commented Jan 10, 2020 via email

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 10, 2020

There's some chat on the topic here nodejs/node#18041 regarding change in node's default behavior, which suggests a change to the line in mirrorHttp

const server = app.listen(config.apps.http.port); 

In dweb mirror, problem is that I'm running a different setup and don't have a way to know if any change i made actually fixed your problem.

Editing that line and then doing service internetarchive restart should work fine.
I'd try
const server = app.listen(config.apps.http.port, "127.0.0.1");
or
const server = app.listen(config.apps.http.port, "0.0.0.0");

if you find a combination that works for your setup, I can look at testing in other setups, and will put it in the config if that's needed.

@georgejhunt
Copy link
Copy Markdown
Contributor Author

georgejhunt commented Jan 11, 2020 via email

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 11, 2020

I don't actually know what URL would work for proxypass on new version, in fact I'm surprised it worked on old version ! The server uses redirects, so its most comfortable when its addressed directly e.g. having nginx redirect to https://:4244.

I remember we struggled with this when we first got internet archive working on IIAB but that was a while back (I thought the problem was that under Apache with rewrite module turned off (previous IIAB confi) and unlike under Nginx, you coudnt figure out what the IP address of your host was.

Anyway, that proxypass line you've give it won't do what I think you are trying to achieve because I believe in nginx you do a "rewrite" of the URL, and then the proxypass, for example I had this block for a different service.

    location /ipfs {
        rewrite ^/ipfs/(.*)     /$1 break;
        proxy_set_header        Host $host;
        proxy_set_header        X-Real-IP $remote_addr;
        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header        X-Forwarded-Proto $scheme;
        proxy_pass              http://localhost:8080/;
        proxy_read_timeout      600;
    }

But as I said, I do not think proxypass will work at all, once the browser has the code its going to try and access URLs that won't work (like '/details/foo')

My best guess is that you want a block like

location /archive {
   rewrite ^/archive/(.*) http://$host:4244$1 redirect;
} 

Once it gets there, you should see a redirect to ... http://<IP.OF.RPI>:4244/archive.html?mirror=<IP.OF.RPI>%3A4244&transport=HTTP&identifier=local;

If that doesnt work then I may have to replicate your setup to figure this out, can you give me (abbreviated) instructions to build the same setup that is failing for you (I have a RPI4 I'll use for this).

@holta
Copy link
Copy Markdown
Member

holta commented Jan 15, 2020

@mitra42 fyi IIAB 7.1's release was deferred until February (likely the 2nd half of February) so that NGINX and similar infra are much more solid and clear for all.

That means we have many weeks ahead of us to get things unstuck here.

But still give me a shout if you and @georgejhunt need a short call to accelerate this work by early/mid-February at the latest?

@georgejhunt
Copy link
Copy Markdown
Contributor Author

georgejhunt commented Jan 15, 2020 via email

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 15, 2020

Ok - I'm thinking it should be a redirect, from /archive/xxx to http://$host:4244/xxx

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 30, 2020

Hi @georgejhunt -
Anyway .... if we are moving to nginx, why not add something like ....

location /archive/ {
    rewrite ^(/archive/(.*)$ http://$host:4244/$1 permanent;
}

I'd propose it as a PR but I've got one of these boxes up, and am more than a little confused by having both apache and nginx on it ! I can't figure out how they are related - what handles what and how they hand off between them, in particular I can only find an apache rule set for archive and it doesnt look like this is being used.

@georgejhunt
Copy link
Copy Markdown
Contributor Author

georgejhunt commented Jan 30, 2020 via email

@tim-moody
Copy link
Copy Markdown
Contributor

I would have taken out the first '('. doesn't $1 refer to the 2nd parens?

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 30, 2020

Arggh - sorry, I cut and pasted another rule and miss-edited.

location /archive/ {
    rewrite ^/archive/(.*)$ http://$host:4244/$1 permanent;
}

Should work....

Note the other half of the question, was that I couldn't figure out where this rule went, so I couldn't test it, given that both Apache and Nginx appear to be running in current version (presumably because the switch over is half done)

@holta
Copy link
Copy Markdown
Member

holta commented Jan 30, 2020

Note the other half of the question, was that I couldn't figure out where this rule went, so I couldn't test it, given that both Apache and Nginx appear to be running in current version (presumably because the switch over is half done)

Yeah please read https://github.com/iiab/iiab/blob/master/roles/nginx/README.md so that you understand why both Apache and NGINX are on.

Hit us back @mitra42 if you have any questions, Thanks! Obviously this isn't the final word (as the doc above explains, some apps/services cannot easily be made to run under NGINX yet...at this point and possibly for the coming year or so!)

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 30, 2020

Thanks for the pointer - answered most of my questions.

Point 2 doesn't parse - FastCGI is not available on nginx, or on Apache ? I see it in the config for mediawiki, and I'm not sure what "validates Ngix" means, does it mean validates that app for NGINX, or is it a typo meaning "Invalidates for NGINX"

List (iv) is really services that run their own http server (neither Apache nor NGINX) i.e. don't need handling in nginx, just forwarding (redirect for internetarchive, not sure which for kalite)

@holta
Copy link
Copy Markdown
Member

holta commented Jan 30, 2020

Point 2 doesn't parse - FastCGI is not available on nginx, or on Apache ? I see it in the config for mediawiki, and I'm not sure what "validates Ngix" means, does it mean validates that app for NGINX, or is it a typo meaning "Invalidates for NGINX"

@georgejhunt I think you wrote that sentence, so hopefully you can explain more?

List (iv) is really services that run their own http server (neither Apache nor NGINX) i.e. don't need handling in nginx, just forwarding (redirect for internetarchive, not sure which for kalite)

Exactly. The point is that many (not all of course) but many educators and users out there are very fond of mnemonics like http://box/archive and http://box/kalite if we can make these possible. For very rapid access to that day's lesson — without students/all getting overly distracted surfing around into other topics/apps off of IIAB's main page e.g. http://box

@mitra42
Copy link
Copy Markdown
Contributor

mitra42 commented Jan 30, 2020

Understood - and should be easier with nginx because you have the $host variable, that makes redirects to the same host MUCH easier, as in the line above so http://box/archive/details/foo would redirect to http://box:4244/details/foo which would work, and access its own internal URLs correctly. (which it wont do after a proxypass)

You can do this on Apache, but IIAB was running with rewrite module disabled, which disables access to that variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants