Page MenuHomeSoftware Heritage

varnish: Define vhost with forbidden access
ClosedPublic

Authored by ardumont on Jan 14 2021, 2:17 PM.

Details

Summary

This defined a vhost which explicitely refuses access when an unknown vhost is
detected.

This will allow to fix the current caveat, access to "hedgedoc" (default vhost) from
http(s)://swh-rproxy3.inria.fr or from the associated ip.

Manually deployed on rp1.internal.admin.swh.network and this seems to do the job.

The other vhosts are still responding appropriately.

Related to T2962

Test Plan

vagrant provision admin-rp1

Refuse access with:

curl --insecure https://10.168.50.20
<!DOCTYPE html>
<html>
  <head>
    <title>403 Forbidden access to unknown vhost 10.168.50.20</title>
  </head>
  <body>
    <h1>Error 403 Forbidden access to unknown vhost 10.168.50.20</h1>
    <p>Forbidden access to unknown vhost 10.168.50.20</p>
    <h3>Guru Meditation:</h3>
    <p>XID: 32789</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>

same goes for http://swh-rproxy3.inria.fr...

with local /etc/hosts tampered with:

...
10.168.50.10 bardo.internal.admin.swh.network
10.168.50.20 hedgedoc.softwareheritage.org swh-rproxy3.inria.fr toto
...

Diff Detail

Repository
rSPSITE puppet-swh-site
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.

Thanks!

However, I think we really want the inverse logic: let's make this a fallthrough on all varnishes for /all/ hostnames that the reverse proxy does not recognize.

I suggest removing the if and putting this vcl in the base varnish profile, at the lowest priority.

I guess the 403 error code is appropriate.

However, I think we really want the inverse logic: let's make this a fallthrough on all varnishes for /all/ hostnames that the reverse proxy does not recognize.

IIUR, I think that's my first implementation... [1]

And after trying it out, it worked. It think only because there is currently only 1
vhost (hedgedoc so far) behind that reverse proxy.

I believe that would not work with multiple vhosts behind the reverse proxy.

Because the first vhost failing the conditional would stop all subsequent checks.

Something like, first vhost try out the conditional on the req.http.host, fails, so the
else kicks in -> 403. Then no other vhosts has the occasion to test its condition.

[1]

sub vcl_recv {
    if (
        req.http.host ~ "^(?i)hedgedoc\.softwareheritage\.org(:[0-9]+)?$"
    ) {
        if (std.port(server.ip) == 80) {
            set req.http.x-redir = "https://" + req.http.host + req.url;
            return(synth(850, "Moved permanently"));
        } else {
            if (req.http.upgrade ~ "(?i)websocket") {
                return (pipe);
            }
            set req.http.X-Forwarded-Proto = "https";
            set req.backend_hint = hedgedoc;
        }
    } else {
        return (synth(403));
    }

Yeah, I understand; we should be able to do the following:

  • as early as possible in vcl_recv, define a new local variable (which gets initialized to false):
sub vcl_recv {
    declare local var.known_vhost BOOL;
}
  • in all known vhosts, set the variable to true:
sub vcl_recv {
    if (...) {
        set var.known_vhost = true;
    }
}
  • add a low priority snippet at the end of vcl_recv, synthesizing a 403 if the vhost is unknown:
sub vcl_recv {
    if (!var.known_vhost) {
        return(synth(403, "Unknown vhost " + req.http.host));
    }
}

Yeah, I understand; we should be able to do the following:

  • as early as possible in vcl_recv, define a new local variable (which gets initialized to false):
sub vcl_recv {
    declare local var.known_vhost BOOL;
}
  • in all known vhosts, set the variable to true:
sub vcl_recv {
    if (...) {
        set var.known_vhost = true;
    }
}
  • add a low priority snippet at the end of vcl_recv, synthesizing a 403 if the vhost is unknown:
sub vcl_recv {
    if (!var.known_vhost) {
        return(synth(403, "Unknown vhost " + req.http.host));
    }
}

It did not occur to me to check varnish's dsl.
That sounds more resilient indeed.
It should also cover more cases than this current diff.

i'll have a stab at it to change this diff implementation.

Thanks for the heads up and the suggestion ;)

As a first tryout, i'm not able to make it work, see the following from a tryout (with vagrant):

Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: Message from VCC-compiler:
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: Symbol not found.
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: ('/etc/varnish/includes/11_known_vhost_initialize.vcl' Line 9 Pos 5)
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]:     declare local var.known_vhost BOOL;
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: ----#######----------------------------
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: Running VCC-compiler failed, exited with 2

Looking for documentation about this, i found references of this in [1]
But that seems to be about fastly not varnish [2]

[1] https://developer.fastly.com/reference/vcl/variables/#user-defined-variables

[2] https://developer.fastly.com/reference/vcl/

@vsellier suggested to use a request header instead of the variable.
Which also sounds like it could work ;)

Rework according to review and exchanges in irc.

This is now using a X-Known-Vhost header pattern

This sets a X-Known-Vhost header to Yes when encountering a known vhost. This header is
removed prior to hitting the backend.

This introduces an unknown-vhost rules which checks for the existence of the
X-Known-Vhost, if not found, this refuses the access with a 403.

Note: It's not using a variable as tryouts did not work.

ardumont edited the summary of this revision. (Show Details)

Cool!

You'll need to scrub that header at the beginning of vcl_recv too, or we would let clients pass it through to us and bypass the feature.

I also think that we want the removal of the header to happen at the end of vcl_recv (before passing the request on to the backend) rather than in vcl_deliver (which will still pass the header to the backend: vcl_deliver happens on the response sent by the backend).

All in all it might be easier to actually use the var vmod ! :D

...

All in all it might be easier to actually use the var vmod ! :D

agreed on what you said btw.

I'll have a look closely to the var vmod [1]

[1] https://github.com/varnish/varnish-modules/blob/master/src/vmod_var.vcc

Use vmod var instead of mutating an header. It works (vagrant test again :).

This requires the installation of the varnish-modules debian packages (fortunately
already packaged! :)

This revision is now accepted and ready to land.Jan 25 2021, 10:10 AM