Details

Reviewers

olasd

Group Reviewers

System administrators

Maniphest Tasks

T2962: hedgedoc: Fix irrelevant access to hedgedoc instance

Commits

rSPSITE6702a957bc5c: varnish: Refuse access to unknown vhost

Summary

This defined a vhost which explicitely refuses access when an unknown vhost is
detected.

This will allow to fix the current caveat, access to "hedgedoc" (default vhost) from
http(s)://swh-rproxy3.inria.fr or from the associated ip.

Manually deployed on rp1.internal.admin.swh.network and this seems to do the job.

The other vhosts are still responding appropriately.

Related to T2962

Test Plan

vagrant provision admin-rp1

Refuse access with:

curl --insecure https://10.168.50.20
<!DOCTYPE html>
<html>
  <head>
    <title>403 Forbidden access to unknown vhost 10.168.50.20</title>
  </head>
  <body>
    <h1>Error 403 Forbidden access to unknown vhost 10.168.50.20</h1>
    <p>Forbidden access to unknown vhost 10.168.50.20</p>
    <h3>Guru Meditation:</h3>
    <p>XID: 32789</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>

same goes for http://swh-rproxy3.inria.fr...

with local /etc/hosts tampered with:

...
10.168.50.10 bardo.internal.admin.swh.network
10.168.50.20 hedgedoc.softwareheritage.org swh-rproxy3.inria.fr toto
...

Diff Detail

Repository

rSPSITE puppet-swh-site

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont requested review of this revision.Jan 14 2021, 2:17 PM

ardumont created this revision.

Harbormaster completed remote builds in B18368: Diff 17224.Jan 14 2021, 2:17 PM

ardumont mentioned this in T2962: hedgedoc: Fix irrelevant access to hedgedoc instance.Jan 14 2021, 2:19 PM

Thanks!

However, I think we really want the inverse logic: let's make this a fallthrough on all varnishes for /all/ hostnames that the reverse proxy does not recognize.

I suggest removing the if and putting this vcl in the base varnish profile, at the lowest priority.

I guess the 403 error code is appropriate.

However, I think we really want the inverse logic: let's make this a fallthrough on all varnishes for /all/ hostnames that the reverse proxy does not recognize.

IIUR, I think that's my first implementation... [1]

And after trying it out, it worked. It think only because there is currently only 1
vhost (hedgedoc so far) behind that reverse proxy.

I believe that would not work with multiple vhosts behind the reverse proxy.

Because the first vhost failing the conditional would stop all subsequent checks.

Something like, first vhost try out the conditional on the req.http.host, fails, so the
else kicks in -> 403. Then no other vhosts has the occasion to test its condition.

[1]

sub vcl_recv {
    if (
        req.http.host ~ "^(?i)hedgedoc\.softwareheritage\.org(:[0-9]+)?$"
    ) {
        if (std.port(server.ip) == 80) {
            set req.http.x-redir = "https://" + req.http.host + req.url;
            return(synth(850, "Moved permanently"));
        } else {
            if (req.http.upgrade ~ "(?i)websocket") {
                return (pipe);
            }
            set req.http.X-Forwarded-Proto = "https";
            set req.backend_hint = hedgedoc;
        }
    } else {
        return (synth(403));
    }

Yeah, I understand; we should be able to do the following:

as early as possible in vcl_recv, define a new local variable (which gets initialized to false):

sub vcl_recv {
    declare local var.known_vhost BOOL;
}

in all known vhosts, set the variable to true:

sub vcl_recv {
    if (...) {
        set var.known_vhost = true;
    }
}

add a low priority snippet at the end of vcl_recv, synthesizing a 403 if the vhost is unknown:

sub vcl_recv {
    if (!var.known_vhost) {
        return(synth(403, "Unknown vhost " + req.http.host));
    }
}

Yeah, I understand; we should be able to do the following:

as early as possible in vcl_recv, define a new local variable (which gets initialized to false):
sub vcl_recv {
    declare local var.known_vhost BOOL;
}
in all known vhosts, set the variable to true:
sub vcl_recv {
    if (...) {
        set var.known_vhost = true;
    }
}
add a low priority snippet at the end of vcl_recv, synthesizing a 403 if the vhost is unknown:
sub vcl_recv {
    if (!var.known_vhost) {
        return(synth(403, "Unknown vhost " + req.http.host));
    }
}

It did not occur to me to check varnish's dsl.
That sounds more resilient indeed.
It should also cover more cases than this current diff.

i'll have a stab at it to change this diff implementation.

Thanks for the heads up and the suggestion ;)

As a first tryout, i'm not able to make it work, see the following from a tryout (with vagrant):

Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: Message from VCC-compiler:
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: Symbol not found.
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: ('/etc/varnish/includes/11_known_vhost_initialize.vcl' Line 9 Pos 5)
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]:     declare local var.known_vhost BOOL;
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: ----#######----------------------------
Jan 14 18:07:58 rp1.internal.admin.swh.network varnishd[2722]: Running VCC-compiler failed, exited with 2

Looking for documentation about this, i found references of this in [1]
But that seems to be about fastly not varnish [2]

[1] https://developer.fastly.com/reference/vcl/variables/#user-defined-variables

[2] https://developer.fastly.com/reference/vcl/

@vsellier suggested to use a request header instead of the variable.
Which also sounds like it could work ;)

Rework according to review and exchanges in irc.

This is now using a X-Known-Vhost header pattern

This sets a X-Known-Vhost header to Yes when encountering a known vhost. This header is
removed prior to hitting the backend.

This introduces an unknown-vhost rules which checks for the existence of the
X-Known-Vhost, if not found, this refuses the access with a 403.

Note: It's not using a variable as tryouts did not work.

Harbormaster completed remote builds in B18385: Diff 17243.Jan 15 2021, 9:58 AM

ardumont edited the test plan for this revision. (Show Details)Jan 15 2021, 9:59 AM

ardumont edited the summary of this revision. (Show Details)

Amend commit message

Harbormaster completed remote builds in B18386: Diff 17244.Jan 15 2021, 10:04 AM

Cool!

You'll need to scrub that header at the beginning of vcl_recv too, or we would let clients pass it through to us and bypass the feature.

I also think that we want the removal of the header to happen at the end of vcl_recv (before passing the request on to the backend) rather than in vcl_deliver (which will still pass the header to the backend: vcl_deliver happens on the response sent by the backend).

All in all it might be easier to actually use the var vmod ! :D

...