Page MenuHomeSoftware Heritage

Add LibreJS compliance for swh-web
ClosedPublic

Authored by anlambert on Feb 18 2019, 5:56 PM.

Details

Summary

This diff enables to browse swh-web with its JavaScript features enabled while
using the LibreJS Firefox plugin (https://www.gnu.org/software/librejs/index.html).

The compliance is obtained through the following:

  • Generate a WebLabels HTML page through the help of a homemade webpack plugin referencing all the generated JavaScript assets we distribute. For each asset, all bundled source files in it are referenced along with their license and a link to their non-minified source code.
  • Slightly modify some Django templates to state the license of scripts directly included in the HTML.

Regarding the link to non-minified source files, I preferred to copy them to our web
server and serve them directly from there. Linking to external locations is also feasible
but there is currently no generic method to compute the relevant urls (some are easy to
compute but some are not, especially for not maintained npm modules). So in order to be
fully compliant with LibreJS specification, this is currently the best method to use.
In the future, we should be able to link directly to the swh archive using the source files
checksums, but this requires npm packages to be loaded in a regular basis into it which
is currently not the case.

That diff introduces a draft webpack plugin to ease the generation of the WebLabels page.
There is still a lot of improvements to add to it but I think it could be useful to other
FOSS web applications using webpack to generate static assets. It could be moved to a
dedicated repository in our forge and published on npm once it gets to a release state.

Related T1512

Test Plan

Not sure how I can test this.

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Great work, thanks! I'm stopping by just to comment on this:

In the future, we should be able to link directly to the swh archive using the source files
checksums, but this requires npm packages to be loaded in a regular basis into it which
is currently not the case.

Even when this will be possible, I don't think it would be advisable, because it adds a dependency on the archive content in order to be in compliance with the Web Labels spec.
While it is true that, for the main archive, once we are in compliance we will remain so, it will not necessarily be the case for independent deployments of the webapp.

What are the magnet_links for?

We are missing the appropriate copyright notices for each library we use. For instance, the only copright info assigned to core-js is that it is an "Expat license", which is linked to http://www.jclark.com/xml/copying.txt .
However, the copyright notice must contain "Copyright (c) 2014-2019 Denis Pushkarev", not "Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd". This is an explicit requirement of the Expat license ("The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software." and many more licenses).

I also don't think it's a good idea to link to external websites, these links might rot. What about serving each project's original license files along the source files (which also solves the issue of copyright notices)?

You use node_modules as a literal at multiple places; could you use resolve.modules from the webpack config instead?

swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
116

Also, I don't fully understand what WebLabels are, but aren't them just the data in /jslicenses/? If so, it would make more sense to rename this folder to jssources or something similar.

182

I don't know about the prod config, but with the debug web server, the folder is /static/weblabels/, not /weblabels/. Would '../weblabels/' + srcFileData['id'] work?

255

Why 0R (zero ah) instead of OR (oh ah)? This seems to crash spdxParse.

260–261

spdxParse does not always return a dict with a license attribute. eg. when the licenseStr is 'MIT OR GPL-3.0', it will return:

{ left: { license: 'MIT' },
  conjunction: 'or',
  right: { license: 'GPL-3.0' } }
271

This probably would better be recursive, as the path comes from a config option: , {"recursive": true}

swh/web/assets/config/webpack.config.development.js
365

This is not related to this Diff, but I'm just noticing this file.

We bundle it in our code, so we must provide its license as well. (I know the current upstream doesn't do it either, but they probably should do it too.)

This revision now requires changes to proceed.Feb 19 2019, 2:33 PM

What are the magnet_links for?

They can be used to declare the license of scripts embedded in HTML page (see section 3.2.2 https://www.gnu.org/software/librejs/free-your-javascript.html).
I imported them in case we need to modify the script code processed by webpack on the fly when compiling assets (I tried this approach first but it did not work well).

We are missing the appropriate copyright notices for each library we use. For instance, the only copright info assigned to core-js is that it is an "Expat license", which is linked to http://www.jclark.com/xml/copying.txt .
However, the copyright notice must contain "Copyright (c) 2014-2019 Denis Pushkarev", not "Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd". This is an explicit requirement of the Expat license ("The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software." and many more licenses).

I also don't think it's a good idea to link to external websites, these links might rot. What about serving each project's original license files along the source files (which also solves the issue of copyright notices)?

LibreJS documentation recommends to use these urls for linking to licenses: https://www.gnu.org/software/librejs/manual/html_node/Free-Licenses-Detection.html#Free-Licenses-Detection.
I agree that linking to license files bundled in node packages is better. For the record, I tried to do that last week but the LibreJS plugin did not recognize the licenses when changing the urls and was blocking the swh-web scripts again.
I must have missed something as I can find some examples using other links as those recommended by LibreJS doc (https://guix.mdc-berlin.de/javascript).
I will make another try on it.

You use node_modules as a literal at multiple places; could you use resolve.modules from the webpack config instead?

I know, there is still too many harcoded paths in the current plugin implementation, the plan is to remove them.

swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
116

Yep, this seems a better name indeed

182

I considered that this path is relative to the root static folder.
This is for use with your web framework of choice who knows its location.
For instance with Django, using {% static <static_url> %} in a template works as expected.

255

whoopsie

260–261

Yes I know, this is currently not handled

271

ack

swh/web/assets/config/webpack.config.development.js
365

The license text is available from the WebLabels page. I can also copy it in source folder but a license header directly in the source file seems better.

swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
260–261

If we don't need it, you can drop the OR-joining above and check the root item is a leaf (so we don't silently ignore an AND).

swh/web/assets/config/webpack.config.development.js
365

I can also copy it in source folder but a license header directly in the source file seems better.

The GPL requires the full license: "and give all recipients a copy of this License along with the Program" https://www.gnu.org/licenses/gpl-3.0.html#section4

anlambert added inline comments.
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
182

Ok so the best option here is to use the following:
srcFileData['static_url'] = stats.publicPath + 'weblabels/' + srcFileData['id'];

It enables to get the absolute url of the source file even when using webpack-dev-server.
So no need to use the Django template tag static anymore.

The same mechanism must be used to compute the absolute url of each generated asset.

260–261

As I said, I did not handle multiple licenses for the moment. I intend to do it once the single license case is properly handled (I need to find some examples first).

swh/web/assets/config/webpack.config.development.js
365

Ack, I will make the changes in master then rebase.

anlambert added inline comments.
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
271

The recursive options is only available with node >= 10 but the node version available in stretch-backports is 8.11.1.
So I will have to handle the recursive mkdir myself as in lines 275-282 below and create a dedicated method.

First update to that diff addressing most of vlorentz comments plus other improvements:

  • Rename output directory from 'weblabels' to 'jssources' and allow to set it via the options provided to the plugin
  • Copy original license files to ouput directory and add links to display them in the weblabels page
  • Generate absolute urls for each file to serve from the weblabels page
  • Fix some weblabels generation issues when using webpack-dev-server
  • Use the path module from node for manipulating paths
  • Some code cleanup and factorization

Next steps:

  • Add proper error handling for missing licenses and licenses not supported by LibreJS
  • Handle the multiple licenses case

Second update to that Diff:

  • Handle source files with multiple licenses
  • Add log messages when a license is missing or does not have a LibreJS counterpart
  • Allow to generate either a plain HTML Web Labels page or a JSON file to be used with a template processor
  • Add validation of options provided to the plugin
  • Add a README.md documenting the plugin
  • Code cleanup
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
304

This assumes that the left subtree is a leaf.

I understand it is hard to fully support spdx expressions.
It's probably best to give up when the expression is anything more than a list of licenses with OR operators, to keep the code readable.

anlambert added inline comments.
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
304

You're right! I would prefer to properly handle those multiple license cases so I will improve that before landing the diff. I have an idea about how to do it right.

anlambert added inline comments.
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
304

Thinking it back, I do not want to implement complex processing regarding the combination operators of licenses at the moment. The LibreJS specification does not handle that currently and simply says that if a js file has multiple licenses, enumerating all of them is sufficient. So I will simply extract all references licenses in the SPDX expression and add them to the Web Labels page.

Update:

  • Ensure all licenses get extracted for complex spdx expressions
  • Simplify multiple licenses display in Web Labels table
  • cosmetic improvements in HTML templates
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
304

You should just check the operator is an OR. If it is not, it makes the plugin write incorrect information, so it's better to crash, or at least print a warning.

anlambert added inline comments.
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
304

Considering that frontend node modules with complex spdx license expressions are pretty rare, I will leave it like it for the moment and add a TODO to properly handle complex license expressions once LibreJS will have a spec for it.

Update:

  • detail spdxToWebLabelsLicenses processing and its limitations
  • add a TODO to improve multiple licenses combination processing in the future
swh/web/assets/config/webpack-plugins/generate-weblabels-webpack-plugin/index.js
304

I still think we should print a warning in case of something that's not an OR, it's only three lines.

Last update: Add warning when encountering a SPDX license expression containing an AND operator

Update: check 'AND' presence instead of ' AND ' in spdx expression

This revision is now accepted and ready to land.Feb 28 2019, 5:39 PM
This revision was automatically updated to reflect the committed changes.