Background: we're currently indexing licenses at the individual blob level, using fossology nomossa. With the recent work done on metadata indexing by @vlorentz , which shows among other things the feasibility of quickly indexing the most recently visited snapshot of all our origins, we can complement file-level license indexing with project-level license indexing. (We already do some of it, for projects who declare licenses in metadata files, but that leaves out a lot of projects.)
As a start we can use GitHub's licensee and run it on suitable revisions, associating its result to dir/commits/snapshots (whatever is appropriate), similarly to what we do for metadata.